# 7.2 Representing HTML Documents in Curry

HTML is a language for specifying the structure and layout of web documents. We also say “HTML document” for a text written in the syntax of HTML. Basically, an HTML document consists of the following elements:

• elementary text

• tags with other HTML elements as contents, like headers (h1, h2,…), lists (ul, ol,…), etc.

• tags without contents, like line breaks (br), images (img), etc.

The plain syntax of HTML, which is interpreted by a web browser when displaying HTML documents, requires tags be enclosed in pointed brackets (<$\cdots$>). The contents of a tag is written between an opening and a closing tag where the closing tag has the same name as the opening tag but is preceded by a slash. Tags can also contain attributes to attach specific information to tags. If present, attributes are written in the form “$\mathit{name}$=$\mathit{value}$” after the opening tag’s name and before its right bracket.

For instance, “i” and “b” are tags to specify that their contents should be set using an italic and bold font, respectively. Thus, the HTML text

This is the <i>italic</i> and the <b>bold</b> font.

would be displayed by a web browser as this:

This is the italic and the bold font.

Tags without contents have no closing tag. An example is the tag for including images in web documents, where the attribute “src” specifies the file containing the picture and “alt” specifies a text to be displayed as an alternative to the picture:

<img src="picture.jpg" alt="Picture">

A program with a web interface must generate HTML documents that are displayed in the client’s browser. In principle, we can do this in Curry by printing the text of the HTML document directly, as in:

writeHTML = do
putStrLn "This is the "
putStrLn "<i>italic</i> and the "
putStrLn "<b>bold</b> font."

If the program becomes more complex and generates the HTML text by various functions, there is the risk that the generated HTML text is syntactically not correct. For instance, the tags with contents must be properly nested, i.e., the following text is not valid in HTML (although browser can display it but may become confused by illegal HTML documents):

This is <b>bold and also <i>italic</b></i>.

To avoid such problems in applications programs, one can introduce an abstraction layer where HTML documents are modeled as terms of a specific datatype. Thus, a web application program generates such abstract HTML documents instead of the concrete HTML text. This has the advantage that ill-formed web documents correspond to ill-formed expressions in Curry which would immediately be rejected by the compiler. The actual printing of the concrete HTML text is done by a wrapper function that translates an abstract HTML document into a string.

For representing abstract HTML documents in Curry, we define the following datatype of HTML expressions:

data HtmlExp = HtmlText   String
| HtmlStruct String [(String,String)] [HtmlExp]

The constructor HtmlText corresponds to elementary text in an HTML document, whereas the constructor HtmlStruct correspond to HTML elements with a tag and attributes. Thus, the parameter of type “[(String,String)]” is the list of attributes, i.e., name/value pairs.

For instance, our first HTML document above is represented with this datatype as the following list of HTML expressions:

[HtmlText "This is the ",
HtmlStruct "i" [] [HtmlText "italic"],
HtmlText " and the ",
HtmlStruct "b" [] [HtmlText "bold"],
HtmlText " font."]

Similarly, the image tag above is represented as follows:

HtmlStruct "img" [("src","picture.jpg"),("alt","Picture")] []

Obviously, we can specify any HTML document in this form but this becomes very tedious for a programmer. To avoid this, we define several functions as useful abbreviations of common HTML tags:

h1     hexps  = HtmlStruct "h1" [] hexps                      -- header 1
h2     hexps  = HtmlStruct "h2" [] hexps                      -- header 2
...
bold   hexps  = HtmlStruct "b"  [] hexps                      -- bold font
italic hexps  = HtmlStruct "i"  [] hexps                      -- italic font
hrule         = HtmlStruct "hr" [] []                         -- horizontal rule
breakline     = HtmlStruct "br" [] []                         -- line break
image src alt = HtmlStruct "img" [("src",src),("alt",alt)] [] -- image
...

Characters that have a special meaning in HTML, like “<”, “>”, “&”, “"”, should be quoted in elementary HTML texts to avoid ill-formed HTML documents. Thus, we define a function “htxt” for writing strings as elementary HTML texts where the special characters are quoted by the function “htmlQuote”:

htxt   :: String -> HtmlExp
htxt s = HtmlText (htmlQuote s)
htmlQuote :: String -> String
htmlQuote [] = []
htmlQuote (c:cs) | c=='<' = "&lt;"   ++ htmlQuote cs
| c=='>' = "&gt;"   ++ htmlQuote cs
| c=='\&' = "&amp;"  ++ htmlQuote cs
| c=='"' = "&quot;" ++ htmlQuote cs
| otherwise = c : htmlQuote cs

Now we can represent our first HTML document above as follows:

[htxt "This is the ", italic [htxt "italic"],
htxt " and the ", bold [htxt "bold"], htxt " font."]

All the definitions we have introduced so far are contained in the library “HTML.Base” of the Curry package html. In order to use this library, one has to add it as a dependency by the CPM command (see Section 5.1)

import HTML.Base

in the header of the Curry program. The library HTML.Base also defines a wrapper function showHtmlExps to generate the concrete textual representation of an abstract HTML expression. For instance, the value of

showHtmlExps [h1 [htxt "Hello World"], italic [htxt "Hello"], htxt " world!"]

is the string

<h1>
Hello World
</h1>
<i>Hello</i> world!

In order to generate a complete HTML page with header information, the HTML library contains the following definition of HTML pages:

data HtmlPage = HtmlPage String [PageParam] [HtmlExp] \pindex{HtmlPage}

The first argument is the title of the page and the third argument is the contents of the page. The second argument is a list of optional parameters, like encoding scheme, style sheets etc. Since they are seldom used in standard pages, the HTML library contains also the following function to specify HTML pages without optional parameters:

page :: String -> [HtmlExp] -> HtmlForm
page title hexps = HtmlPage title [] hexps

Furthermore, the HTML library defines a wrapper function

showHtmlPage :: HtmlPage -> String\pindex{showHtmlPage}

to generate the concrete textual representation of a complete HTML page with head and body parts. For instance, the value of

showHtmlPage (page "Hello" [h1 [htxt "Hello World"],
italic [htxt "Hello"], htxt " world!"])

is the string

<!DOCTYPE html>
<html lang="en">
<title>
Hello
</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<body>
<h1>
Hello World
</h1>
<i>Hello</i>
world!
</body>
</html>

We can use these functions to write Curry programs that generate HTML documents. For instance, consider the generation of an HTML document that contains a list of all multiplications of digits, i.e., a line in this document should look as follows:

The product of 7 and 6 is 42

First, we define a list of all triples containing such multiplications by the use of list comprehensions (compare Section 4.2.4):

multiplications = [ (x,y,x*y) | x <- [1..10], y <- [1..x] ]

Each triple is translated into a list of HTML expressions specifying the layout of a line:

mult2html :: (Int,Int,Int) -> [HtmlExp]
mult2html (x,y,z) =
[htxt "The product of ", bold [htxt (show x)],
htxt " and ", bold [htxt (show y)],
htxt " is ", bold [htxt (show z)], breakline]

Now can use these definitions to define the complete HTML document (the prelude function concatMap applies a function that maps elements to lists to each element of a list and concatenates the result into a single list) [Browse Program][Download Program]:

htmlMultiplications =
[h1 [htxt "Multiplication of Digits"]] ++ concatMap mult2html multiplications

For instance, we can use the latter function to store the HTML page in a file named “multtable.html” by evaluating the expression:

writeFile "multtable.html"
(showHtmlPage (page "Multiplication" htmlMultiplications))
###### Exercise 14

Define a function boldItalic to translate text files into HTML documents. The function has two arguments: the name of the input text file and the name of the file where the HTML page should be stored. The HTML document should have the same line structure as the input but the lines should be formatted in bold and italic, i.e, first line in bold, second in italic, third in bold, fourth in italic, etc. Hint: use the prelude function lines to split a string into a list of lines. [Browse Answer][Download Answer]