You are on page 1of 8

Haskell HyperText Processing

161 (C1), 176 (J1), Integrated Programming Laboratory 12th - 19th October 2012

Aims
To write a program from a given specication of a problem. To provide further experience with recursive list processing in Haskell. To gain experience of more of Haskells built-in list processing functions. To provide an introduction to web pages and HTML.

Introduction
The World-Wide Web is a common universal standard medium for information interchange. A web browsers job is to display annotated text les called HTML les (HTML stands for HyperText Mark-up Language, which denes the annotation rules). The mark-ups in a HTML le are annotations called HTML tags, placed around selected text. The tags give meaning to the elements they contain and the browser can understand this meaning, and use this information. Most commonly, the browser will use the tags to alter the visual appearance and layout of the contents of the tag. For example, the tags <title> and </title> tell the browser the title of the page, and it may then display the intervening text in the title bar at the top of the page; <h1> and </h1> tells the browser the intervening text is a header, and the browser may then display this in a larger font.

Using a suitable combination of HTML tags any text document can be marked-up so that it can be displayed on any browser on any machine that is connected to the internet. It is quite common to store information in HTML format so that it can be easily shipped to a browser on request. However, it is also common for HTML les to be generated dynamically by assembling information from other sources, for example template les or databases. This exercise requires you to generate a HTML le dynamically by assembling information from two sources: 1. A template HTML le that describes the basic structure of a web page for displaying project proposals. 2. An information le that provides details of specic projects from a specic proposer. Your job is to combine the two les, replacing special keywords in the template le with specic data read from the information le. You actually dont need to know anything about HTML to complete this exercise, but you might like to read up on some HTML basics as you go along. This is a good opportunity to learn more. You also dont need to know anything about text les except that they contain sequences of characters. The contents of a le can therefore be thought of as synonymous with a Haskell String. To help you with this exercise a skeleton le GenerateHTML.hs, a template le template.html, an info le pooh.info and a sample output le pooh.html are provided in your repository. To see what is required here, examine the le template.html. Although its a template le it comprises valid HTML which can be rendered by any browser. For example, you can tell Firefox to show it by passing the template le as an argument to the command firefox: firefox template.html & at the Linux prompt. The & tells your shell to run firefox in the background. Now examine the le pooh.info in a text editor. It contains information on the project proposals by a certain member of sta called Winnie the Pooh. You will notice that this le denes the values of the :NAME, :EMAIL, :WWW and :BLURB keywords. The text that follows each keyword is the value 2

associated with the keyword. Additionally there is a :PROJECT and :DESC keyword for each of Poohs proposals; the former is followed by the project name and the latter by the project description. The le pooh.html is the result of combining the template with the info le. So the :NAME, :EMAIL, :WWW and :BLURB keywords have been replaced by their values and the :PROJECTLIST keyword has been replaced by a list of proposals, each comprising the title and description as specied by the various :PROJECT and :DESC keywords in pooh.info. Again the HTML le might not be clear from looking at it in a text editor, but you can ask Firefox to render it, using: firefox pooh.html & You will see that it describes a neatly formatted list of Poohs proposals. If you complete the exercise your program should be capable of generating the content of the pooh.html le from the content of the pooh.info and template.html les.

Preamble
As per the previous exercise, you will use the git version control system to get the repository with the skeleton les for this exercise and its (incomplete) test suite. You can get your repository with the following (remember to replace the two instances of login with your login). > git clone ssh://login @labranch.doc.ic.ac.uk:10022/lab/login /1213/html To save you typing, the skeleton le GenerateHTML.hs contains the denitions shown below. type FileContents = String type Keyword = String type KeywordValue = [ String ] type KeywordDefs = [ ( Keyword, KeywordValue ) ] Additionally, a top-level function main is provided for performing the le input/output. Haskell I/O is beyond the scope of this course, but suce it to say that readFile delivers the contents of a le as a string and writeFile 3

does the opposite. The main function will look for the arguments passed to it on the command line, which should be the names of the template, information and output les. Once you have everything working, you should be able to generate your own output HTML le using: runghc GenerateHTML template.html pooh.info output.html To help you process the info le the function breakAt is provided. You will need to have a clear understanding of this function before you write the rest of the program.
breakAt ["Here","is","a",":Keyword","and","some", "text"] keywords x xs Invoke breakAt recursively on xs (["is","a"], [":Keyword","and","some","text"]) before Reattach x to before after

("Here":["is","a"], [":Keyword","and","some","text"]) Equivalently (["Here","is","a"], [":Keyword","and","some","text"])

Figure 1: An example application of breakAt Figure 1 shows how the function works when applied to a tokenised form of the text "Here is a :Keyword and some text" and an argument keywords that is assumed to contain at least the string ":Keyword". Notice that the recursive call generates almost the answer we want, but is missing the rst word ("Here"). To complete the solution this needs to be added back to the front of the list of words preceding ":Keyword" in the result of the recursive call. This involves pattern matching on the returned tuple. breakAt :: [ String ] -> [ Keyword ] -> ( [ String ], [ String ] ) breakAt [] keywords = ( [], [] ) breakAt ( x : xs ) keywords 4

| elem x keywords = ( [], x : xs ) | otherwise = ( x : before, after ) where ( before, after ) = breakAt xs keywords For example: *GenerateHTML> breakAt ["one", ":keyA", "two"] [":keyA", ":keyB"] (["one"],[":keyA","two"]) *GenerateHTML> breakAt ["one", "two"] [] (["one","two"],[]) *GenerateHTML> breakAt ["one", "two"] [":keyA", ":keyB"] (["one","two"],[]) To save you typing we have also provided the following constant which lists all the valid keywords with the exception of :PROJECTLIST: keywords :: [ Keyword ] keywords = [ ":NAME", ":EMAIL", ":WWW", ":BLURB", ":PROJECT", ":DESC"]

The Built-in Functions words and unwords


Two really useful built-in functions are unwords :: and words :: String -> [ String ]. [ String ] -> String

The function words takes a string comprising words separated by whitespace and returns the list of words, for example: Prelude> words "Winnie the Pooh is round\n\n \n and cuddly." ["Winnie","the","Pooh","is","round","and","cuddly."] Notice that the whitespace here comprises spaces and newlines ("\n"). The function unwords does approximately the opposite: given a list of words (Strings) it concatenates the words together, inserting exactly one space between them, for example: Prelude> unwords ["Winnie","the","Pooh","is","round","and","cuddly."] "Winnie the Pooh is round and cuddly." Youll nd this useful when generating your HTML output. Note that unwords is similar to the more general concat function, which joins together the elements of any list of lists (no spaces, though!). 5

The Problem
Complete the Haskell module GenerateHTML which contains denitions of the following functions which are described more fully below. A function lookUp that given a search string and a list of string/item pairs will return the list of all items associated with that string in the given list of string/item pairs. A function getKeywordDefs that takes a list of strings representing the words in an information le, and which returns the list of keyword definitions in that le in the form of a list of ( Keyword, KeyWordValue ) pairs. A function buildProjects that extracts the list of project titles (":PROJECT") and project descriptions (":DESC") from a given list of keyword definitions, and returns the HTML string representing the complete set of projects. A function buildHtml that copies the words in a given template le, replacing the keywords appropriately according to a given information le.

What To Do
Work in GenerateHTML.hs. Dont forget you can extend the given test suite (Tests.hs), and you should regularly add, commit, and push your work using git as you complete functions. Write a Haskell function lookUp :: String -> [ ( String, a ) ] -> [ a ] that, given a search string and a list of string/item pairs, returns the list of items whose associated string matches the search string. For example, *GenerateHTML> lookUp "A" [("A",8),("B",9),("C",5),("A",7)] [8,7] Hint: try doing this with a list comprehension as its much easier that way. Dene a function getKeywordDefs :: [ String ] -> KeywordDefs that takes the contents of an information le in the form of a list of 6

words (strings), and which returns a list of keyword/value pairs. The value associated with a keyword is the list of words that lie between that keyword and the next one. For example, *GenerateHTML> getKeywordDefs [":NAME","Fred","Bloggs",":BLURB","Blah blah"] [(":NAME",["Fred","Bloggs"]),(":BLURB",["Blah blah"])] Dene a function buildProjects :: KeywordDefs -> String that given a list of keyword/value pairs, returns a string representing the marked up descriptions of all project proposals. Each proposal should be surrounded by <li> and </li> tags, with the title nested within <h4> and </h4> tags, and the description between <p> and </p> tags. For example (ignoring the breaks in the output), *GenerateHTML> buildProjects ( getKeywordDefs [":NAME", "Tigger", ":PROJECT", "Bounce", ":DESC", "Springier", "tail?", ":PROJECT", "Movie", ":DESC", "Find", "family."] ) "<li><h4>Bounce</h4><p>Springier tail?</p></li><li> <h4>Movie</h4><p>Find family.</p></li>" Dene a function buildHtml :: FileContents -> FileContents -> FileContents that takes the contents of a template le and an information le and combines them using the above functions to build a string representing a valid HTML output le. Note that this function is called by the top-level main function that has been dened for you. You can use main to test your buildHtml function, but you will need to inspect the output le in order to do this. Alternatively you can test it directly by passing in smaller test strings. For example (again, ignoring line breaks), *GenerateHTML> buildHtml "<body> <h1> Name: :NAME </h1> <p> :BLURB </p> </body>" ":NAME Piglet :BLURB Piglet is a Very Small Animal." "<body> <h1> Name: Piglet </h1> <p> Piglet is a Very Small Animal. </p> </body>" Hint: You might nd it useful to dene a helper function to replace the words of the template le one by one. Recall that ":PROJECTLIST" needs to be replaced by the HTML for the list of proposals (buildProjects); 7

the other keywords need to be replaced with their associated values (from the KeywordDefs); all other words need to be left unmodied.

Submission
As with the previous exercise, you will need to use git to add, commit, and push your work back to the lab server. Then using the lab webpages, https://www.doc.ic.ac.uk/~tora/firstyear/ lab/, get the CATe token for the Haskell HTML exercise for your work, and submit that token to CATe.

Assessment
F - E: Very little to no attempt made. Submissions that fail to compile cannot score above an E. D - C: Implementations of most functions attempted, solutions may not be correct, or may not have a good style. B - A: Implementations of all functions up to buildHtml attempted, and solutions are mostly correct. Code style is generally good. A+: There are no obvious deficiencies in the solution or the students coding style. In addition there is evidence of productive testing. As for an A+, and the student has done additional work beyond the basic spec, e.g. by considering (and clearly commenting) interesting variations or alternatives to the given functions. Or they have done their own research into the theme of the spec and additionally presented extra functions.

A*:

You might also like