HTML and XML parsing.

Next: RDF parsing. Up: Result of a Libwww Previous: Page fetching. Contents Index

HTML and XML parsing.

For htmlparse and xmlparse, Result is a variable in case of an error and a complex term otherwise. In the latter case, it is a list of the form [elt1,...,elt_n], where each elt_i is of the form:

    elt(tag, [attval(attrname,value),...], [elt1',...,elt'_m])

The second argument here represents the list of attribute-value pairs. In HTML, some attributes, like checked, can be binary, in which case the corresponding value will be unbound. The third argument represents HTML or XML elements that are within the scope of tag. These elements have the same syntax as the parent element: elt(tag',attrs,sub-elements). If a tag has no attributes or if it does not have sub-elements, the corresponding lists will be empty. One special tag, pcdata, is introduced to represent pieces of text that appear in the document. This tag is our own creation--neither HTML nor XML use tags to represent text. One important difference between pcdata and other tags is that the third argument in elt(pcdata,...,...) is an atom or a list of characters, not a list (unlike other tags). If URL was specified as an atom, then the third argument of the pcdata-element is an atom as well. If URL is a character list, then so is the corresponding argument in the pcdata-element.

Next: RDF parsing. Up: Result of a Libwww Previous: Page fetching. Contents Index

Luis Fernando P. de Castro 2003-06-27