The sgml package accepts input in the form of files, URLs and Prolog atoms. To load the sgml parser, the user should type
?- [sgml].at the prompt. If test.html is a file with the following contents
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>Demo</title> </head> <body> <h1 align=center>This is a demo</title> <p>Paragraphs in HTML need not be closed. <p>This is called `omitted-tag' handling. </body> </html>then the following call
?- load_html_structure(file('test.html'), Term, Warn).
will parse the document and bind Term to the following Prolog term:
[ element(html,
[],
[ element(head,
[],
[ element(title,
[],
[ 'Demo'
])
]),
element(body,
[],
[ '\n',
element(h1,
[ align = center
],
[ 'This is a demo'
]),
'\n\n',
element(p,
[],
[ 'Paragraphs in HTML need not be closed.\n'
]),
element(p,
[],
[ 'This is called `omitted-tag\' handling.'
])
])
])
].
The XML document is converted into a list of Prolog terms of the form element(Name,Attributes,Content). Each term corresponds to an XML element. Name represents the name of the element. Attributes is a list of attribute-value pairs of the element. Content is a list of child-elements and CDATA. For instance,
<aaa>fooo<bbb>foo1</bbb></aaa>
will be parsed as
element(aaa,[],[fooo, element(bbb,[],[foo1])])
Entities (e.g. <) are returned as part of CDATA,
unless they cannot be represented. See load_sgml_structure/3
for details.