Next: 9.3.2 Handling of White
Up: 9.3 Predicate Reference
Previous: 9.3 Predicate Reference
Contents
Index
SGML, HTML, and XML documents are parsed by the predicate
load_structure/4, which has many options. For
convenience, a number of commonly used shorthands are provided
to parse SGML, XML, HTML, and XHTML documents
respectively.
-
- load_sgml_structure(+Source, -Content, -Warn)
-
- load_xml_structure(+Source, -Content, -Warn)
-
- load_html_structure(+Source, -Content, -Warn)
-
- load_xhtml_structure(+Source, -Content, -Warn)
-
The parameters of these predicates have the same meaning as those in load_structure/4, and are described below.
The above predicates (in fact, just load_xml_structure/3 and
load_html_structure/3) are the most commonly used predicates of the
sgml package. The other predicates described in this section are
needed only for advanced uses of the package.
-
- load_structure(+Source, -Content, +Options, -Warn)
-
Source can have one of the following forms:
url(url), file(file name),
string('document as a Prolog atom').
The parsed document is returned in Content.
Warn is bound to a (possibly empty) list of warnings generated
during the parsing process.
Options is a list of parameters that control parsing, which are
described later.
The list Content can have the following members:
-
- A Prolog atom
-
Atoms are used to represent character strings, i.e., CDATA.
- element(Name, Attributes, Content )
-
Name is the name of the element tag. Since SGML is
case-insensitive, all element names are returned as lowercase atoms.
Attributes is a list of pairs the form Name= Value, where Name is the name of an attribute and
Value is its value. Values of type CDATA are represented
as atoms. The values of multi-valued attributes (NAMES,
etc.) are represented as a lists of atoms. Handling of the
attributes of types NUMBER and NUMBERS depends on the
setting of the number(+NumberMode) option of set_sgml_parser/2 or load_structure/3 (see later). By
default the values of such attributes
are represented as atoms, but the number(...) option can also
specify that these values must be converted to
Prolog integers.
Content is a list that represents
the content for the element.
- entity(Code)
-
If a character entity (e.g., Α) is encountered that
cannot be represented in the Prolog character set, this term is
returned. It represents the code of the encountered character (e.g.,
entity(913)).
- entity(Name)
-
This is a special case of entity(Code), intended to handle
special symbols by their name rather than character code.
If an entity refers to a character entity holding a single character,
but this character cannot be represented in the Prolog character set,
this term is returned. For example, if the contents of an element is
Α < Β then it will be represented as follows:
[ entity('Alpha'), ' < ', entity('Beta') ]
Note that entity names are case sensitive in both SGML and XML.
- sdata(Text)
-
If an entity with declared content-type SDATA is encountered, this
term is used. The data of the entity instantiates Text.
- ndata(Text)
-
If an entity with declared content-type NDATA is encountered, this
term is used. The data instantiates Text.
- pi(Text)
-
If a processing instruction is encountered (<?...?>), Text holds the text of the processing instruction. Please note that the
<?xml ...?> instruction is ignored and is not treated as a
processing instruction.
The Options parameter is a list that controls parsing. Members of
that list can be of the following form:
- dtd(?DTD)
-
Reference to a DTD object. If specified, the <!DOCTYPE ...>
declaration supplied with the document
is ignored and the document is parsed and validated against
the provided DTD. If the DTD argument is a variable, then
a the variable DTD gets bound to
the DTD object created out of the DTD supplied with the document.
- dialect(+Dialect)
-
Specify the parsing dialect. The supported dialects are
sgml (default),
xml
and xmlns.
- space(+SpaceMode)
-
Sets the space handling mode for the initial environment. This mode is
inherited by the other environments, which can override the inherited
value using the XML reserved attribute xml:space. See
Section 9.3.2 for details.
- number(+NumberMode)
-
Determines how attributes of type NUMBER and NUMBERS are
handled. If token is specified (the default) they are passed as
an atom. If integer is specified the parser attempts to convert
the value to an integer. If conversion is successful, the attribute is
represented as a Prolog integer. Otherwise the value is represented as
an atom. Note that SGML defines a numeric attribute to be a sequence
of digits. The - (minus) sign is not allowed and 1 is
different from 01. For this reason the default is to handle
numeric attributes as tokens. If conversion to integer is enabled,
negative values are silently accepted and the minus sign is ignored.
- defaults(+Bool)
-
Determines how default and fixed attributes from the DTD are used. By
default, defaults are included in the output if they do not appear in
the source. If false, only the attributes occurring in the source
are emitted.
- file(+Name)
-
Sets the name of the input file for error reporting.
This is useful if the input is a stream that is not coming from
a file. In this case, errors and warnings will not have the file name
in them, and this option allows one to force inclusion of a file name
in such messages.
- line(+Line)
-
Sets the starting line-number for reporting errors. For instance, if
line(10) is specified and an error is found at line X then the
error message will say that the error occurred at line X+10.
This option is used when the input stream does not start with the first
line of a file.
- max_errors(+Max)
-
Sets the maximum number of errors. The default is 50. If this number
is reached, the following exception is raised:
error(limit_exceeded(max_errors, Max), _)
Next: 9.3.2 Handling of White
Up: 9.3 Predicate Reference
Previous: 9.3 Predicate Reference
Contents
Index
Terrance Swift
2007-10-06