Next: 9.3.6 External Entities
Up: 9.3 Predicate Reference
Previous: 9.3.4 DTD-Handling
Contents
Index
The following primitives are used only for more complex types of parsing,
which might not be covered by the load_structure/4 predicate.
-
- new_sgml_parser(-Parser, +Options, -Warn )
-
Creates a new parser. Warn is the list of warnings
generated. A parser can be used one or multiple times for parsing
documents or parts thereof. It may be bound to a DTD or the DTD may be
left implicit. In this case the DTD is created from the document prologue
or (if it is not in the prologue) parsing is performed without a DTD.
The Options list can contain the following parameters:
- dtd(?DTD)
-
If DTD is bound to a DTD object, this DTD is used for parsing
the document and the document's prologue is ignored. If DTD is a
variable, the variable gets bound to a created DTD. This DTD may
be created from the document prologue or build implicitly from the
document's content.
- free_sgml_parser(+Parser, -Warn )
-
Destroy all resources related to the parser. This does not
destroy the DTD if the parser was created using the dtd(DTD)
option. Warn is the list of warnings generated during parsing (can
be empty).
- set_sgml_parser(+Parser, +Option, -Warn )
-
Sets attributes to the parser. Warn is the list of
warnings generated. Options is a list that can contain the
following members:
- file(File)
-
Sets the file for reporting errors and warnings. Sets the linenumber to 1.
- line(Line)
-
Sets the starting line for error reporting. Useful if the stream is not
at the start of the (file) object for generating proper line-numbers.
This option has the same meaning as in the load_structure/4
predicate.
- charpos(Offset)
-
Sets the starting character location. See also the file(File)
option. Used when the stream does not start from the beginning of a
document.
- dialect(Dialect)
-
Set the markup dialect. Known dialects:
-
- sgml
-
The default dialect. This implies markup is case-insensitive and
standard SGML abbreviation is allowed (abbreviated attributes and
omitted tags).
- xml
-
This dialect is selected automatically if the processing instruction
<?xml ...> is encountered.
- xmlns
-
Process file as XML file with namespace support.
- qualify_attributes(Boolean)
-
Specifies how
to handle unqualified attributes (i.e., without an explicit namespace)
in XML namespace (xmlns) dialect. By default, such attributes
are not qualified with namespace prefixes.
If true, such attributes are qualified
with the namespace of the element they appear in.
- space(SpaceMode)
-
Define the initial handling of white-space in PCDATA. This attribute is
described in Section 9.3.2.
- number(NumberMode)
-
If token is specified (the default), attributes of type number are represented as a Prolog atom.
If integer is specified, such attributes are translated into Prolog integers. If
the conversion fails (e.g., due to an overflow) a warning is issued and the
value is represented as an atom.
- doctype(Element)
-
Defines the top-level element of the document. If a <!DOCTYPE ...>
declaration has been parsed, this declaration is used. If there is no
DOCTYPE declaration then the
parser can be instructed to use the element given in
doctype(_) as the top level element. This feature is
useful when parsing part of a document (see the parse option to
sgml_parse/3).
- sgml_parse(+Parser, +Options, -Warn )
-
Parse an XML file. The parser can operate in two input and
two output modes. Output is a structured term as described with load_structure/4.
Warn is the list of warnings generated. A full description of
Options is given below.
- document(+Term)
-
A variable that will be unified with a list describing the content of
the document (see load_structure/4).
- source(+Source)
-
Source can have one of the following forms:
url(url), file(fileName),
string('document as a Prolog atom').
This option must be given.
- content_length(+Characters)
-
Stop parsing after the given number of
Characters. This option is useful for parsing
input embedded in envelopes, such as HTTP envelopes.
- parse(Unit)
-
Defines how much of the input is parsed. This option is used to parse
only parts of a file.
-
- file
-
Default. Parse everything upto the end of the input.
- element
-
The parser stops after reading the first element. Using source(Stream), this implies reading is stopped as soon as the
element is complete, and another call may be issued on the same stream
to read the next element.
- declaration
-
This may be used to stop the parser after reading the first
declaration. This is useful if we want to parse only the doctype
declaration.
- max_errors(+MaxErrors)
-
Sets the maximum number of errors. If this number is exceeded, further
writes to the stream will yield an I/O error exception. Printing of
errors is suppressed after reaching this value. The default is 100.
- syntax_errors(+ErrorMode)
-
Defines how syntax errors are handled.
- quiet
-
Suppress all messages.
- print
-
Default. Print messages.
Next: 9.3.6 External Entities
Up: 9.3 Predicate Reference
Previous: 9.3.4 DTD-Handling
Contents
Index
Terrance Swift
2007-10-06