next up previous contents index
Next: 9.3.6 External Entities Up: 9.3 Predicate Reference Previous: 9.3.4 DTD-Handling   Contents   Index

9.3.5 Low-level Parsing Primitives

The following primitives are used only for more complex types of parsing, which might not be covered by the load_structure/4 predicate.

new_sgml_parser(-Parser, +Options, -Warn )

Creates a new parser. Warn is the list of warnings generated. A parser can be used one or multiple times for parsing documents or parts thereof. It may be bound to a DTD or the DTD may be left implicit. In this case the DTD is created from the document prologue or (if it is not in the prologue) parsing is performed without a DTD. The Options list can contain the following parameters:

dtd(?DTD)

If DTD is bound to a DTD object, this DTD is used for parsing the document and the document's prologue is ignored. If DTD is a variable, the variable gets bound to a created DTD. This DTD may be created from the document prologue or build implicitly from the document's content.

free_sgml_parser(+Parser, -Warn )

Destroy all resources related to the parser. This does not destroy the DTD if the parser was created using the dtd(DTD) option. Warn is the list of warnings generated during parsing (can be empty).

set_sgml_parser(+Parser, +Option, -Warn )

Sets attributes to the parser. Warn is the list of warnings generated. Options is a list that can contain the following members:

file(File)

Sets the file for reporting errors and warnings. Sets the linenumber to 1.
line(Line)

Sets the starting line for error reporting. Useful if the stream is not at the start of the (file) object for generating proper line-numbers. This option has the same meaning as in the load_structure/4 predicate.

charpos(Offset)

Sets the starting character location. See also the file(File) option. Used when the stream does not start from the beginning of a document.

dialect(Dialect)

Set the markup dialect. Known dialects:

sgml

The default dialect. This implies markup is case-insensitive and standard SGML abbreviation is allowed (abbreviated attributes and omitted tags).
xml

This dialect is selected automatically if the processing instruction <?xml ...> is encountered.
xmlns

Process file as XML file with namespace support.

qualify_attributes(Boolean)

Specifies how to handle unqualified attributes (i.e., without an explicit namespace) in XML namespace (xmlns) dialect. By default, such attributes are not qualified with namespace prefixes. If true, such attributes are qualified with the namespace of the element they appear in.

space(SpaceMode)

Define the initial handling of white-space in PCDATA. This attribute is described in Section 9.3.2.
number(NumberMode)

If token is specified (the default), attributes of type number are represented as a Prolog atom. If integer is specified, such attributes are translated into Prolog integers. If the conversion fails (e.g., due to an overflow) a warning is issued and the value is represented as an atom.
doctype(Element)

Defines the top-level element of the document. If a <!DOCTYPE ...> declaration has been parsed, this declaration is used. If there is no DOCTYPE declaration then the parser can be instructed to use the element given in doctype(_) as the top level element. This feature is useful when parsing part of a document (see the parse option to sgml_parse/3).

sgml_parse(+Parser, +Options, -Warn )

Parse an XML file. The parser can operate in two input and two output modes. Output is a structured term as described with load_structure/4.

Warn is the list of warnings generated. A full description of Options is given below.

document(+Term)

A variable that will be unified with a list describing the content of the document (see load_structure/4).
source(+Source)

Source can have one of the following forms: url(url), file(fileName), string('document as a Prolog atom'). This option must be given.
content_length(+Characters)

Stop parsing after the given number of Characters. This option is useful for parsing input embedded in envelopes, such as HTTP envelopes.
parse(Unit)

Defines how much of the input is parsed. This option is used to parse only parts of a file.

file

Default. Parse everything upto the end of the input.
element

The parser stops after reading the first element. Using source(Stream), this implies reading is stopped as soon as the element is complete, and another call may be issued on the same stream to read the next element.

declaration

This may be used to stop the parser after reading the first declaration. This is useful if we want to parse only the doctype declaration.

max_errors(+MaxErrors)

Sets the maximum number of errors. If this number is exceeded, further writes to the stream will yield an I/O error exception. Printing of errors is suppressed after reaching this value. The default is 100.
syntax_errors(+ErrorMode)

Defines how syntax errors are handled.

quiet

Suppress all messages.
print

Default. Print messages.


next up previous contents index
Next: 9.3.6 External Entities Up: 9.3 Predicate Reference Previous: 9.3.4 DTD-Handling   Contents   Index
Terrance Swift 2007-10-06