The parser can operate in two modes: the sgml mode and the xml mode, as defined by the dialect(Dialect) option. HTML is a special case of the SGML mode with a particular DTD. Regardless of this option, if the first line of the document reads as below, the parser is switched automatically to the XML mode.
<?xml ... ?>
Switching to XML mode implies:
<element attribute ... attribute/> is recognized as
an empty element.
<), >
(>), & (&), ' (')
and " (").
_) and colon (:) are
allowed in names.
<!ATTLIST pre xml:space nmtoken #fixed preserve>