Tabling can be used in conjunction with Definite Clause Grammars to get the effect of a more complete parsing strategy. When Prolog is used to evaluate DCG's, the resulting parsing algorithm is ``recursive descent''. Recursive descent parsing, while efficiently implementable, is known to suffer from several deficiencies: 1) its time can be exponential in the size of the input, and 2) it may not terminate for certain context-free grammars (in particular, those that are left or doubly recursive). By appropriate use of tabling, both of these limitations can be overcome. With appropriate tabling, the resulting parsing algorithm is a variant of Earley's algorithm and of chart parsing algorithms.
In the simplest cases, one needs only to add the directive :- auto_table (see Section 3.8.4) to the source file containing a DCG specification. This should generate any necessary table declarations so that infinite loops are avoided (for context-free grammars). That is, with a :- auto_table declaration, left-recursive grammars can be correctly processed. Of course, individual table directives may also be used, but note that the arity must be specified as two more than that shown in the DCG source, to account for the extra arguments added by the expansion. However, the efficiency of tabling for DCGs depends on the representation of the input and output sequences used, a topic to which we now turn.
Consider the expanded DCG rule from the previous section:
p(X, S0, S) :- |
In a Prolog system, each input and output variable, such as S0 or S is bound to a variable or a difference list. In XSB, this is called list mode. Thus, to parse go to lunch stop the phrase would be presented to the DCG rule as a list of tokens [go,to,lunch,stop] via a call to phrase/3 such as:
phrase(p(X),[go,to,lunch,stop]). |
or an explicit call to p/3, such as:
p(X,[go,to,lunch,stop|X],X). |
Terminal elements of the sequence are consumed (or generated) via the predicate 'C'/3 which is defined for Prolog systems as:
'C'([Token|Rest],Token,Rest). |
While such a definition would also work correctly if a DCG rule were tabled, the need to copy sequences into or out of a table can lead to behavior quadratic in the length of the input sequence (See Section 5.2.4). As an alternative, XSB allows a mode of DCGs that defines 'C'/3 as a call to a Datalog predicte word/3 :
'C'(Pos,Token,Next_pos):- word(Pos,Token,Next_pos). |
assuming that each token of the sequence has been asserted as a word/3 fact, e.g:
word(0,go,1). |
The above mode of executing DCGs is called datalog mode.
word/3 facts are asserted via a call to the predicate tphrase_set_string/1. Afterwards, a grammar rule can be called either directly, or via a call to tphrase/1. To parse the list [go,to,lunch,stop] in datalog mode using the predicate p/3 from above, the call
tphrase_set_string([go,to,lunch,stop]) |
would be made, afterwards the sequence could be parsed via the goal:
tphrase(p(X)). |
or
p(X,0,F). |
To summarize, DCGs in list mode have the same syntax as they do in datalog mode: they just use a different definition of 'C'/3. Of course tabled and non-tabled DCGs can use either definition of 'C'/3. Indeed, this property is necessary for tabled DCG predicates to be able to call non-tabled DCG predicates and vice-versa. At the same time,tabled DCG rules may execute faster in datalog mode, while non-tabled DCG rules may execute faster in list mode.
Finally, we note that the mode of DCG parsing is part of XSB's state. XSB's default mode is to use list mode: the mode is set to datalog mode via a call to tphrase_set_string/3 and back to list mode by a call to phrase/2 or by a call to reset_dcg_mode/0.