next up previous contents index
Next: Substitution. Up: regmatch: Regular Expression Matching Previous: Matching.   Contents   Index

Extracting the matches.

The predicate re_match/5 provides us with the offsets. How can we actually get the matched substrings? This is done with the help of the predicate re_substring/4:
    re_substring(+String, +BeginOffset, +EndOffset, -Result).

This predicate works exactly like substring/4 described in Section 1.6, except that the resulting substring is not interned (if it is an atom). All you can do with this string is to immediately convert it into a list (using atom_codes/2) or into a true atom (using intern_string/2, which must be imported from module machine).

The reason for these complications is to allow the user to control the size of the atom table. At present, XSB does not have atom table garbage collection, so heavy use of string manipulation functions can result in atom table overflow. This danger is particularly severe when XSB is used for processing HTML pages. This predicate will become an alias to substring/4 when atom garbage collection will be added to XSB.

On the other hand, converting strings into lists (without interning them first) is safe, because lists are garbage-collected in XSB Version 2.0.

Here is a complete example that shows matching followed by a subsequent extraction of the matches:

| ?- import intern_string/2 from machine.

| ?- Str = 'abbbcd\bbo',
      re_match("a(b*)cd\\\\",Str,0,_,[match(X,Y), match(V,W)|L]),
      re_substring(Str,X,Y,UninternedMatch),
      intern_string(UninternedMatch,Match),
      re_substring(Str,V,W,UninternedParen1),
      atom_codes(UninternedParen1,Paren1).

Str = abbbcd\bbo
X = 0
Y = 7
V = 1
W = 4
L = []
UninternedMatch = abbbcd\
Match = abbbcd\
UninternedParen1 = bbb
Paren1 = [98,98,98]
Note that the strings UninternedMatch and UninternedParen1 cannot be used by themselves. In the first case, we converted the string into a Prolog atom and in the second case into a string. The resulting objects (Match and Paren1) can be used in further computations.

Observe that XSB reports that UninternedMatch and UninternedParen1 are both equal the string ``bbb'', while Match -- the atom obtained from UninternedMatch -- is different. This is because UninternedMatch and UninternedParen1 are uninterned and both occupy the same physical space. Thus, the second call to re_substring/4 overrides the value stored in this location by the first call.


next up previous contents index
Next: Substitution. Up: regmatch: Regular Expression Matching Previous: Matching.   Contents   Index
Luis Fernando P. de Castro 2003-06-27