Next: Script Writing Utilities Up: Library Utilities Previous: Lower-Level I/O Contents Index

String Manipulation

XSB has a number of powerful builtins that simplify the job of string manipulation. These builting are especially powerful when they are combined with pattern-matching facilities provided by the regmatch package described in Chapter 7.

str_sub(+Sub, +Str, ?Pos)

string

Succeeds if Sub is a substring of Str. In that case, Pos unifies with the position where the match occurred. Positions start from 0. str_sub/2 is also available, which is equivalent to having _ in the third argument of str_sub/3.

str_match(+Sub, +Str, +Direction, ?Beg, ?End)

string

This is an enhanced version of the previous predicate. Direction can be forward or reverse (or any abbreviation of these). If forward, the predicate finds the first match of Sub from the beginning of Str. If reverse, it finds the first match from the end of the string (i.e., the last match of Sub from the beginning of Str). Beg and End must be integers or unbound variables. (It is possible that one is bound and another is not.) Beg unifies with the offset of the first character where Sub matched, and End unifies with the offset of the next character to the right of Sub (such a character might not exist, but the offset is stil defined). Offsets start from 0.

Both Beg and End can be bound to negative integers. In this case, the value represents the offset from the second character past the end of Str. Thus -1 represents the character next to the end of Str and can be used to check where the end of Sub matches in Str. In the following examples

    ?- string_match(Sub,Str,forw,X,-1).  
    ?- string_match(Sub,Str,rev,X,-1).  
    ?- string_match(Sub,Str,forw,0,X).

the first checks if the first match of Sub from the beginning of Str is a suffix of Str (because End represents the character next to the last character in Sub, so End=-1 means that the last characters of Sub and of Str occupy the same position). If so, X is bound to the offset (from the end of Str) of the first character of Sub. The second example checks if the last match of Sub in Str is a suffix of Str and binds X to the offset of the beginning of that match (counted from the beginning of Str). The last example checks if the first match of Sub is a prefix of Str. If so, X is bound to the offset (from the beginning of Str) of the last character of Sub.

str_cat(+Str1, +Str2, ?Result)

string

Concatenates Str1 with Str2. Unifies the result with Result.

In addition to this, the predicate fmt_write_string/3 described in Section 1.5 can be used to concatenate strings and do much more. However, for simple string concatenation, str_cat/3 is more efficient.

str_length(+Str, ?Result)

string

Unifies the Result with the length of Str.

substring(+String, +BeginOffset, +EndOffset, -Result)

string
String can be an atom or a list of characters, and the offsets must be integers. If EndOffset is negative, endof(String)+ EndOffset+1 is assumed. Thus, -1 means end of string. If BeginOffset is less than 0, then 0 is assumed; if it is greater than the length of the string, then string end is assumed. If EndOffset is non-negative, but is less than BeginOffset, then empty string is returned.

Offsets start from 0.

The result returned in the fourth argument is a string, if String is an atom, or a list of characters, if so is String.

The substring/4 predicate always succeeds (unless there is an error, such as wrong argument type).

Here are some examples:

| ?- substring('abcdefg', 3, 5, L).

L = de

| ?- substring("abcdefg", 4, -1, L).

L = [101,102]

(i.e., L = ef represented using ASCII codes).

string_substitute(+InpStr, +SubstrList, +SubstitutionList, -OutStr)

string

InputStr can an atom or a list of characters. SubstrList must be a list of terms of the form s(BegOffset, EndOffset), where the name of the functor is immaterial. The meaning of the offsets is the same as for substring/4. (In particular, negative offsets represent offsets from the first character past the end of String.) Each such term specifies a substring (between BegOffset and EndOffset; negative EndOffset stands for the end of string) to be replaced. SubstitutionList must be a list of atoms or character lists.

Offsets start from 0, as in C/Java.

This predicate replaces the substrings specified in SubstrList with the corresponding strings from SubstitutionList. The result is returned in OutStr. OutStr is a list of characters, if so is InputStr; otherwise, it is an atom.

If SubstitutionList is shorter than SubstrList then the last string in SubstitutionList is used for substituting the extra substrings specified in SubstitutionList. As a special case, this makes it possible to replace all specified substrings with a single string.

As in the case of re_substring/4, if OutStr is an atom, it is not interned. The user should either intern this string or convert it into a list, as explained previously.

The string_substitute/4 predicate always succeeds.

Here are some examples:

| ?- string_substitute('qaddf', [s(2,4)], ['123'] ,L).

L = qa123f

| ?- string_substitute('qaddf', [s(2,-1)], ['123'] ,L).

L = qa123

| ?- string_substitute("abcdefg", [s(4,-1)], ["123"],L).

L = [97,98,99,100,49,50,51]

| ?- string_substitute('1234567890123', [f(1,5),f(5,7),f(9,-2)], ["pppp", lll],X).

X = 1pppplll89lll

| ?- string_substitute('1234567890123', [f(1,5),f(6,7),f(9,-2)], ['---'],X).

X = 1---6---89---

concat_atom(+AtomList,?Atom)

string

AtomList must be a list containing atoms, integers and/or floats. This predicate concatenates the atoms and integers into a single atom, returned in Atom. Integers and floats are converted to character strings using number_codes/2.

concat_atom(+AtomList,+Sep,?Atom)

string

AtomList must be a list containing atoms, integers and/or floats, and Sep must be an atom. This predicate concatenates the atoms and integers into a single atom, separating each by Sep, return the resulting atom in Atom. Integers and floats are converted to character strings using number_codes/2.

term_to_atom(+Term,-Atom)

string

This predicate converts an arbitrary Prolog term Term into an atom, putting the result in Atom. It uses a format similar to the canonical format of write_canonical, but uses a standard list format for lists. An atom created from a term using this predicate can be reconverted back to the original term by using atom_to_term/2.

term_to_codes(+Term,-CodeList)

string

This predicate is used in the definition of term_to_atom/2 and converts a term into a list of ascii codes.

atom_to_term(+Atom,-Term)

string

This predicate converts an atom (in Atom) consisting of the characters making up a valid term and converts it into that term, placing the result in Term. The accepted syntax is intended to be valid canonical form (with no trailing '.'), extended by a treatment of the usual list syntax. It should be the inverse of term_to_atom/2. Floating point numbers are not completely handled; only a fixed point representation is used. If the atom is not a syntactically valid term, the predicate fails, quietly.

codes_to_term(+CodeList,-Term)

string

This predicate is used in the definition of atom_to_term/2 and converts a list of ascii codes consisting of a valid canonical term into that term. See atom_to_term for details.

read_atom_to_term(+Atom,-Term)

string

This predicate converts an atom Atom whose characters make up a valid term that can be read by read/1 into the term (Term) it represents. This predicate actually uses XSB's read to process the term so the operators currently in effect are used. The atom should not contain a terminating period ('.'). If the atom is not a syntactically correct term, then this predicate fails, quietly.

read_atom_to_term(+Atom,-Term,-VarList)

string

This predicate is similar to read_atom_to_term/2, but in addition returns in the third argument an (open-tailed) list of vv(VariableName,Variable) pairs associating the variable names with the variables. This is exactly the list returned from file_read/3, so documentation for that predicate gives further details.

Next: Script Writing Utilities Up: Library Utilities Previous: Lower-Level I/O Contents Index

Luis Fernando P. de Castro 2003-06-27