Exchanging Complex Data Types

Next: High Level Foreign Predicate Up: Passing Data between XSB Previous: Examples of Using the Contents Index

Exchanging Complex Data Types

The advanced XSB/Prolog interface uses only one data type: prolog_term. A Prolog term (as the name suggests) can be bound to any XSB term. On the C side, the type of the term can be checked and then processed accordingly. For instance, if the term turns out to be a structure, then it can be decomposed and the functor can be extracted along with the arguments. If the term happens to be a list, then it can be processed in a loop and each list member can be further decomposed into its atomic components. The advanced interface also provides functions to check the types of these atomic components and for converting them into C types.

As with the basic C interface, the file emu/cinterf.h must be included in the C program in order to make the prototypes of the relevant functions known to the C compiler.

The first set of functions is typically used to check the type of Prolog terms passed into the C program.

xsbBool is_attv((prolog_term) T): is_attv(T) returns TRUE if T represents an XSB attributed variable, and FALSE otherwise.
xsbBool is_float((prolog_term) T): is_float(T) returns TRUE if T represents an XSB float value, and FALSE otherwise.
xsbBool is_functor((prolog_term) T): is_functor(T) returns TRUE if T represents an XSB structure value (not a list), and FALSE otherwise.
xsbBool is_int((prolog_term) T): is_int(T) returns TRUE if T represents an XSB integer value, and FALSE otherwise.
xsbBool is_list((prolog_term) T): is_list(T) returns TRUE if T represents an XSB list value (not nil), and FALSE otherwise.
xsbBool is_nil((prolog_term) T): is_nil(T) returns TRUE if T represents an XSB [] (nil) value, and FALSE otherwise.
xsbBool is_string((prolog_term) T): is_string(T) returns TRUE if T represents an XSB atom value, and FALSE otherwise.
xsbBool is_var((prolog_term) T): is_var(T) returns TRUE if T represents an XSB variable, and FALSE otherwise.

After checking the types of the arguments passed in from the Prolog side, the next task usually is to convert Prolog data into the types understood by C. This is done with the following functions. The first three convert between the basic types. The last two extract the functor name and the arity. Extraction of the components of a list and the arguments of a structured term is explained later.

int p2c_int((prolog_term) V): The prolog_term argument must represent an integer, and p2c_int returns the value of that integer.
double p2c_float((prolog_term) V): The prolog_term argument must represent a floating point number, and p2c_float returns the value of that floating point number.
char *p2c_string((prolog_term) V): The prolog_term argument must represent an atom, and p2c_string returns the name of that atom as a string. The pointer returned points to the actual atom name in XSB's space, and thus it must NOT be modified by the calling program.
char *p2c_functor((prolog_term) V): The prolog_term argument must represent a structured term (not a list). p2c_functor returns the name of the main functor symbol of that term as a string. The pointer returned points to the actual functor name in XSB's space, and thus it must NOT be modified by the calling program.
int p2c_arity((prolog_term) V): The prolog_term argument must represent a structured term (not a list). p2c_arity returns the arity of the main functor symbol of that term as an integer.

The next batch of functions support conversion of data in the opposite direction: from basic C types to the type prolog_term. These c2p_* functions all return a boolean value TRUE if successful and FALSE if unsuccessful. The XSB term argument must always contain an XSB variable, which will be bound to the indicated value as a side effect of the function call.

xsbBool c2p_int((int) N, (prolog_term) V): c2p_int binds the prolog_term V (which must be a variable) to the integer value N.
xsbBool c2p_float((double) F, (prolog_term) V): c2p_float binds the prolog_term V (which must be a variable) to the (double) float value F.
xsbBool c2p_string((char *) S, (prolog_term) V): c2p_string binds the prolog_term V (which must be a variable) to the atom whose name is the value of S, which must be of type char *.

The following functions create Prolog data structures within a C program. This is usually done in order to pass these structures back to the Prolog side.

xsbBool c2p_functor((char *) S, (int) N, (prolog_term) V): c2p_functor binds the prolog_term V (which must be a variable) to an open term whose main functor symbol is given by S (of type char *) and whose arity is N. An open term is one with all arguments as new distinct variables.
xsbBool c2p_list((prolog_term) V): c2p_list binds the prolog_term V (which must be a variable) to an open list term, i.e., a list term with both car and cdr as new distinct variables. Note: to create an empty list use the function c2p_nil described below.
xsbBool c2p_nil((prolog_term) V): c2p_nil binds the prolog_term V (which must be a variable) to the atom [] (nil).
prolog_term p2p_new(): Create a new Prolog variable. This is sometimes needed when you want to create a Prolog term on the C side and pass it to the Prolog side.

To use the above functions, one must be able to get access to the components of the structured Prolog terms. This is done with the help of the following functions:

prolog_term p2p_arg((prolog_term) T, (int) A): Argument T must be a prolog_term that is a structured term (but not a list). A is a positive integer (no larger than the arity of the term) that specifies an argument position of the term T. p2p_arg returns the A^th subfield of the term T.
prolog_term p2p_car((prolog_term) T): Argument T must be a prolog_term that is a list (not nil). p2p_car returns the car (i.e., head of the list) of the term T.
prolog_term p2p_cdr((prolog_term) T): Argument T must be a prolog_term that is a list (not nil). p2p_car returns the cdr (i.e., tail of the list) of the term T.

It is very important to realize that these functions return the actual Prolog term that is, say, the head of a list or the actual argument of a structured term. Thus, assigning a value to such a prolog term also modifies the head of the corresponding list or the relevant argument of the structured term. It is precisely this feature that allows passing structured terms and lists from the C side to the Prolog side. For instance,

   prolog_term plist,        /* a Prolog list           */
               structure;    /* something like f(a,b,c) */
   prolog_term tail, arg;
   ..........
   tail = p2p_cdr(plist);         /* get the list tail  */
   arg  = p2p_arg(structure, 2);  /* get the second arg */

   /* Assume that the list tail was supposed to be a prolog variable */
   if (is_var(tail))
      c2p_nil(tail);  /* terminate the list */
   else {
      fprintf(stderr, "Something wrong with the list tail!");
      exit(1);
   }
   /* Assume that the argument was supposed to be a prolog variable */
   c2p_string("abcdef", arg);

In the above program fragment, we assume that both the tail of the list and the second argument of the term were supposed to be bound to Prolog variables. In case of the tail, we check if this is, indeed, the case. In case of the argument, no checks are done; XSB will issue an error (which might be hard to track down) if the second argument is not currently bound to a variable.

The last batch of functions is useful for passing data in and out of the Prolog side of XSB. The first function is the only way to get a prolog_term out of the Prolog side; the second function is sometimes needed in order to pass complex structures from C into Prolog.

prolog_term reg_term((int) R): Argument R is an argument number of the Prolog predicate implemented by this C function (range 1 to 255). The function reg_term returns the prolog_term in that predicate argument.
xsbBool p2p_unify(prolog_term T1, prolog_term T2): Unify the two Prolog terms. This is useful when an argument of the Prolog predicate (implemented in C) is a structured term or a list, which acts both as input and output parameter.

For instance, consider the Prolog call test(X, f(Z)), which is implemented by a C function with the following fragment:

    prolog_term newterm, newvar, z_var, arg2;
    .....
    /* process argument 1 */
    c2p_functor("func",1,reg_term(1));
    c2p_string("str",p2p_arg(reg_term(1),1));
    /* process argument 2 */
    arg2 = reg_term(2);
    z_var = p2p_arg(arg2, 1);  /* get the var Z */
    /* bind newterm to abc(V), where V is a new var */
    c2p_functor("abc", 1, newterm);
    newvar = p2p_arg(newterm, 1);
    newvar = p2p_new();
    ....
    /* return TRUE (success), if unify; FALSE (failure) otherwise */
    return p2p_unify(z_var, newterm);

On exit, the variable X will be bound to the term func(str). Processing argument 2 is more interesting. Here, argument 2 is used both for input and output. If test is called as above, then on exit Z will be bound to abc(_h123), where _h123 is some new Prolog variable. But if the call is test(X,f(1)) or test(X,f(Z,V)) then this call will fail (fail as in Prolog, i.e., it is not an error), because the term passed back, abc(_h123), does not unify with f(1) or f(Z,V). This effect is achieved by the use of p2p_unify above.

We conclude with two real examples of functions that pass complex data in and out of the Prolog side of XSB. These functions are part of the Posix regular expression matching package of XSB. The first function uses argument 2 to accept a list of complex prolog terms from the Prolog side and does the processing on the C side. The second function does the opposite: it constructs a list of complex Prolog terms on the C side and passes it over to the Prolog side in argument 5.

/* XSB string substitution entry point: replace substrings specified in Arg2
   with strings in Arg3.
   In: 
       Arg1: string
       Arg2: substring specification, a list [s(B1,E1),s(B2,E2),...]
       Arg3: list of replacement string
   Out:
       Arg4: new (output) string
   Always succeeds, unless error.
*/
int do_regsubstitute__(void)
{
  /* Prolog args are first assigned to these, so we could examine the types
     of these objects to determine if we got strings or atoms. */
  prolog_term input_term, output_term;
  prolog_term subst_reg_term, subst_spec_list_term, subst_spec_list_term1;
  prolog_term subst_str_term=(prolog_term)0,
    subst_str_list_term, subst_str_list_term1;
  char *input_string=NULL;    /* string where matches are to be found */
  char *subst_string=NULL;
  prolog_term beg_term, end_term;
  int beg_offset=0, end_offset=0, input_len;
  int last_pos = 0; /* last scanned pos in input string */
  /* the output buffer is made large enough to include the input string and the
     substitution string. */
  char subst_buf[MAXBUFSIZE];
  char *output_ptr;
  int conversion_required=FALSE; /* from C string to Prolog char list */

  input_term = reg_term(1);  /* Arg1: string to find matches in */
  if (is_string(input_term)) /* check it */
    input_string = string_val(input_term);
  else if (is_list(input_term)) {
    input_string =
      p_charlist_to_c_string(input_term, input_buffer, sizeof(input_buffer),
                             "RE_SUBSTITUTE", "input string");
    conversion_required = TRUE;
  } else
    xsb_abort("RE_SUBSTITUTE: Arg 1 (the input string) must be an atom or a character list");

  input_len = strlen(input_string);

  /* arg 2: substring specification */
  subst_spec_list_term = reg_term(2);
  if (!is_list(subst_spec_list_term) && !is_nil(subst_spec_list_term))
    xsb_abort("RE_SUBSTITUTE: Arg 2 must be a list [s(B1,E1),s(B2,E2),...]");

  /* handle substitution string */
  subst_str_list_term = reg_term(3);
  if (! is_list(subst_str_list_term))
    xsb_abort("RE_SUBSTITUTE: Arg 3 must be a list of strings");

  output_term = reg_term(4);
  if (! is_var(output_term))
    xsb_abort("RE_SUBSTITUTE: Arg 4 (the output) must be an unbound variable");

  subst_spec_list_term1 = subst_spec_list_term;
  subst_str_list_term1 = subst_str_list_term;

  if (is_nil(subst_spec_list_term1)) {
    strncpy(output_buffer, input_string, sizeof(output_buffer));
    goto EXIT;
  }
  if (is_nil(subst_str_list_term1))
    xsb_abort("RE_SUBSTITUTE: Arg 3 must not be an empty list");

  /* initialize output buf */
  output_ptr = output_buffer;

  do {
    subst_reg_term = p2p_car(subst_spec_list_term1);
    subst_spec_list_term1 = p2p_cdr(subst_spec_list_term1);

    if (!is_nil(subst_str_list_term1)) {
      subst_str_term = p2p_car(subst_str_list_term1);
      subst_str_list_term1 = p2p_cdr(subst_str_list_term1);

      if (is_string(subst_str_term)) {
        subst_string = string_val(subst_str_term);
      } else if (is_list(subst_str_term)) {
        subst_string =
          p_charlist_to_c_string(subst_str_term, subst_buf, sizeof(subst_buf),
                                 "RE_SUBSTITUTE", "substitution string");
      } else 
        xsb_abort("RE_SUBSTITUTE: Arg 3 must be a list of strings");
    }

    beg_term = p2p_arg(subst_reg_term,1);
    end_term = p2p_arg(subst_reg_term,2);

    if (!is_int(beg_term) || !is_int(end_term))
      xsb_abort("RE_SUBSTITUTE: Non-integer in Arg 2");
    else{
      beg_offset = int_val(beg_term);
      end_offset = int_val(end_term);
    }
    /* -1 means end of string */
    if (end_offset < 0)
      end_offset = input_len;
    if ((end_offset < beg_offset) || (beg_offset < last_pos))
      xsb_abort("RE_SUBSTITUTE: Substitution regions in Arg 2 not sorted");

    /* do the actual replacement */
    strncpy(output_ptr, input_string + last_pos, beg_offset - last_pos);
    output_ptr = output_ptr + beg_offset - last_pos;
    if (sizeof(output_buffer)
        > (output_ptr - output_buffer + strlen(subst_string)))
      strcpy(output_ptr, subst_string);
    else
      xsb_abort("RE_SUBSTITUTE: Substitution result size %d > maximum %d",
                beg_offset + strlen(subst_string),
                sizeof(output_buffer));
    
    last_pos = end_offset;
    output_ptr = output_ptr + strlen(subst_string);

  } while (!is_nil(subst_spec_list_term1));

  if (sizeof(output_buffer) > (output_ptr-output_buffer+input_len-end_offset))
    strcat(output_ptr, input_string+end_offset);

 EXIT:
  /* get result out */
  if (conversion_required)
    c_string_to_p_charlist(output_buffer,output_term,"RE_SUBSTITUTE","Arg 4");
  else
    /* DO NOT intern. When atom table garbage collection is in place, then
       replace the instruction with this:
                  c2p_string(output_buffer, output_term);
       The reason for not interning is that in Web page
       manipulation it is often necessary to process the same string many
       times. This can cause atom table overflow. Not interning allows us to
       circumvent the problem.  */
    ctop_string(4, output_buffer);
  
  return(TRUE);
}


/* XSB regular expression matcher entry point
   In:
       Arg1: regexp
       Arg2: string
       Arg3: offset
       Arg4: ignorecase
   Out:
       Arg5: list of the form [match(bo0,eo0), match(bo1,eo1),...]
             where bo*,eo* specify the beginning and ending offsets of the
             matched substrings.
             All matched substrings are returned. Parenthesized expressions are
             ignored.
*/
int do_bulkmatch__(void)
{
  prolog_term listHead, listTail;
  /* Prolog args are first assigned to these, so we could examine the types
     of these objects to determine if we got strings or atoms. */
  prolog_term regexp_term, input_term, offset_term;
  prolog_term output_term = p2p_new();
  char *regexp_ptr=NULL;      /* regular expression ptr               */
  char *input_string=NULL;    /* string where matches are to be found */
  int ignorecase=FALSE;
  int return_code, paren_number, offset;
  regmatch_t *match_array;
  int last_pos=0, input_len;
  char regexp_buffer[MAXBUFSIZE];

  if (first_call)
    initialize_regexp_tbl();

  regexp_term = reg_term(1);  /* Arg1: regexp */
  if (is_string(regexp_term)) /* check it */
    regexp_ptr = string_val(regexp_term);
  else if (is_list(regexp_term))
    regexp_ptr =
      p_charlist_to_c_string(regexp_term, regexp_buffer, sizeof(regexp_buffer),
                             "RE_MATCH", "regular expression");
  else
    xsb_abort("RE_MATCH: Arg 1 (the regular expression) must be an atom or a character list");

  input_term = reg_term(2);  /* Arg2: string to find matches in */
  if (is_string(input_term)) /* check it */
    input_string = string_val(input_term);
  else if (is_list(input_term)) {
    input_string =
      p_charlist_to_c_string(input_term, input_buffer, sizeof(input_buffer),
                             "RE_MATCH", "input string");
  } else
    xsb_abort("RE_MATCH: Arg 2 (the input string) must be an atom or a character list");

  input_len = strlen(input_string);
  
  offset_term = reg_term(3); /* arg3: offset within the string */
  if (! is_int(offset_term))
    xsb_abort("RE_MATCH: Arg 3 (the offset) must be an integer");
  offset = int_val(offset_term);
  if (offset < 0 || offset > input_len)
    xsb_abort("RE_MATCH: Arg 3 (=%d) must be between 0 and %d", input_len);

  /* If arg 4 is bound to anything, then consider this as ignore case flag */
  if (! is_var(reg_term(4)))
    ignorecase = TRUE;

  last_pos = offset;
  /* returned result */
  listTail = output_term;
  while (last_pos < input_len) {
    c2p_list(listTail); /* make it into a list */
    listHead = p2p_car(listTail); /* get head of the list */

    return_code = xsb_re_match(regexp_ptr, input_string+last_pos, ignorecase,
                               &match_array, &paren_number);
    /* exit on no match */
    if (! return_code) break;

    /* bind i-th match to listHead as match(beg,end) */
    c2p_functor("match", 2, listHead);
    c2p_int(match_array[0].rm_so+last_pos, p2p_arg(listHead,1));
    c2p_int(match_array[0].rm_eo+last_pos, p2p_arg(listHead,2));

    listTail = p2p_cdr(listTail);
    last_pos = match_array[0].rm_eo+last_pos;
  }
  c2p_nil(listTail); /* bind tail to nil */
  return p2p_unify(output_term, reg_term(5));
}

Next: High Level Foreign Predicate Up: Passing Data between XSB Previous: Examples of Using the Contents Index

Luis Fernando P. de Castro 2003-06-27