--- code/trunk/doc/html/pcrecallout.html 2007/02/24 21:40:37 75 +++ code/trunk/doc/html/pcrecallout.html 2011/08/02 11:00:40 654 @@ -17,6 +17,8 @@
  • MISSING CALLOUTS
  • THE CALLOUT INTERFACE
  • RETURN VALUES +
  • AUTHOR +
  • REVISION
    PCRE CALLOUTS

    @@ -35,11 +37,12 @@ a number less than 256 after the letter C. The default value is zero. For example, this pattern has two callout points:

    -  (?C1)\deabc(?C2)def
    +  (?C1)abc(?C2)def
     
    -If the PCRE_AUTO_CALLOUT option bit is set when pcre_compile() is called, -PCRE automatically inserts callouts, all with number 255, before each item in -the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern +If the PCRE_AUTO_CALLOUT option bit is set when pcre_compile() or +pcre_compile2() is called, PCRE automatically inserts callouts, all with +number 255, before each item in the pattern. For example, if PCRE_AUTO_CALLOUT +is used with the pattern
       A(\d{2}|--)
     
    @@ -60,7 +63,8 @@
    MISSING CALLOUTS

    You should be aware that, because of optimizations in the way PCRE matches -patterns, callouts sometimes do not happen. For example, if the pattern is +patterns by default, callouts sometimes do not happen. For example, if the +pattern is

       ab(?C4)cd
     
    @@ -69,28 +73,42 @@ the callout is never reached. However, with "abyd", though the result is still no match, the callout is obeyed.

    +

    +If the pattern is studied, PCRE knows the minimum length of a matching string, +and will immediately give a "no match" return without actually running a match +if the subject is not long enough, or, for unanchored patterns, if it has +been scanned far enough. +

    +

    +You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE +option to pcre_compile(), pcre_exec(), or pcre_dfa_exec(), +or by starting the pattern with (*NO_START_OPT). This slows down the matching +process, but does ensure that callouts such as the example above are obeyed. +


    THE CALLOUT INTERFACE

    During matching, when PCRE reaches a callout point, the external function -defined by pcre_callout is called (if it is set). The only argument is a -pointer to a pcre_callout block. This structure contains the following -fields: +defined by pcre_callout is called (if it is set). This applies to both +the pcre_exec() and the pcre_dfa_exec() matching functions. The +only argument to the callout function is a pointer to a pcre_callout +block. This structure contains the following fields:

    -  int          version;
    -  int          callout_number;
    -  int         *offset_vector;
    -  const char  *subject;
    -  int          subject_length;
    -  int          start_match;
    -  int          current_position;
    -  int          capture_top;
    -  int          capture_last;
    -  void        *callout_data;
    -  int          pattern_position;
    -  int          next_item_length;
    +  int         version;
    +  int         callout_number;
    +  int        *offset_vector;
    +  const char *subject;
    +  int         subject_length;
    +  int         start_match;
    +  int         current_position;
    +  int         capture_top;
    +  int         capture_last;
    +  void       *callout_data;
    +  int         pattern_position;
    +  int         next_item_length;
    +  const unsigned char *mark;
     
    The version field is an integer containing the version number of the -block format. The initial version was 0; the current version is 1. The version +block format. The initial version was 0; the current version is 2. The version number will change again in future if additional fields are added, but the intention is never to remove any of the existing fields.

    @@ -101,40 +119,47 @@

    The offset_vector field is a pointer to the vector of offsets that was -passed by the caller to pcre_exec(). The contents can be inspected in -order to extract substrings that have been matched so far, in the same way as -for extracting substrings after a match has completed. +passed by the caller to pcre_exec() or pcre_dfa_exec(). When +pcre_exec() is used, the contents can be inspected in order to extract +substrings that have been matched so far, in the same way as for extracting +substrings after a match has completed. For pcre_dfa_exec() this field is +not useful.

    The subject and subject_length fields contain copies of the values that were passed to pcre_exec().

    -The start_match field contains the offset within the subject at which the -current match attempt started. If the pattern is not anchored, the callout -function may be called several times from the same point in the pattern for -different starting points in the subject. +The start_match field normally contains the offset within the subject at +which the current match attempt started. However, if the escape sequence \K +has been encountered, this value is changed to reflect the modified starting +point. If the pattern is not anchored, the callout function may be called +several times from the same point in the pattern for different starting points +in the subject.

    The current_position field contains the offset within the subject of the current match pointer.

    -The capture_top field contains one more than the number of the highest -numbered captured substring so far. If no substrings have been captured, -the value of capture_top is one. +When the pcre_exec() function is used, the capture_top field +contains one more than the number of the highest numbered captured substring so +far. If no substrings have been captured, the value of capture_top is +one. This is always the case when pcre_dfa_exec() is used, because it +does not support captured substrings.

    The capture_last field contains the number of the most recently captured -substring. If no substrings have been captured, its value is -1. +substring. If no substrings have been captured, its value is -1. This is always +the case when pcre_dfa_exec() is used.

    The callout_data field contains a value that is passed to -pcre_exec() by the caller specifically so that it can be passed back in -callouts. It is passed in the pcre_callout field of the pcre_extra -data structure. If no such data was passed, the value of callout_data in -a pcre_callout block is NULL. There is a description of the -pcre_extra structure in the +pcre_exec() or pcre_dfa_exec() specifically so that it can be +passed back in callouts. It is passed in the pcre_callout field of the +pcre_extra data structure. If no such data was passed, the value of +callout_data in a pcre_callout block is NULL. There is a +description of the pcre_extra structure in the pcreapi documentation.

    @@ -156,14 +181,21 @@ help in distinguishing between different automatic callouts, which all have the same callout number. However, they are set for all callouts.

    +

    +The mark field is present from version 2 of the pcre_callout +structure. In callouts from pcre_exec() it contains a pointer to the +zero-terminated name of the most recently passed (*MARK) item in the match, or +NULL if there are no (*MARK)s in the current matching path. In callouts from +pcre_dfa_exec() this field always contains NULL. +


    RETURN VALUES

    The external callout function returns an integer to PCRE. If the value is zero, matching proceeds as normal. If the value is greater than zero, matching fails -at the current point, but backtracking to test other matching possibilities -goes ahead, just as if a lookahead assertion had failed. If the value is less -than zero, the match is abandoned, and pcre_exec() returns the negative -value. +at the current point, but the testing of other matching possibilities goes +ahead, just as if a lookahead assertion had failed. If the value is less than +zero, the match is abandoned, and pcre_exec() or pcre_dfa_exec() +returns the negative value.

    Negative values should normally be chosen from the set of PCRE_ERROR_xxx @@ -171,10 +203,21 @@ The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions; it will never be used by PCRE itself.

    +
    AUTHOR

    -Last updated: 09 September 2004 +Philip Hazel +
    +University Computing Service +
    +Cambridge CB2 3QH, England. +
    +

    +
    REVISION
    +

    +Last updated: 31 July 2011 +
    +Copyright © 1997-2011 University of Cambridge.
    -Copyright © 1997-2004 University of Cambridge.

    Return to the PCRE index page.