--- code/trunk/doc/html/pcreapi.html 2012/01/21 15:59:35 902 +++ code/trunk/doc/html/pcreapi.html 2012/01/21 16:37:17 903 @@ -34,10 +34,11 @@
  • EXTRACTING CAPTURED SUBSTRINGS BY NAME
  • DUPLICATE SUBPATTERN NAMES
  • FINDING ALL POSSIBLE MATCHES -
  • MATCHING A PATTERN: THE ALTERNATIVE FUNCTION -
  • SEE ALSO -
  • AUTHOR -
  • REVISION +
  • OBTAINING AN ESTIMATE OF STACK USAGE +
  • MATCHING A PATTERN: THE ALTERNATIVE FUNCTION +
  • SEE ALSO +
  • AUTHOR +
  • REVISION

    #include <pcre.h> @@ -174,7 +175,7 @@ start with pcre16_ instead of pcre_. For every option that has UTF8 in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit -option names define the same bit values. +option names define the same bit values.

    References to bytes and UTF-8 in this document should be read as references to @@ -182,7 +183,7 @@ specified otherwise. More details of the specific differences for the 16-bit library are given in the pcre16 -page. +page.


    PCRE API OVERVIEW

    @@ -397,7 +398,7 @@ PCRE_CONFIG_UTF8 The output is an integer that is set to one if UTF-8 support is available; -otherwise it is set to zero. If this option is given to the 16-bit version of +otherwise it is set to zero. If this option is given to the 16-bit version of this function, pcre16_config(), the result is PCRE_ERROR_BADOPTION.

       PCRE_CONFIG_UTF16
    @@ -417,6 +418,13 @@
     The output is an integer that is set to one if support for just-in-time
     compiling is available; otherwise it is set to zero.
     
    +  PCRE_CONFIG_JITTARGET
    +
    +The output is a pointer to a zero-terminated "const char *" string. If JIT +support is available, the string contains the name of the architecture for +which the JIT compiler is configured, for example "x86 32bit (little endian + +unaligned)". If JIT support is not available, the result is NULL. +
       PCRE_CONFIG_NEWLINE
     
    The output is an integer whose value specifies the default character sequence @@ -738,7 +746,7 @@ that any Unicode newline sequence should be recognized. The Unicode newline sequences are the three just mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line -separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit +separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit library, the last two are recognized only in UTF-8 mode.

    @@ -808,7 +816,7 @@

       PCRE_NO_UTF8_CHECK
     
    -When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 +When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is automatically checked. There is a discussion about the validity of UTF-8 strings in the @@ -825,7 +833,7 @@

    The following table lists the error codes than may be returned by pcre_compile2(), along with the error messages that may be returned by -both compiling functions. Note that error messages are always 8-bit ASCII +both compiling functions. Note that error messages are always 8-bit ASCII strings, even in 16-bit mode. As PCRE has developed, some error codes have fallen out of use. To avoid confusion, they have not been re-used.

    @@ -899,14 +907,14 @@
       65  different names for subpatterns of the same number are
             not allowed
       66  (*MARK) must have an argument
    -  67  this version of PCRE is not compiled with Unicode property 
    +  67  this version of PCRE is not compiled with Unicode property
             support
       68  \c must be followed by an ASCII character
       69  \k is not followed by a braced, angle-bracketed, or quoted name
       70  internal error: unknown opcode in find_fixedlength()
       71  \N is not supported in a class
       72  too many forward references
    -  73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)    
    +  73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
       74  invalid UTF-16 string (specifically UTF-16)
     
    The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may @@ -1101,12 +1109,12 @@ PCRE_ERROR_NULL the argument code was NULL the argument where was NULL PCRE_ERROR_BADMAGIC the "magic number" was not found - PCRE_ERROR_BADENDIANNESS the pattern was compiled with different + PCRE_ERROR_BADENDIANNESS the pattern was compiled with different endianness PCRE_ERROR_BADOPTION the value of what was invalid
    The "magic number" is placed at the start of each compiled pattern as an simple -check against passing an arbitrary memory pointer. The endianness error can +check against passing an arbitrary memory pointer. The endianness error can occur if a compiled pattern is saved and reloaded on a different host. Here is a typical call of pcre_fullinfo(), to obtain the length of the compiled pattern: @@ -1150,8 +1158,8 @@

    If there is a fixed first value, for example, the letter "c" from a pattern -such as (cat|cow|coyote), its value is returned. In the 8-bit library, the -value is always less than 256; in the 16-bit library the value can be up to +such as (cat|cow|coyote), its value is returned. In the 8-bit library, the +value is always less than 256; in the 16-bit library the value can be up to 0xffff.

    @@ -1427,7 +1435,7 @@ const unsigned char *tables; unsigned char **mark; -In the 16-bit version of this structure, the mark field has type +In the 16-bit version of this structure, the mark field has type "PCRE_UCHAR16 **".

    @@ -2067,14 +2075,14 @@

       PCRE_ERROR_BADMODE (-28)
     
    -This error is given if a pattern that was compiled by the 8-bit library is +This error is given if a pattern that was compiled by the 8-bit library is passed to a 16-bit library function, or vice versa.
       PCRE_ERROR_BADENDIANNESS (-29)
     
    -This error is given if a pattern that was compiled and saved is reloaded on a -host with different endianness. The utility function -pcre_pattern_to_host_byte_order() can be used to convert such a pattern +This error is given if a pattern that was compiled and saved is reloaded on a +host with different endianness. The utility function +pcre_pattern_to_host_byte_order() can be used to convert such a pattern so that it runs on the new host.

    @@ -2084,7 +2092,7 @@ Reason codes for invalid UTF-8 strings

    -This section applies only to the 8-bit library. The corresponding information +This section applies only to the 8-bit library. The corresponding information for the 16-bit library is given in the pcre16 page. @@ -2374,8 +2382,32 @@ substring. Then return 1, which forces pcre_exec() to backtrack and try other alternatives. Ultimately, when it runs out of matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. +

    +
    OBTAINING AN ESTIMATE OF STACK USAGE
    +

    +Matching certain patterns using pcre_exec() can use a lot of process +stack, which in certain environments can be rather limited in size. Some users +find it helpful to have an estimate of the amount of stack that is used by +pcre_exec(), to help them set recursion limits, as described in the +pcrestack +documentation. The estimate that is output by pcretest when called with +the -m and -C options is obtained by calling pcre_exec with +the values NULL, NULL, NULL, -999, and -999 for its first five arguments. +

    +

    +Normally, if its first argument is NULL, pcre_exec() immediately returns +the negative error code PCRE_ERROR_NULL, but with this special combination of +arguments, it returns instead a negative number whose absolute value is the +approximate stack frame size in bytes. (A negative number is used so that it is +clear that no match has happened.) The value is approximate because in some +cases, recursive calls to pcre_exec() occur when there are one or two +additional variables on the stack. +

    +

    +If PCRE has been compiled to use the heap instead of the stack for recursion, +the value returned is the size of each block that is obtained from the heap.

    -
    MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
    +
    MATCHING A PATTERN: THE ALTERNATIVE FUNCTION

    int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, @@ -2550,13 +2582,13 @@ error is given if the output vector is not large enough. This should be extremely rare, as a vector of size 1000 is used.

    -
    SEE ALSO
    +
    SEE ALSO

    pcre16(3), pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).

    -
    AUTHOR
    +
    AUTHOR

    Philip Hazel
    @@ -2565,9 +2597,9 @@ Cambridge CB2 3QH, England.

    -
    REVISION
    +
    REVISION

    -Last updated: 07 January 2012 +Last updated: 21 January 2012
    Copyright © 1997-2012 University of Cambridge.