/[pcre]/code/trunk/doc/html/pcreapi.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreapi.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 902 by ph10, Sat Jan 14 11:16:23 2012 UTC revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC
# Line 34  man page, in case the conversion went wr Line 34  man page, in case the conversion went wr
34  <li><a name="TOC19" href="#SEC19">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>  <li><a name="TOC19" href="#SEC19">EXTRACTING CAPTURED SUBSTRINGS BY NAME</a>
35  <li><a name="TOC20" href="#SEC20">DUPLICATE SUBPATTERN NAMES</a>  <li><a name="TOC20" href="#SEC20">DUPLICATE SUBPATTERN NAMES</a>
36  <li><a name="TOC21" href="#SEC21">FINDING ALL POSSIBLE MATCHES</a>  <li><a name="TOC21" href="#SEC21">FINDING ALL POSSIBLE MATCHES</a>
37  <li><a name="TOC22" href="#SEC22">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>  <li><a name="TOC22" href="#SEC22">OBTAINING AN ESTIMATE OF STACK USAGE</a>
38  <li><a name="TOC23" href="#SEC23">SEE ALSO</a>  <li><a name="TOC23" href="#SEC23">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a>
39  <li><a name="TOC24" href="#SEC24">AUTHOR</a>  <li><a name="TOC24" href="#SEC24">SEE ALSO</a>
40  <li><a name="TOC25" href="#SEC25">REVISION</a>  <li><a name="TOC25" href="#SEC25">AUTHOR</a>
41    <li><a name="TOC26" href="#SEC26">REVISION</a>
42  </ul>  </ul>
43  <P>  <P>
44  <b>#include &#60;pcre.h&#62;</b>  <b>#include &#60;pcre.h&#62;</b>
# Line 174  just use different data types for their Line 175  just use different data types for their
175  start with <b>pcre16_</b> instead of <b>pcre_</b>. For every option that has UTF8  start with <b>pcre16_</b> instead of <b>pcre_</b>. For every option that has UTF8
176  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with
177  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit
178  option names define the same bit values.  option names define the same bit values.
179  </P>  </P>
180  <P>  <P>
181  References to bytes and UTF-8 in this document should be read as references to  References to bytes and UTF-8 in this document should be read as references to
# Line 182  References to bytes and UTF-8 in this do Line 183  References to bytes and UTF-8 in this do
183  specified otherwise. More details of the specific differences for the 16-bit  specified otherwise. More details of the specific differences for the 16-bit
184  library are given in the  library are given in the
185  <a href="pcre16.html"><b>pcre16</b></a>  <a href="pcre16.html"><b>pcre16</b></a>
186  page.  page.
187  </P>  </P>
188  <br><a name="SEC6" href="#TOC1">PCRE API OVERVIEW</a><br>  <br><a name="SEC6" href="#TOC1">PCRE API OVERVIEW</a><br>
189  <P>  <P>
# Line 397  not recognized. The following informatio Line 398  not recognized. The following informatio
398    PCRE_CONFIG_UTF8    PCRE_CONFIG_UTF8
399  </pre>  </pre>
400  The output is an integer that is set to one if UTF-8 support is available;  The output is an integer that is set to one if UTF-8 support is available;
401  otherwise it is set to zero. If this option is given to the 16-bit version of  otherwise it is set to zero. If this option is given to the 16-bit version of
402  this function, <b>pcre16_config()</b>, the result is PCRE_ERROR_BADOPTION.  this function, <b>pcre16_config()</b>, the result is PCRE_ERROR_BADOPTION.
403  <pre>  <pre>
404    PCRE_CONFIG_UTF16    PCRE_CONFIG_UTF16
# Line 417  properties is available; otherwise it is Line 418  properties is available; otherwise it is
418  The output is an integer that is set to one if support for just-in-time  The output is an integer that is set to one if support for just-in-time
419  compiling is available; otherwise it is set to zero.  compiling is available; otherwise it is set to zero.
420  <pre>  <pre>
421      PCRE_CONFIG_JITTARGET
422    </pre>
423    The output is a pointer to a zero-terminated "const char *" string. If JIT
424    support is available, the string contains the name of the architecture for
425    which the JIT compiler is configured, for example "x86 32bit (little endian +
426    unaligned)". If JIT support is not available, the result is NULL.
427    <pre>
428    PCRE_CONFIG_NEWLINE    PCRE_CONFIG_NEWLINE
429  </pre>  </pre>
430  The output is an integer whose value specifies the default character sequence  The output is an integer whose value specifies the default character sequence
# Line 738  preceding sequences should be recognized Line 746  preceding sequences should be recognized
746  that any Unicode newline sequence should be recognized. The Unicode newline  that any Unicode newline sequence should be recognized. The Unicode newline
747  sequences are the three just mentioned, plus the single characters VT (vertical  sequences are the three just mentioned, plus the single characters VT (vertical
748  tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line  tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
749  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
750  library, the last two are recognized only in UTF-8 mode.  library, the last two are recognized only in UTF-8 mode.
751  </P>  </P>
752  <P>  <P>
# Line 808  page. Line 816  page.
816  <pre>  <pre>
817    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
818  </pre>  </pre>
819  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8
820  string is automatically checked. There is a discussion about the  string is automatically checked. There is a discussion about the
821  <a href="pcreunicode.html#utf8strings">validity of UTF-8 strings</a>  <a href="pcreunicode.html#utf8strings">validity of UTF-8 strings</a>
822  in the  in the
# Line 825  validity checking of subject strings. Line 833  validity checking of subject strings.
833  <P>  <P>
834  The following table lists the error codes than may be returned by  The following table lists the error codes than may be returned by
835  <b>pcre_compile2()</b>, along with the error messages that may be returned by  <b>pcre_compile2()</b>, along with the error messages that may be returned by
836  both compiling functions. Note that error messages are always 8-bit ASCII  both compiling functions. Note that error messages are always 8-bit ASCII
837  strings, even in 16-bit mode. As PCRE has developed, some error codes have  strings, even in 16-bit mode. As PCRE has developed, some error codes have
838  fallen out of use. To avoid confusion, they have not been re-used.  fallen out of use. To avoid confusion, they have not been re-used.
839  <pre>  <pre>
# Line 899  fallen out of use. To avoid confusion, t Line 907  fallen out of use. To avoid confusion, t
907    65  different names for subpatterns of the same number are    65  different names for subpatterns of the same number are
908          not allowed          not allowed
909    66  (*MARK) must have an argument    66  (*MARK) must have an argument
910    67  this version of PCRE is not compiled with Unicode property    67  this version of PCRE is not compiled with Unicode property
911          support          support
912    68  \c must be followed by an ASCII character    68  \c must be followed by an ASCII character
913    69  \k is not followed by a braced, angle-bracketed, or quoted name    69  \k is not followed by a braced, angle-bracketed, or quoted name
914    70  internal error: unknown opcode in find_fixedlength()    70  internal error: unknown opcode in find_fixedlength()
915    71  \N is not supported in a class    71  \N is not supported in a class
916    72  too many forward references    72  too many forward references
917    73  disallowed Unicode code point (&#62;= 0xd800 && &#60;= 0xdfff)    73  disallowed Unicode code point (&#62;= 0xd800 && &#60;= 0xdfff)
918    74  invalid UTF-16 string (specifically UTF-16)    74  invalid UTF-16 string (specifically UTF-16)
919  </pre>  </pre>
920  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
# Line 1101  the following negative numbers: Line 1109  the following negative numbers:
1109    PCRE_ERROR_NULL           the argument <i>code</i> was NULL    PCRE_ERROR_NULL           the argument <i>code</i> was NULL
1110                              the argument <i>where</i> was NULL                              the argument <i>where</i> was NULL
1111    PCRE_ERROR_BADMAGIC       the "magic number" was not found    PCRE_ERROR_BADMAGIC       the "magic number" was not found
1112    PCRE_ERROR_BADENDIANNESS  the pattern was compiled with different    PCRE_ERROR_BADENDIANNESS  the pattern was compiled with different
1113                              endianness                              endianness
1114    PCRE_ERROR_BADOPTION      the value of <i>what</i> was invalid    PCRE_ERROR_BADOPTION      the value of <i>what</i> was invalid
1115  </pre>  </pre>
1116  The "magic number" is placed at the start of each compiled pattern as an simple  The "magic number" is placed at the start of each compiled pattern as an simple
1117  check against passing an arbitrary memory pointer. The endianness error can  check against passing an arbitrary memory pointer. The endianness error can
1118  occur if a compiled pattern is saved and reloaded on a different host. Here is  occur if a compiled pattern is saved and reloaded on a different host. Here is
1119  a typical call of <b>pcre_fullinfo()</b>, to obtain the length of the compiled  a typical call of <b>pcre_fullinfo()</b>, to obtain the length of the compiled
1120  pattern:  pattern:
# Line 1150  variable. Line 1158  variable.
1158  </P>  </P>
1159  <P>  <P>
1160  If there is a fixed first value, for example, the letter "c" from a pattern  If there is a fixed first value, for example, the letter "c" from a pattern
1161  such as (cat|cow|coyote), its value is returned. In the 8-bit library, the  such as (cat|cow|coyote), its value is returned. In the 8-bit library, the
1162  value is always less than 256; in the 16-bit library the value can be up to  value is always less than 256; in the 16-bit library the value can be up to
1163  0xffff.  0xffff.
1164  </P>  </P>
1165  <P>  <P>
# Line 1427  fields (not necessarily in this order): Line 1435  fields (not necessarily in this order):
1435    const unsigned char *<i>tables</i>;    const unsigned char *<i>tables</i>;
1436    unsigned char **<i>mark</i>;    unsigned char **<i>mark</i>;
1437  </pre>  </pre>
1438  In the 16-bit version of this structure, the <i>mark</i> field has type  In the 16-bit version of this structure, the <i>mark</i> field has type
1439  "PCRE_UCHAR16 **".  "PCRE_UCHAR16 **".
1440  </P>  </P>
1441  <P>  <P>
# Line 2067  documentation for more details. Line 2075  documentation for more details.
2075  <pre>  <pre>
2076    PCRE_ERROR_BADMODE (-28)    PCRE_ERROR_BADMODE (-28)
2077  </pre>  </pre>
2078  This error is given if a pattern that was compiled by the 8-bit library is  This error is given if a pattern that was compiled by the 8-bit library is
2079  passed to a 16-bit library function, or vice versa.  passed to a 16-bit library function, or vice versa.
2080  <pre>  <pre>
2081    PCRE_ERROR_BADENDIANNESS (-29)    PCRE_ERROR_BADENDIANNESS (-29)
2082  </pre>  </pre>
2083  This error is given if a pattern that was compiled and saved is reloaded on a  This error is given if a pattern that was compiled and saved is reloaded on a
2084  host with different endianness. The utility function  host with different endianness. The utility function
2085  <b>pcre_pattern_to_host_byte_order()</b> can be used to convert such a pattern  <b>pcre_pattern_to_host_byte_order()</b> can be used to convert such a pattern
2086  so that it runs on the new host.  so that it runs on the new host.
2087  </P>  </P>
2088  <P>  <P>
# Line 2084  Error numbers -16 to -20 and -22 are not Line 2092  Error numbers -16 to -20 and -22 are not
2092  Reason codes for invalid UTF-8 strings  Reason codes for invalid UTF-8 strings
2093  </b><br>  </b><br>
2094  <P>  <P>
2095  This section applies only to the 8-bit library. The corresponding information  This section applies only to the 8-bit library. The corresponding information
2096  for the 16-bit library is given in the  for the 16-bit library is given in the
2097  <a href="pcre16.html"><b>pcre16</b></a>  <a href="pcre16.html"><b>pcre16</b></a>
2098  page.  page.
# Line 2374  When your callout function is called, ex Line 2382  When your callout function is called, ex
2382  substring. Then return 1, which forces <b>pcre_exec()</b> to backtrack and try  substring. Then return 1, which forces <b>pcre_exec()</b> to backtrack and try
2383  other alternatives. Ultimately, when it runs out of matches, <b>pcre_exec()</b>  other alternatives. Ultimately, when it runs out of matches, <b>pcre_exec()</b>
2384  will yield PCRE_ERROR_NOMATCH.  will yield PCRE_ERROR_NOMATCH.
2385    </P>
2386    <br><a name="SEC22" href="#TOC1">OBTAINING AN ESTIMATE OF STACK USAGE</a><br>
2387    <P>
2388    Matching certain patterns using <b>pcre_exec()</b> can use a lot of process
2389    stack, which in certain environments can be rather limited in size. Some users
2390    find it helpful to have an estimate of the amount of stack that is used by
2391    <b>pcre_exec()</b>, to help them set recursion limits, as described in the
2392    <a href="pcrestack.html"><b>pcrestack</b></a>
2393    documentation. The estimate that is output by <b>pcretest</b> when called with
2394    the <b>-m</b> and <b>-C</b> options is obtained by calling <b>pcre_exec</b> with
2395    the values NULL, NULL, NULL, -999, and -999 for its first five arguments.
2396    </P>
2397    <P>
2398    Normally, if its first argument is NULL, <b>pcre_exec()</b> immediately returns
2399    the negative error code PCRE_ERROR_NULL, but with this special combination of
2400    arguments, it returns instead a negative number whose absolute value is the
2401    approximate stack frame size in bytes. (A negative number is used so that it is
2402    clear that no match has happened.) The value is approximate because in some
2403    cases, recursive calls to <b>pcre_exec()</b> occur when there are one or two
2404    additional variables on the stack.
2405    </P>
2406    <P>
2407    If PCRE has been compiled to use the heap instead of the stack for recursion,
2408    the value returned is the size of each block that is obtained from the heap.
2409  <a name="dfamatch"></a></P>  <a name="dfamatch"></a></P>
2410  <br><a name="SEC22" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>  <br><a name="SEC23" href="#TOC1">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a><br>
2411  <P>  <P>
2412  <b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>  <b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
2413  <b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>  <b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
# Line 2550  recursively, using private vectors for < Line 2582  recursively, using private vectors for <
2582  error is given if the output vector is not large enough. This should be  error is given if the output vector is not large enough. This should be
2583  extremely rare, as a vector of size 1000 is used.  extremely rare, as a vector of size 1000 is used.
2584  </P>  </P>
2585  <br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
2586  <P>  <P>
2587  <b>pcre16</b>(3), <b>pcrebuild</b>(3), <b>pcrecallout</b>(3), <b>pcrecpp(3)</b>(3),  <b>pcre16</b>(3), <b>pcrebuild</b>(3), <b>pcrecallout</b>(3), <b>pcrecpp(3)</b>(3),
2588  <b>pcrematching</b>(3), <b>pcrepartial</b>(3), <b>pcreposix</b>(3),  <b>pcrematching</b>(3), <b>pcrepartial</b>(3), <b>pcreposix</b>(3),
2589  <b>pcreprecompile</b>(3), <b>pcresample</b>(3), <b>pcrestack</b>(3).  <b>pcreprecompile</b>(3), <b>pcresample</b>(3), <b>pcrestack</b>(3).
2590  </P>  </P>
2591  <br><a name="SEC24" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
2592  <P>  <P>
2593  Philip Hazel  Philip Hazel
2594  <br>  <br>
# Line 2565  University Computing Service Line 2597  University Computing Service
2597  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
2598  <br>  <br>
2599  </P>  </P>
2600  <br><a name="SEC25" href="#TOC1">REVISION</a><br>  <br><a name="SEC26" href="#TOC1">REVISION</a><br>
2601  <P>  <P>
2602  Last updated: 07 January 2012  Last updated: 21 January 2012
2603  <br>  <br>
2604  Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
2605  <br>  <br>

Legend:
Removed from v.902  
changed lines
  Added in v.903

  ViewVC Help
Powered by ViewVC 1.1.5