/[pcre]/code/trunk/doc/html/pcreapi.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreapi.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1403 by ph10, Fri Jun 14 09:09:28 2013 UTC revision 1404 by ph10, Tue Nov 19 15:36:57 2013 UTC
# Line 484  the Line 484  the
484  <a href="pcreposix.html"><b>pcreposix</b></a>  <a href="pcreposix.html"><b>pcreposix</b></a>
485  documentation.  documentation.
486  <pre>  <pre>
487      PCRE_CONFIG_PARENS_LIMIT
488    </pre>
489    The output is a long integer that gives the maximum depth of nesting of
490    parentheses (of any kind) in a pattern. This limit is imposed to cap the amount
491    of system stack used when a pattern is compiled. It is specified when PCRE is
492    built; the default is 250.
493    <pre>
494    PCRE_CONFIG_MATCH_LIMIT    PCRE_CONFIG_MATCH_LIMIT
495  </pre>  </pre>
496  The output is a long integer that gives the default limit for the number of  The output is a long integer that gives the default limit for the number of
# Line 582  If the final argument, <i>tableptr</i>, Line 589  If the final argument, <i>tableptr</i>,
589  character tables that are built when PCRE is compiled, using the default C  character tables that are built when PCRE is compiled, using the default C
590  locale. Otherwise, <i>tableptr</i> must be an address that is the result of a  locale. Otherwise, <i>tableptr</i> must be an address that is the result of a
591  call to <b>pcre_maketables()</b>. This value is stored with the compiled  call to <b>pcre_maketables()</b>. This value is stored with the compiled
592  pattern, and used again by <b>pcre_exec()</b>, unless another table pointer is  pattern, and used again by <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b> when the
593  passed to it. For more discussion, see the section on locale support below.  pattern is matched. For more discussion, see the section on locale support
594    below.
595  </P>  </P>
596  <P>  <P>
597  This code fragment shows a typical straightforward call to <b>pcre_compile()</b>:  This code fragment shows a typical straightforward call to <b>pcre_compile()</b>:
# Line 668  documentation. Line 676  documentation.
676  <pre>  <pre>
677    PCRE_EXTENDED    PCRE_EXTENDED
678  </pre>  </pre>
679  If this bit is set, white space data characters in the pattern are totally  If this bit is set, most white space characters in the pattern are totally
680  ignored except when escaped or inside a character class. White space does not  ignored except when escaped or inside a character class. However, white space
681  include the VT character (code 11). In addition, characters between an  is not allowed within sequences such as (?&#62; that introduce various
682  unescaped # outside a character class and the next newline, inclusive, are also  parenthesized subpatterns, nor within a numerical quantifier such as {1,3}.
683  ignored. This is equivalent to Perl's /x option, and it can be changed within a  However, ignorable white space is permitted between an item and a following
684  pattern by a (?x) option setting.  quantifier and between a quantifier and a following + that indicates
685    possessiveness.
686    </P>
687    <P>
688    White space did not used to include the VT character (code 11), because Perl
689    did not treat this character as white space. However, Perl changed at release
690    5.18, so PCRE followed at release 8.34, and VT is now treated as white space.
691    </P>
692    <P>
693    PCRE_EXTENDED also causes characters between an unescaped # outside a character
694    class and the next newline, inclusive, to be ignored. PCRE_EXTENDED is
695    equivalent to Perl's /x option, and it can be changed within a pattern by a
696    (?x) option setting.
697  </P>  </P>
698  <P>  <P>
699  Which characters are interpreted as newlines is controlled by the options  Which characters are interpreted as newlines is controlled by the options
# Line 827  were followed by ?: but named parenthese Line 847  were followed by ?: but named parenthese
847  they acquire numbers in the usual way). There is no equivalent of this option  they acquire numbers in the usual way). There is no equivalent of this option
848  in Perl.  in Perl.
849  <pre>  <pre>
850      PCRE_NO_AUTO_POSSESS
851    </pre>
852    If this option is set, it disables "auto-possessification". This is an
853    optimization that, for example, turns a+b into a++b in order to avoid
854    backtracks into a+ that can never be successful. However, if callouts are in
855    use, auto-possessification means that some of them are never taken. You can set
856    this option if you want the matching functions to do a full unoptimized search
857    and run all the callouts, but it is mainly provided for testing purposes.
858    <pre>
859    PCRE_NO_START_OPTIMIZE    PCRE_NO_START_OPTIMIZE
860  </pre>  </pre>
861  This is an option that acts at matching time; that is, it is really an option  This is an option that acts at matching time; that is, it is really an option
# Line 877  page. If an invalid UTF-8 sequence is fo Line 906  page. If an invalid UTF-8 sequence is fo
906  error. If you already know that your pattern is valid, and you want to skip  error. If you already know that your pattern is valid, and you want to skip
907  this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option.  this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option.
908  When it is set, the effect of passing an invalid UTF-8 string as a pattern is  When it is set, the effect of passing an invalid UTF-8 string as a pattern is
909  undefined. It may cause your program to crash. Note that this option can also  undefined. It may cause your program to crash or loop. Note that this option
910  be passed to <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, to suppress the  can also be passed to <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, to suppress
911  validity checking of subject strings only. If the same string is being matched  the validity checking of subject strings only. If the same string is being
912  many times, the option can be safely set for the second and subsequent  matched many times, the option can be safely set for the second and subsequent
913  matchings to improve performance.  matchings to improve performance.
914  </P>  </P>
915  <br><a name="SEC12" href="#TOC1">COMPILATION ERROR CODES</a><br>  <br><a name="SEC12" href="#TOC1">COMPILATION ERROR CODES</a><br>
# Line 925  have fallen out of use. To avoid confusi Line 954  have fallen out of use. To avoid confusi
954    31  POSIX collating elements are not supported    31  POSIX collating elements are not supported
955    32  this version of PCRE is compiled without UTF support    32  this version of PCRE is compiled without UTF support
956    33  [this code is not in use]    33  [this code is not in use]
957    34  character value in \x{...} sequence is too large    34  character value in \x{} or \o{} is too large
958    35  invalid condition (?(0)    35  invalid condition (?(0)
959    36  \C not allowed in lookbehind assertion    36  \C not allowed in lookbehind assertion
960    37  PCRE does not support \L, \l, \N{name}, \U, or \u    37  PCRE does not support \L, \l, \N{name}, \U, or \u
# Line 973  have fallen out of use. To avoid confusi Line 1002  have fallen out of use. To avoid confusi
1002    75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)    75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
1003    76  character value in \u.... sequence is too large    76  character value in \u.... sequence is too large
1004    77  invalid UTF-32 string (specifically UTF-32)    77  invalid UTF-32 string (specifically UTF-32)
1005      78  setting UTF is disabled by the application
1006      79  non-hex character in \x{} (closing brace missing?)
1007      80  non-octal character in \o{} (closing brace missing?)
1008      81  missing opening brace after \o
1009      82  parentheses are too deeply nested
1010      83  invalid range in character class
1011  </pre>  </pre>
1012  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
1013  be used if the limits were changed when PCRE was built.  be used if the limits were changed when PCRE was built.
# Line 1103  There is a longer discussion of PCRE_NO_ Line 1138  There is a longer discussion of PCRE_NO_
1138  <P>  <P>
1139  PCRE handles caseless matching, and determines whether characters are letters,  PCRE handles caseless matching, and determines whether characters are letters,
1140  digits, or whatever, by reference to a set of tables, indexed by character  digits, or whatever, by reference to a set of tables, indexed by character
1141  value. When running in UTF-8 mode, this applies only to characters  code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this
1142  with codes less than 128. By default, higher-valued codes never match escapes  applies only to characters with code points less than 256. By default,
1143  such as \w or \d, but they can be tested with \p if PCRE is built with  higher-valued code points never match escapes such as \w or \d. However, if
1144  Unicode character property support. Alternatively, the PCRE_UCP option can be  PCRE is built with Unicode property support, all characters can be tested with
1145  set at compile time; this causes \w and friends to use Unicode property  \p and \P, or, alternatively, the PCRE_UCP option can be set when a pattern
1146  support instead of built-in tables. The use of locales with Unicode is  is compiled; this causes \w and friends to use Unicode property support
1147  discouraged. If you are handling characters with codes greater than 128, you  instead of the built-in tables.
1148  should either use UTF-8 and Unicode, or use locales, but not try to mix the  </P>
1149  two.  <P>
1150    The use of locales with Unicode is discouraged. If you are handling characters
1151    with code points greater than 128, you should either use Unicode support, or
1152    use locales, but not try to mix the two.
1153  </P>  </P>
1154  <P>  <P>
1155  PCRE contains an internal set of tables that are used when the final argument  PCRE contains an internal set of tables that are used when the final argument
# Line 1129  for this locale support is expected to d Line 1167  for this locale support is expected to d
1167  <P>  <P>
1168  External tables are built by calling the <b>pcre_maketables()</b> function,  External tables are built by calling the <b>pcre_maketables()</b> function,
1169  which has no arguments, in the relevant locale. The result can then be passed  which has no arguments, in the relevant locale. The result can then be passed
1170  to <b>pcre_compile()</b> or <b>pcre_exec()</b> as often as necessary. For  to <b>pcre_compile()</b> as often as necessary. For example, to build and use
1171  example, to build and use tables that are appropriate for the French locale  tables that are appropriate for the French locale (where accented characters
1172  (where accented characters with values greater than 128 are treated as letters),  with values greater than 128 are treated as letters), the following code could
1173  the following code could be used:  be used:
1174  <pre>  <pre>
1175    setlocale(LC_CTYPE, "fr_FR");    setlocale(LC_CTYPE, "fr_FR");
1176    tables = pcre_maketables();    tables = pcre_maketables();
# Line 1150  needed. Line 1188  needed.
1188  <P>  <P>
1189  The pointer that is passed to <b>pcre_compile()</b> is saved with the compiled  The pointer that is passed to <b>pcre_compile()</b> is saved with the compiled
1190  pattern, and the same tables are used via this pointer by <b>pcre_study()</b>  pattern, and the same tables are used via this pointer by <b>pcre_study()</b>
1191  and normally also by <b>pcre_exec()</b>. Thus, by default, for any single  and also by <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>. Thus, for any single
1192  pattern, compilation, studying and matching all happen in the same locale, but  pattern, compilation, studying and matching all happen in the same locale, but
1193  different patterns can be compiled in different locales.  different patterns can be processed in different locales.
1194  </P>  </P>
1195  <P>  <P>
1196  It is possible to pass a table pointer or NULL (indicating the use of the  It is possible to pass a table pointer or NULL (indicating the use of the
1197  internal tables) to <b>pcre_exec()</b>. Although not intended for this purpose,  internal tables) to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> (see the
1198  this facility could be used to match a pattern in a different locale from the  discussion below in the section on matching a pattern). This facility is
1199  one in which it was compiled. Passing table pointers at run time is discussed  provided for use with pre-compiled patterns that have been saved and reloaded.
1200  below in the section on matching a pattern.  Character tables are not saved with patterns, so if a non-standard table was
1201    used at compile time, it must be provided again when the reloaded pattern is
1202    matched. Attempting to use this facility to match a pattern in a different
1203    locale from the one in which it was compiled is likely to lead to anomalous
1204    (usually incorrect) results.
1205  <a name="infoaboutpattern"></a></P>  <a name="infoaboutpattern"></a></P>
1206  <br><a name="SEC15" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br>  <br><a name="SEC15" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br>
1207  <P>  <P>
# Line 1305  is -1. Line 1347  is -1.
1347  </P>  </P>
1348  <P>  <P>
1349  Since for the 32-bit library using the non-UTF-32 mode, this function is unable  Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1350  to return the full 32-bit range of the character, this value is deprecated;  to return the full 32-bit range of characters, this value is deprecated;
1351  instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should  instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
1352  be used.  be used.
1353  <pre>  <pre>
1354      PCRE_INFO_MATCH_EMPTY
1355    </pre>
1356    Return 1 if the pattern can match an empty string, otherwise 0. The fourth
1357    argument should point to an <b>int</b> variable.
1358    <pre>
1359    PCRE_INFO_MATCHLIMIT    PCRE_INFO_MATCHLIMIT
1360  </pre>  </pre>
1361  If the pattern set a match limit by including an item of the form  If the pattern set a match limit by including an item of the form
# Line 1366  contains the parenthesis number. The res Line 1413  contains the parenthesis number. The res
1413  name, zero terminated.  name, zero terminated.
1414  </P>  </P>
1415  <P>  <P>
1416  The names are in alphabetical order. Duplicate names may appear if (?| is used  The names are in alphabetical order. If (?| is used to create multiple groups
1417  to create multiple groups with the same number, as described in the  with the same number, as described in the
1418  <a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>  <a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a>
1419  in the  in the
1420  <a href="pcrepattern.html"><b>pcrepattern</b></a>  <a href="pcrepattern.html"><b>pcrepattern</b></a>
1421  page. Duplicate names for subpatterns with different numbers are permitted only  page, the groups may be given the same name, but there is only one entry in the
1422  if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear in the  table. Different names for groups of the same number are not permitted.
1423  table in the order in which they were found in the pattern. In the absence of  Duplicate names for subpatterns with different numbers are permitted,
1424  (?| this is the order of increasing number; when (?| is used this is not  but only if PCRE_DUPNAMES is set. They appear in the table in the order in
1425  necessarily the case because later subpatterns may have lower numbers.  which they were found in the pattern. In the absence of (?| this is the order
1426    of increasing number; when (?| is used this is not necessarily the case because
1427    later subpatterns may have lower numbers.
1428  </P>  </P>
1429  <P>  <P>
1430  As a simple example of the name/number table, consider the following pattern  As a simple example of the name/number table, consider the following pattern
# Line 1489  returned. For anchored patterns, 0 is re Line 1538  returned. For anchored patterns, 0 is re
1538  <pre>  <pre>
1539    PCRE_INFO_FIRSTCHARACTER    PCRE_INFO_FIRSTCHARACTER
1540  </pre>  </pre>
1541  Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS  Return the fixed first character value in the situation where
1542  returned 1; otherwise returns 0. The fourth argument should point to an  PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth
1543  <b>uint_t</b> variable.  argument should point to an <b>uint_t</b> variable.
1544  </P>  </P>
1545  <P>  <P>
1546  In the 8-bit library, the value is always less than 256. In the 16-bit library  In the 8-bit library, the value is always less than 256. In the 16-bit library
1547  the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value  the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
1548  can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.  can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
 </P>  
 <P>  
 If there is no fixed first value, and if either  
 <br>  
 <br>  
 (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch  
 starts with "^", or  
 <br>  
 <br>  
 (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set  
 (if it were set, the pattern would be anchored),  
 <br>  
 <br>  
 -1 is returned, indicating that the pattern matches only at the start of a  
 subject string or after any newline within the string. Otherwise -2 is  
 returned. For anchored patterns, -2 is returned.  
1549  <pre>  <pre>
1550    PCRE_INFO_REQUIREDCHARFLAGS    PCRE_INFO_REQUIREDCHARFLAGS
1551  </pre>  </pre>
# Line 1725  and is described in the Line 1758  and is described in the
1758  documentation.  documentation.
1759  </P>  </P>
1760  <P>  <P>
1761  The <i>tables</i> field is used to pass a character tables pointer to  The <i>tables</i> field is provided for use with patterns that have been
1762  <b>pcre_exec()</b>; this overrides the value that is stored with the compiled  pre-compiled using custom character tables, saved to disc or elsewhere, and
1763  pattern. A non-NULL value is stored with the compiled pattern only if custom  then reloaded, because the tables that were used to compile a pattern are not
1764  tables were supplied to <b>pcre_compile()</b> via its <i>tableptr</i> argument.  saved with it. See the
 If NULL is passed to <b>pcre_exec()</b> using this mechanism, it forces PCRE's  
 internal tables to be used. This facility is helpful when re-using patterns  
 that have been saved after compiling with an external set of tables, because  
 the external tables might be at a different address when <b>pcre_exec()</b> is  
 called. See the  
1765  <a href="pcreprecompile.html"><b>pcreprecompile</b></a>  <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
1766  documentation for a discussion of saving compiled patterns for later use.  documentation for a discussion of saving compiled patterns for later use. If
1767    NULL is passed using this mechanism, it forces PCRE's internal tables to be
1768    used.
1769    </P>
1770    <P>
1771    <b>Warning:</b> The tables that <b>pcre_exec()</b> uses must be the same as those
1772    that were used when the pattern was compiled. If this is not the case, the
1773    behaviour of <b>pcre_exec()</b> is undefined. Therefore, when a pattern is
1774    compiled and matched in the same process, this field should never be set. In
1775    this (the most common) case, the correct table pointer is automatically passed
1776    with the compiled pattern from <b>pcre_compile()</b> to <b>pcre_exec()</b>.
1777  </P>  </P>
1778  <P>  <P>
1779  If PCRE_EXTRA_MARK is set in the <i>flags</i> field, the <i>mark</i> field must  If PCRE_EXTRA_MARK is set in the <i>flags</i> field, the <i>mark</i> field must
# Line 1953  all the matches in a single subject stri Line 1991  all the matches in a single subject stri
1991  the value of <i>startoffset</i> points to the start of a character (or the end  the value of <i>startoffset</i> points to the start of a character (or the end
1992  of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an  of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an
1993  invalid string as a subject or an invalid value of <i>startoffset</i> is  invalid string as a subject or an invalid value of <i>startoffset</i> is
1994  undefined. Your program may crash.  undefined. Your program may crash or loop.
1995  <pre>  <pre>
1996    PCRE_PARTIAL_HARD    PCRE_PARTIAL_HARD
1997    PCRE_PARTIAL_SOFT    PCRE_PARTIAL_SOFT
# Line 2786  matching string is given first. If there Line 2824  matching string is given first. If there
2824  the longest matches. Unlike <b>pcre_exec()</b>, <b>pcre_dfa_exec()</b> can use  the longest matches. Unlike <b>pcre_exec()</b>, <b>pcre_dfa_exec()</b> can use
2825  the entire <i>ovector</i> for returning matched strings.  the entire <i>ovector</i> for returning matched strings.
2826  </P>  </P>
2827    <P>
2828    NOTE: PCRE's "auto-possessification" optimization usually applies to character
2829    repeats at the end of a pattern (as well as internally). For example, the
2830    pattern "a\d+" is compiled as if it were "a\d++" because there is no point
2831    even considering the possibility of backtracking into the repeated digits. For
2832    DFA matching, this means that only one possible match is found. If you really
2833    do want multiple matches in such cases, either use an ungreedy repeat
2834    ("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling.
2835    </P>
2836  <br><b>  <br><b>
2837  Error returns from <b>pcre_dfa_exec()</b>  Error returns from <b>pcre_dfa_exec()</b>
2838  </b><br>  </b><br>
# Line 2852  Cambridge CB2 3QH, England. Line 2899  Cambridge CB2 3QH, England.
2899  </P>  </P>
2900  <br><a name="SEC26" href="#TOC1">REVISION</a><br>  <br><a name="SEC26" href="#TOC1">REVISION</a><br>
2901  <P>  <P>
2902  Last updated: 12 June 2013  Last updated: 12 November 2013
2903  <br>  <br>
2904  Copyright &copy; 1997-2013 University of Cambridge.  Copyright &copy; 1997-2013 University of Cambridge.
2905  <br>  <br>

Legend:
Removed from v.1403  
changed lines
  Added in v.1404

  ViewVC Help
Powered by ViewVC 1.1.5