484 |
<a href="pcreposix.html"><b>pcreposix</b></a> |
<a href="pcreposix.html"><b>pcreposix</b></a> |
485 |
documentation. |
documentation. |
486 |
<pre> |
<pre> |
487 |
|
PCRE_CONFIG_PARENS_LIMIT |
488 |
|
</pre> |
489 |
|
The output is a long integer that gives the maximum depth of nesting of |
490 |
|
parentheses (of any kind) in a pattern. This limit is imposed to cap the amount |
491 |
|
of system stack used when a pattern is compiled. It is specified when PCRE is |
492 |
|
built; the default is 250. |
493 |
|
<pre> |
494 |
PCRE_CONFIG_MATCH_LIMIT |
PCRE_CONFIG_MATCH_LIMIT |
495 |
</pre> |
</pre> |
496 |
The output is a long integer that gives the default limit for the number of |
The output is a long integer that gives the default limit for the number of |
589 |
character tables that are built when PCRE is compiled, using the default C |
character tables that are built when PCRE is compiled, using the default C |
590 |
locale. Otherwise, <i>tableptr</i> must be an address that is the result of a |
locale. Otherwise, <i>tableptr</i> must be an address that is the result of a |
591 |
call to <b>pcre_maketables()</b>. This value is stored with the compiled |
call to <b>pcre_maketables()</b>. This value is stored with the compiled |
592 |
pattern, and used again by <b>pcre_exec()</b>, unless another table pointer is |
pattern, and used again by <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b> when the |
593 |
passed to it. For more discussion, see the section on locale support below. |
pattern is matched. For more discussion, see the section on locale support |
594 |
|
below. |
595 |
</P> |
</P> |
596 |
<P> |
<P> |
597 |
This code fragment shows a typical straightforward call to <b>pcre_compile()</b>: |
This code fragment shows a typical straightforward call to <b>pcre_compile()</b>: |
676 |
<pre> |
<pre> |
677 |
PCRE_EXTENDED |
PCRE_EXTENDED |
678 |
</pre> |
</pre> |
679 |
If this bit is set, white space data characters in the pattern are totally |
If this bit is set, most white space characters in the pattern are totally |
680 |
ignored except when escaped or inside a character class. White space does not |
ignored except when escaped or inside a character class. However, white space |
681 |
include the VT character (code 11). In addition, characters between an |
is not allowed within sequences such as (?> that introduce various |
682 |
unescaped # outside a character class and the next newline, inclusive, are also |
parenthesized subpatterns, nor within a numerical quantifier such as {1,3}. |
683 |
ignored. This is equivalent to Perl's /x option, and it can be changed within a |
However, ignorable white space is permitted between an item and a following |
684 |
pattern by a (?x) option setting. |
quantifier and between a quantifier and a following + that indicates |
685 |
|
possessiveness. |
686 |
|
</P> |
687 |
|
<P> |
688 |
|
White space did not used to include the VT character (code 11), because Perl |
689 |
|
did not treat this character as white space. However, Perl changed at release |
690 |
|
5.18, so PCRE followed at release 8.34, and VT is now treated as white space. |
691 |
|
</P> |
692 |
|
<P> |
693 |
|
PCRE_EXTENDED also causes characters between an unescaped # outside a character |
694 |
|
class and the next newline, inclusive, to be ignored. PCRE_EXTENDED is |
695 |
|
equivalent to Perl's /x option, and it can be changed within a pattern by a |
696 |
|
(?x) option setting. |
697 |
</P> |
</P> |
698 |
<P> |
<P> |
699 |
Which characters are interpreted as newlines is controlled by the options |
Which characters are interpreted as newlines is controlled by the options |
847 |
they acquire numbers in the usual way). There is no equivalent of this option |
they acquire numbers in the usual way). There is no equivalent of this option |
848 |
in Perl. |
in Perl. |
849 |
<pre> |
<pre> |
850 |
|
PCRE_NO_AUTO_POSSESS |
851 |
|
</pre> |
852 |
|
If this option is set, it disables "auto-possessification". This is an |
853 |
|
optimization that, for example, turns a+b into a++b in order to avoid |
854 |
|
backtracks into a+ that can never be successful. However, if callouts are in |
855 |
|
use, auto-possessification means that some of them are never taken. You can set |
856 |
|
this option if you want the matching functions to do a full unoptimized search |
857 |
|
and run all the callouts, but it is mainly provided for testing purposes. |
858 |
|
<pre> |
859 |
PCRE_NO_START_OPTIMIZE |
PCRE_NO_START_OPTIMIZE |
860 |
</pre> |
</pre> |
861 |
This is an option that acts at matching time; that is, it is really an option |
This is an option that acts at matching time; that is, it is really an option |
906 |
error. If you already know that your pattern is valid, and you want to skip |
error. If you already know that your pattern is valid, and you want to skip |
907 |
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. |
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. |
908 |
When it is set, the effect of passing an invalid UTF-8 string as a pattern is |
When it is set, the effect of passing an invalid UTF-8 string as a pattern is |
909 |
undefined. It may cause your program to crash. Note that this option can also |
undefined. It may cause your program to crash or loop. Note that this option |
910 |
be passed to <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, to suppress the |
can also be passed to <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, to suppress |
911 |
validity checking of subject strings only. If the same string is being matched |
the validity checking of subject strings only. If the same string is being |
912 |
many times, the option can be safely set for the second and subsequent |
matched many times, the option can be safely set for the second and subsequent |
913 |
matchings to improve performance. |
matchings to improve performance. |
914 |
</P> |
</P> |
915 |
<br><a name="SEC12" href="#TOC1">COMPILATION ERROR CODES</a><br> |
<br><a name="SEC12" href="#TOC1">COMPILATION ERROR CODES</a><br> |
954 |
31 POSIX collating elements are not supported |
31 POSIX collating elements are not supported |
955 |
32 this version of PCRE is compiled without UTF support |
32 this version of PCRE is compiled without UTF support |
956 |
33 [this code is not in use] |
33 [this code is not in use] |
957 |
34 character value in \x{...} sequence is too large |
34 character value in \x{} or \o{} is too large |
958 |
35 invalid condition (?(0) |
35 invalid condition (?(0) |
959 |
36 \C not allowed in lookbehind assertion |
36 \C not allowed in lookbehind assertion |
960 |
37 PCRE does not support \L, \l, \N{name}, \U, or \u |
37 PCRE does not support \L, \l, \N{name}, \U, or \u |
1002 |
75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) |
75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) |
1003 |
76 character value in \u.... sequence is too large |
76 character value in \u.... sequence is too large |
1004 |
77 invalid UTF-32 string (specifically UTF-32) |
77 invalid UTF-32 string (specifically UTF-32) |
1005 |
|
78 setting UTF is disabled by the application |
1006 |
|
79 non-hex character in \x{} (closing brace missing?) |
1007 |
|
80 non-octal character in \o{} (closing brace missing?) |
1008 |
|
81 missing opening brace after \o |
1009 |
|
82 parentheses are too deeply nested |
1010 |
|
83 invalid range in character class |
1011 |
</pre> |
</pre> |
1012 |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
1013 |
be used if the limits were changed when PCRE was built. |
be used if the limits were changed when PCRE was built. |
1138 |
<P> |
<P> |
1139 |
PCRE handles caseless matching, and determines whether characters are letters, |
PCRE handles caseless matching, and determines whether characters are letters, |
1140 |
digits, or whatever, by reference to a set of tables, indexed by character |
digits, or whatever, by reference to a set of tables, indexed by character |
1141 |
value. When running in UTF-8 mode, this applies only to characters |
code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this |
1142 |
with codes less than 128. By default, higher-valued codes never match escapes |
applies only to characters with code points less than 256. By default, |
1143 |
such as \w or \d, but they can be tested with \p if PCRE is built with |
higher-valued code points never match escapes such as \w or \d. However, if |
1144 |
Unicode character property support. Alternatively, the PCRE_UCP option can be |
PCRE is built with Unicode property support, all characters can be tested with |
1145 |
set at compile time; this causes \w and friends to use Unicode property |
\p and \P, or, alternatively, the PCRE_UCP option can be set when a pattern |
1146 |
support instead of built-in tables. The use of locales with Unicode is |
is compiled; this causes \w and friends to use Unicode property support |
1147 |
discouraged. If you are handling characters with codes greater than 128, you |
instead of the built-in tables. |
1148 |
should either use UTF-8 and Unicode, or use locales, but not try to mix the |
</P> |
1149 |
two. |
<P> |
1150 |
|
The use of locales with Unicode is discouraged. If you are handling characters |
1151 |
|
with code points greater than 128, you should either use Unicode support, or |
1152 |
|
use locales, but not try to mix the two. |
1153 |
</P> |
</P> |
1154 |
<P> |
<P> |
1155 |
PCRE contains an internal set of tables that are used when the final argument |
PCRE contains an internal set of tables that are used when the final argument |
1167 |
<P> |
<P> |
1168 |
External tables are built by calling the <b>pcre_maketables()</b> function, |
External tables are built by calling the <b>pcre_maketables()</b> function, |
1169 |
which has no arguments, in the relevant locale. The result can then be passed |
which has no arguments, in the relevant locale. The result can then be passed |
1170 |
to <b>pcre_compile()</b> or <b>pcre_exec()</b> as often as necessary. For |
to <b>pcre_compile()</b> as often as necessary. For example, to build and use |
1171 |
example, to build and use tables that are appropriate for the French locale |
tables that are appropriate for the French locale (where accented characters |
1172 |
(where accented characters with values greater than 128 are treated as letters), |
with values greater than 128 are treated as letters), the following code could |
1173 |
the following code could be used: |
be used: |
1174 |
<pre> |
<pre> |
1175 |
setlocale(LC_CTYPE, "fr_FR"); |
setlocale(LC_CTYPE, "fr_FR"); |
1176 |
tables = pcre_maketables(); |
tables = pcre_maketables(); |
1188 |
<P> |
<P> |
1189 |
The pointer that is passed to <b>pcre_compile()</b> is saved with the compiled |
The pointer that is passed to <b>pcre_compile()</b> is saved with the compiled |
1190 |
pattern, and the same tables are used via this pointer by <b>pcre_study()</b> |
pattern, and the same tables are used via this pointer by <b>pcre_study()</b> |
1191 |
and normally also by <b>pcre_exec()</b>. Thus, by default, for any single |
and also by <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>. Thus, for any single |
1192 |
pattern, compilation, studying and matching all happen in the same locale, but |
pattern, compilation, studying and matching all happen in the same locale, but |
1193 |
different patterns can be compiled in different locales. |
different patterns can be processed in different locales. |
1194 |
</P> |
</P> |
1195 |
<P> |
<P> |
1196 |
It is possible to pass a table pointer or NULL (indicating the use of the |
It is possible to pass a table pointer or NULL (indicating the use of the |
1197 |
internal tables) to <b>pcre_exec()</b>. Although not intended for this purpose, |
internal tables) to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> (see the |
1198 |
this facility could be used to match a pattern in a different locale from the |
discussion below in the section on matching a pattern). This facility is |
1199 |
one in which it was compiled. Passing table pointers at run time is discussed |
provided for use with pre-compiled patterns that have been saved and reloaded. |
1200 |
below in the section on matching a pattern. |
Character tables are not saved with patterns, so if a non-standard table was |
1201 |
|
used at compile time, it must be provided again when the reloaded pattern is |
1202 |
|
matched. Attempting to use this facility to match a pattern in a different |
1203 |
|
locale from the one in which it was compiled is likely to lead to anomalous |
1204 |
|
(usually incorrect) results. |
1205 |
<a name="infoaboutpattern"></a></P> |
<a name="infoaboutpattern"></a></P> |
1206 |
<br><a name="SEC15" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br> |
<br><a name="SEC15" href="#TOC1">INFORMATION ABOUT A PATTERN</a><br> |
1207 |
<P> |
<P> |
1347 |
</P> |
</P> |
1348 |
<P> |
<P> |
1349 |
Since for the 32-bit library using the non-UTF-32 mode, this function is unable |
Since for the 32-bit library using the non-UTF-32 mode, this function is unable |
1350 |
to return the full 32-bit range of the character, this value is deprecated; |
to return the full 32-bit range of characters, this value is deprecated; |
1351 |
instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should |
instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should |
1352 |
be used. |
be used. |
1353 |
<pre> |
<pre> |
1354 |
|
PCRE_INFO_MATCH_EMPTY |
1355 |
|
</pre> |
1356 |
|
Return 1 if the pattern can match an empty string, otherwise 0. The fourth |
1357 |
|
argument should point to an <b>int</b> variable. |
1358 |
|
<pre> |
1359 |
PCRE_INFO_MATCHLIMIT |
PCRE_INFO_MATCHLIMIT |
1360 |
</pre> |
</pre> |
1361 |
If the pattern set a match limit by including an item of the form |
If the pattern set a match limit by including an item of the form |
1413 |
name, zero terminated. |
name, zero terminated. |
1414 |
</P> |
</P> |
1415 |
<P> |
<P> |
1416 |
The names are in alphabetical order. Duplicate names may appear if (?| is used |
The names are in alphabetical order. If (?| is used to create multiple groups |
1417 |
to create multiple groups with the same number, as described in the |
with the same number, as described in the |
1418 |
<a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a> |
<a href="pcrepattern.html#dupsubpatternnumber">section on duplicate subpattern numbers</a> |
1419 |
in the |
in the |
1420 |
<a href="pcrepattern.html"><b>pcrepattern</b></a> |
<a href="pcrepattern.html"><b>pcrepattern</b></a> |
1421 |
page. Duplicate names for subpatterns with different numbers are permitted only |
page, the groups may be given the same name, but there is only one entry in the |
1422 |
if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear in the |
table. Different names for groups of the same number are not permitted. |
1423 |
table in the order in which they were found in the pattern. In the absence of |
Duplicate names for subpatterns with different numbers are permitted, |
1424 |
(?| this is the order of increasing number; when (?| is used this is not |
but only if PCRE_DUPNAMES is set. They appear in the table in the order in |
1425 |
necessarily the case because later subpatterns may have lower numbers. |
which they were found in the pattern. In the absence of (?| this is the order |
1426 |
|
of increasing number; when (?| is used this is not necessarily the case because |
1427 |
|
later subpatterns may have lower numbers. |
1428 |
</P> |
</P> |
1429 |
<P> |
<P> |
1430 |
As a simple example of the name/number table, consider the following pattern |
As a simple example of the name/number table, consider the following pattern |
1538 |
<pre> |
<pre> |
1539 |
PCRE_INFO_FIRSTCHARACTER |
PCRE_INFO_FIRSTCHARACTER |
1540 |
</pre> |
</pre> |
1541 |
Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS |
Return the fixed first character value in the situation where |
1542 |
returned 1; otherwise returns 0. The fourth argument should point to an |
PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; otherwise return 0. The fourth |
1543 |
<b>uint_t</b> variable. |
argument should point to an <b>uint_t</b> variable. |
1544 |
</P> |
</P> |
1545 |
<P> |
<P> |
1546 |
In the 8-bit library, the value is always less than 256. In the 16-bit library |
In the 8-bit library, the value is always less than 256. In the 16-bit library |
1547 |
the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value |
the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value |
1548 |
can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode. |
can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode. |
|
</P> |
|
|
<P> |
|
|
If there is no fixed first value, and if either |
|
|
<br> |
|
|
<br> |
|
|
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch |
|
|
starts with "^", or |
|
|
<br> |
|
|
<br> |
|
|
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set |
|
|
(if it were set, the pattern would be anchored), |
|
|
<br> |
|
|
<br> |
|
|
-1 is returned, indicating that the pattern matches only at the start of a |
|
|
subject string or after any newline within the string. Otherwise -2 is |
|
|
returned. For anchored patterns, -2 is returned. |
|
1549 |
<pre> |
<pre> |
1550 |
PCRE_INFO_REQUIREDCHARFLAGS |
PCRE_INFO_REQUIREDCHARFLAGS |
1551 |
</pre> |
</pre> |
1758 |
documentation. |
documentation. |
1759 |
</P> |
</P> |
1760 |
<P> |
<P> |
1761 |
The <i>tables</i> field is used to pass a character tables pointer to |
The <i>tables</i> field is provided for use with patterns that have been |
1762 |
<b>pcre_exec()</b>; this overrides the value that is stored with the compiled |
pre-compiled using custom character tables, saved to disc or elsewhere, and |
1763 |
pattern. A non-NULL value is stored with the compiled pattern only if custom |
then reloaded, because the tables that were used to compile a pattern are not |
1764 |
tables were supplied to <b>pcre_compile()</b> via its <i>tableptr</i> argument. |
saved with it. See the |
|
If NULL is passed to <b>pcre_exec()</b> using this mechanism, it forces PCRE's |
|
|
internal tables to be used. This facility is helpful when re-using patterns |
|
|
that have been saved after compiling with an external set of tables, because |
|
|
the external tables might be at a different address when <b>pcre_exec()</b> is |
|
|
called. See the |
|
1765 |
<a href="pcreprecompile.html"><b>pcreprecompile</b></a> |
<a href="pcreprecompile.html"><b>pcreprecompile</b></a> |
1766 |
documentation for a discussion of saving compiled patterns for later use. |
documentation for a discussion of saving compiled patterns for later use. If |
1767 |
|
NULL is passed using this mechanism, it forces PCRE's internal tables to be |
1768 |
|
used. |
1769 |
|
</P> |
1770 |
|
<P> |
1771 |
|
<b>Warning:</b> The tables that <b>pcre_exec()</b> uses must be the same as those |
1772 |
|
that were used when the pattern was compiled. If this is not the case, the |
1773 |
|
behaviour of <b>pcre_exec()</b> is undefined. Therefore, when a pattern is |
1774 |
|
compiled and matched in the same process, this field should never be set. In |
1775 |
|
this (the most common) case, the correct table pointer is automatically passed |
1776 |
|
with the compiled pattern from <b>pcre_compile()</b> to <b>pcre_exec()</b>. |
1777 |
</P> |
</P> |
1778 |
<P> |
<P> |
1779 |
If PCRE_EXTRA_MARK is set in the <i>flags</i> field, the <i>mark</i> field must |
If PCRE_EXTRA_MARK is set in the <i>flags</i> field, the <i>mark</i> field must |
1991 |
the value of <i>startoffset</i> points to the start of a character (or the end |
the value of <i>startoffset</i> points to the start of a character (or the end |
1992 |
of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an |
of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an |
1993 |
invalid string as a subject or an invalid value of <i>startoffset</i> is |
invalid string as a subject or an invalid value of <i>startoffset</i> is |
1994 |
undefined. Your program may crash. |
undefined. Your program may crash or loop. |
1995 |
<pre> |
<pre> |
1996 |
PCRE_PARTIAL_HARD |
PCRE_PARTIAL_HARD |
1997 |
PCRE_PARTIAL_SOFT |
PCRE_PARTIAL_SOFT |
2824 |
the longest matches. Unlike <b>pcre_exec()</b>, <b>pcre_dfa_exec()</b> can use |
the longest matches. Unlike <b>pcre_exec()</b>, <b>pcre_dfa_exec()</b> can use |
2825 |
the entire <i>ovector</i> for returning matched strings. |
the entire <i>ovector</i> for returning matched strings. |
2826 |
</P> |
</P> |
2827 |
|
<P> |
2828 |
|
NOTE: PCRE's "auto-possessification" optimization usually applies to character |
2829 |
|
repeats at the end of a pattern (as well as internally). For example, the |
2830 |
|
pattern "a\d+" is compiled as if it were "a\d++" because there is no point |
2831 |
|
even considering the possibility of backtracking into the repeated digits. For |
2832 |
|
DFA matching, this means that only one possible match is found. If you really |
2833 |
|
do want multiple matches in such cases, either use an ungreedy repeat |
2834 |
|
("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. |
2835 |
|
</P> |
2836 |
<br><b> |
<br><b> |
2837 |
Error returns from <b>pcre_dfa_exec()</b> |
Error returns from <b>pcre_dfa_exec()</b> |
2838 |
</b><br> |
</b><br> |
2899 |
</P> |
</P> |
2900 |
<br><a name="SEC26" href="#TOC1">REVISION</a><br> |
<br><a name="SEC26" href="#TOC1">REVISION</a><br> |
2901 |
<P> |
<P> |
2902 |
Last updated: 12 June 2013 |
Last updated: 12 November 2013 |
2903 |
<br> |
<br> |
2904 |
Copyright © 1997-2013 University of Cambridge. |
Copyright © 1997-2013 University of Cambridge. |
2905 |
<br> |
<br> |