361 |
conventional to use the standard for your operating system. |
conventional to use the standard for your operating system. |
362 |
|
|
363 |
|
|
364 |
|
WHAT \R MATCHES |
365 |
|
|
366 |
|
By default, the sequence \R in a pattern matches any Unicode newline |
367 |
|
sequence, whatever has been selected as the line ending sequence. If |
368 |
|
you specify |
369 |
|
|
370 |
|
--enable-bsr-anycrlf |
371 |
|
|
372 |
|
the default is changed so that \R matches only CR, LF, or CRLF. What- |
373 |
|
ever is selected when PCRE is built can be overridden when the library |
374 |
|
functions are called. |
375 |
|
|
376 |
|
|
377 |
BUILDING SHARED AND STATIC LIBRARIES |
BUILDING SHARED AND STATIC LIBRARIES |
378 |
|
|
379 |
The PCRE building process uses libtool to build both shared and static |
The PCRE building process uses libtool to build both shared and static |
526 |
|
|
527 |
REVISION |
REVISION |
528 |
|
|
529 |
Last updated: 30 July 2007 |
Last updated: 11 September 2007 |
530 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
531 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
532 |
|
|
932 |
dollar metacharacters, the handling of #-comments in /x mode, and, when |
dollar metacharacters, the handling of #-comments in /x mode, and, when |
933 |
CRLF is a recognized line ending sequence, the match position advance- |
CRLF is a recognized line ending sequence, the match position advance- |
934 |
ment for a non-anchored pattern. There is more detail about this in the |
ment for a non-anchored pattern. There is more detail about this in the |
935 |
section on pcre_exec() options below. The choice of newline convention |
section on pcre_exec() options below. |
936 |
does not affect the interpretation of the \n or \r escape sequences. |
|
937 |
|
The choice of newline convention does not affect the interpretation of |
938 |
|
the \n or \r escape sequences, nor does it affect what \R matches, |
939 |
|
which is controlled in a similar way, but by separate options. |
940 |
|
|
941 |
|
|
942 |
MULTITHREADING |
MULTITHREADING |
943 |
|
|
944 |
The PCRE functions can be used in multi-threading applications, with |
The PCRE functions can be used in multi-threading applications, with |
945 |
the proviso that the memory management functions pointed to by |
the proviso that the memory management functions pointed to by |
946 |
pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the |
pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the |
947 |
callout function pointed to by pcre_callout, are shared by all threads. |
callout function pointed to by pcre_callout, are shared by all threads. |
948 |
|
|
949 |
The compiled form of a regular expression is not altered during match- |
The compiled form of a regular expression is not altered during match- |
950 |
ing, so the same compiled pattern can safely be used by several threads |
ing, so the same compiled pattern can safely be used by several threads |
951 |
at once. |
at once. |
952 |
|
|
954 |
SAVING PRECOMPILED PATTERNS FOR LATER USE |
SAVING PRECOMPILED PATTERNS FOR LATER USE |
955 |
|
|
956 |
The compiled form of a regular expression can be saved and re-used at a |
The compiled form of a regular expression can be saved and re-used at a |
957 |
later time, possibly by a different program, and even on a host other |
later time, possibly by a different program, and even on a host other |
958 |
than the one on which it was compiled. Details are given in the |
than the one on which it was compiled. Details are given in the |
959 |
pcreprecompile documentation. However, compiling a regular expression |
pcreprecompile documentation. However, compiling a regular expression |
960 |
with one version of PCRE for use with a different version is not guar- |
with one version of PCRE for use with a different version is not guar- |
961 |
anteed to work and may cause crashes. |
anteed to work and may cause crashes. |
962 |
|
|
963 |
|
|
965 |
|
|
966 |
int pcre_config(int what, void *where); |
int pcre_config(int what, void *where); |
967 |
|
|
968 |
The function pcre_config() makes it possible for a PCRE client to dis- |
The function pcre_config() makes it possible for a PCRE client to dis- |
969 |
cover which optional features have been compiled into the PCRE library. |
cover which optional features have been compiled into the PCRE library. |
970 |
The pcrebuild documentation has more details about these optional fea- |
The pcrebuild documentation has more details about these optional fea- |
971 |
tures. |
tures. |
972 |
|
|
973 |
The first argument for pcre_config() is an integer, specifying which |
The first argument for pcre_config() is an integer, specifying which |
974 |
information is required; the second argument is a pointer to a variable |
information is required; the second argument is a pointer to a variable |
975 |
into which the information is placed. The following information is |
into which the information is placed. The following information is |
976 |
available: |
available: |
977 |
|
|
978 |
PCRE_CONFIG_UTF8 |
PCRE_CONFIG_UTF8 |
979 |
|
|
980 |
The output is an integer that is set to one if UTF-8 support is avail- |
The output is an integer that is set to one if UTF-8 support is avail- |
981 |
able; otherwise it is set to zero. |
able; otherwise it is set to zero. |
982 |
|
|
983 |
PCRE_CONFIG_UNICODE_PROPERTIES |
PCRE_CONFIG_UNICODE_PROPERTIES |
984 |
|
|
985 |
The output is an integer that is set to one if support for Unicode |
The output is an integer that is set to one if support for Unicode |
986 |
character properties is available; otherwise it is set to zero. |
character properties is available; otherwise it is set to zero. |
987 |
|
|
988 |
PCRE_CONFIG_NEWLINE |
PCRE_CONFIG_NEWLINE |
989 |
|
|
990 |
The output is an integer whose value specifies the default character |
The output is an integer whose value specifies the default character |
991 |
sequence that is recognized as meaning "newline". The four values that |
sequence that is recognized as meaning "newline". The four values that |
992 |
are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, |
are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, |
993 |
and -1 for ANY. The default should normally be the standard sequence |
and -1 for ANY. The default should normally be the standard sequence |
994 |
for your operating system. |
for your operating system. |
995 |
|
|
996 |
|
PCRE_CONFIG_BSR |
997 |
|
|
998 |
|
The output is an integer whose value indicates what character sequences |
999 |
|
the \R escape sequence matches by default. A value of 0 means that \R |
1000 |
|
matches any Unicode line ending sequence; a value of 1 means that \R |
1001 |
|
matches only CR, LF, or CRLF. The default can be overridden when a pat- |
1002 |
|
tern is compiled or matched. |
1003 |
|
|
1004 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
1005 |
|
|
1006 |
The output is an integer that contains the number of bytes used for |
The output is an integer that contains the number of bytes used for |
1007 |
internal linkage in compiled regular expressions. The value is 2, 3, or |
internal linkage in compiled regular expressions. The value is 2, 3, or |
1008 |
4. Larger values allow larger regular expressions to be compiled, at |
4. Larger values allow larger regular expressions to be compiled, at |
1009 |
the expense of slower matching. The default value of 2 is sufficient |
the expense of slower matching. The default value of 2 is sufficient |
1010 |
for all but the most massive patterns, since it allows the compiled |
for all but the most massive patterns, since it allows the compiled |
1011 |
pattern to be up to 64K in size. |
pattern to be up to 64K in size. |
1012 |
|
|
1013 |
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD |
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD |
1014 |
|
|
1015 |
The output is an integer that contains the threshold above which the |
The output is an integer that contains the threshold above which the |
1016 |
POSIX interface uses malloc() for output vectors. Further details are |
POSIX interface uses malloc() for output vectors. Further details are |
1017 |
given in the pcreposix documentation. |
given in the pcreposix documentation. |
1018 |
|
|
1019 |
PCRE_CONFIG_MATCH_LIMIT |
PCRE_CONFIG_MATCH_LIMIT |
1020 |
|
|
1021 |
The output is an integer that gives the default limit for the number of |
The output is an integer that gives the default limit for the number of |
1022 |
internal matching function calls in a pcre_exec() execution. Further |
internal matching function calls in a pcre_exec() execution. Further |
1023 |
details are given with pcre_exec() below. |
details are given with pcre_exec() below. |
1024 |
|
|
1025 |
PCRE_CONFIG_MATCH_LIMIT_RECURSION |
PCRE_CONFIG_MATCH_LIMIT_RECURSION |
1026 |
|
|
1027 |
The output is an integer that gives the default limit for the depth of |
The output is an integer that gives the default limit for the depth of |
1028 |
recursion when calling the internal matching function in a pcre_exec() |
recursion when calling the internal matching function in a pcre_exec() |
1029 |
execution. Further details are given with pcre_exec() below. |
execution. Further details are given with pcre_exec() below. |
1030 |
|
|
1031 |
PCRE_CONFIG_STACKRECURSE |
PCRE_CONFIG_STACKRECURSE |
1032 |
|
|
1033 |
The output is an integer that is set to one if internal recursion when |
The output is an integer that is set to one if internal recursion when |
1034 |
running pcre_exec() is implemented by recursive function calls that use |
running pcre_exec() is implemented by recursive function calls that use |
1035 |
the stack to remember their state. This is the usual way that PCRE is |
the stack to remember their state. This is the usual way that PCRE is |
1036 |
compiled. The output is zero if PCRE was compiled to use blocks of data |
compiled. The output is zero if PCRE was compiled to use blocks of data |
1037 |
on the heap instead of recursive function calls. In this case, |
on the heap instead of recursive function calls. In this case, |
1038 |
pcre_stack_malloc and pcre_stack_free are called to manage memory |
pcre_stack_malloc and pcre_stack_free are called to manage memory |
1039 |
blocks on the heap, thus avoiding the use of the stack. |
blocks on the heap, thus avoiding the use of the stack. |
1040 |
|
|
1041 |
|
|
1052 |
|
|
1053 |
Either of the functions pcre_compile() or pcre_compile2() can be called |
Either of the functions pcre_compile() or pcre_compile2() can be called |
1054 |
to compile a pattern into an internal form. The only difference between |
to compile a pattern into an internal form. The only difference between |
1055 |
the two interfaces is that pcre_compile2() has an additional argument, |
the two interfaces is that pcre_compile2() has an additional argument, |
1056 |
errorcodeptr, via which a numerical error code can be returned. |
errorcodeptr, via which a numerical error code can be returned. |
1057 |
|
|
1058 |
The pattern is a C string terminated by a binary zero, and is passed in |
The pattern is a C string terminated by a binary zero, and is passed in |
1059 |
the pattern argument. A pointer to a single block of memory that is |
the pattern argument. A pointer to a single block of memory that is |
1060 |
obtained via pcre_malloc is returned. This contains the compiled code |
obtained via pcre_malloc is returned. This contains the compiled code |
1061 |
and related data. The pcre type is defined for the returned block; this |
and related data. The pcre type is defined for the returned block; this |
1062 |
is a typedef for a structure whose contents are not externally defined. |
is a typedef for a structure whose contents are not externally defined. |
1063 |
It is up to the caller to free the memory (via pcre_free) when it is no |
It is up to the caller to free the memory (via pcre_free) when it is no |
1064 |
longer required. |
longer required. |
1065 |
|
|
1066 |
Although the compiled code of a PCRE regex is relocatable, that is, it |
Although the compiled code of a PCRE regex is relocatable, that is, it |
1067 |
does not depend on memory location, the complete pcre data block is not |
does not depend on memory location, the complete pcre data block is not |
1068 |
fully relocatable, because it may contain a copy of the tableptr argu- |
fully relocatable, because it may contain a copy of the tableptr argu- |
1069 |
ment, which is an address (see below). |
ment, which is an address (see below). |
1070 |
|
|
1071 |
The options argument contains various bit settings that affect the com- |
The options argument contains various bit settings that affect the com- |
1072 |
pilation. It should be zero if no options are required. The available |
pilation. It should be zero if no options are required. The available |
1073 |
options are described below. Some of them, in particular, those that |
options are described below. Some of them, in particular, those that |
1074 |
are compatible with Perl, can also be set and unset from within the |
are compatible with Perl, can also be set and unset from within the |
1075 |
pattern (see the detailed description in the pcrepattern documenta- |
pattern (see the detailed description in the pcrepattern documenta- |
1076 |
tion). For these options, the contents of the options argument speci- |
tion). For these options, the contents of the options argument speci- |
1077 |
fies their initial settings at the start of compilation and execution. |
fies their initial settings at the start of compilation and execution. |
1078 |
The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the time |
The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the time |
1079 |
of matching as well as at compile time. |
of matching as well as at compile time. |
1080 |
|
|
1081 |
If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, |
If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, |
1082 |
if compilation of a pattern fails, pcre_compile() returns NULL, and |
if compilation of a pattern fails, pcre_compile() returns NULL, and |
1083 |
sets the variable pointed to by errptr to point to a textual error mes- |
sets the variable pointed to by errptr to point to a textual error mes- |
1084 |
sage. This is a static string that is part of the library. You must not |
sage. This is a static string that is part of the library. You must not |
1085 |
try to free it. The offset from the start of the pattern to the charac- |
try to free it. The offset from the start of the pattern to the charac- |
1086 |
ter where the error was discovered is placed in the variable pointed to |
ter where the error was discovered is placed in the variable pointed to |
1087 |
by erroffset, which must not be NULL. If it is, an immediate error is |
by erroffset, which must not be NULL. If it is, an immediate error is |
1088 |
given. |
given. |
1089 |
|
|
1090 |
If pcre_compile2() is used instead of pcre_compile(), and the error- |
If pcre_compile2() is used instead of pcre_compile(), and the error- |
1091 |
codeptr argument is not NULL, a non-zero error code number is returned |
codeptr argument is not NULL, a non-zero error code number is returned |
1092 |
via this argument in the event of an error. This is in addition to the |
via this argument in the event of an error. This is in addition to the |
1093 |
textual error message. Error codes and messages are listed below. |
textual error message. Error codes and messages are listed below. |
1094 |
|
|
1095 |
If the final argument, tableptr, is NULL, PCRE uses a default set of |
If the final argument, tableptr, is NULL, PCRE uses a default set of |
1096 |
character tables that are built when PCRE is compiled, using the |
character tables that are built when PCRE is compiled, using the |
1097 |
default C locale. Otherwise, tableptr must be an address that is the |
default C locale. Otherwise, tableptr must be an address that is the |
1098 |
result of a call to pcre_maketables(). This value is stored with the |
result of a call to pcre_maketables(). This value is stored with the |
1099 |
compiled pattern, and used again by pcre_exec(), unless another table |
compiled pattern, and used again by pcre_exec(), unless another table |
1100 |
pointer is passed to it. For more discussion, see the section on locale |
pointer is passed to it. For more discussion, see the section on locale |
1101 |
support below. |
support below. |
1102 |
|
|
1103 |
This code fragment shows a typical straightforward call to pcre_com- |
This code fragment shows a typical straightforward call to pcre_com- |
1104 |
pile(): |
pile(): |
1105 |
|
|
1106 |
pcre *re; |
pcre *re; |
1113 |
&erroffset, /* for error offset */ |
&erroffset, /* for error offset */ |
1114 |
NULL); /* use default character tables */ |
NULL); /* use default character tables */ |
1115 |
|
|
1116 |
The following names for option bits are defined in the pcre.h header |
The following names for option bits are defined in the pcre.h header |
1117 |
file: |
file: |
1118 |
|
|
1119 |
PCRE_ANCHORED |
PCRE_ANCHORED |
1120 |
|
|
1121 |
If this bit is set, the pattern is forced to be "anchored", that is, it |
If this bit is set, the pattern is forced to be "anchored", that is, it |
1122 |
is constrained to match only at the first matching point in the string |
is constrained to match only at the first matching point in the string |
1123 |
that is being searched (the "subject string"). This effect can also be |
that is being searched (the "subject string"). This effect can also be |
1124 |
achieved by appropriate constructs in the pattern itself, which is the |
achieved by appropriate constructs in the pattern itself, which is the |
1125 |
only way to do it in Perl. |
only way to do it in Perl. |
1126 |
|
|
1127 |
PCRE_AUTO_CALLOUT |
PCRE_AUTO_CALLOUT |
1128 |
|
|
1129 |
If this bit is set, pcre_compile() automatically inserts callout items, |
If this bit is set, pcre_compile() automatically inserts callout items, |
1130 |
all with number 255, before each pattern item. For discussion of the |
all with number 255, before each pattern item. For discussion of the |
1131 |
callout facility, see the pcrecallout documentation. |
callout facility, see the pcrecallout documentation. |
1132 |
|
|
1133 |
|
PCRE_BSR_ANYCRLF |
1134 |
|
PCRE_BSR_UNICODE |
1135 |
|
|
1136 |
|
These options (which are mutually exclusive) control what the \R escape |
1137 |
|
sequence matches. The choice is either to match only CR, LF, or CRLF, |
1138 |
|
or to match any Unicode newline sequence. The default is specified when |
1139 |
|
PCRE is built. It can be overridden from within the pattern, or by set- |
1140 |
|
ting an option when a compiled pattern is matched. |
1141 |
|
|
1142 |
PCRE_CASELESS |
PCRE_CASELESS |
1143 |
|
|
1144 |
If this bit is set, letters in the pattern match both upper and lower |
If this bit is set, letters in the pattern match both upper and lower |
1145 |
case letters. It is equivalent to Perl's /i option, and it can be |
case letters. It is equivalent to Perl's /i option, and it can be |
1146 |
changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE |
changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE |
1147 |
always understands the concept of case for characters whose values are |
always understands the concept of case for characters whose values are |
1148 |
less than 128, so caseless matching is always possible. For characters |
less than 128, so caseless matching is always possible. For characters |
1149 |
with higher values, the concept of case is supported if PCRE is com- |
with higher values, the concept of case is supported if PCRE is com- |
1150 |
piled with Unicode property support, but not otherwise. If you want to |
piled with Unicode property support, but not otherwise. If you want to |
1151 |
use caseless matching for characters 128 and above, you must ensure |
use caseless matching for characters 128 and above, you must ensure |
1152 |
that PCRE is compiled with Unicode property support as well as with |
that PCRE is compiled with Unicode property support as well as with |
1153 |
UTF-8 support. |
UTF-8 support. |
1154 |
|
|
1155 |
PCRE_DOLLAR_ENDONLY |
PCRE_DOLLAR_ENDONLY |
1156 |
|
|
1157 |
If this bit is set, a dollar metacharacter in the pattern matches only |
If this bit is set, a dollar metacharacter in the pattern matches only |
1158 |
at the end of the subject string. Without this option, a dollar also |
at the end of the subject string. Without this option, a dollar also |
1159 |
matches immediately before a newline at the end of the string (but not |
matches immediately before a newline at the end of the string (but not |
1160 |
before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored |
before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored |
1161 |
if PCRE_MULTILINE is set. There is no equivalent to this option in |
if PCRE_MULTILINE is set. There is no equivalent to this option in |
1162 |
Perl, and no way to set it within a pattern. |
Perl, and no way to set it within a pattern. |
1163 |
|
|
1164 |
PCRE_DOTALL |
PCRE_DOTALL |
1165 |
|
|
1166 |
If this bit is set, a dot metacharater in the pattern matches all char- |
If this bit is set, a dot metacharater in the pattern matches all char- |
1167 |
acters, including those that indicate newline. Without it, a dot does |
acters, including those that indicate newline. Without it, a dot does |
1168 |
not match when the current position is at a newline. This option is |
not match when the current position is at a newline. This option is |
1169 |
equivalent to Perl's /s option, and it can be changed within a pattern |
equivalent to Perl's /s option, and it can be changed within a pattern |
1170 |
by a (?s) option setting. A negative class such as [^a] always matches |
by a (?s) option setting. A negative class such as [^a] always matches |
1171 |
newline characters, independent of the setting of this option. |
newline characters, independent of the setting of this option. |
1172 |
|
|
1173 |
PCRE_DUPNAMES |
PCRE_DUPNAMES |
1174 |
|
|
1175 |
If this bit is set, names used to identify capturing subpatterns need |
If this bit is set, names used to identify capturing subpatterns need |
1176 |
not be unique. This can be helpful for certain types of pattern when it |
not be unique. This can be helpful for certain types of pattern when it |
1177 |
is known that only one instance of the named subpattern can ever be |
is known that only one instance of the named subpattern can ever be |
1178 |
matched. There are more details of named subpatterns below; see also |
matched. There are more details of named subpatterns below; see also |
1179 |
the pcrepattern documentation. |
the pcrepattern documentation. |
1180 |
|
|
1181 |
PCRE_EXTENDED |
PCRE_EXTENDED |
1182 |
|
|
1183 |
If this bit is set, whitespace data characters in the pattern are |
If this bit is set, whitespace data characters in the pattern are |
1184 |
totally ignored except when escaped or inside a character class. White- |
totally ignored except when escaped or inside a character class. White- |
1185 |
space does not include the VT character (code 11). In addition, charac- |
space does not include the VT character (code 11). In addition, charac- |
1186 |
ters between an unescaped # outside a character class and the next new- |
ters between an unescaped # outside a character class and the next new- |
1187 |
line, inclusive, are also ignored. This is equivalent to Perl's /x |
line, inclusive, are also ignored. This is equivalent to Perl's /x |
1188 |
option, and it can be changed within a pattern by a (?x) option set- |
option, and it can be changed within a pattern by a (?x) option set- |
1189 |
ting. |
ting. |
1190 |
|
|
1191 |
This option makes it possible to include comments inside complicated |
This option makes it possible to include comments inside complicated |
1192 |
patterns. Note, however, that this applies only to data characters. |
patterns. Note, however, that this applies only to data characters. |
1193 |
Whitespace characters may never appear within special character |
Whitespace characters may never appear within special character |
1194 |
sequences in a pattern, for example within the sequence (?( which |
sequences in a pattern, for example within the sequence (?( which |
1195 |
introduces a conditional subpattern. |
introduces a conditional subpattern. |
1196 |
|
|
1197 |
PCRE_EXTRA |
PCRE_EXTRA |
1198 |
|
|
1199 |
This option was invented in order to turn on additional functionality |
This option was invented in order to turn on additional functionality |
1200 |
of PCRE that is incompatible with Perl, but it is currently of very |
of PCRE that is incompatible with Perl, but it is currently of very |
1201 |
little use. When set, any backslash in a pattern that is followed by a |
little use. When set, any backslash in a pattern that is followed by a |
1202 |
letter that has no special meaning causes an error, thus reserving |
letter that has no special meaning causes an error, thus reserving |
1203 |
these combinations for future expansion. By default, as in Perl, a |
these combinations for future expansion. By default, as in Perl, a |
1204 |
backslash followed by a letter with no special meaning is treated as a |
backslash followed by a letter with no special meaning is treated as a |
1205 |
literal. (Perl can, however, be persuaded to give a warning for this.) |
literal. (Perl can, however, be persuaded to give a warning for this.) |
1206 |
There are at present no other features controlled by this option. It |
There are at present no other features controlled by this option. It |
1207 |
can also be set by a (?X) option setting within a pattern. |
can also be set by a (?X) option setting within a pattern. |
1208 |
|
|
1209 |
PCRE_FIRSTLINE |
PCRE_FIRSTLINE |
1210 |
|
|
1211 |
If this option is set, an unanchored pattern is required to match |
If this option is set, an unanchored pattern is required to match |
1212 |
before or at the first newline in the subject string, though the |
before or at the first newline in the subject string, though the |
1213 |
matched text may continue over the newline. |
matched text may continue over the newline. |
1214 |
|
|
1215 |
PCRE_MULTILINE |
PCRE_MULTILINE |
1216 |
|
|
1217 |
By default, PCRE treats the subject string as consisting of a single |
By default, PCRE treats the subject string as consisting of a single |
1218 |
line of characters (even if it actually contains newlines). The "start |
line of characters (even if it actually contains newlines). The "start |
1219 |
of line" metacharacter (^) matches only at the start of the string, |
of line" metacharacter (^) matches only at the start of the string, |
1220 |
while the "end of line" metacharacter ($) matches only at the end of |
while the "end of line" metacharacter ($) matches only at the end of |
1221 |
the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY |
the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY |
1222 |
is set). This is the same as Perl. |
is set). This is the same as Perl. |
1223 |
|
|
1224 |
When PCRE_MULTILINE it is set, the "start of line" and "end of line" |
When PCRE_MULTILINE it is set, the "start of line" and "end of line" |
1225 |
constructs match immediately following or immediately before internal |
constructs match immediately following or immediately before internal |
1226 |
newlines in the subject string, respectively, as well as at the very |
newlines in the subject string, respectively, as well as at the very |
1227 |
start and end. This is equivalent to Perl's /m option, and it can be |
start and end. This is equivalent to Perl's /m option, and it can be |
1228 |
changed within a pattern by a (?m) option setting. If there are no new- |
changed within a pattern by a (?m) option setting. If there are no new- |
1229 |
lines in a subject string, or no occurrences of ^ or $ in a pattern, |
lines in a subject string, or no occurrences of ^ or $ in a pattern, |
1230 |
setting PCRE_MULTILINE has no effect. |
setting PCRE_MULTILINE has no effect. |
1231 |
|
|
1232 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1235 |
PCRE_NEWLINE_ANYCRLF |
PCRE_NEWLINE_ANYCRLF |
1236 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
1237 |
|
|
1238 |
These options override the default newline definition that was chosen |
These options override the default newline definition that was chosen |
1239 |
when PCRE was built. Setting the first or the second specifies that a |
when PCRE was built. Setting the first or the second specifies that a |
1240 |
newline is indicated by a single character (CR or LF, respectively). |
newline is indicated by a single character (CR or LF, respectively). |
1241 |
Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the |
Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the |
1242 |
two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies |
two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies |
1243 |
that any of the three preceding sequences should be recognized. Setting |
that any of the three preceding sequences should be recognized. Setting |
1244 |
PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be |
PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be |
1245 |
recognized. The Unicode newline sequences are the three just mentioned, |
recognized. The Unicode newline sequences are the three just mentioned, |
1246 |
plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
1247 |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
1248 |
(paragraph separator, U+2029). The last two are recognized only in |
(paragraph separator, U+2029). The last two are recognized only in |
1249 |
UTF-8 mode. |
UTF-8 mode. |
1250 |
|
|
1251 |
The newline setting in the options word uses three bits that are |
The newline setting in the options word uses three bits that are |
1252 |
treated as a number, giving eight possibilities. Currently only six are |
treated as a number, giving eight possibilities. Currently only six are |
1253 |
used (default plus the five values above). This means that if you set |
used (default plus the five values above). This means that if you set |
1254 |
more than one newline option, the combination may or may not be sensi- |
more than one newline option, the combination may or may not be sensi- |
1255 |
ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to |
ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to |
1256 |
PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and |
PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and |
1257 |
cause an error. |
cause an error. |
1258 |
|
|
1259 |
The only time that a line break is specially recognized when compiling |
The only time that a line break is specially recognized when compiling |
1260 |
a pattern is if PCRE_EXTENDED is set, and an unescaped # outside a |
a pattern is if PCRE_EXTENDED is set, and an unescaped # outside a |
1261 |
character class is encountered. This indicates a comment that lasts |
character class is encountered. This indicates a comment that lasts |
1262 |
until after the next line break sequence. In other circumstances, line |
until after the next line break sequence. In other circumstances, line |
1263 |
break sequences are treated as literal data, except that in |
break sequences are treated as literal data, except that in |
1264 |
PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters |
PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters |
1265 |
and are therefore ignored. |
and are therefore ignored. |
1266 |
|
|
1267 |
The newline option that is set at compile time becomes the default that |
The newline option that is set at compile time becomes the default that |
1268 |
is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden. |
is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden. |
1269 |
|
|
1270 |
PCRE_NO_AUTO_CAPTURE |
PCRE_NO_AUTO_CAPTURE |
1271 |
|
|
1272 |
If this option is set, it disables the use of numbered capturing paren- |
If this option is set, it disables the use of numbered capturing paren- |
1273 |
theses in the pattern. Any opening parenthesis that is not followed by |
theses in the pattern. Any opening parenthesis that is not followed by |
1274 |
? behaves as if it were followed by ?: but named parentheses can still |
? behaves as if it were followed by ?: but named parentheses can still |
1275 |
be used for capturing (and they acquire numbers in the usual way). |
be used for capturing (and they acquire numbers in the usual way). |
1276 |
There is no equivalent of this option in Perl. |
There is no equivalent of this option in Perl. |
1277 |
|
|
1278 |
PCRE_UNGREEDY |
PCRE_UNGREEDY |
1279 |
|
|
1280 |
This option inverts the "greediness" of the quantifiers so that they |
This option inverts the "greediness" of the quantifiers so that they |
1281 |
are not greedy by default, but become greedy if followed by "?". It is |
are not greedy by default, but become greedy if followed by "?". It is |
1282 |
not compatible with Perl. It can also be set by a (?U) option setting |
not compatible with Perl. It can also be set by a (?U) option setting |
1283 |
within the pattern. |
within the pattern. |
1284 |
|
|
1285 |
PCRE_UTF8 |
PCRE_UTF8 |
1286 |
|
|
1287 |
This option causes PCRE to regard both the pattern and the subject as |
This option causes PCRE to regard both the pattern and the subject as |
1288 |
strings of UTF-8 characters instead of single-byte character strings. |
strings of UTF-8 characters instead of single-byte character strings. |
1289 |
However, it is available only when PCRE is built to include UTF-8 sup- |
However, it is available only when PCRE is built to include UTF-8 sup- |
1290 |
port. If not, the use of this option provokes an error. Details of how |
port. If not, the use of this option provokes an error. Details of how |
1291 |
this option changes the behaviour of PCRE are given in the section on |
this option changes the behaviour of PCRE are given in the section on |
1292 |
UTF-8 support in the main pcre page. |
UTF-8 support in the main pcre page. |
1293 |
|
|
1294 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
1295 |
|
|
1296 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
1297 |
automatically checked. There is a discussion about the validity of |
automatically checked. There is a discussion about the validity of |
1298 |
UTF-8 strings in the main pcre page. If an invalid UTF-8 sequence of |
UTF-8 strings in the main pcre page. If an invalid UTF-8 sequence of |
1299 |
bytes is found, pcre_compile() returns an error. If you already know |
bytes is found, pcre_compile() returns an error. If you already know |
1300 |
that your pattern is valid, and you want to skip this check for perfor- |
that your pattern is valid, and you want to skip this check for perfor- |
1301 |
mance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is |
mance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is |
1302 |
set, the effect of passing an invalid UTF-8 string as a pattern is |
set, the effect of passing an invalid UTF-8 string as a pattern is |
1303 |
undefined. It may cause your program to crash. Note that this option |
undefined. It may cause your program to crash. Note that this option |
1304 |
can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the |
can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the |
1305 |
UTF-8 validity checking of subject strings. |
UTF-8 validity checking of subject strings. |
1306 |
|
|
1307 |
|
|
1308 |
COMPILATION ERROR CODES |
COMPILATION ERROR CODES |
1309 |
|
|
1310 |
The following table lists the error codes than may be returned by |
The following table lists the error codes than may be returned by |
1311 |
pcre_compile2(), along with the error messages that may be returned by |
pcre_compile2(), along with the error messages that may be returned by |
1312 |
both compiling functions. As PCRE has developed, some error codes have |
both compiling functions. As PCRE has developed, some error codes have |
1313 |
fallen out of use. To avoid confusion, they have not been re-used. |
fallen out of use. To avoid confusion, they have not been re-used. |
1314 |
|
|
1315 |
0 no error |
0 no error |
1365 |
50 [this code is not in use] |
50 [this code is not in use] |
1366 |
51 octal value is greater than \377 (not in UTF-8 mode) |
51 octal value is greater than \377 (not in UTF-8 mode) |
1367 |
52 internal error: overran compiling workspace |
52 internal error: overran compiling workspace |
1368 |
53 internal error: previously-checked referenced subpattern not |
53 internal error: previously-checked referenced subpattern not |
1369 |
found |
found |
1370 |
54 DEFINE group contains more than one branch |
54 DEFINE group contains more than one branch |
1371 |
55 repeating a DEFINE group is not allowed |
55 repeating a DEFINE group is not allowed |
1372 |
56 inconsistent NEWLINE options" |
56 inconsistent NEWLINE options |
1373 |
57 \g is not followed by a braced name or an optionally braced |
57 \g is not followed by a braced name or an optionally braced |
1374 |
non-zero number |
non-zero number |
1375 |
58 (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number |
58 (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number |
1380 |
pcre_extra *pcre_study(const pcre *code, int options |
pcre_extra *pcre_study(const pcre *code, int options |
1381 |
const char **errptr); |
const char **errptr); |
1382 |
|
|
1383 |
If a compiled pattern is going to be used several times, it is worth |
If a compiled pattern is going to be used several times, it is worth |
1384 |
spending more time analyzing it in order to speed up the time taken for |
spending more time analyzing it in order to speed up the time taken for |
1385 |
matching. The function pcre_study() takes a pointer to a compiled pat- |
matching. The function pcre_study() takes a pointer to a compiled pat- |
1386 |
tern as its first argument. If studying the pattern produces additional |
tern as its first argument. If studying the pattern produces additional |
1387 |
information that will help speed up matching, pcre_study() returns a |
information that will help speed up matching, pcre_study() returns a |
1388 |
pointer to a pcre_extra block, in which the study_data field points to |
pointer to a pcre_extra block, in which the study_data field points to |
1389 |
the results of the study. |
the results of the study. |
1390 |
|
|
1391 |
The returned value from pcre_study() can be passed directly to |
The returned value from pcre_study() can be passed directly to |
1392 |
pcre_exec(). However, a pcre_extra block also contains other fields |
pcre_exec(). However, a pcre_extra block also contains other fields |
1393 |
that can be set by the caller before the block is passed; these are |
that can be set by the caller before the block is passed; these are |
1394 |
described below in the section on matching a pattern. |
described below in the section on matching a pattern. |
1395 |
|
|
1396 |
If studying the pattern does not produce any additional information |
If studying the pattern does not produce any additional information |
1397 |
pcre_study() returns NULL. In that circumstance, if the calling program |
pcre_study() returns NULL. In that circumstance, if the calling program |
1398 |
wants to pass any of the other fields to pcre_exec(), it must set up |
wants to pass any of the other fields to pcre_exec(), it must set up |
1399 |
its own pcre_extra block. |
its own pcre_extra block. |
1400 |
|
|
1401 |
The second argument of pcre_study() contains option bits. At present, |
The second argument of pcre_study() contains option bits. At present, |
1402 |
no options are defined, and this argument should always be zero. |
no options are defined, and this argument should always be zero. |
1403 |
|
|
1404 |
The third argument for pcre_study() is a pointer for an error message. |
The third argument for pcre_study() is a pointer for an error message. |
1405 |
If studying succeeds (even if no data is returned), the variable it |
If studying succeeds (even if no data is returned), the variable it |
1406 |
points to is set to NULL. Otherwise it is set to point to a textual |
points to is set to NULL. Otherwise it is set to point to a textual |
1407 |
error message. This is a static string that is part of the library. You |
error message. This is a static string that is part of the library. You |
1408 |
must not try to free it. You should test the error pointer for NULL |
must not try to free it. You should test the error pointer for NULL |
1409 |
after calling pcre_study(), to be sure that it has run successfully. |
after calling pcre_study(), to be sure that it has run successfully. |
1410 |
|
|
1411 |
This is a typical call to pcre_study(): |
This is a typical call to pcre_study(): |
1417 |
&error); /* set to NULL or points to a message */ |
&error); /* set to NULL or points to a message */ |
1418 |
|
|
1419 |
At present, studying a pattern is useful only for non-anchored patterns |
At present, studying a pattern is useful only for non-anchored patterns |
1420 |
that do not have a single fixed starting character. A bitmap of possi- |
that do not have a single fixed starting character. A bitmap of possi- |
1421 |
ble starting bytes is created. |
ble starting bytes is created. |
1422 |
|
|
1423 |
|
|
1424 |
LOCALE SUPPORT |
LOCALE SUPPORT |
1425 |
|
|
1426 |
PCRE handles caseless matching, and determines whether characters are |
PCRE handles caseless matching, and determines whether characters are |
1427 |
letters, digits, or whatever, by reference to a set of tables, indexed |
letters, digits, or whatever, by reference to a set of tables, indexed |
1428 |
by character value. When running in UTF-8 mode, this applies only to |
by character value. When running in UTF-8 mode, this applies only to |
1429 |
characters with codes less than 128. Higher-valued codes never match |
characters with codes less than 128. Higher-valued codes never match |
1430 |
escapes such as \w or \d, but can be tested with \p if PCRE is built |
escapes such as \w or \d, but can be tested with \p if PCRE is built |
1431 |
with Unicode character property support. The use of locales with Uni- |
with Unicode character property support. The use of locales with Uni- |
1432 |
code is discouraged. If you are handling characters with codes greater |
code is discouraged. If you are handling characters with codes greater |
1433 |
than 128, you should either use UTF-8 and Unicode, or use locales, but |
than 128, you should either use UTF-8 and Unicode, or use locales, but |
1434 |
not try to mix the two. |
not try to mix the two. |
1435 |
|
|
1436 |
PCRE contains an internal set of tables that are used when the final |
PCRE contains an internal set of tables that are used when the final |
1437 |
argument of pcre_compile() is NULL. These are sufficient for many |
argument of pcre_compile() is NULL. These are sufficient for many |
1438 |
applications. Normally, the internal tables recognize only ASCII char- |
applications. Normally, the internal tables recognize only ASCII char- |
1439 |
acters. However, when PCRE is built, it is possible to cause the inter- |
acters. However, when PCRE is built, it is possible to cause the inter- |
1440 |
nal tables to be rebuilt in the default "C" locale of the local system, |
nal tables to be rebuilt in the default "C" locale of the local system, |
1441 |
which may cause them to be different. |
which may cause them to be different. |
1442 |
|
|
1443 |
The internal tables can always be overridden by tables supplied by the |
The internal tables can always be overridden by tables supplied by the |
1444 |
application that calls PCRE. These may be created in a different locale |
application that calls PCRE. These may be created in a different locale |
1445 |
from the default. As more and more applications change to using Uni- |
from the default. As more and more applications change to using Uni- |
1446 |
code, the need for this locale support is expected to die away. |
code, the need for this locale support is expected to die away. |
1447 |
|
|
1448 |
External tables are built by calling the pcre_maketables() function, |
External tables are built by calling the pcre_maketables() function, |
1449 |
which has no arguments, in the relevant locale. The result can then be |
which has no arguments, in the relevant locale. The result can then be |
1450 |
passed to pcre_compile() or pcre_exec() as often as necessary. For |
passed to pcre_compile() or pcre_exec() as often as necessary. For |
1451 |
example, to build and use tables that are appropriate for the French |
example, to build and use tables that are appropriate for the French |
1452 |
locale (where accented characters with values greater than 128 are |
locale (where accented characters with values greater than 128 are |
1453 |
treated as letters), the following code could be used: |
treated as letters), the following code could be used: |
1454 |
|
|
1455 |
setlocale(LC_CTYPE, "fr_FR"); |
setlocale(LC_CTYPE, "fr_FR"); |
1456 |
tables = pcre_maketables(); |
tables = pcre_maketables(); |
1457 |
re = pcre_compile(..., tables); |
re = pcre_compile(..., tables); |
1458 |
|
|
1459 |
The locale name "fr_FR" is used on Linux and other Unix-like systems; |
The locale name "fr_FR" is used on Linux and other Unix-like systems; |
1460 |
if you are using Windows, the name for the French locale is "french". |
if you are using Windows, the name for the French locale is "french". |
1461 |
|
|
1462 |
When pcre_maketables() runs, the tables are built in memory that is |
When pcre_maketables() runs, the tables are built in memory that is |
1463 |
obtained via pcre_malloc. It is the caller's responsibility to ensure |
obtained via pcre_malloc. It is the caller's responsibility to ensure |
1464 |
that the memory containing the tables remains available for as long as |
that the memory containing the tables remains available for as long as |
1465 |
it is needed. |
it is needed. |
1466 |
|
|
1467 |
The pointer that is passed to pcre_compile() is saved with the compiled |
The pointer that is passed to pcre_compile() is saved with the compiled |
1468 |
pattern, and the same tables are used via this pointer by pcre_study() |
pattern, and the same tables are used via this pointer by pcre_study() |
1469 |
and normally also by pcre_exec(). Thus, by default, for any single pat- |
and normally also by pcre_exec(). Thus, by default, for any single pat- |
1470 |
tern, compilation, studying and matching all happen in the same locale, |
tern, compilation, studying and matching all happen in the same locale, |
1471 |
but different patterns can be compiled in different locales. |
but different patterns can be compiled in different locales. |
1472 |
|
|
1473 |
It is possible to pass a table pointer or NULL (indicating the use of |
It is possible to pass a table pointer or NULL (indicating the use of |
1474 |
the internal tables) to pcre_exec(). Although not intended for this |
the internal tables) to pcre_exec(). Although not intended for this |
1475 |
purpose, this facility could be used to match a pattern in a different |
purpose, this facility could be used to match a pattern in a different |
1476 |
locale from the one in which it was compiled. Passing table pointers at |
locale from the one in which it was compiled. Passing table pointers at |
1477 |
run time is discussed below in the section on matching a pattern. |
run time is discussed below in the section on matching a pattern. |
1478 |
|
|
1482 |
int pcre_fullinfo(const pcre *code, const pcre_extra *extra, |
int pcre_fullinfo(const pcre *code, const pcre_extra *extra, |
1483 |
int what, void *where); |
int what, void *where); |
1484 |
|
|
1485 |
The pcre_fullinfo() function returns information about a compiled pat- |
The pcre_fullinfo() function returns information about a compiled pat- |
1486 |
tern. It replaces the obsolete pcre_info() function, which is neverthe- |
tern. It replaces the obsolete pcre_info() function, which is neverthe- |
1487 |
less retained for backwards compability (and is documented below). |
less retained for backwards compability (and is documented below). |
1488 |
|
|
1489 |
The first argument for pcre_fullinfo() is a pointer to the compiled |
The first argument for pcre_fullinfo() is a pointer to the compiled |
1490 |
pattern. The second argument is the result of pcre_study(), or NULL if |
pattern. The second argument is the result of pcre_study(), or NULL if |
1491 |
the pattern was not studied. The third argument specifies which piece |
the pattern was not studied. The third argument specifies which piece |
1492 |
of information is required, and the fourth argument is a pointer to a |
of information is required, and the fourth argument is a pointer to a |
1493 |
variable to receive the data. The yield of the function is zero for |
variable to receive the data. The yield of the function is zero for |
1494 |
success, or one of the following negative numbers: |
success, or one of the following negative numbers: |
1495 |
|
|
1496 |
PCRE_ERROR_NULL the argument code was NULL |
PCRE_ERROR_NULL the argument code was NULL |
1498 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
1499 |
PCRE_ERROR_BADOPTION the value of what was invalid |
PCRE_ERROR_BADOPTION the value of what was invalid |
1500 |
|
|
1501 |
The "magic number" is placed at the start of each compiled pattern as |
The "magic number" is placed at the start of each compiled pattern as |
1502 |
an simple check against passing an arbitrary memory pointer. Here is a |
an simple check against passing an arbitrary memory pointer. Here is a |
1503 |
typical call of pcre_fullinfo(), to obtain the length of the compiled |
typical call of pcre_fullinfo(), to obtain the length of the compiled |
1504 |
pattern: |
pattern: |
1505 |
|
|
1506 |
int rc; |
int rc; |
1511 |
PCRE_INFO_SIZE, /* what is required */ |
PCRE_INFO_SIZE, /* what is required */ |
1512 |
&length); /* where to put the data */ |
&length); /* where to put the data */ |
1513 |
|
|
1514 |
The possible values for the third argument are defined in pcre.h, and |
The possible values for the third argument are defined in pcre.h, and |
1515 |
are as follows: |
are as follows: |
1516 |
|
|
1517 |
PCRE_INFO_BACKREFMAX |
PCRE_INFO_BACKREFMAX |
1518 |
|
|
1519 |
Return the number of the highest back reference in the pattern. The |
Return the number of the highest back reference in the pattern. The |
1520 |
fourth argument should point to an int variable. Zero is returned if |
fourth argument should point to an int variable. Zero is returned if |
1521 |
there are no back references. |
there are no back references. |
1522 |
|
|
1523 |
PCRE_INFO_CAPTURECOUNT |
PCRE_INFO_CAPTURECOUNT |
1524 |
|
|
1525 |
Return the number of capturing subpatterns in the pattern. The fourth |
Return the number of capturing subpatterns in the pattern. The fourth |
1526 |
argument should point to an int variable. |
argument should point to an int variable. |
1527 |
|
|
1528 |
PCRE_INFO_DEFAULT_TABLES |
PCRE_INFO_DEFAULT_TABLES |
1529 |
|
|
1530 |
Return a pointer to the internal default character tables within PCRE. |
Return a pointer to the internal default character tables within PCRE. |
1531 |
The fourth argument should point to an unsigned char * variable. This |
The fourth argument should point to an unsigned char * variable. This |
1532 |
information call is provided for internal use by the pcre_study() func- |
information call is provided for internal use by the pcre_study() func- |
1533 |
tion. External callers can cause PCRE to use its internal tables by |
tion. External callers can cause PCRE to use its internal tables by |
1534 |
passing a NULL table pointer. |
passing a NULL table pointer. |
1535 |
|
|
1536 |
PCRE_INFO_FIRSTBYTE |
PCRE_INFO_FIRSTBYTE |
1537 |
|
|
1538 |
Return information about the first byte of any matched string, for a |
Return information about the first byte of any matched string, for a |
1539 |
non-anchored pattern. The fourth argument should point to an int vari- |
non-anchored pattern. The fourth argument should point to an int vari- |
1540 |
able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name |
able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name |
1541 |
is still recognized for backwards compatibility.) |
is still recognized for backwards compatibility.) |
1542 |
|
|
1543 |
If there is a fixed first byte, for example, from a pattern such as |
If there is a fixed first byte, for example, from a pattern such as |
1544 |
(cat|cow|coyote), its value is returned. Otherwise, if either |
(cat|cow|coyote), its value is returned. Otherwise, if either |
1545 |
|
|
1546 |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every |
1547 |
branch starts with "^", or |
branch starts with "^", or |
1548 |
|
|
1549 |
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not |
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not |
1550 |
set (if it were set, the pattern would be anchored), |
set (if it were set, the pattern would be anchored), |
1551 |
|
|
1552 |
-1 is returned, indicating that the pattern matches only at the start |
-1 is returned, indicating that the pattern matches only at the start |
1553 |
of a subject string or after any newline within the string. Otherwise |
of a subject string or after any newline within the string. Otherwise |
1554 |
-2 is returned. For anchored patterns, -2 is returned. |
-2 is returned. For anchored patterns, -2 is returned. |
1555 |
|
|
1556 |
PCRE_INFO_FIRSTTABLE |
PCRE_INFO_FIRSTTABLE |
1557 |
|
|
1558 |
If the pattern was studied, and this resulted in the construction of a |
If the pattern was studied, and this resulted in the construction of a |
1559 |
256-bit table indicating a fixed set of bytes for the first byte in any |
256-bit table indicating a fixed set of bytes for the first byte in any |
1560 |
matching string, a pointer to the table is returned. Otherwise NULL is |
matching string, a pointer to the table is returned. Otherwise NULL is |
1561 |
returned. The fourth argument should point to an unsigned char * vari- |
returned. The fourth argument should point to an unsigned char * vari- |
1562 |
able. |
able. |
1563 |
|
|
1564 |
PCRE_INFO_HASCRORLF |
PCRE_INFO_HASCRORLF |
1565 |
|
|
1566 |
Return 1 if the pattern contains any explicit matches for CR or LF |
Return 1 if the pattern contains any explicit matches for CR or LF |
1567 |
characters, otherwise 0. The fourth argument should point to an int |
characters, otherwise 0. The fourth argument should point to an int |
1568 |
variable. |
variable. |
1569 |
|
|
1570 |
PCRE_INFO_JCHANGED |
PCRE_INFO_JCHANGED |
1571 |
|
|
1572 |
Return 1 if the (?J) option setting is used in the pattern, otherwise |
Return 1 if the (?J) option setting is used in the pattern, otherwise |
1573 |
0. The fourth argument should point to an int variable. The (?J) inter- |
0. The fourth argument should point to an int variable. The (?J) inter- |
1574 |
nal option setting changes the local PCRE_DUPNAMES option. |
nal option setting changes the local PCRE_DUPNAMES option. |
1575 |
|
|
1576 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
1577 |
|
|
1578 |
Return the value of the rightmost literal byte that must exist in any |
Return the value of the rightmost literal byte that must exist in any |
1579 |
matched string, other than at its start, if such a byte has been |
matched string, other than at its start, if such a byte has been |
1580 |
recorded. The fourth argument should point to an int variable. If there |
recorded. The fourth argument should point to an int variable. If there |
1581 |
is no such byte, -1 is returned. For anchored patterns, a last literal |
is no such byte, -1 is returned. For anchored patterns, a last literal |
1582 |
byte is recorded only if it follows something of variable length. For |
byte is recorded only if it follows something of variable length. For |
1583 |
example, for the pattern /^a\d+z\d+/ the returned value is "z", but for |
example, for the pattern /^a\d+z\d+/ the returned value is "z", but for |
1584 |
/^a\dz\d/ the returned value is -1. |
/^a\dz\d/ the returned value is -1. |
1585 |
|
|
1587 |
PCRE_INFO_NAMEENTRYSIZE |
PCRE_INFO_NAMEENTRYSIZE |
1588 |
PCRE_INFO_NAMETABLE |
PCRE_INFO_NAMETABLE |
1589 |
|
|
1590 |
PCRE supports the use of named as well as numbered capturing parenthe- |
PCRE supports the use of named as well as numbered capturing parenthe- |
1591 |
ses. The names are just an additional way of identifying the parenthe- |
ses. The names are just an additional way of identifying the parenthe- |
1592 |
ses, which still acquire numbers. Several convenience functions such as |
ses, which still acquire numbers. Several convenience functions such as |
1593 |
pcre_get_named_substring() are provided for extracting captured sub- |
pcre_get_named_substring() are provided for extracting captured sub- |
1594 |
strings by name. It is also possible to extract the data directly, by |
strings by name. It is also possible to extract the data directly, by |
1595 |
first converting the name to a number in order to access the correct |
first converting the name to a number in order to access the correct |
1596 |
pointers in the output vector (described with pcre_exec() below). To do |
pointers in the output vector (described with pcre_exec() below). To do |
1597 |
the conversion, you need to use the name-to-number map, which is |
the conversion, you need to use the name-to-number map, which is |
1598 |
described by these three values. |
described by these three values. |
1599 |
|
|
1600 |
The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT |
The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT |
1601 |
gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size |
gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size |
1602 |
of each entry; both of these return an int value. The entry size |
of each entry; both of these return an int value. The entry size |
1603 |
depends on the length of the longest name. PCRE_INFO_NAMETABLE returns |
depends on the length of the longest name. PCRE_INFO_NAMETABLE returns |
1604 |
a pointer to the first entry of the table (a pointer to char). The |
a pointer to the first entry of the table (a pointer to char). The |
1605 |
first two bytes of each entry are the number of the capturing parenthe- |
first two bytes of each entry are the number of the capturing parenthe- |
1606 |
sis, most significant byte first. The rest of the entry is the corre- |
sis, most significant byte first. The rest of the entry is the corre- |
1607 |
sponding name, zero terminated. The names are in alphabetical order. |
sponding name, zero terminated. The names are in alphabetical order. |
1608 |
When PCRE_DUPNAMES is set, duplicate names are in order of their paren- |
When PCRE_DUPNAMES is set, duplicate names are in order of their paren- |
1609 |
theses numbers. For example, consider the following pattern (assume |
theses numbers. For example, consider the following pattern (assume |
1610 |
PCRE_EXTENDED is set, so white space - including newlines - is |
PCRE_EXTENDED is set, so white space - including newlines - is |
1611 |
ignored): |
ignored): |
1612 |
|
|
1613 |
(?<date> (?<year>(\d\d)?\d\d) - |
(?<date> (?<year>(\d\d)?\d\d) - |
1614 |
(?<month>\d\d) - (?<day>\d\d) ) |
(?<month>\d\d) - (?<day>\d\d) ) |
1615 |
|
|
1616 |
There are four named subpatterns, so the table has four entries, and |
There are four named subpatterns, so the table has four entries, and |
1617 |
each entry in the table is eight bytes long. The table is as follows, |
each entry in the table is eight bytes long. The table is as follows, |
1618 |
with non-printing bytes shows in hexadecimal, and undefined bytes shown |
with non-printing bytes shows in hexadecimal, and undefined bytes shown |
1619 |
as ??: |
as ??: |
1620 |
|
|
1623 |
00 04 m o n t h 00 |
00 04 m o n t h 00 |
1624 |
00 02 y e a r 00 ?? |
00 02 y e a r 00 ?? |
1625 |
|
|
1626 |
When writing code to extract data from named subpatterns using the |
When writing code to extract data from named subpatterns using the |
1627 |
name-to-number map, remember that the length of the entries is likely |
name-to-number map, remember that the length of the entries is likely |
1628 |
to be different for each compiled pattern. |
to be different for each compiled pattern. |
1629 |
|
|
1630 |
PCRE_INFO_OKPARTIAL |
PCRE_INFO_OKPARTIAL |
1631 |
|
|
1632 |
Return 1 if the pattern can be used for partial matching, otherwise 0. |
Return 1 if the pattern can be used for partial matching, otherwise 0. |
1633 |
The fourth argument should point to an int variable. The pcrepartial |
The fourth argument should point to an int variable. The pcrepartial |
1634 |
documentation lists the restrictions that apply to patterns when par- |
documentation lists the restrictions that apply to patterns when par- |
1635 |
tial matching is used. |
tial matching is used. |
1636 |
|
|
1637 |
PCRE_INFO_OPTIONS |
PCRE_INFO_OPTIONS |
1638 |
|
|
1639 |
Return a copy of the options with which the pattern was compiled. The |
Return a copy of the options with which the pattern was compiled. The |
1640 |
fourth argument should point to an unsigned long int variable. These |
fourth argument should point to an unsigned long int variable. These |
1641 |
option bits are those specified in the call to pcre_compile(), modified |
option bits are those specified in the call to pcre_compile(), modified |
1642 |
by any top-level option settings at the start of the pattern itself. In |
by any top-level option settings at the start of the pattern itself. In |
1643 |
other words, they are the options that will be in force when matching |
other words, they are the options that will be in force when matching |
1644 |
starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with |
starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with |
1645 |
the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, |
the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, |
1646 |
and PCRE_EXTENDED. |
and PCRE_EXTENDED. |
1647 |
|
|
1648 |
A pattern is automatically anchored by PCRE if all of its top-level |
A pattern is automatically anchored by PCRE if all of its top-level |
1649 |
alternatives begin with one of the following: |
alternatives begin with one of the following: |
1650 |
|
|
1651 |
^ unless PCRE_MULTILINE is set |
^ unless PCRE_MULTILINE is set |
1659 |
|
|
1660 |
PCRE_INFO_SIZE |
PCRE_INFO_SIZE |
1661 |
|
|
1662 |
Return the size of the compiled pattern, that is, the value that was |
Return the size of the compiled pattern, that is, the value that was |
1663 |
passed as the argument to pcre_malloc() when PCRE was getting memory in |
passed as the argument to pcre_malloc() when PCRE was getting memory in |
1664 |
which to place the compiled data. The fourth argument should point to a |
which to place the compiled data. The fourth argument should point to a |
1665 |
size_t variable. |
size_t variable. |
1667 |
PCRE_INFO_STUDYSIZE |
PCRE_INFO_STUDYSIZE |
1668 |
|
|
1669 |
Return the size of the data block pointed to by the study_data field in |
Return the size of the data block pointed to by the study_data field in |
1670 |
a pcre_extra block. That is, it is the value that was passed to |
a pcre_extra block. That is, it is the value that was passed to |
1671 |
pcre_malloc() when PCRE was getting memory into which to place the data |
pcre_malloc() when PCRE was getting memory into which to place the data |
1672 |
created by pcre_study(). The fourth argument should point to a size_t |
created by pcre_study(). The fourth argument should point to a size_t |
1673 |
variable. |
variable. |
1674 |
|
|
1675 |
|
|
1677 |
|
|
1678 |
int pcre_info(const pcre *code, int *optptr, int *firstcharptr); |
int pcre_info(const pcre *code, int *optptr, int *firstcharptr); |
1679 |
|
|
1680 |
The pcre_info() function is now obsolete because its interface is too |
The pcre_info() function is now obsolete because its interface is too |
1681 |
restrictive to return all the available data about a compiled pattern. |
restrictive to return all the available data about a compiled pattern. |
1682 |
New programs should use pcre_fullinfo() instead. The yield of |
New programs should use pcre_fullinfo() instead. The yield of |
1683 |
pcre_info() is the number of capturing subpatterns, or one of the fol- |
pcre_info() is the number of capturing subpatterns, or one of the fol- |
1684 |
lowing negative numbers: |
lowing negative numbers: |
1685 |
|
|
1686 |
PCRE_ERROR_NULL the argument code was NULL |
PCRE_ERROR_NULL the argument code was NULL |
1687 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
1688 |
|
|
1689 |
If the optptr argument is not NULL, a copy of the options with which |
If the optptr argument is not NULL, a copy of the options with which |
1690 |
the pattern was compiled is placed in the integer it points to (see |
the pattern was compiled is placed in the integer it points to (see |
1691 |
PCRE_INFO_OPTIONS above). |
PCRE_INFO_OPTIONS above). |
1692 |
|
|
1693 |
If the pattern is not anchored and the firstcharptr argument is not |
If the pattern is not anchored and the firstcharptr argument is not |
1694 |
NULL, it is used to pass back information about the first character of |
NULL, it is used to pass back information about the first character of |
1695 |
any matched string (see PCRE_INFO_FIRSTBYTE above). |
any matched string (see PCRE_INFO_FIRSTBYTE above). |
1696 |
|
|
1697 |
|
|
1699 |
|
|
1700 |
int pcre_refcount(pcre *code, int adjust); |
int pcre_refcount(pcre *code, int adjust); |
1701 |
|
|
1702 |
The pcre_refcount() function is used to maintain a reference count in |
The pcre_refcount() function is used to maintain a reference count in |
1703 |
the data block that contains a compiled pattern. It is provided for the |
the data block that contains a compiled pattern. It is provided for the |
1704 |
benefit of applications that operate in an object-oriented manner, |
benefit of applications that operate in an object-oriented manner, |
1705 |
where different parts of the application may be using the same compiled |
where different parts of the application may be using the same compiled |
1706 |
pattern, but you want to free the block when they are all done. |
pattern, but you want to free the block when they are all done. |
1707 |
|
|
1708 |
When a pattern is compiled, the reference count field is initialized to |
When a pattern is compiled, the reference count field is initialized to |
1709 |
zero. It is changed only by calling this function, whose action is to |
zero. It is changed only by calling this function, whose action is to |
1710 |
add the adjust value (which may be positive or negative) to it. The |
add the adjust value (which may be positive or negative) to it. The |
1711 |
yield of the function is the new value. However, the value of the count |
yield of the function is the new value. However, the value of the count |
1712 |
is constrained to lie between 0 and 65535, inclusive. If the new value |
is constrained to lie between 0 and 65535, inclusive. If the new value |
1713 |
is outside these limits, it is forced to the appropriate limit value. |
is outside these limits, it is forced to the appropriate limit value. |
1714 |
|
|
1715 |
Except when it is zero, the reference count is not correctly preserved |
Except when it is zero, the reference count is not correctly preserved |
1716 |
if a pattern is compiled on one host and then transferred to a host |
if a pattern is compiled on one host and then transferred to a host |
1717 |
whose byte-order is different. (This seems a highly unlikely scenario.) |
whose byte-order is different. (This seems a highly unlikely scenario.) |
1718 |
|
|
1719 |
|
|
1723 |
const char *subject, int length, int startoffset, |
const char *subject, int length, int startoffset, |
1724 |
int options, int *ovector, int ovecsize); |
int options, int *ovector, int ovecsize); |
1725 |
|
|
1726 |
The function pcre_exec() is called to match a subject string against a |
The function pcre_exec() is called to match a subject string against a |
1727 |
compiled pattern, which is passed in the code argument. If the pattern |
compiled pattern, which is passed in the code argument. If the pattern |
1728 |
has been studied, the result of the study should be passed in the extra |
has been studied, the result of the study should be passed in the extra |
1729 |
argument. This function is the main matching facility of the library, |
argument. This function is the main matching facility of the library, |
1730 |
and it operates in a Perl-like manner. For specialist use there is also |
and it operates in a Perl-like manner. For specialist use there is also |
1731 |
an alternative matching function, which is described below in the sec- |
an alternative matching function, which is described below in the sec- |
1732 |
tion about the pcre_dfa_exec() function. |
tion about the pcre_dfa_exec() function. |
1733 |
|
|
1734 |
In most applications, the pattern will have been compiled (and option- |
In most applications, the pattern will have been compiled (and option- |
1735 |
ally studied) in the same process that calls pcre_exec(). However, it |
ally studied) in the same process that calls pcre_exec(). However, it |
1736 |
is possible to save compiled patterns and study data, and then use them |
is possible to save compiled patterns and study data, and then use them |
1737 |
later in different processes, possibly even on different hosts. For a |
later in different processes, possibly even on different hosts. For a |
1738 |
discussion about this, see the pcreprecompile documentation. |
discussion about this, see the pcreprecompile documentation. |
1739 |
|
|
1740 |
Here is an example of a simple call to pcre_exec(): |
Here is an example of a simple call to pcre_exec(): |
1753 |
|
|
1754 |
Extra data for pcre_exec() |
Extra data for pcre_exec() |
1755 |
|
|
1756 |
If the extra argument is not NULL, it must point to a pcre_extra data |
If the extra argument is not NULL, it must point to a pcre_extra data |
1757 |
block. The pcre_study() function returns such a block (when it doesn't |
block. The pcre_study() function returns such a block (when it doesn't |
1758 |
return NULL), but you can also create one for yourself, and pass addi- |
return NULL), but you can also create one for yourself, and pass addi- |
1759 |
tional information in it. The pcre_extra block contains the following |
tional information in it. The pcre_extra block contains the following |
1760 |
fields (not necessarily in this order): |
fields (not necessarily in this order): |
1761 |
|
|
1762 |
unsigned long int flags; |
unsigned long int flags; |
1766 |
void *callout_data; |
void *callout_data; |
1767 |
const unsigned char *tables; |
const unsigned char *tables; |
1768 |
|
|
1769 |
The flags field is a bitmap that specifies which of the other fields |
The flags field is a bitmap that specifies which of the other fields |
1770 |
are set. The flag bits are: |
are set. The flag bits are: |
1771 |
|
|
1772 |
PCRE_EXTRA_STUDY_DATA |
PCRE_EXTRA_STUDY_DATA |
1775 |
PCRE_EXTRA_CALLOUT_DATA |
PCRE_EXTRA_CALLOUT_DATA |
1776 |
PCRE_EXTRA_TABLES |
PCRE_EXTRA_TABLES |
1777 |
|
|
1778 |
Other flag bits should be set to zero. The study_data field is set in |
Other flag bits should be set to zero. The study_data field is set in |
1779 |
the pcre_extra block that is returned by pcre_study(), together with |
the pcre_extra block that is returned by pcre_study(), together with |
1780 |
the appropriate flag bit. You should not set this yourself, but you may |
the appropriate flag bit. You should not set this yourself, but you may |
1781 |
add to the block by setting the other fields and their corresponding |
add to the block by setting the other fields and their corresponding |
1782 |
flag bits. |
flag bits. |
1783 |
|
|
1784 |
The match_limit field provides a means of preventing PCRE from using up |
The match_limit field provides a means of preventing PCRE from using up |
1785 |
a vast amount of resources when running patterns that are not going to |
a vast amount of resources when running patterns that are not going to |
1786 |
match, but which have a very large number of possibilities in their |
match, but which have a very large number of possibilities in their |
1787 |
search trees. The classic example is the use of nested unlimited |
search trees. The classic example is the use of nested unlimited |
1788 |
repeats. |
repeats. |
1789 |
|
|
1790 |
Internally, PCRE uses a function called match() which it calls repeat- |
Internally, PCRE uses a function called match() which it calls repeat- |
1791 |
edly (sometimes recursively). The limit set by match_limit is imposed |
edly (sometimes recursively). The limit set by match_limit is imposed |
1792 |
on the number of times this function is called during a match, which |
on the number of times this function is called during a match, which |
1793 |
has the effect of limiting the amount of backtracking that can take |
has the effect of limiting the amount of backtracking that can take |
1794 |
place. For patterns that are not anchored, the count restarts from zero |
place. For patterns that are not anchored, the count restarts from zero |
1795 |
for each position in the subject string. |
for each position in the subject string. |
1796 |
|
|
1797 |
The default value for the limit can be set when PCRE is built; the |
The default value for the limit can be set when PCRE is built; the |
1798 |
default default is 10 million, which handles all but the most extreme |
default default is 10 million, which handles all but the most extreme |
1799 |
cases. You can override the default by suppling pcre_exec() with a |
cases. You can override the default by suppling pcre_exec() with a |
1800 |
pcre_extra block in which match_limit is set, and |
pcre_extra block in which match_limit is set, and |
1801 |
PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is |
PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is |
1802 |
exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. |
exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. |
1803 |
|
|
1804 |
The match_limit_recursion field is similar to match_limit, but instead |
The match_limit_recursion field is similar to match_limit, but instead |
1805 |
of limiting the total number of times that match() is called, it limits |
of limiting the total number of times that match() is called, it limits |
1806 |
the depth of recursion. The recursion depth is a smaller number than |
the depth of recursion. The recursion depth is a smaller number than |
1807 |
the total number of calls, because not all calls to match() are recur- |
the total number of calls, because not all calls to match() are recur- |
1808 |
sive. This limit is of use only if it is set smaller than match_limit. |
sive. This limit is of use only if it is set smaller than match_limit. |
1809 |
|
|
1810 |
Limiting the recursion depth limits the amount of stack that can be |
Limiting the recursion depth limits the amount of stack that can be |
1811 |
used, or, when PCRE has been compiled to use memory on the heap instead |
used, or, when PCRE has been compiled to use memory on the heap instead |
1812 |
of the stack, the amount of heap memory that can be used. |
of the stack, the amount of heap memory that can be used. |
1813 |
|
|
1814 |
The default value for match_limit_recursion can be set when PCRE is |
The default value for match_limit_recursion can be set when PCRE is |
1815 |
built; the default default is the same value as the default for |
built; the default default is the same value as the default for |
1816 |
match_limit. You can override the default by suppling pcre_exec() with |
match_limit. You can override the default by suppling pcre_exec() with |
1817 |
a pcre_extra block in which match_limit_recursion is set, and |
a pcre_extra block in which match_limit_recursion is set, and |
1818 |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the |
1819 |
limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. |
limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. |
1820 |
|
|
1821 |
The pcre_callout field is used in conjunction with the "callout" fea- |
The pcre_callout field is used in conjunction with the "callout" fea- |
1822 |
ture, which is described in the pcrecallout documentation. |
ture, which is described in the pcrecallout documentation. |
1823 |
|
|
1824 |
The tables field is used to pass a character tables pointer to |
The tables field is used to pass a character tables pointer to |
1825 |
pcre_exec(); this overrides the value that is stored with the compiled |
pcre_exec(); this overrides the value that is stored with the compiled |
1826 |
pattern. A non-NULL value is stored with the compiled pattern only if |
pattern. A non-NULL value is stored with the compiled pattern only if |
1827 |
custom tables were supplied to pcre_compile() via its tableptr argu- |
custom tables were supplied to pcre_compile() via its tableptr argu- |
1828 |
ment. If NULL is passed to pcre_exec() using this mechanism, it forces |
ment. If NULL is passed to pcre_exec() using this mechanism, it forces |
1829 |
PCRE's internal tables to be used. This facility is helpful when re- |
PCRE's internal tables to be used. This facility is helpful when re- |
1830 |
using patterns that have been saved after compiling with an external |
using patterns that have been saved after compiling with an external |
1831 |
set of tables, because the external tables might be at a different |
set of tables, because the external tables might be at a different |
1832 |
address when pcre_exec() is called. See the pcreprecompile documenta- |
address when pcre_exec() is called. See the pcreprecompile documenta- |
1833 |
tion for a discussion of saving compiled patterns for later use. |
tion for a discussion of saving compiled patterns for later use. |
1834 |
|
|
1835 |
Option bits for pcre_exec() |
Option bits for pcre_exec() |
1836 |
|
|
1837 |
The unused bits of the options argument for pcre_exec() must be zero. |
The unused bits of the options argument for pcre_exec() must be zero. |
1838 |
The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, |
The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, |
1839 |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and |
1840 |
PCRE_PARTIAL. |
PCRE_PARTIAL. |
1841 |
|
|
1842 |
PCRE_ANCHORED |
PCRE_ANCHORED |
1843 |
|
|
1844 |
The PCRE_ANCHORED option limits pcre_exec() to matching at the first |
The PCRE_ANCHORED option limits pcre_exec() to matching at the first |
1845 |
matching position. If a pattern was compiled with PCRE_ANCHORED, or |
matching position. If a pattern was compiled with PCRE_ANCHORED, or |
1846 |
turned out to be anchored by virtue of its contents, it cannot be made |
turned out to be anchored by virtue of its contents, it cannot be made |
1847 |
unachored at matching time. |
unachored at matching time. |
1848 |
|
|
1849 |
|
PCRE_BSR_ANYCRLF |
1850 |
|
PCRE_BSR_UNICODE |
1851 |
|
|
1852 |
|
These options (which are mutually exclusive) control what the \R escape |
1853 |
|
sequence matches. The choice is either to match only CR, LF, or CRLF, |
1854 |
|
or to match any Unicode newline sequence. These options override the |
1855 |
|
choice that was made or defaulted when the pattern was compiled. |
1856 |
|
|
1857 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1858 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
1859 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
1870 |
When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
1871 |
set, and a match attempt for an unanchored pattern fails when the cur- |
set, and a match attempt for an unanchored pattern fails when the cur- |
1872 |
rent position is at a CRLF sequence, and the pattern contains no |
rent position is at a CRLF sequence, and the pattern contains no |
1873 |
explicit matches for CR or NL characters, the match position is |
explicit matches for CR or LF characters, the match position is |
1874 |
advanced by two characters instead of one, in other words, to after the |
advanced by two characters instead of one, in other words, to after the |
1875 |
CRLF. |
CRLF. |
1876 |
|
|
1880 |
failing at the start, it skips both the CR and the LF before retrying. |
failing at the start, it skips both the CR and the LF before retrying. |
1881 |
However, the pattern [\r\n]A does match that string, because it con- |
However, the pattern [\r\n]A does match that string, because it con- |
1882 |
tains an explicit CR or LF reference, and so advances only by one char- |
tains an explicit CR or LF reference, and so advances only by one char- |
1883 |
acter after the first failure. Note than an explicit CR or LF refer- |
acter after the first failure. |
|
ence occurs for negated character classes such as [^X] because they can |
|
|
match CR or LF characters. |
|
1884 |
|
|
1885 |
Notwithstanding the above, anomalous effects may still occur when CRLF |
An explicit match for CR of LF is either a literal appearance of one of |
1886 |
|
those characters, or one of the \r or \n escape sequences. Implicit |
1887 |
|
matches such as [^X] do not count, nor does \s (which includes CR and |
1888 |
|
LF in the characters that it matches). |
1889 |
|
|
1890 |
|
Notwithstanding the above, anomalous effects may still occur when CRLF |
1891 |
is a valid newline sequence and explicit \r or \n escapes appear in the |
is a valid newline sequence and explicit \r or \n escapes appear in the |
1892 |
pattern. |
pattern. |
1893 |
|
|
1894 |
PCRE_NOTBOL |
PCRE_NOTBOL |
1895 |
|
|
1896 |
This option specifies that first character of the subject string is not |
This option specifies that first character of the subject string is not |
1897 |
the beginning of a line, so the circumflex metacharacter should not |
the beginning of a line, so the circumflex metacharacter should not |
1898 |
match before it. Setting this without PCRE_MULTILINE (at compile time) |
match before it. Setting this without PCRE_MULTILINE (at compile time) |
1899 |
causes circumflex never to match. This option affects only the behav- |
causes circumflex never to match. This option affects only the behav- |
1900 |
iour of the circumflex metacharacter. It does not affect \A. |
iour of the circumflex metacharacter. It does not affect \A. |
1901 |
|
|
1902 |
PCRE_NOTEOL |
PCRE_NOTEOL |
1903 |
|
|
1904 |
This option specifies that the end of the subject string is not the end |
This option specifies that the end of the subject string is not the end |
1905 |
of a line, so the dollar metacharacter should not match it nor (except |
of a line, so the dollar metacharacter should not match it nor (except |
1906 |
in multiline mode) a newline immediately before it. Setting this with- |
in multiline mode) a newline immediately before it. Setting this with- |
1907 |
out PCRE_MULTILINE (at compile time) causes dollar never to match. This |
out PCRE_MULTILINE (at compile time) causes dollar never to match. This |
1908 |
option affects only the behaviour of the dollar metacharacter. It does |
option affects only the behaviour of the dollar metacharacter. It does |
1909 |
not affect \Z or \z. |
not affect \Z or \z. |
1910 |
|
|
1911 |
PCRE_NOTEMPTY |
PCRE_NOTEMPTY |
1912 |
|
|
1913 |
An empty string is not considered to be a valid match if this option is |
An empty string is not considered to be a valid match if this option is |
1914 |
set. If there are alternatives in the pattern, they are tried. If all |
set. If there are alternatives in the pattern, they are tried. If all |
1915 |
the alternatives match the empty string, the entire match fails. For |
the alternatives match the empty string, the entire match fails. For |
1916 |
example, if the pattern |
example, if the pattern |
1917 |
|
|
1918 |
a?b? |
a?b? |
1919 |
|
|
1920 |
is applied to a string not beginning with "a" or "b", it matches the |
is applied to a string not beginning with "a" or "b", it matches the |
1921 |
empty string at the start of the subject. With PCRE_NOTEMPTY set, this |
empty string at the start of the subject. With PCRE_NOTEMPTY set, this |
1922 |
match is not valid, so PCRE searches further into the string for occur- |
match is not valid, so PCRE searches further into the string for occur- |
1923 |
rences of "a" or "b". |
rences of "a" or "b". |
1924 |
|
|
1925 |
Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe- |
Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe- |
1926 |
cial case of a pattern match of the empty string within its split() |
cial case of a pattern match of the empty string within its split() |
1927 |
function, and when using the /g modifier. It is possible to emulate |
function, and when using the /g modifier. It is possible to emulate |
1928 |
Perl's behaviour after matching a null string by first trying the match |
Perl's behaviour after matching a null string by first trying the match |
1929 |
again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then |
again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then |
1930 |
if that fails by advancing the starting offset (see below) and trying |
if that fails by advancing the starting offset (see below) and trying |
1931 |
an ordinary match again. There is some code that demonstrates how to do |
an ordinary match again. There is some code that demonstrates how to do |
1932 |
this in the pcredemo.c sample program. |
this in the pcredemo.c sample program. |
1933 |
|
|
1934 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
1935 |
|
|
1936 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a |
When PCRE_UTF8 is set at compile time, the validity of the subject as a |
1937 |
UTF-8 string is automatically checked when pcre_exec() is subsequently |
UTF-8 string is automatically checked when pcre_exec() is subsequently |
1938 |
called. The value of startoffset is also checked to ensure that it |
called. The value of startoffset is also checked to ensure that it |
1939 |
points to the start of a UTF-8 character. There is a discussion about |
points to the start of a UTF-8 character. There is a discussion about |
1940 |
the validity of UTF-8 strings in the section on UTF-8 support in the |
the validity of UTF-8 strings in the section on UTF-8 support in the |
1941 |
main pcre page. If an invalid UTF-8 sequence of bytes is found, |
main pcre page. If an invalid UTF-8 sequence of bytes is found, |
1942 |
pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con- |
pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con- |
1943 |
tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
1944 |
|
|
1945 |
If you already know that your subject is valid, and you want to skip |
If you already know that your subject is valid, and you want to skip |
1946 |
these checks for performance reasons, you can set the |
these checks for performance reasons, you can set the |
1947 |
PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to |
PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to |
1948 |
do this for the second and subsequent calls to pcre_exec() if you are |
do this for the second and subsequent calls to pcre_exec() if you are |
1949 |
making repeated calls to find all the matches in a single subject |
making repeated calls to find all the matches in a single subject |
1950 |
string. However, you should be sure that the value of startoffset |
string. However, you should be sure that the value of startoffset |
1951 |
points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is |
points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is |
1952 |
set, the effect of passing an invalid UTF-8 string as a subject, or a |
set, the effect of passing an invalid UTF-8 string as a subject, or a |
1953 |
value of startoffset that does not point to the start of a UTF-8 char- |
value of startoffset that does not point to the start of a UTF-8 char- |
1954 |
acter, is undefined. Your program may crash. |
acter, is undefined. Your program may crash. |
1955 |
|
|
1956 |
PCRE_PARTIAL |
PCRE_PARTIAL |
1957 |
|
|
1958 |
This option turns on the partial matching feature. If the subject |
This option turns on the partial matching feature. If the subject |
1959 |
string fails to match the pattern, but at some point during the match- |
string fails to match the pattern, but at some point during the match- |
1960 |
ing process the end of the subject was reached (that is, the subject |
ing process the end of the subject was reached (that is, the subject |
1961 |
partially matches the pattern and the failure to match occurred only |
partially matches the pattern and the failure to match occurred only |
1962 |
because there were not enough subject characters), pcre_exec() returns |
because there were not enough subject characters), pcre_exec() returns |
1963 |
PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is |
PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is |
1964 |
used, there are restrictions on what may appear in the pattern. These |
used, there are restrictions on what may appear in the pattern. These |
1965 |
are discussed in the pcrepartial documentation. |
are discussed in the pcrepartial documentation. |
1966 |
|
|
1967 |
The string to be matched by pcre_exec() |
The string to be matched by pcre_exec() |
1968 |
|
|
1969 |
The subject string is passed to pcre_exec() as a pointer in subject, a |
The subject string is passed to pcre_exec() as a pointer in subject, a |
1970 |
length in length, and a starting byte offset in startoffset. In UTF-8 |
length in length, and a starting byte offset in startoffset. In UTF-8 |
1971 |
mode, the byte offset must point to the start of a UTF-8 character. |
mode, the byte offset must point to the start of a UTF-8 character. |
1972 |
Unlike the pattern string, the subject may contain binary zero bytes. |
Unlike the pattern string, the subject may contain binary zero bytes. |
1973 |
When the starting offset is zero, the search for a match starts at the |
When the starting offset is zero, the search for a match starts at the |
1974 |
beginning of the subject, and this is by far the most common case. |
beginning of the subject, and this is by far the most common case. |
1975 |
|
|
1976 |
A non-zero starting offset is useful when searching for another match |
A non-zero starting offset is useful when searching for another match |
1977 |
in the same subject by calling pcre_exec() again after a previous suc- |
in the same subject by calling pcre_exec() again after a previous suc- |
1978 |
cess. Setting startoffset differs from just passing over a shortened |
cess. Setting startoffset differs from just passing over a shortened |
1979 |
string and setting PCRE_NOTBOL in the case of a pattern that begins |
string and setting PCRE_NOTBOL in the case of a pattern that begins |
1980 |
with any kind of lookbehind. For example, consider the pattern |
with any kind of lookbehind. For example, consider the pattern |
1981 |
|
|
1982 |
\Biss\B |
\Biss\B |
1983 |
|
|
1984 |
which finds occurrences of "iss" in the middle of words. (\B matches |
which finds occurrences of "iss" in the middle of words. (\B matches |
1985 |
only if the current position in the subject is not a word boundary.) |
only if the current position in the subject is not a word boundary.) |
1986 |
When applied to the string "Mississipi" the first call to pcre_exec() |
When applied to the string "Mississipi" the first call to pcre_exec() |
1987 |
finds the first occurrence. If pcre_exec() is called again with just |
finds the first occurrence. If pcre_exec() is called again with just |
1988 |
the remainder of the subject, namely "issipi", it does not match, |
the remainder of the subject, namely "issipi", it does not match, |
1989 |
because \B is always false at the start of the subject, which is deemed |
because \B is always false at the start of the subject, which is deemed |
1990 |
to be a word boundary. However, if pcre_exec() is passed the entire |
to be a word boundary. However, if pcre_exec() is passed the entire |
1991 |
string again, but with startoffset set to 4, it finds the second occur- |
string again, but with startoffset set to 4, it finds the second occur- |
1992 |
rence of "iss" because it is able to look behind the starting point to |
rence of "iss" because it is able to look behind the starting point to |
1993 |
discover that it is preceded by a letter. |
discover that it is preceded by a letter. |
1994 |
|
|
1995 |
If a non-zero starting offset is passed when the pattern is anchored, |
If a non-zero starting offset is passed when the pattern is anchored, |
1996 |
one attempt to match at the given offset is made. This can only succeed |
one attempt to match at the given offset is made. This can only succeed |
1997 |
if the pattern does not require the match to be at the start of the |
if the pattern does not require the match to be at the start of the |
1998 |
subject. |
subject. |
1999 |
|
|
2000 |
How pcre_exec() returns captured substrings |
How pcre_exec() returns captured substrings |
2001 |
|
|
2002 |
In general, a pattern matches a certain portion of the subject, and in |
In general, a pattern matches a certain portion of the subject, and in |
2003 |
addition, further substrings from the subject may be picked out by |
addition, further substrings from the subject may be picked out by |
2004 |
parts of the pattern. Following the usage in Jeffrey Friedl's book, |
parts of the pattern. Following the usage in Jeffrey Friedl's book, |
2005 |
this is called "capturing" in what follows, and the phrase "capturing |
this is called "capturing" in what follows, and the phrase "capturing |
2006 |
subpattern" is used for a fragment of a pattern that picks out a sub- |
subpattern" is used for a fragment of a pattern that picks out a sub- |
2007 |
string. PCRE supports several other kinds of parenthesized subpattern |
string. PCRE supports several other kinds of parenthesized subpattern |
2008 |
that do not cause substrings to be captured. |
that do not cause substrings to be captured. |
2009 |
|
|
2010 |
Captured substrings are returned to the caller via a vector of integer |
Captured substrings are returned to the caller via a vector of integer |
2011 |
offsets whose address is passed in ovector. The number of elements in |
offsets whose address is passed in ovector. The number of elements in |
2012 |
the vector is passed in ovecsize, which must be a non-negative number. |
the vector is passed in ovecsize, which must be a non-negative number. |
2013 |
Note: this argument is NOT the size of ovector in bytes. |
Note: this argument is NOT the size of ovector in bytes. |
2014 |
|
|
2015 |
The first two-thirds of the vector is used to pass back captured sub- |
The first two-thirds of the vector is used to pass back captured sub- |
2016 |
strings, each substring using a pair of integers. The remaining third |
strings, each substring using a pair of integers. The remaining third |
2017 |
of the vector is used as workspace by pcre_exec() while matching cap- |
of the vector is used as workspace by pcre_exec() while matching cap- |
2018 |
turing subpatterns, and is not available for passing back information. |
turing subpatterns, and is not available for passing back information. |
2019 |
The length passed in ovecsize should always be a multiple of three. If |
The length passed in ovecsize should always be a multiple of three. If |
2020 |
it is not, it is rounded down. |
it is not, it is rounded down. |
2021 |
|
|
2022 |
When a match is successful, information about captured substrings is |
When a match is successful, information about captured substrings is |
2023 |
returned in pairs of integers, starting at the beginning of ovector, |
returned in pairs of integers, starting at the beginning of ovector, |
2024 |
and continuing up to two-thirds of its length at the most. The first |
and continuing up to two-thirds of its length at the most. The first |
2025 |
element of a pair is set to the offset of the first character in a sub- |
element of a pair is set to the offset of the first character in a sub- |
2026 |
string, and the second is set to the offset of the first character |
string, and the second is set to the offset of the first character |
2027 |
after the end of a substring. The first pair, ovector[0] and ovec- |
after the end of a substring. The first pair, ovector[0] and ovec- |
2028 |
tor[1], identify the portion of the subject string matched by the |
tor[1], identify the portion of the subject string matched by the |
2029 |
entire pattern. The next pair is used for the first capturing subpat- |
entire pattern. The next pair is used for the first capturing subpat- |
2030 |
tern, and so on. The value returned by pcre_exec() is one more than the |
tern, and so on. The value returned by pcre_exec() is one more than the |
2031 |
highest numbered pair that has been set. For example, if two substrings |
highest numbered pair that has been set. For example, if two substrings |
2032 |
have been captured, the returned value is 3. If there are no capturing |
have been captured, the returned value is 3. If there are no capturing |
2033 |
subpatterns, the return value from a successful match is 1, indicating |
subpatterns, the return value from a successful match is 1, indicating |
2034 |
that just the first pair of offsets has been set. |
that just the first pair of offsets has been set. |
2035 |
|
|
2036 |
If a capturing subpattern is matched repeatedly, it is the last portion |
If a capturing subpattern is matched repeatedly, it is the last portion |
2037 |
of the string that it matched that is returned. |
of the string that it matched that is returned. |
2038 |
|
|
2039 |
If the vector is too small to hold all the captured substring offsets, |
If the vector is too small to hold all the captured substring offsets, |
2040 |
it is used as far as possible (up to two-thirds of its length), and the |
it is used as far as possible (up to two-thirds of its length), and the |
2041 |
function returns a value of zero. In particular, if the substring off- |
function returns a value of zero. In particular, if the substring off- |
2042 |
sets are not of interest, pcre_exec() may be called with ovector passed |
sets are not of interest, pcre_exec() may be called with ovector passed |
2043 |
as NULL and ovecsize as zero. However, if the pattern contains back |
as NULL and ovecsize as zero. However, if the pattern contains back |
2044 |
references and the ovector is not big enough to remember the related |
references and the ovector is not big enough to remember the related |
2045 |
substrings, PCRE has to get additional memory for use during matching. |
substrings, PCRE has to get additional memory for use during matching. |
2046 |
Thus it is usually advisable to supply an ovector. |
Thus it is usually advisable to supply an ovector. |
2047 |
|
|
2048 |
The pcre_info() function can be used to find out how many capturing |
The pcre_info() function can be used to find out how many capturing |
2049 |
subpatterns there are in a compiled pattern. The smallest size for |
subpatterns there are in a compiled pattern. The smallest size for |
2050 |
ovector that will allow for n captured substrings, in addition to the |
ovector that will allow for n captured substrings, in addition to the |
2051 |
offsets of the substring matched by the whole pattern, is (n+1)*3. |
offsets of the substring matched by the whole pattern, is (n+1)*3. |
2052 |
|
|
2053 |
It is possible for capturing subpattern number n+1 to match some part |
It is possible for capturing subpattern number n+1 to match some part |
2054 |
of the subject when subpattern n has not been used at all. For example, |
of the subject when subpattern n has not been used at all. For example, |
2055 |
if the string "abc" is matched against the pattern (a|(z))(bc) the |
if the string "abc" is matched against the pattern (a|(z))(bc) the |
2056 |
return from the function is 4, and subpatterns 1 and 3 are matched, but |
return from the function is 4, and subpatterns 1 and 3 are matched, but |
2057 |
2 is not. When this happens, both values in the offset pairs corre- |
2 is not. When this happens, both values in the offset pairs corre- |
2058 |
sponding to unused subpatterns are set to -1. |
sponding to unused subpatterns are set to -1. |
2059 |
|
|
2060 |
Offset values that correspond to unused subpatterns at the end of the |
Offset values that correspond to unused subpatterns at the end of the |
2061 |
expression are also set to -1. For example, if the string "abc" is |
expression are also set to -1. For example, if the string "abc" is |
2062 |
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not |
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not |
2063 |
matched. The return from the function is 2, because the highest used |
matched. The return from the function is 2, because the highest used |
2064 |
capturing subpattern number is 1. However, you can refer to the offsets |
capturing subpattern number is 1. However, you can refer to the offsets |
2065 |
for the second and third capturing subpatterns if you wish (assuming |
for the second and third capturing subpatterns if you wish (assuming |
2066 |
the vector is large enough, of course). |
the vector is large enough, of course). |
2067 |
|
|
2068 |
Some convenience functions are provided for extracting the captured |
Some convenience functions are provided for extracting the captured |
2069 |
substrings as separate strings. These are described below. |
substrings as separate strings. These are described below. |
2070 |
|
|
2071 |
Error return values from pcre_exec() |
Error return values from pcre_exec() |
2072 |
|
|
2073 |
If pcre_exec() fails, it returns a negative number. The following are |
If pcre_exec() fails, it returns a negative number. The following are |
2074 |
defined in the header file: |
defined in the header file: |
2075 |
|
|
2076 |
PCRE_ERROR_NOMATCH (-1) |
PCRE_ERROR_NOMATCH (-1) |
2079 |
|
|
2080 |
PCRE_ERROR_NULL (-2) |
PCRE_ERROR_NULL (-2) |
2081 |
|
|
2082 |
Either code or subject was passed as NULL, or ovector was NULL and |
Either code or subject was passed as NULL, or ovector was NULL and |
2083 |
ovecsize was not zero. |
ovecsize was not zero. |
2084 |
|
|
2085 |
PCRE_ERROR_BADOPTION (-3) |
PCRE_ERROR_BADOPTION (-3) |
2088 |
|
|
2089 |
PCRE_ERROR_BADMAGIC (-4) |
PCRE_ERROR_BADMAGIC (-4) |
2090 |
|
|
2091 |
PCRE stores a 4-byte "magic number" at the start of the compiled code, |
PCRE stores a 4-byte "magic number" at the start of the compiled code, |
2092 |
to catch the case when it is passed a junk pointer and to detect when a |
to catch the case when it is passed a junk pointer and to detect when a |
2093 |
pattern that was compiled in an environment of one endianness is run in |
pattern that was compiled in an environment of one endianness is run in |
2094 |
an environment with the other endianness. This is the error that PCRE |
an environment with the other endianness. This is the error that PCRE |
2095 |
gives when the magic number is not present. |
gives when the magic number is not present. |
2096 |
|
|
2097 |
PCRE_ERROR_UNKNOWN_OPCODE (-5) |
PCRE_ERROR_UNKNOWN_OPCODE (-5) |
2098 |
|
|
2099 |
While running the pattern match, an unknown item was encountered in the |
While running the pattern match, an unknown item was encountered in the |
2100 |
compiled pattern. This error could be caused by a bug in PCRE or by |
compiled pattern. This error could be caused by a bug in PCRE or by |
2101 |
overwriting of the compiled pattern. |
overwriting of the compiled pattern. |
2102 |
|
|
2103 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2104 |
|
|
2105 |
If a pattern contains back references, but the ovector that is passed |
If a pattern contains back references, but the ovector that is passed |
2106 |
to pcre_exec() is not big enough to remember the referenced substrings, |
to pcre_exec() is not big enough to remember the referenced substrings, |
2107 |
PCRE gets a block of memory at the start of matching to use for this |
PCRE gets a block of memory at the start of matching to use for this |
2108 |
purpose. If the call via pcre_malloc() fails, this error is given. The |
purpose. If the call via pcre_malloc() fails, this error is given. The |
2109 |
memory is automatically freed at the end of matching. |
memory is automatically freed at the end of matching. |
2110 |
|
|
2111 |
PCRE_ERROR_NOSUBSTRING (-7) |
PCRE_ERROR_NOSUBSTRING (-7) |
2112 |
|
|
2113 |
This error is used by the pcre_copy_substring(), pcre_get_substring(), |
This error is used by the pcre_copy_substring(), pcre_get_substring(), |
2114 |
and pcre_get_substring_list() functions (see below). It is never |
and pcre_get_substring_list() functions (see below). It is never |
2115 |
returned by pcre_exec(). |
returned by pcre_exec(). |
2116 |
|
|
2117 |
PCRE_ERROR_MATCHLIMIT (-8) |
PCRE_ERROR_MATCHLIMIT (-8) |
2118 |
|
|
2119 |
The backtracking limit, as specified by the match_limit field in a |
The backtracking limit, as specified by the match_limit field in a |
2120 |
pcre_extra structure (or defaulted) was reached. See the description |
pcre_extra structure (or defaulted) was reached. See the description |
2121 |
above. |
above. |
2122 |
|
|
2123 |
PCRE_ERROR_CALLOUT (-9) |
PCRE_ERROR_CALLOUT (-9) |
2124 |
|
|
2125 |
This error is never generated by pcre_exec() itself. It is provided for |
This error is never generated by pcre_exec() itself. It is provided for |
2126 |
use by callout functions that want to yield a distinctive error code. |
use by callout functions that want to yield a distinctive error code. |
2127 |
See the pcrecallout documentation for details. |
See the pcrecallout documentation for details. |
2128 |
|
|
2129 |
PCRE_ERROR_BADUTF8 (-10) |
PCRE_ERROR_BADUTF8 (-10) |
2130 |
|
|
2131 |
A string that contains an invalid UTF-8 byte sequence was passed as a |
A string that contains an invalid UTF-8 byte sequence was passed as a |
2132 |
subject. |
subject. |
2133 |
|
|
2134 |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
2135 |
|
|
2136 |
The UTF-8 byte sequence that was passed as a subject was valid, but the |
The UTF-8 byte sequence that was passed as a subject was valid, but the |
2137 |
value of startoffset did not point to the beginning of a UTF-8 charac- |
value of startoffset did not point to the beginning of a UTF-8 charac- |
2138 |
ter. |
ter. |
2139 |
|
|
2140 |
PCRE_ERROR_PARTIAL (-12) |
PCRE_ERROR_PARTIAL (-12) |
2141 |
|
|
2142 |
The subject string did not match, but it did match partially. See the |
The subject string did not match, but it did match partially. See the |
2143 |
pcrepartial documentation for details of partial matching. |
pcrepartial documentation for details of partial matching. |
2144 |
|
|
2145 |
PCRE_ERROR_BADPARTIAL (-13) |
PCRE_ERROR_BADPARTIAL (-13) |
2146 |
|
|
2147 |
The PCRE_PARTIAL option was used with a compiled pattern containing |
The PCRE_PARTIAL option was used with a compiled pattern containing |
2148 |
items that are not supported for partial matching. See the pcrepartial |
items that are not supported for partial matching. See the pcrepartial |
2149 |
documentation for details of partial matching. |
documentation for details of partial matching. |
2150 |
|
|
2151 |
PCRE_ERROR_INTERNAL (-14) |
PCRE_ERROR_INTERNAL (-14) |
2152 |
|
|
2153 |
An unexpected internal error has occurred. This error could be caused |
An unexpected internal error has occurred. This error could be caused |
2154 |
by a bug in PCRE or by overwriting of the compiled pattern. |
by a bug in PCRE or by overwriting of the compiled pattern. |
2155 |
|
|
2156 |
PCRE_ERROR_BADCOUNT (-15) |
PCRE_ERROR_BADCOUNT (-15) |
2157 |
|
|
2158 |
This error is given if the value of the ovecsize argument is negative. |
This error is given if the value of the ovecsize argument is negative. |
2159 |
|
|
2160 |
PCRE_ERROR_RECURSIONLIMIT (-21) |
PCRE_ERROR_RECURSIONLIMIT (-21) |
2161 |
|
|
2162 |
The internal recursion limit, as specified by the match_limit_recursion |
The internal recursion limit, as specified by the match_limit_recursion |
2163 |
field in a pcre_extra structure (or defaulted) was reached. See the |
field in a pcre_extra structure (or defaulted) was reached. See the |
2164 |
description above. |
description above. |
2165 |
|
|
2166 |
PCRE_ERROR_BADNEWLINE (-23) |
PCRE_ERROR_BADNEWLINE (-23) |
2183 |
int pcre_get_substring_list(const char *subject, |
int pcre_get_substring_list(const char *subject, |
2184 |
int *ovector, int stringcount, const char ***listptr); |
int *ovector, int stringcount, const char ***listptr); |
2185 |
|
|
2186 |
Captured substrings can be accessed directly by using the offsets |
Captured substrings can be accessed directly by using the offsets |
2187 |
returned by pcre_exec() in ovector. For convenience, the functions |
returned by pcre_exec() in ovector. For convenience, the functions |
2188 |
pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- |
pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- |
2189 |
string_list() are provided for extracting captured substrings as new, |
string_list() are provided for extracting captured substrings as new, |
2190 |
separate, zero-terminated strings. These functions identify substrings |
separate, zero-terminated strings. These functions identify substrings |
2191 |
by number. The next section describes functions for extracting named |
by number. The next section describes functions for extracting named |
2192 |
substrings. |
substrings. |
2193 |
|
|
2194 |
A substring that contains a binary zero is correctly extracted and has |
A substring that contains a binary zero is correctly extracted and has |
2195 |
a further zero added on the end, but the result is not, of course, a C |
a further zero added on the end, but the result is not, of course, a C |
2196 |
string. However, you can process such a string by referring to the |
string. However, you can process such a string by referring to the |
2197 |
length that is returned by pcre_copy_substring() and pcre_get_sub- |
length that is returned by pcre_copy_substring() and pcre_get_sub- |
2198 |
string(). Unfortunately, the interface to pcre_get_substring_list() is |
string(). Unfortunately, the interface to pcre_get_substring_list() is |
2199 |
not adequate for handling strings containing binary zeros, because the |
not adequate for handling strings containing binary zeros, because the |
2200 |
end of the final string is not independently indicated. |
end of the final string is not independently indicated. |
2201 |
|
|
2202 |
The first three arguments are the same for all three of these func- |
The first three arguments are the same for all three of these func- |
2203 |
tions: subject is the subject string that has just been successfully |
tions: subject is the subject string that has just been successfully |
2204 |
matched, ovector is a pointer to the vector of integer offsets that was |
matched, ovector is a pointer to the vector of integer offsets that was |
2205 |
passed to pcre_exec(), and stringcount is the number of substrings that |
passed to pcre_exec(), and stringcount is the number of substrings that |
2206 |
were captured by the match, including the substring that matched the |
were captured by the match, including the substring that matched the |
2207 |
entire regular expression. This is the value returned by pcre_exec() if |
entire regular expression. This is the value returned by pcre_exec() if |
2208 |
it is greater than zero. If pcre_exec() returned zero, indicating that |
it is greater than zero. If pcre_exec() returned zero, indicating that |
2209 |
it ran out of space in ovector, the value passed as stringcount should |
it ran out of space in ovector, the value passed as stringcount should |
2210 |
be the number of elements in the vector divided by three. |
be the number of elements in the vector divided by three. |
2211 |
|
|
2212 |
The functions pcre_copy_substring() and pcre_get_substring() extract a |
The functions pcre_copy_substring() and pcre_get_substring() extract a |
2213 |
single substring, whose number is given as stringnumber. A value of |
single substring, whose number is given as stringnumber. A value of |
2214 |
zero extracts the substring that matched the entire pattern, whereas |
zero extracts the substring that matched the entire pattern, whereas |
2215 |
higher values extract the captured substrings. For pcre_copy_sub- |
higher values extract the captured substrings. For pcre_copy_sub- |
2216 |
string(), the string is placed in buffer, whose length is given by |
string(), the string is placed in buffer, whose length is given by |
2217 |
buffersize, while for pcre_get_substring() a new block of memory is |
buffersize, while for pcre_get_substring() a new block of memory is |
2218 |
obtained via pcre_malloc, and its address is returned via stringptr. |
obtained via pcre_malloc, and its address is returned via stringptr. |
2219 |
The yield of the function is the length of the string, not including |
The yield of the function is the length of the string, not including |
2220 |
the terminating zero, or one of these error codes: |
the terminating zero, or one of these error codes: |
2221 |
|
|
2222 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2223 |
|
|
2224 |
The buffer was too small for pcre_copy_substring(), or the attempt to |
The buffer was too small for pcre_copy_substring(), or the attempt to |
2225 |
get memory failed for pcre_get_substring(). |
get memory failed for pcre_get_substring(). |
2226 |
|
|
2227 |
PCRE_ERROR_NOSUBSTRING (-7) |
PCRE_ERROR_NOSUBSTRING (-7) |
2228 |
|
|
2229 |
There is no substring whose number is stringnumber. |
There is no substring whose number is stringnumber. |
2230 |
|
|
2231 |
The pcre_get_substring_list() function extracts all available sub- |
The pcre_get_substring_list() function extracts all available sub- |
2232 |
strings and builds a list of pointers to them. All this is done in a |
strings and builds a list of pointers to them. All this is done in a |
2233 |
single block of memory that is obtained via pcre_malloc. The address of |
single block of memory that is obtained via pcre_malloc. The address of |
2234 |
the memory block is returned via listptr, which is also the start of |
the memory block is returned via listptr, which is also the start of |
2235 |
the list of string pointers. The end of the list is marked by a NULL |
the list of string pointers. The end of the list is marked by a NULL |
2236 |
pointer. The yield of the function is zero if all went well, or the |
pointer. The yield of the function is zero if all went well, or the |
2237 |
error code |
error code |
2238 |
|
|
2239 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2240 |
|
|
2241 |
if the attempt to get the memory block failed. |
if the attempt to get the memory block failed. |
2242 |
|
|
2243 |
When any of these functions encounter a substring that is unset, which |
When any of these functions encounter a substring that is unset, which |
2244 |
can happen when capturing subpattern number n+1 matches some part of |
can happen when capturing subpattern number n+1 matches some part of |
2245 |
the subject, but subpattern n has not been used at all, they return an |
the subject, but subpattern n has not been used at all, they return an |
2246 |
empty string. This can be distinguished from a genuine zero-length sub- |
empty string. This can be distinguished from a genuine zero-length sub- |
2247 |
string by inspecting the appropriate offset in ovector, which is nega- |
string by inspecting the appropriate offset in ovector, which is nega- |
2248 |
tive for unset substrings. |
tive for unset substrings. |
2249 |
|
|
2250 |
The two convenience functions pcre_free_substring() and pcre_free_sub- |
The two convenience functions pcre_free_substring() and pcre_free_sub- |
2251 |
string_list() can be used to free the memory returned by a previous |
string_list() can be used to free the memory returned by a previous |
2252 |
call of pcre_get_substring() or pcre_get_substring_list(), respec- |
call of pcre_get_substring() or pcre_get_substring_list(), respec- |
2253 |
tively. They do nothing more than call the function pointed to by |
tively. They do nothing more than call the function pointed to by |
2254 |
pcre_free, which of course could be called directly from a C program. |
pcre_free, which of course could be called directly from a C program. |
2255 |
However, PCRE is used in some situations where it is linked via a spe- |
However, PCRE is used in some situations where it is linked via a spe- |
2256 |
cial interface to another programming language that cannot use |
cial interface to another programming language that cannot use |
2257 |
pcre_free directly; it is for these cases that the functions are pro- |
pcre_free directly; it is for these cases that the functions are pro- |
2258 |
vided. |
vided. |
2259 |
|
|
2260 |
|
|
2273 |
int stringcount, const char *stringname, |
int stringcount, const char *stringname, |
2274 |
const char **stringptr); |
const char **stringptr); |
2275 |
|
|
2276 |
To extract a substring by name, you first have to find associated num- |
To extract a substring by name, you first have to find associated num- |
2277 |
ber. For example, for this pattern |
ber. For example, for this pattern |
2278 |
|
|
2279 |
(a+)b(?<xxx>\d+)... |
(a+)b(?<xxx>\d+)... |
2282 |
be unique (PCRE_DUPNAMES was not set), you can find the number from the |
be unique (PCRE_DUPNAMES was not set), you can find the number from the |
2283 |
name by calling pcre_get_stringnumber(). The first argument is the com- |
name by calling pcre_get_stringnumber(). The first argument is the com- |
2284 |
piled pattern, and the second is the name. The yield of the function is |
piled pattern, and the second is the name. The yield of the function is |
2285 |
the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no |
the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no |
2286 |
subpattern of that name. |
subpattern of that name. |
2287 |
|
|
2288 |
Given the number, you can extract the substring directly, or use one of |
Given the number, you can extract the substring directly, or use one of |
2289 |
the functions described in the previous section. For convenience, there |
the functions described in the previous section. For convenience, there |
2290 |
are also two functions that do the whole job. |
are also two functions that do the whole job. |
2291 |
|
|
2292 |
Most of the arguments of pcre_copy_named_substring() and |
Most of the arguments of pcre_copy_named_substring() and |
2293 |
pcre_get_named_substring() are the same as those for the similarly |
pcre_get_named_substring() are the same as those for the similarly |
2294 |
named functions that extract by number. As these are described in the |
named functions that extract by number. As these are described in the |
2295 |
previous section, they are not re-described here. There are just two |
previous section, they are not re-described here. There are just two |
2296 |
differences: |
differences: |
2297 |
|
|
2298 |
First, instead of a substring number, a substring name is given. Sec- |
First, instead of a substring number, a substring name is given. Sec- |
2299 |
ond, there is an extra argument, given at the start, which is a pointer |
ond, there is an extra argument, given at the start, which is a pointer |
2300 |
to the compiled pattern. This is needed in order to gain access to the |
to the compiled pattern. This is needed in order to gain access to the |
2301 |
name-to-number translation table. |
name-to-number translation table. |
2302 |
|
|
2303 |
These functions call pcre_get_stringnumber(), and if it succeeds, they |
These functions call pcre_get_stringnumber(), and if it succeeds, they |
2304 |
then call pcre_copy_substring() or pcre_get_substring(), as appropri- |
then call pcre_copy_substring() or pcre_get_substring(), as appropri- |
2305 |
ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the |
ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the |
2306 |
behaviour may not be what you want (see the next section). |
behaviour may not be what you want (see the next section). |
2307 |
|
|
2308 |
|
|
2311 |
int pcre_get_stringtable_entries(const pcre *code, |
int pcre_get_stringtable_entries(const pcre *code, |
2312 |
const char *name, char **first, char **last); |
const char *name, char **first, char **last); |
2313 |
|
|
2314 |
When a pattern is compiled with the PCRE_DUPNAMES option, names for |
When a pattern is compiled with the PCRE_DUPNAMES option, names for |
2315 |
subpatterns are not required to be unique. Normally, patterns with |
subpatterns are not required to be unique. Normally, patterns with |
2316 |
duplicate names are such that in any one match, only one of the named |
duplicate names are such that in any one match, only one of the named |
2317 |
subpatterns participates. An example is shown in the pcrepattern docu- |
subpatterns participates. An example is shown in the pcrepattern docu- |
2318 |
mentation. |
mentation. |
2319 |
|
|
2320 |
When duplicates are present, pcre_copy_named_substring() and |
When duplicates are present, pcre_copy_named_substring() and |
2321 |
pcre_get_named_substring() return the first substring corresponding to |
pcre_get_named_substring() return the first substring corresponding to |
2322 |
the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING |
the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING |
2323 |
(-7) is returned; no data is returned. The pcre_get_stringnumber() |
(-7) is returned; no data is returned. The pcre_get_stringnumber() |
2324 |
function returns one of the numbers that are associated with the name, |
function returns one of the numbers that are associated with the name, |
2325 |
but it is not defined which it is. |
but it is not defined which it is. |
2326 |
|
|
2327 |
If you want to get full details of all captured substrings for a given |
If you want to get full details of all captured substrings for a given |
2328 |
name, you must use the pcre_get_stringtable_entries() function. The |
name, you must use the pcre_get_stringtable_entries() function. The |
2329 |
first argument is the compiled pattern, and the second is the name. The |
first argument is the compiled pattern, and the second is the name. The |
2330 |
third and fourth are pointers to variables which are updated by the |
third and fourth are pointers to variables which are updated by the |
2331 |
function. After it has run, they point to the first and last entries in |
function. After it has run, they point to the first and last entries in |
2332 |
the name-to-number table for the given name. The function itself |
the name-to-number table for the given name. The function itself |
2333 |
returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if |
returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if |
2334 |
there are none. The format of the table is described above in the sec- |
there are none. The format of the table is described above in the sec- |
2335 |
tion entitled Information about a pattern. Given all the relevant |
tion entitled Information about a pattern. Given all the relevant |
2336 |
entries for the name, you can extract each of their numbers, and hence |
entries for the name, you can extract each of their numbers, and hence |
2337 |
the captured data, if any. |
the captured data, if any. |
2338 |
|
|
2339 |
|
|
2340 |
FINDING ALL POSSIBLE MATCHES |
FINDING ALL POSSIBLE MATCHES |
2341 |
|
|
2342 |
The traditional matching function uses a similar algorithm to Perl, |
The traditional matching function uses a similar algorithm to Perl, |
2343 |
which stops when it finds the first match, starting at a given point in |
which stops when it finds the first match, starting at a given point in |
2344 |
the subject. If you want to find all possible matches, or the longest |
the subject. If you want to find all possible matches, or the longest |
2345 |
possible match, consider using the alternative matching function (see |
possible match, consider using the alternative matching function (see |
2346 |
below) instead. If you cannot use the alternative function, but still |
below) instead. If you cannot use the alternative function, but still |
2347 |
need to find all possible matches, you can kludge it up by making use |
need to find all possible matches, you can kludge it up by making use |
2348 |
of the callout facility, which is described in the pcrecallout documen- |
of the callout facility, which is described in the pcrecallout documen- |
2349 |
tation. |
tation. |
2350 |
|
|
2351 |
What you have to do is to insert a callout right at the end of the pat- |
What you have to do is to insert a callout right at the end of the pat- |
2352 |
tern. When your callout function is called, extract and save the cur- |
tern. When your callout function is called, extract and save the cur- |
2353 |
rent matched substring. Then return 1, which forces pcre_exec() to |
rent matched substring. Then return 1, which forces pcre_exec() to |
2354 |
backtrack and try other alternatives. Ultimately, when it runs out of |
backtrack and try other alternatives. Ultimately, when it runs out of |
2355 |
matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. |
matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. |
2356 |
|
|
2357 |
|
|
2362 |
int options, int *ovector, int ovecsize, |
int options, int *ovector, int ovecsize, |
2363 |
int *workspace, int wscount); |
int *workspace, int wscount); |
2364 |
|
|
2365 |
The function pcre_dfa_exec() is called to match a subject string |
The function pcre_dfa_exec() is called to match a subject string |
2366 |
against a compiled pattern, using a matching algorithm that scans the |
against a compiled pattern, using a matching algorithm that scans the |
2367 |
subject string just once, and does not backtrack. This has different |
subject string just once, and does not backtrack. This has different |
2368 |
characteristics to the normal algorithm, and is not compatible with |
characteristics to the normal algorithm, and is not compatible with |
2369 |
Perl. Some of the features of PCRE patterns are not supported. Never- |
Perl. Some of the features of PCRE patterns are not supported. Never- |
2370 |
theless, there are times when this kind of matching can be useful. For |
theless, there are times when this kind of matching can be useful. For |
2371 |
a discussion of the two matching algorithms, see the pcrematching docu- |
a discussion of the two matching algorithms, see the pcrematching docu- |
2372 |
mentation. |
mentation. |
2373 |
|
|
2374 |
The arguments for the pcre_dfa_exec() function are the same as for |
The arguments for the pcre_dfa_exec() function are the same as for |
2375 |
pcre_exec(), plus two extras. The ovector argument is used in a differ- |
pcre_exec(), plus two extras. The ovector argument is used in a differ- |
2376 |
ent way, and this is described below. The other common arguments are |
ent way, and this is described below. The other common arguments are |
2377 |
used in the same way as for pcre_exec(), so their description is not |
used in the same way as for pcre_exec(), so their description is not |
2378 |
repeated here. |
repeated here. |
2379 |
|
|
2380 |
The two additional arguments provide workspace for the function. The |
The two additional arguments provide workspace for the function. The |
2381 |
workspace vector should contain at least 20 elements. It is used for |
workspace vector should contain at least 20 elements. It is used for |
2382 |
keeping track of multiple paths through the pattern tree. More |
keeping track of multiple paths through the pattern tree. More |
2383 |
workspace will be needed for patterns and subjects where there are a |
workspace will be needed for patterns and subjects where there are a |
2384 |
lot of potential matches. |
lot of potential matches. |
2385 |
|
|
2386 |
Here is an example of a simple call to pcre_dfa_exec(): |
Here is an example of a simple call to pcre_dfa_exec(): |
2402 |
|
|
2403 |
Option bits for pcre_dfa_exec() |
Option bits for pcre_dfa_exec() |
2404 |
|
|
2405 |
The unused bits of the options argument for pcre_dfa_exec() must be |
The unused bits of the options argument for pcre_dfa_exec() must be |
2406 |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- |
2407 |
LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, |
LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, |
2408 |
PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
2409 |
three of these are the same as for pcre_exec(), so their description is |
three of these are the same as for pcre_exec(), so their description is |
2410 |
not repeated here. |
not repeated here. |
2411 |
|
|
2412 |
PCRE_PARTIAL |
PCRE_PARTIAL |
2413 |
|
|
2414 |
This has the same general effect as it does for pcre_exec(), but the |
This has the same general effect as it does for pcre_exec(), but the |
2415 |
details are slightly different. When PCRE_PARTIAL is set for |
details are slightly different. When PCRE_PARTIAL is set for |
2416 |
pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is converted into |
pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is converted into |
2417 |
PCRE_ERROR_PARTIAL if the end of the subject is reached, there have |
PCRE_ERROR_PARTIAL if the end of the subject is reached, there have |
2418 |
been no complete matches, but there is still at least one matching pos- |
been no complete matches, but there is still at least one matching pos- |
2419 |
sibility. The portion of the string that provided the partial match is |
sibility. The portion of the string that provided the partial match is |
2420 |
set as the first matching string. |
set as the first matching string. |
2421 |
|
|
2422 |
PCRE_DFA_SHORTEST |
PCRE_DFA_SHORTEST |
2423 |
|
|
2424 |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to |
2425 |
stop as soon as it has found one match. Because of the way the alterna- |
stop as soon as it has found one match. Because of the way the alterna- |
2426 |
tive algorithm works, this is necessarily the shortest possible match |
tive algorithm works, this is necessarily the shortest possible match |
2427 |
at the first possible matching point in the subject string. |
at the first possible matching point in the subject string. |
2428 |
|
|
2429 |
PCRE_DFA_RESTART |
PCRE_DFA_RESTART |
2430 |
|
|
2431 |
When pcre_dfa_exec() is called with the PCRE_PARTIAL option, and |
When pcre_dfa_exec() is called with the PCRE_PARTIAL option, and |
2432 |
returns a partial match, it is possible to call it again, with addi- |
returns a partial match, it is possible to call it again, with addi- |
2433 |
tional subject characters, and have it continue with the same match. |
tional subject characters, and have it continue with the same match. |
2434 |
The PCRE_DFA_RESTART option requests this action; when it is set, the |
The PCRE_DFA_RESTART option requests this action; when it is set, the |
2435 |
workspace and wscount options must reference the same vector as before |
workspace and wscount options must reference the same vector as before |
2436 |
because data about the match so far is left in them after a partial |
because data about the match so far is left in them after a partial |
2437 |
match. There is more discussion of this facility in the pcrepartial |
match. There is more discussion of this facility in the pcrepartial |
2438 |
documentation. |
documentation. |
2439 |
|
|
2440 |
Successful returns from pcre_dfa_exec() |
Successful returns from pcre_dfa_exec() |
2441 |
|
|
2442 |
When pcre_dfa_exec() succeeds, it may have matched more than one sub- |
When pcre_dfa_exec() succeeds, it may have matched more than one sub- |
2443 |
string in the subject. Note, however, that all the matches from one run |
string in the subject. Note, however, that all the matches from one run |
2444 |
of the function start at the same point in the subject. The shorter |
of the function start at the same point in the subject. The shorter |
2445 |
matches are all initial substrings of the longer matches. For example, |
matches are all initial substrings of the longer matches. For example, |
2446 |
if the pattern |
if the pattern |
2447 |
|
|
2448 |
<.*> |
<.*> |
2457 |
<something> <something else> |
<something> <something else> |
2458 |
<something> <something else> <something further> |
<something> <something else> <something further> |
2459 |
|
|
2460 |
On success, the yield of the function is a number greater than zero, |
On success, the yield of the function is a number greater than zero, |
2461 |
which is the number of matched substrings. The substrings themselves |
which is the number of matched substrings. The substrings themselves |
2462 |
are returned in ovector. Each string uses two elements; the first is |
are returned in ovector. Each string uses two elements; the first is |
2463 |
the offset to the start, and the second is the offset to the end. In |
the offset to the start, and the second is the offset to the end. In |
2464 |
fact, all the strings have the same start offset. (Space could have |
fact, all the strings have the same start offset. (Space could have |
2465 |
been saved by giving this only once, but it was decided to retain some |
been saved by giving this only once, but it was decided to retain some |
2466 |
compatibility with the way pcre_exec() returns data, even though the |
compatibility with the way pcre_exec() returns data, even though the |
2467 |
meaning of the strings is different.) |
meaning of the strings is different.) |
2468 |
|
|
2469 |
The strings are returned in reverse order of length; that is, the long- |
The strings are returned in reverse order of length; that is, the long- |
2470 |
est matching string is given first. If there were too many matches to |
est matching string is given first. If there were too many matches to |
2471 |
fit into ovector, the yield of the function is zero, and the vector is |
fit into ovector, the yield of the function is zero, and the vector is |
2472 |
filled with the longest matches. |
filled with the longest matches. |
2473 |
|
|
2474 |
Error returns from pcre_dfa_exec() |
Error returns from pcre_dfa_exec() |
2475 |
|
|
2476 |
The pcre_dfa_exec() function returns a negative number when it fails. |
The pcre_dfa_exec() function returns a negative number when it fails. |
2477 |
Many of the errors are the same as for pcre_exec(), and these are |
Many of the errors are the same as for pcre_exec(), and these are |
2478 |
described above. There are in addition the following errors that are |
described above. There are in addition the following errors that are |
2479 |
specific to pcre_dfa_exec(): |
specific to pcre_dfa_exec(): |
2480 |
|
|
2481 |
PCRE_ERROR_DFA_UITEM (-16) |
PCRE_ERROR_DFA_UITEM (-16) |
2482 |
|
|
2483 |
This return is given if pcre_dfa_exec() encounters an item in the pat- |
This return is given if pcre_dfa_exec() encounters an item in the pat- |
2484 |
tern that it does not support, for instance, the use of \C or a back |
tern that it does not support, for instance, the use of \C or a back |
2485 |
reference. |
reference. |
2486 |
|
|
2487 |
PCRE_ERROR_DFA_UCOND (-17) |
PCRE_ERROR_DFA_UCOND (-17) |
2488 |
|
|
2489 |
This return is given if pcre_dfa_exec() encounters a condition item |
This return is given if pcre_dfa_exec() encounters a condition item |
2490 |
that uses a back reference for the condition, or a test for recursion |
that uses a back reference for the condition, or a test for recursion |
2491 |
in a specific group. These are not supported. |
in a specific group. These are not supported. |
2492 |
|
|
2493 |
PCRE_ERROR_DFA_UMLIMIT (-18) |
PCRE_ERROR_DFA_UMLIMIT (-18) |
2494 |
|
|
2495 |
This return is given if pcre_dfa_exec() is called with an extra block |
This return is given if pcre_dfa_exec() is called with an extra block |
2496 |
that contains a setting of the match_limit field. This is not supported |
that contains a setting of the match_limit field. This is not supported |
2497 |
(it is meaningless). |
(it is meaningless). |
2498 |
|
|
2499 |
PCRE_ERROR_DFA_WSSIZE (-19) |
PCRE_ERROR_DFA_WSSIZE (-19) |
2500 |
|
|
2501 |
This return is given if pcre_dfa_exec() runs out of space in the |
This return is given if pcre_dfa_exec() runs out of space in the |
2502 |
workspace vector. |
workspace vector. |
2503 |
|
|
2504 |
PCRE_ERROR_DFA_RECURSE (-20) |
PCRE_ERROR_DFA_RECURSE (-20) |
2505 |
|
|
2506 |
When a recursive subpattern is processed, the matching function calls |
When a recursive subpattern is processed, the matching function calls |
2507 |
itself recursively, using private vectors for ovector and workspace. |
itself recursively, using private vectors for ovector and workspace. |
2508 |
This error is given if the output vector is not large enough. This |
This error is given if the output vector is not large enough. This |
2509 |
should be extremely rare, as a vector of size 1000 is used. |
should be extremely rare, as a vector of size 1000 is used. |
2510 |
|
|
2511 |
|
|
2512 |
SEE ALSO |
SEE ALSO |
2513 |
|
|
2514 |
pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar- |
pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar- |
2515 |
tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3). |
tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3). |
2516 |
|
|
2517 |
|
|
2518 |
AUTHOR |
AUTHOR |
2524 |
|
|
2525 |
REVISION |
REVISION |
2526 |
|
|
2527 |
Last updated: 21 August 2007 |
Last updated: 11 September 2007 |
2528 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
2529 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
2530 |
|
|
2809 |
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP- |
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP- |
2810 |
TURE options for pcre_exec() have no Perl equivalents. |
TURE options for pcre_exec() have no Perl equivalents. |
2811 |
|
|
2812 |
(g) The callout facility is PCRE-specific. |
(g) The \R escape sequence can be restricted to match only CR, LF, or |
2813 |
|
CRLF by the PCRE_BSR_ANYCRLF option. |
2814 |
|
|
2815 |
|
(h) The callout facility is PCRE-specific. |
2816 |
|
|
2817 |
(h) The partial matching facility is PCRE-specific. |
(i) The partial matching facility is PCRE-specific. |
2818 |
|
|
2819 |
(i) Patterns compiled by PCRE can be saved and re-used at a later time, |
(j) Patterns compiled by PCRE can be saved and re-used at a later time, |
2820 |
even on different hosts that have the other endianness. |
even on different hosts that have the other endianness. |
2821 |
|
|
2822 |
(j) The alternative matching function (pcre_dfa_exec()) matches in a |
(k) The alternative matching function (pcre_dfa_exec()) matches in a |
2823 |
different way and is not Perl-compatible. |
different way and is not Perl-compatible. |
2824 |
|
|
2825 |
|
(l) PCRE recognizes some special sequences such as (*CR) at the start |
2826 |
|
of a pattern that set overall options that cannot be changed within the |
2827 |
|
pattern. |
2828 |
|
|
2829 |
|
|
2830 |
AUTHOR |
AUTHOR |
2831 |
|
|
2836 |
|
|
2837 |
REVISION |
REVISION |
2838 |
|
|
2839 |
Last updated: 08 August 2007 |
Last updated: 11 September 2007 |
2840 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
2841 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
2842 |
|
|
2904 |
changes the convention to CR. That pattern matches "a\nb" because LF is |
changes the convention to CR. That pattern matches "a\nb" because LF is |
2905 |
no longer a newline. Note that these special settings, which are not |
no longer a newline. Note that these special settings, which are not |
2906 |
Perl-compatible, are recognized only at the very start of a pattern, |
Perl-compatible, are recognized only at the very start of a pattern, |
2907 |
and that they must be in upper case. |
and that they must be in upper case. If more than one of them is |
2908 |
|
present, the last one is used. |
2909 |
|
|
2910 |
|
The newline convention does not affect what the \R escape sequence |
2911 |
|
matches. By default, this is any Unicode newline sequence, for Perl |
2912 |
|
compatibility. However, this can be changed; see the description of \R |
2913 |
|
in the section entitled "Newline sequences" below. |
2914 |
|
|
2915 |
|
|
2916 |
CHARACTERS AND METACHARACTERS |
CHARACTERS AND METACHARACTERS |
3185 |
|
|
3186 |
Newline sequences |
Newline sequences |
3187 |
|
|
3188 |
Outside a character class, the escape sequence \R matches any Unicode |
Outside a character class, by default, the escape sequence \R matches |
3189 |
newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is |
any Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8 |
3190 |
equivalent to the following: |
mode \R is equivalent to the following: |
3191 |
|
|
3192 |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
3193 |
|
|
3203 |
rator, U+2029). Unicode character property support is not needed for |
rator, U+2029). Unicode character property support is not needed for |
3204 |
these characters to be recognized. |
these characters to be recognized. |
3205 |
|
|
3206 |
|
It is possible to restrict \R to match only CR, LF, or CRLF (instead of |
3207 |
|
the complete set of Unicode line endings) by setting the option |
3208 |
|
PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched. |
3209 |
|
This can be made the default when PCRE is built; if this is the case, |
3210 |
|
the other behaviour can be requested via the PCRE_BSR_UNICODE option. |
3211 |
|
It is also possible to specify these settings by starting a pattern |
3212 |
|
string with one of the following sequences: |
3213 |
|
|
3214 |
|
(*BSR_ANYCRLF) CR, LF, or CRLF only |
3215 |
|
(*BSR_UNICODE) any Unicode newline sequence |
3216 |
|
|
3217 |
|
These override the default and the options given to pcre_compile(), but |
3218 |
|
they can be overridden by options given to pcre_exec(). Note that these |
3219 |
|
special settings, which are not Perl-compatible, are recognized only at |
3220 |
|
the very start of a pattern, and that they must be in upper case. If |
3221 |
|
more than one of them is present, the last one is used. |
3222 |
|
|
3223 |
Inside a character class, \R matches the letter "R". |
Inside a character class, \R matches the letter "R". |
3224 |
|
|
3225 |
Unicode character properties |
Unicode character properties |
4849 |
|
|
4850 |
REVISION |
REVISION |
4851 |
|
|
4852 |
Last updated: 21 August 2007 |
Last updated: 11 September 2007 |
4853 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
4854 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
4855 |
|
|
5157 |
(*ANY) |
(*ANY) |
5158 |
|
|
5159 |
|
|
5160 |
|
WHAT \R MATCHES |
5161 |
|
|
5162 |
|
These are recognized only at the very start of a pattern. |
5163 |
|
|
5164 |
|
(*BSR_ANYCRLF) |
5165 |
|
(*BSR_UNICODE) |
5166 |
|
|
5167 |
|
|
5168 |
CALLOUTS |
CALLOUTS |
5169 |
|
|
5170 |
(?C) callout |
(?C) callout |
5185 |
|
|
5186 |
REVISION |
REVISION |
5187 |
|
|
5188 |
Last updated: 21 August 2007 |
Last updated: 11 September 2007 |
5189 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
5190 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5191 |
|
|