149 |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
150 |
Perl-compatible, is also provided. This uses a different algorithm for the |
Perl-compatible, is also provided. This uses a different algorithm for the |
151 |
matching. The alternative algorithm finds all possible matches (at a given |
matching. The alternative algorithm finds all possible matches (at a given |
152 |
point in the subject), and scans the subject just once. However, this algorithm |
point in the subject), and scans the subject just once (unless there are |
153 |
does not return captured substrings. A description of the two matching |
lookbehind assertions). However, this algorithm does not return captured |
154 |
algorithms and their advantages and disadvantages is given in the |
substrings. A description of the two matching algorithms and their advantages |
155 |
|
and disadvantages is given in the |
156 |
.\" HREF |
.\" HREF |
157 |
\fBpcrematching\fP |
\fBpcrematching\fP |
158 |
.\" |
.\" |
427 |
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns |
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns |
428 |
NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual |
NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual |
429 |
error message. This is a static string that is part of the library. You must |
error message. This is a static string that is part of the library. You must |
430 |
not try to free it. The offset from the start of the pattern to the character |
not try to free it. The byte offset from the start of the pattern to the |
431 |
where the error was discovered is placed in the variable pointed to by |
character that was being processes when the error was discovered is placed in |
432 |
\fIerroffset\fP, which must not be NULL. If it is, an immediate error is given. |
the variable pointed to by \fIerroffset\fP, which must not be NULL. If it is, |
433 |
|
an immediate error is given. Some errors are not detected until checks are |
434 |
|
carried out when the whole pattern has been scanned; in this case the offset is |
435 |
|
set to the end of the pattern. |
436 |
.P |
.P |
437 |
If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the |
If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the |
438 |
\fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is |
\fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is |
772 |
results of the study. |
results of the study. |
773 |
.P |
.P |
774 |
The returned value from \fBpcre_study()\fP can be passed directly to |
The returned value from \fBpcre_study()\fP can be passed directly to |
775 |
\fBpcre_exec()\fP. However, a \fBpcre_extra\fP block also contains other |
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. However, a \fBpcre_extra\fP block |
776 |
fields that can be set by the caller before the block is passed; these are |
also contains other fields that can be set by the caller before the block is |
777 |
described |
passed; these are described |
778 |
.\" HTML <a href="#extradata"> |
.\" HTML <a href="#extradata"> |
779 |
.\" </a> |
.\" </a> |
780 |
below |
below |
781 |
.\" |
.\" |
782 |
in the section on matching a pattern. |
in the section on matching a pattern. |
783 |
.P |
.P |
784 |
If studying the pattern does not produce any additional information |
If studying the pattern does not produce any useful information, |
785 |
\fBpcre_study()\fP returns NULL. In that circumstance, if the calling program |
\fBpcre_study()\fP returns NULL. In that circumstance, if the calling program |
786 |
wants to pass any of the other fields to \fBpcre_exec()\fP, it must set up its |
wants to pass any of the other fields to \fBpcre_exec()\fP or |
787 |
own \fBpcre_extra\fP block. |
\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block. |
788 |
.P |
.P |
789 |
The second argument of \fBpcre_study()\fP contains option bits. At present, no |
The second argument of \fBpcre_study()\fP contains option bits. At present, no |
790 |
options are defined, and this argument should always be zero. |
options are defined, and this argument should always be zero. |
804 |
0, /* no options exist */ |
0, /* no options exist */ |
805 |
&error); /* set to NULL or points to a message */ |
&error); /* set to NULL or points to a message */ |
806 |
.sp |
.sp |
807 |
At present, studying a pattern is useful only for non-anchored patterns that do |
Studying a pattern does two things: first, a lower bound for the length of |
808 |
not have a single fixed starting character. A bitmap of possible starting |
subject string that is needed to match the pattern is computed. This does not |
809 |
bytes is created. |
mean that there are any strings of that length that match, but it does |
810 |
|
guarantee that no shorter strings match. The value is used by |
811 |
|
\fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP to avoid wasting time by trying to |
812 |
|
match strings that are shorter than the lower bound. You can find out the value |
813 |
|
in a calling program via the \fBpcre_fullinfo()\fP function. |
814 |
|
.P |
815 |
|
Studying a pattern is also useful for non-anchored patterns that do not have a |
816 |
|
single fixed starting character. A bitmap of possible starting bytes is |
817 |
|
created. This speeds up finding a position in the subject at which to start |
818 |
|
matching. |
819 |
. |
. |
820 |
. |
. |
821 |
.\" HTML <a name="localesupport"></a> |
.\" HTML <a name="localesupport"></a> |
980 |
/^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value |
/^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value |
981 |
is -1. |
is -1. |
982 |
.sp |
.sp |
983 |
|
PCRE_INFO_MINLENGTH |
984 |
|
.sp |
985 |
|
If the pattern was studied and a minimum length for matching subject strings |
986 |
|
was computed, its value is returned. Otherwise the returned value is -1. The |
987 |
|
value is a number of characters, not bytes (there may be a difference in UTF-8 |
988 |
|
mode). The fourth argument should point to an \fBint\fP variable. A |
989 |
|
non-negative value is a lower bound to the length of any matching string. There |
990 |
|
may not be any strings of that length that do actually match, but every string |
991 |
|
that does match is at least that long. |
992 |
|
.sp |
993 |
PCRE_INFO_NAMECOUNT |
PCRE_INFO_NAMECOUNT |
994 |
PCRE_INFO_NAMEENTRYSIZE |
PCRE_INFO_NAMEENTRYSIZE |
995 |
PCRE_INFO_NAMETABLE |
PCRE_INFO_NAMETABLE |
1034 |
.sp |
.sp |
1035 |
PCRE_INFO_OKPARTIAL |
PCRE_INFO_OKPARTIAL |
1036 |
.sp |
.sp |
1037 |
Return 1 if the pattern can be used for partial matching, otherwise 0. The |
Return 1 if the pattern can be used for partial matching with |
1038 |
fourth argument should point to an \fBint\fP variable. From release 8.00, this |
\fBpcre_exec()\fP, otherwise 0. The fourth argument should point to an |
1039 |
always returns 1, because the restrictions that previously applied to partial |
\fBint\fP variable. From release 8.00, this always returns 1, because the |
1040 |
matching have been lifted. The |
restrictions that previously applied to partial matching have been lifted. The |
1041 |
.\" HREF |
.\" HREF |
1042 |
\fBpcrepartial\fP |
\fBpcrepartial\fP |
1043 |
.\" |
.\" |
1078 |
Return the size of the data block pointed to by the \fIstudy_data\fP field in |
Return the size of the data block pointed to by the \fIstudy_data\fP field in |
1079 |
a \fBpcre_extra\fP block. That is, it is the value that was passed to |
a \fBpcre_extra\fP block. That is, it is the value that was passed to |
1080 |
\fBpcre_malloc()\fP when PCRE was getting memory into which to place the data |
\fBpcre_malloc()\fP when PCRE was getting memory into which to place the data |
1081 |
created by \fBpcre_study()\fP. The fourth argument should point to a |
created by \fBpcre_study()\fP. If \fBpcre_extra\fP is NULL, or there is no |
1082 |
|
study data, zero is returned. The fourth argument should point to a |
1083 |
\fBsize_t\fP variable. |
\fBsize_t\fP variable. |
1084 |
. |
. |
1085 |
. |
. |
1141 |
.P |
.P |
1142 |
The function \fBpcre_exec()\fP is called to match a subject string against a |
The function \fBpcre_exec()\fP is called to match a subject string against a |
1143 |
compiled pattern, which is passed in the \fIcode\fP argument. If the |
compiled pattern, which is passed in the \fIcode\fP argument. If the |
1144 |
pattern has been studied, the result of the study should be passed in the |
pattern was studied, the result of the study should be passed in the |
1145 |
\fIextra\fP argument. This function is the main matching facility of the |
\fIextra\fP argument. This function is the main matching facility of the |
1146 |
library, and it operates in a Perl-like manner. For specialist use there is |
library, and it operates in a Perl-like manner. For specialist use there is |
1147 |
also an alternative matching function, which is described |
also an alternative matching function, which is described |
1242 |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the \fIflags\fP field. If the limit |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the \fIflags\fP field. If the limit |
1243 |
is exceeded, \fBpcre_exec()\fP returns PCRE_ERROR_RECURSIONLIMIT. |
is exceeded, \fBpcre_exec()\fP returns PCRE_ERROR_RECURSIONLIMIT. |
1244 |
.P |
.P |
1245 |
The \fIpcre_callout\fP field is used in conjunction with the "callout" feature, |
The \fIcallout_data\fP field is used in conjunction with the "callout" feature, |
1246 |
which is described in the |
and is described in the |
1247 |
.\" HREF |
.\" HREF |
1248 |
\fBpcrecallout\fP |
\fBpcrecallout\fP |
1249 |
.\" |
.\" |
1269 |
.sp |
.sp |
1270 |
The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be |
The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be |
1271 |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, |
1272 |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_START_OPTIMIZE, |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
1273 |
PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and PCRE_PARTIAL_HARD. |
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and |
1274 |
|
PCRE_PARTIAL_HARD. |
1275 |
.sp |
.sp |
1276 |
PCRE_ANCHORED |
PCRE_ANCHORED |
1277 |
.sp |
.sp |
1346 |
.sp |
.sp |
1347 |
a?b? |
a?b? |
1348 |
.sp |
.sp |
1349 |
is applied to a string not beginning with "a" or "b", it matches the empty |
is applied to a string not beginning with "a" or "b", it matches an empty |
1350 |
string at the start of the subject. With PCRE_NOTEMPTY set, this match is not |
string at the start of the subject. With PCRE_NOTEMPTY set, this match is not |
1351 |
valid, so PCRE searches further into the string for occurrences of "a" or "b". |
valid, so PCRE searches further into the string for occurrences of "a" or "b". |
1352 |
.P |
.sp |
1353 |
Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a special case |
PCRE_NOTEMPTY_ATSTART |
1354 |
of a pattern match of the empty string within its \fBsplit()\fP function, and |
.sp |
1355 |
when using the /g modifier. It is possible to emulate Perl's behaviour after |
This is like PCRE_NOTEMPTY, except that an empty string match that is not at |
1356 |
matching a null string by first trying the match again at the same offset with |
the start of the subject is permitted. If the pattern is anchored, such a match |
1357 |
PCRE_NOTEMPTY and PCRE_ANCHORED, and then if that fails by advancing the |
can occur only if the pattern contains \eK. |
1358 |
starting offset (see below) and trying an ordinary match again. There is some |
.P |
1359 |
code that demonstrates how to do this in the |
Perl has no direct equivalent of PCRE_NOTEMPTY or PCRE_NOTEMPTY_ATSTART, but it |
1360 |
|
does make a special case of a pattern match of the empty string within its |
1361 |
|
\fBsplit()\fP function, and when using the /g modifier. It is possible to |
1362 |
|
emulate Perl's behaviour after matching a null string by first trying the match |
1363 |
|
again at the same offset with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then |
1364 |
|
if that fails, by advancing the starting offset (see below) and trying an |
1365 |
|
ordinary match again. There is some code that demonstrates how to do this in |
1366 |
|
the |
1367 |
.\" HREF |
.\" HREF |
1368 |
\fBpcredemo\fP |
\fBpcredemo\fP |
1369 |
.\" |
.\" |
1419 |
PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, matching continues |
PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, matching continues |
1420 |
by testing any other alternatives. Only if they all fail is PCRE_ERROR_PARTIAL |
by testing any other alternatives. Only if they all fail is PCRE_ERROR_PARTIAL |
1421 |
returned (instead of PCRE_ERROR_NOMATCH). The portion of the string that |
returned (instead of PCRE_ERROR_NOMATCH). The portion of the string that |
1422 |
provided the partial match is set as the first matching string. There is a more |
was inspected when the partial match was found is set as the first matching |
1423 |
detailed discussion in the |
string. There is a more detailed discussion in the |
1424 |
.\" HREF |
.\" HREF |
1425 |
\fBpcrepartial\fP |
\fBpcrepartial\fP |
1426 |
.\" |
.\" |
1866 |
just once, and does not backtrack. This has different characteristics to the |
just once, and does not backtrack. This has different characteristics to the |
1867 |
normal algorithm, and is not compatible with Perl. Some of the features of PCRE |
normal algorithm, and is not compatible with Perl. Some of the features of PCRE |
1868 |
patterns are not supported. Nevertheless, there are times when this kind of |
patterns are not supported. Nevertheless, there are times when this kind of |
1869 |
matching can be useful. For a discussion of the two matching algorithms, see |
matching can be useful. For a discussion of the two matching algorithms, and a |
1870 |
the |
list of features that \fBpcre_dfa_exec()\fP does not support, see the |
1871 |
.\" HREF |
.\" HREF |
1872 |
\fBpcrematching\fP |
\fBpcrematching\fP |
1873 |
.\" |
.\" |
1906 |
.sp |
.sp |
1907 |
The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be |
The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be |
1908 |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, |
1909 |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
1910 |
PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, |
1911 |
four of these are exactly the same as for \fBpcre_exec()\fP, so their |
and PCRE_DFA_RESTART. All but the last four of these are exactly the same as |
1912 |
description is not repeated here. |
for \fBpcre_exec()\fP, so their description is not repeated here. |
1913 |
.sp |
.sp |
1914 |
PCRE_PARTIAL_HARD |
PCRE_PARTIAL_HARD |
1915 |
PCRE_PARTIAL_SOFT |
PCRE_PARTIAL_SOFT |
1922 |
been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH |
been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH |
1923 |
is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached, |
is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached, |
1924 |
there have been no complete matches, but there is still at least one matching |
there have been no complete matches, but there is still at least one matching |
1925 |
possibility. The portion of the string that provided the longest partial match |
possibility. The portion of the string that was inspected when the longest |
1926 |
is set as the first matching string in both cases. |
partial match was found is set as the first matching string in both cases. |
1927 |
.sp |
.sp |
1928 |
PCRE_DFA_SHORTEST |
PCRE_DFA_SHORTEST |
1929 |
.sp |
.sp |
2043 |
.rs |
.rs |
2044 |
.sp |
.sp |
2045 |
.nf |
.nf |
2046 |
Last updated: 01 September 2009 |
Last updated: 26 September 2009 |
2047 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
2048 |
.fi |
.fi |