140 |
.\" HREF |
.\" HREF |
141 |
\fBpcresample\fP |
\fBpcresample\fP |
142 |
.\" |
.\" |
143 |
documentation describes how to run it. |
documentation describes how to compile and run it. |
144 |
.P |
.P |
145 |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
146 |
Perl-compatible, is also provided. This uses a different algorithm for the |
Perl-compatible, is also provided. This uses a different algorithm for the |
549 |
the first newline in the subject string, though the matched text may continue |
the first newline in the subject string, though the matched text may continue |
550 |
over the newline. |
over the newline. |
551 |
.sp |
.sp |
552 |
|
PCRE_JAVASCRIPT_COMPAT |
553 |
|
.sp |
554 |
|
If this option is set, PCRE's behaviour is changed in some ways so that it is |
555 |
|
compatible with JavaScript rather than Perl. The changes are as follows: |
556 |
|
.P |
557 |
|
(1) A lone closing square bracket in a pattern causes a compile-time error, |
558 |
|
because this is illegal in JavaScript (by default it is treated as a data |
559 |
|
character). Thus, the pattern AB]CD becomes illegal when this option is set. |
560 |
|
.P |
561 |
|
(2) At run time, a back reference to an unset subpattern group matches an empty |
562 |
|
string (by default this causes the current matching alternative to fail). A |
563 |
|
pattern such as (\e1)(a) succeeds when this option is set (assuming it can find |
564 |
|
an "a" in the subject), whereas it fails by default, for Perl compatibility. |
565 |
|
.sp |
566 |
PCRE_MULTILINE |
PCRE_MULTILINE |
567 |
.sp |
.sp |
568 |
By default, PCRE treats the subject string as consisting of a single line of |
By default, PCRE treats the subject string as consisting of a single line of |
686 |
9 nothing to repeat |
9 nothing to repeat |
687 |
10 [this code is not in use] |
10 [this code is not in use] |
688 |
11 internal error: unexpected repeat |
11 internal error: unexpected repeat |
689 |
12 unrecognized character after (? |
12 unrecognized character after (? or (?- |
690 |
13 POSIX named classes are supported only within a class |
13 POSIX named classes are supported only within a class |
691 |
14 missing ) |
14 missing ) |
692 |
15 reference to non-existent subpattern |
15 reference to non-existent subpattern |
694 |
17 unknown option bit(s) set |
17 unknown option bit(s) set |
695 |
18 missing ) after comment |
18 missing ) after comment |
696 |
19 [this code is not in use] |
19 [this code is not in use] |
697 |
20 regular expression too large |
20 regular expression is too large |
698 |
21 failed to get memory |
21 failed to get memory |
699 |
22 unmatched parentheses |
22 unmatched parentheses |
700 |
23 internal error: code overflow |
23 internal error: code overflow |
723 |
46 malformed \eP or \ep sequence |
46 malformed \eP or \ep sequence |
724 |
47 unknown property name after \eP or \ep |
47 unknown property name after \eP or \ep |
725 |
48 subpattern name is too long (maximum 32 characters) |
48 subpattern name is too long (maximum 32 characters) |
726 |
49 too many named subpatterns (maximum 10,000) |
49 too many named subpatterns (maximum 10000) |
727 |
50 [this code is not in use] |
50 [this code is not in use] |
728 |
51 octal value is greater than \e377 (not in UTF-8 mode) |
51 octal value is greater than \e377 (not in UTF-8 mode) |
729 |
52 internal error: overran compiling workspace |
52 internal error: overran compiling workspace |
731 |
54 DEFINE group contains more than one branch |
54 DEFINE group contains more than one branch |
732 |
55 repeating a DEFINE group is not allowed |
55 repeating a DEFINE group is not allowed |
733 |
56 inconsistent NEWLINE options |
56 inconsistent NEWLINE options |
734 |
57 \eg is not followed by a braced name or an optionally braced |
57 \eg is not followed by a braced, angle-bracketed, or quoted |
735 |
non-zero number |
name/number or by a plain number |
736 |
58 (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number |
58 a numbered reference must not be zero |
737 |
|
59 (*VERB) with an argument is not supported |
738 |
|
60 (*VERB) not recognized |
739 |
|
61 number is too big |
740 |
|
62 subpattern name expected |
741 |
|
63 digit expected after (?+ |
742 |
|
64 ] is an invalid data character in JavaScript compatibility mode |
743 |
|
.sp |
744 |
|
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
745 |
|
be used if the limits were changed when PCRE was built. |
746 |
. |
. |
747 |
. |
. |
748 |
.SH "STUDYING A PATTERN" |
.SH "STUDYING A PATTERN" |
941 |
PCRE_INFO_HASCRORLF |
PCRE_INFO_HASCRORLF |
942 |
.sp |
.sp |
943 |
Return 1 if the pattern contains any explicit matches for CR or LF characters, |
Return 1 if the pattern contains any explicit matches for CR or LF characters, |
944 |
otherwise 0. The fourth argument should point to an \fBint\fP variable. An |
otherwise 0. The fourth argument should point to an \fBint\fP variable. An |
945 |
explicit match is either a literal CR or LF character, or \er or \en. |
explicit match is either a literal CR or LF character, or \er or \en. |
946 |
.sp |
.sp |
947 |
PCRE_INFO_JCHANGED |
PCRE_INFO_JCHANGED |
948 |
.sp |
.sp |
949 |
Return 1 if the (?J) option setting is used in the pattern, otherwise 0. The |
Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise |
950 |
fourth argument should point to an \fBint\fP variable. The (?J) internal option |
0. The fourth argument should point to an \fBint\fP variable. (?J) and |
951 |
setting changes the local PCRE_DUPNAMES option. |
(?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
952 |
.sp |
.sp |
953 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
954 |
.sp |
.sp |
1371 |
.rs |
.rs |
1372 |
.sp |
.sp |
1373 |
The subject string is passed to \fBpcre_exec()\fP as a pointer in |
The subject string is passed to \fBpcre_exec()\fP as a pointer in |
1374 |
\fIsubject\fP, a length in \fIlength\fP, and a starting byte offset in |
\fIsubject\fP, a length (in bytes) in \fIlength\fP, and a starting byte offset |
1375 |
\fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of a |
in \fIstartoffset\fP. In UTF-8 mode, the byte offset must point to the start of |
1376 |
UTF-8 character. Unlike the pattern string, the subject may contain binary zero |
a UTF-8 character. Unlike the pattern string, the subject may contain binary |
1377 |
bytes. When the starting offset is zero, the search for a match starts at the |
zero bytes. When the starting offset is zero, the search for a match starts at |
1378 |
beginning of the subject, and this is by far the most common case. |
the beginning of the subject, and this is by far the most common case. |
1379 |
.P |
.P |
1380 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
1381 |
same subject by calling \fBpcre_exec()\fP again after a previous success. |
same subject by calling \fBpcre_exec()\fP again after a previous success. |
1409 |
a fragment of a pattern that picks out a substring. PCRE supports several other |
a fragment of a pattern that picks out a substring. PCRE supports several other |
1410 |
kinds of parenthesized subpattern that do not cause substrings to be captured. |
kinds of parenthesized subpattern that do not cause substrings to be captured. |
1411 |
.P |
.P |
1412 |
Captured substrings are returned to the caller via a vector of integer offsets |
Captured substrings are returned to the caller via a vector of integers whose |
1413 |
whose address is passed in \fIovector\fP. The number of elements in the vector |
address is passed in \fIovector\fP. The number of elements in the vector is |
1414 |
is passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP: |
passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP: this |
1415 |
this argument is NOT the size of \fIovector\fP in bytes. |
argument is NOT the size of \fIovector\fP in bytes. |
1416 |
.P |
.P |
1417 |
The first two-thirds of the vector is used to pass back captured substrings, |
The first two-thirds of the vector is used to pass back captured substrings, |
1418 |
each substring using a pair of integers. The remaining third of the vector is |
each substring using a pair of integers. The remaining third of the vector is |
1419 |
used as workspace by \fBpcre_exec()\fP while matching capturing subpatterns, |
used as workspace by \fBpcre_exec()\fP while matching capturing subpatterns, |
1420 |
and is not available for passing back information. The length passed in |
and is not available for passing back information. The number passed in |
1421 |
\fIovecsize\fP should always be a multiple of three. If it is not, it is |
\fIovecsize\fP should always be a multiple of three. If it is not, it is |
1422 |
rounded down. |
rounded down. |
1423 |
.P |
.P |
1424 |
When a match is successful, information about captured substrings is returned |
When a match is successful, information about captured substrings is returned |
1425 |
in pairs of integers, starting at the beginning of \fIovector\fP, and |
in pairs of integers, starting at the beginning of \fIovector\fP, and |
1426 |
continuing up to two-thirds of its length at the most. The first element of a |
continuing up to two-thirds of its length at the most. The first element of |
1427 |
pair is set to the offset of the first character in a substring, and the second |
each pair is set to the byte offset of the first character in a substring, and |
1428 |
is set to the offset of the first character after the end of a substring. The |
the second is set to the byte offset of the first character after the end of a |
1429 |
first pair, \fIovector[0]\fP and \fIovector[1]\fP, identify the portion of the |
substring. \fBNote\fP: these values are always byte offsets, even in UTF-8 |
1430 |
subject string matched by the entire pattern. The next pair is used for the |
mode. They are not character counts. |
1431 |
first capturing subpattern, and so on. The value returned by \fBpcre_exec()\fP |
.P |
1432 |
is one more than the highest numbered pair that has been set. For example, if |
The first pair of integers, \fIovector[0]\fP and \fIovector[1]\fP, identify the |
1433 |
two substrings have been captured, the returned value is 3. If there are no |
portion of the subject string matched by the entire pattern. The next pair is |
1434 |
capturing subpatterns, the return value from a successful match is 1, |
used for the first capturing subpattern, and so on. The value returned by |
1435 |
indicating that just the first pair of offsets has been set. |
\fBpcre_exec()\fP is one more than the highest numbered pair that has been set. |
1436 |
|
For example, if two substrings have been captured, the returned value is 3. If |
1437 |
|
there are no capturing subpatterns, the return value from a successful match is |
1438 |
|
1, indicating that just the first pair of offsets has been set. |
1439 |
.P |
.P |
1440 |
If a capturing subpattern is matched repeatedly, it is the last portion of the |
If a capturing subpattern is matched repeatedly, it is the last portion of the |
1441 |
string that it matched that is returned. |
string that it matched that is returned. |
1442 |
.P |
.P |
1443 |
If the vector is too small to hold all the captured substring offsets, it is |
If the vector is too small to hold all the captured substring offsets, it is |
1444 |
used as far as possible (up to two-thirds of its length), and the function |
used as far as possible (up to two-thirds of its length), and the function |
1445 |
returns a value of zero. In particular, if the substring offsets are not of |
returns a value of zero. If the substring offsets are not of interest, |
1446 |
interest, \fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and |
\fBpcre_exec()\fP may be called with \fIovector\fP passed as NULL and |
1447 |
\fIovecsize\fP as zero. However, if the pattern contains back references and |
\fIovecsize\fP as zero. However, if the pattern contains back references and |
1448 |
the \fIovector\fP is not big enough to remember the related substrings, PCRE |
the \fIovector\fP is not big enough to remember the related substrings, PCRE |
1449 |
has to get additional memory for use during matching. Thus it is usually |
has to get additional memory for use during matching. Thus it is usually |
1978 |
.rs |
.rs |
1979 |
.sp |
.sp |
1980 |
.nf |
.nf |
1981 |
Last updated: 11 September 2007 |
Last updated: 24 August 2008 |
1982 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2008 University of Cambridge. |
1983 |
.fi |
.fi |