140 |
.\" HREF |
.\" HREF |
141 |
\fBpcresample\fP |
\fBpcresample\fP |
142 |
.\" |
.\" |
143 |
documentation describes how to run it. |
documentation describes how to compile and run it. |
144 |
.P |
.P |
145 |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
A second matching function, \fBpcre_dfa_exec()\fP, which is not |
146 |
Perl-compatible, is also provided. This uses a different algorithm for the |
Perl-compatible, is also provided. This uses a different algorithm for the |
254 |
.\" </a> |
.\" </a> |
255 |
section on \fBpcre_exec()\fP options |
section on \fBpcre_exec()\fP options |
256 |
.\" |
.\" |
257 |
below. The choice of newline convention does not affect the interpretation of |
below. |
258 |
the \en or \er escape sequences. |
.P |
259 |
|
The choice of newline convention does not affect the interpretation of |
260 |
|
the \en or \er escape sequences, nor does it affect what \eR matches, which is |
261 |
|
controlled in a similar way, but by separate options. |
262 |
. |
. |
263 |
. |
. |
264 |
.SH MULTITHREADING |
.SH MULTITHREADING |
320 |
are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY. The |
are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY. The |
321 |
default should normally be the standard sequence for your operating system. |
default should normally be the standard sequence for your operating system. |
322 |
.sp |
.sp |
323 |
|
PCRE_CONFIG_BSR |
324 |
|
.sp |
325 |
|
The output is an integer whose value indicates what character sequences the \eR |
326 |
|
escape sequence matches by default. A value of 0 means that \eR matches any |
327 |
|
Unicode line ending sequence; a value of 1 means that \eR matches only CR, LF, |
328 |
|
or CRLF. The default can be overridden when a pattern is compiled or matched. |
329 |
|
.sp |
330 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
331 |
.sp |
.sp |
332 |
The output is an integer that contains the number of bytes used for internal |
The output is an integer that contains the number of bytes used for internal |
468 |
.\" |
.\" |
469 |
documentation. |
documentation. |
470 |
.sp |
.sp |
471 |
|
PCRE_BSR_ANYCRLF |
472 |
|
PCRE_BSR_UNICODE |
473 |
|
.sp |
474 |
|
These options (which are mutually exclusive) control what the \eR escape |
475 |
|
sequence matches. The choice is either to match only CR, LF, or CRLF, or to |
476 |
|
match any Unicode newline sequence. The default is specified when PCRE is |
477 |
|
built. It can be overridden from within the pattern, or by setting an option |
478 |
|
when a compiled pattern is matched. |
479 |
|
.sp |
480 |
PCRE_CASELESS |
PCRE_CASELESS |
481 |
.sp |
.sp |
482 |
If this bit is set, letters in the pattern match both upper and lower case |
If this bit is set, letters in the pattern match both upper and lower case |
549 |
the first newline in the subject string, though the matched text may continue |
the first newline in the subject string, though the matched text may continue |
550 |
over the newline. |
over the newline. |
551 |
.sp |
.sp |
552 |
|
PCRE_JAVASCRIPT_COMPAT |
553 |
|
.sp |
554 |
|
If this option is set, PCRE's behaviour is changed in some ways so that it is |
555 |
|
compatible with JavaScript rather than Perl. The changes are as follows: |
556 |
|
.P |
557 |
|
(1) A lone closing square bracket in a pattern causes a compile-time error, |
558 |
|
because this is illegal in JavaScript (by default it is treated as a data |
559 |
|
character). Thus, the pattern AB]CD becomes illegal when this option is set. |
560 |
|
.P |
561 |
|
(2) At run time, a back reference to an unset subpattern group matches an empty |
562 |
|
string (by default this causes the current matching path to fail). A pattern |
563 |
|
such as (\1)(a) succeeds when this option is set (assuming it can find an "a" |
564 |
|
in the subject), whereas it fails by default, for Perl compatibility. |
565 |
|
.sp |
566 |
PCRE_MULTILINE |
PCRE_MULTILINE |
567 |
.sp |
.sp |
568 |
By default, PCRE treats the subject string as consisting of a single line of |
By default, PCRE treats the subject string as consisting of a single line of |
686 |
9 nothing to repeat |
9 nothing to repeat |
687 |
10 [this code is not in use] |
10 [this code is not in use] |
688 |
11 internal error: unexpected repeat |
11 internal error: unexpected repeat |
689 |
12 unrecognized character after (? |
12 unrecognized character after (? or (?- |
690 |
13 POSIX named classes are supported only within a class |
13 POSIX named classes are supported only within a class |
691 |
14 missing ) |
14 missing ) |
692 |
15 reference to non-existent subpattern |
15 reference to non-existent subpattern |
694 |
17 unknown option bit(s) set |
17 unknown option bit(s) set |
695 |
18 missing ) after comment |
18 missing ) after comment |
696 |
19 [this code is not in use] |
19 [this code is not in use] |
697 |
20 regular expression too large |
20 regular expression is too large |
698 |
21 failed to get memory |
21 failed to get memory |
699 |
22 unmatched parentheses |
22 unmatched parentheses |
700 |
23 internal error: code overflow |
23 internal error: code overflow |
723 |
46 malformed \eP or \ep sequence |
46 malformed \eP or \ep sequence |
724 |
47 unknown property name after \eP or \ep |
47 unknown property name after \eP or \ep |
725 |
48 subpattern name is too long (maximum 32 characters) |
48 subpattern name is too long (maximum 32 characters) |
726 |
49 too many named subpatterns (maximum 10,000) |
49 too many named subpatterns (maximum 10000) |
727 |
50 [this code is not in use] |
50 [this code is not in use] |
728 |
51 octal value is greater than \e377 (not in UTF-8 mode) |
51 octal value is greater than \e377 (not in UTF-8 mode) |
729 |
52 internal error: overran compiling workspace |
52 internal error: overran compiling workspace |
730 |
53 internal error: previously-checked referenced subpattern not found |
53 internal error: previously-checked referenced subpattern not found |
731 |
54 DEFINE group contains more than one branch |
54 DEFINE group contains more than one branch |
732 |
55 repeating a DEFINE group is not allowed |
55 repeating a DEFINE group is not allowed |
733 |
56 inconsistent NEWLINE options" |
56 inconsistent NEWLINE options |
734 |
57 \eg is not followed by a braced name or an optionally braced |
57 \eg is not followed by a braced, angle-bracketed, or quoted |
735 |
non-zero number |
name/number or by a plain number |
736 |
58 (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number |
58 a numbered reference must not be zero |
737 |
|
59 (*VERB) with an argument is not supported |
738 |
|
60 (*VERB) not recognized |
739 |
|
61 number is too big |
740 |
|
62 subpattern name expected |
741 |
|
63 digit expected after (?+ |
742 |
|
64 ] is an invalid data character in JavaScript compatibility mode |
743 |
|
.sp |
744 |
|
The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may |
745 |
|
be used if the limits were changed when PCRE was built. |
746 |
. |
. |
747 |
. |
. |
748 |
.SH "STUDYING A PATTERN" |
.SH "STUDYING A PATTERN" |
941 |
PCRE_INFO_HASCRORLF |
PCRE_INFO_HASCRORLF |
942 |
.sp |
.sp |
943 |
Return 1 if the pattern contains any explicit matches for CR or LF characters, |
Return 1 if the pattern contains any explicit matches for CR or LF characters, |
944 |
otherwise 0. The fourth argument should point to an \fBint\fP variable. |
otherwise 0. The fourth argument should point to an \fBint\fP variable. An |
945 |
|
explicit match is either a literal CR or LF character, or \er or \en. |
946 |
.sp |
.sp |
947 |
PCRE_INFO_JCHANGED |
PCRE_INFO_JCHANGED |
948 |
.sp |
.sp |
949 |
Return 1 if the (?J) option setting is used in the pattern, otherwise 0. The |
Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise |
950 |
fourth argument should point to an \fBint\fP variable. The (?J) internal option |
0. The fourth argument should point to an \fBint\fP variable. (?J) and |
951 |
setting changes the local PCRE_DUPNAMES option. |
(?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
952 |
.sp |
.sp |
953 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
954 |
.sp |
.sp |
1246 |
to be anchored by virtue of its contents, it cannot be made unachored at |
to be anchored by virtue of its contents, it cannot be made unachored at |
1247 |
matching time. |
matching time. |
1248 |
.sp |
.sp |
1249 |
|
PCRE_BSR_ANYCRLF |
1250 |
|
PCRE_BSR_UNICODE |
1251 |
|
.sp |
1252 |
|
These options (which are mutually exclusive) control what the \eR escape |
1253 |
|
sequence matches. The choice is either to match only CR, LF, or CRLF, or to |
1254 |
|
match any Unicode newline sequence. These options override the choice that was |
1255 |
|
made or defaulted when the pattern was compiled. |
1256 |
|
.sp |
1257 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1258 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
1259 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
1269 |
.P |
.P |
1270 |
When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a |
When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a |
1271 |
match attempt for an unanchored pattern fails when the current position is at a |
match attempt for an unanchored pattern fails when the current position is at a |
1272 |
CRLF sequence, and the pattern contains no explicit matches for CR or NL |
CRLF sequence, and the pattern contains no explicit matches for CR or LF |
1273 |
characters, the match position is advanced by two characters instead of one, in |
characters, the match position is advanced by two characters instead of one, in |
1274 |
other words, to after the CRLF. |
other words, to after the CRLF. |
1275 |
.P |
.P |
1279 |
start, it skips both the CR and the LF before retrying. However, the pattern |
start, it skips both the CR and the LF before retrying. However, the pattern |
1280 |
[\er\en]A does match that string, because it contains an explicit CR or LF |
[\er\en]A does match that string, because it contains an explicit CR or LF |
1281 |
reference, and so advances only by one character after the first failure. |
reference, and so advances only by one character after the first failure. |
1282 |
Note than an explicit CR or LF reference occurs for negated character classes |
.P |
1283 |
such as [^X] because they can match CR or LF characters. |
An explicit match for CR of LF is either a literal appearance of one of those |
1284 |
|
characters, or one of the \er or \en escape sequences. Implicit matches such as |
1285 |
|
[^X] do not count, nor does \es (which includes CR and LF in the characters |
1286 |
|
that it matches). |
1287 |
.P |
.P |
1288 |
Notwithstanding the above, anomalous effects may still occur when CRLF is a |
Notwithstanding the above, anomalous effects may still occur when CRLF is a |
1289 |
valid newline sequence and explicit \er or \en escapes appear in the pattern. |
valid newline sequence and explicit \er or \en escapes appear in the pattern. |
1975 |
.rs |
.rs |
1976 |
.sp |
.sp |
1977 |
.nf |
.nf |
1978 |
Last updated: 21 August 2007 |
Last updated: 12 April 2008 |
1979 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2008 University of Cambridge. |
1980 |
.fi |
.fi |