20 |
.br |
.br |
21 |
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR," |
.B int pcre_exec(const pcre *\fIcode\fR, "const pcre_extra *\fIextra\fR," |
22 |
.ti +5n |
.ti +5n |
23 |
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIoptions\fR, |
.B "const char *\fIsubject\fR," int \fIlength\fR, int \fIstartoffset\fR, |
24 |
.ti +5n |
.ti +5n |
25 |
.B int *\fIovector\fR, int \fIovecsize\fR); |
.B int \fIoptions\fR, int *\fIovector\fR, int \fIovecsize\fR); |
26 |
.PP |
.PP |
27 |
.br |
.br |
28 |
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR, |
.B int pcre_copy_substring(const char *\fIsubject\fR, int *\fIovector\fR, |
249 |
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The |
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The |
250 |
pointer that is passed to \fBpcre_compile\fR is saved with the compiled |
pointer that is passed to \fBpcre_compile\fR is saved with the compiled |
251 |
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR |
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR |
252 |
and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and |
and \fBpcre_exec()\fR. Thus for any single pattern, compilation, studying and |
253 |
matching all happen in the same locale, but different patterns can be compiled |
matching all happen in the same locale, but different patterns can be compiled |
254 |
in different locales. It is the caller's responsibility to ensure that the |
in different locales. It is the caller's responsibility to ensure that the |
255 |
memory containing the tables remains available for as long as it is needed. |
memory containing the tables remains available for as long as it is needed. |
293 |
pattern has been studied, the result of the study should be passed in the |
pattern has been studied, the result of the study should be passed in the |
294 |
\fIextra\fR argument. Otherwise this must be NULL. |
\fIextra\fR argument. Otherwise this must be NULL. |
295 |
|
|
|
The subject string is passed as a pointer in \fIsubject\fR and a length in |
|
|
\fIlength\fR. Unlike the pattern string, it may contain binary zero characters. |
|
|
|
|
296 |
The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose |
The PCRE_ANCHORED option can be passed in the \fIoptions\fR argument, whose |
297 |
unused bits must be zero. However, if a pattern was compiled with |
unused bits must be zero. However, if a pattern was compiled with |
298 |
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it |
PCRE_ANCHORED, or turned out to be anchored by virtue of its contents, it |
313 |
it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never |
it. Setting this without PCRE_MULTILINE (at compile time) causes dollar never |
314 |
to match. |
to match. |
315 |
|
|
316 |
|
The subject string is passed as a pointer in \fIsubject\fR, a length in |
317 |
|
\fIlength\fR, and a starting offset in \fIstartoffset\fR. Unlike the pattern |
318 |
|
string, it may contain binary zero characters. When the starting offset is |
319 |
|
zero, the search for a match starts at the beginning of the subject, and this |
320 |
|
is by far the most common case. |
321 |
|
|
322 |
|
A non-zero starting offset is useful when searching for another match in the |
323 |
|
same subject by calling \fBpcre_exec()\fR again after a previous success. |
324 |
|
Setting \fIstartoffset\fR differs from just passing over a shortened string and |
325 |
|
setting PCRE_NOTBOL in the case of a pattern that begins with any kind of |
326 |
|
lookbehind. For example, consider the pattern |
327 |
|
|
328 |
|
\\Biss\\B |
329 |
|
|
330 |
|
which finds occurrences of "iss" in the middle of words. (\\B matches only if |
331 |
|
the current position in the subject is not a word boundary.) When applied to |
332 |
|
the string "Mississipi" the first call to \fBpcre_exec()\fR finds the first |
333 |
|
occurrence. If \fBpcre_exec()\fR is called again with just the remainder of the |
334 |
|
subject, namely "issipi", it does not match, because \\B is always false at the |
335 |
|
start of the subject, which is deemed to be a word boundary. However, if |
336 |
|
\fBpcre_exec()\fR is passed the entire string again, but with \fIstartoffset\fR |
337 |
|
set to 4, it finds the second occurrence of "iss" because it is able to look |
338 |
|
behind the starting point to discover that it is preceded by a letter. |
339 |
|
|
340 |
|
If a non-zero starting offset is passed when the pattern is anchored, one |
341 |
|
attempt to match at the given offset is tried. This can only succeed if the |
342 |
|
pattern does not require the match to be at the start of the subject. |
343 |
|
|
344 |
In general, a pattern matches a certain portion of the subject, and in |
In general, a pattern matches a certain portion of the subject, and in |
345 |
addition, further substrings from the subject may be picked out by parts of the |
addition, further substrings from the subject may be picked out by parts of the |
346 |
pattern. Following the usage in Jeffrey Friedl's book, this is called |
pattern. Following the usage in Jeffrey Friedl's book, this is called |
755 |
The \\A, \\Z, and \\z assertions differ from the traditional circumflex and |
The \\A, \\Z, and \\z assertions differ from the traditional circumflex and |
756 |
dollar (described below) in that they only ever match at the very start and end |
dollar (described below) in that they only ever match at the very start and end |
757 |
of the subject string, whatever options are set. They are not affected by the |
of the subject string, whatever options are set. They are not affected by the |
758 |
PCRE_NOTBOL or PCRE_NOTEOL options. The difference between \\Z and \\z is that |
PCRE_NOTBOL or PCRE_NOTEOL options. If the \fIstartoffset\fR argument of |
759 |
\\Z matches before a newline that is the last character of the string as well |
\fBpcre_exec()\fR is non-zero, \\A can never match. The difference between \\Z |
760 |
as at the end of the string, whereas \\z matches only at the end. |
and \\z is that \\Z matches before a newline that is the last character of the |
761 |
|
string as well as at the end of the string, whereas \\z matches only at the |
762 |
|
end. |
763 |
|
|
764 |
|
|
765 |
.SH CIRCUMFLEX AND DOLLAR |
.SH CIRCUMFLEX AND DOLLAR |
766 |
Outside a character class, in the default matching mode, the circumflex |
Outside a character class, in the default matching mode, the circumflex |
767 |
character is an assertion which is true only if the current matching point is |
character is an assertion which is true only if the current matching point is |
768 |
at the start of the subject string. Inside a character class, circumflex has an |
at the start of the subject string. If the \fIstartoffset\fR argument of |
769 |
entirely different meaning (see below). |
\fBpcre_exec()\fR is non-zero, circumflex can never match. Inside a character |
770 |
|
class, circumflex has an entirely different meaning (see below). |
771 |
|
|
772 |
Circumflex need not be the first character of the pattern if a number of |
Circumflex need not be the first character of the pattern if a number of |
773 |
alternatives are involved, but it should be the first thing in each alternative |
alternatives are involved, but it should be the first thing in each alternative |
794 |
addition to matching at the start and end of the subject string. For example, |
addition to matching at the start and end of the subject string. For example, |
795 |
the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode, |
the pattern /^abc$/ matches the subject string "def\\nabc" in multiline mode, |
796 |
but not otherwise. Consequently, patterns that are anchored in single line mode |
but not otherwise. Consequently, patterns that are anchored in single line mode |
797 |
because all branches start with "^" are not anchored in multiline mode. The |
because all branches start with "^" are not anchored in multiline mode, and a |
798 |
PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. |
match for circumflex is possible when the \fIstartoffset\fR argument of |
799 |
|
\fBpcre_exec()\fR is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if |
800 |
|
PCRE_MULTILINE is set. |
801 |
|
|
802 |
Note that the sequences \\A, \\Z, and \\z can be used to match the start and |
Note that the sequences \\A, \\Z, and \\z can be used to match the start and |
803 |
end of the subject in both modes, and if all branches of a pattern start with |
end of the subject in both modes, and if all branches of a pattern start with |
1249 |
preceded by "foo". |
preceded by "foo". |
1250 |
|
|
1251 |
Assertion subpatterns are not capturing subpatterns, and may not be repeated, |
Assertion subpatterns are not capturing subpatterns, and may not be repeated, |
1252 |
because it makes no sense to assert the same thing several times. If an |
because it makes no sense to assert the same thing several times. If any kind |
1253 |
assertion contains capturing subpatterns within it, these are always counted |
of assertion contains capturing subpatterns within it, these are counted for |
1254 |
for the purposes of numbering the capturing subpatterns in the whole pattern. |
the purposes of numbering the capturing subpatterns in the whole pattern. |
1255 |
Substring capturing is carried out for positive assertions, but it does not |
However, substring capturing is carried out only for positive assertions, |
1256 |
make sense for negative assertions. |
because it does not make sense for negative assertions. |
1257 |
|
|
1258 |
Assertions count towards the maximum of 200 parenthesized subpatterns. |
Assertions count towards the maximum of 200 parenthesized subpatterns. |
1259 |
|
|
1420 |
.br |
.br |
1421 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
1422 |
|
|
1423 |
|
Last updated: 10 June 1999 |
1424 |
|
.br |
1425 |
Copyright (c) 1997-1999 University of Cambridge. |
Copyright (c) 1997-1999 University of Cambridge. |