353 |
Return information about the first character of any matched |
Return information about the first character of any matched |
354 |
string, for a non-anchored pattern. If there is a fixed |
string, for a non-anchored pattern. If there is a fixed |
355 |
first character, e.g. from a pattern such as |
first character, e.g. from a pattern such as |
356 |
(cat|cow|coyote), then it is returned in the integer pointed |
(cat|cow|coyote), it is returned in the integer pointed to |
357 |
to by where. Otherwise, if either |
by where. Otherwise, if either |
358 |
|
|
359 |
(a) the pattern was compiled with the PCRE_MULTILINE option, |
(a) the pattern was compiled with the PCRE_MULTILINE option, |
360 |
and every branch starts with "^", or |
and every branch starts with "^", or |
363 |
PCRE_DOTALL is not set (if it were set, the pattern would be |
PCRE_DOTALL is not set (if it were set, the pattern would be |
364 |
anchored), |
anchored), |
365 |
|
|
366 |
then -1 is returned, indicating that the pattern matches |
-1 is returned, indicating that the pattern matches only at |
367 |
only at the start of a subject string or after any "\n" |
the start of a subject string or after any "\n" within the |
368 |
within the string. Otherwise -2 is returned. For anchored |
string. Otherwise -2 is returned. For anchored patterns, -2 |
369 |
patterns, -2 is returned. |
is returned. |
370 |
|
|
371 |
PCRE_INFO_FIRSTTABLE |
PCRE_INFO_FIRSTTABLE |
372 |
|
|
622 |
entire regular expression. This is the value returned by |
entire regular expression. This is the value returned by |
623 |
pcre_exec if it is greater than zero. If pcre_exec() |
pcre_exec if it is greater than zero. If pcre_exec() |
624 |
returned zero, indicating that it ran out of space in ovec- |
returned zero, indicating that it ran out of space in ovec- |
625 |
tor, then the value passed as stringcount should be the size |
tor, the value passed as stringcount should be the size of |
626 |
of the vector divided by three. |
the vector divided by three. |
627 |
|
|
628 |
The functions pcre_copy_substring() and pcre_get_substring() |
The functions pcre_copy_substring() and pcre_get_substring() |
629 |
extract a single substring, whose number is given as string- |
extract a single substring, whose number is given as string- |
739 |
"aba" against the pattern /^(a(b)?)+$/ sets $2 to the value |
"aba" against the pattern /^(a(b)?)+$/ sets $2 to the value |
740 |
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 |
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 |
741 |
unset. However, if the pattern is changed to |
unset. However, if the pattern is changed to |
742 |
/^(aa(b(b))?)+$/ then $2 (and $3) get set. |
/^(aa(b(b))?)+$/ then $2 (and $3) are set. |
743 |
|
|
744 |
In Perl 5.004 $2 is set in both cases, and that is also true |
In Perl 5.004 $2 is set in both cases, and that is also true |
745 |
of PCRE. If in the future Perl changes to a consistent state |
of PCRE. If in the future Perl changes to a consistent state |
1056 |
Outside a character class, a dot in the pattern matches any |
Outside a character class, a dot in the pattern matches any |
1057 |
one character in the subject, including a non-printing char- |
one character in the subject, including a non-printing char- |
1058 |
acter, but not (by default) newline. If the PCRE_DOTALL |
acter, but not (by default) newline. If the PCRE_DOTALL |
1059 |
option is set, then dots match newlines as well. The han- |
option is set, dots match newlines as well. The handling of |
1060 |
dling of dot is entirely independent of the handling of cir- |
dot is entirely independent of the handling of circumflex |
1061 |
cumflex and dollar, the only relationship being that they |
and dollar, the only relationship being that they both |
1062 |
both involve newline characters. Dot has no special meaning |
involve newline characters. Dot has no special meaning in a |
1063 |
in a character class. |
character class. |
1064 |
|
|
1065 |
|
|
1066 |
|
|
1406 |
fails, because it matches the entire string due to the |
fails, because it matches the entire string due to the |
1407 |
greediness of the .* item. |
greediness of the .* item. |
1408 |
|
|
1409 |
However, if a quantifier is followed by a question mark, |
However, if a quantifier is followed by a question mark, it |
1410 |
then it ceases to be greedy, and instead matches the minimum |
ceases to be greedy, and instead matches the minimum number |
1411 |
number of times possible, so the pattern |
of times possible, so the pattern |
1412 |
|
|
1413 |
/\*.*?\*/ |
/\*.*?\*/ |
1414 |
|
|
1425 |
that is the only way the rest of the pattern matches. |
that is the only way the rest of the pattern matches. |
1426 |
|
|
1427 |
If the PCRE_UNGREEDY option is set (an option which is not |
If the PCRE_UNGREEDY option is set (an option which is not |
1428 |
available in Perl) then the quantifiers are not greedy by |
available in Perl), the quantifiers are not greedy by |
1429 |
default, but individual ones can be made greedy by following |
default, but individual ones can be made greedy by following |
1430 |
them with a question mark. In other words, it inverts the |
them with a question mark. In other words, it inverts the |
1431 |
default behaviour. |
default behaviour. |
1437 |
|
|
1438 |
If a pattern starts with .* or .{0,} and the PCRE_DOTALL |
If a pattern starts with .* or .{0,} and the PCRE_DOTALL |
1439 |
option (equivalent to Perl's /s) is set, thus allowing the . |
option (equivalent to Perl's /s) is set, thus allowing the . |
1440 |
to match newlines, then the pattern is implicitly anchored, |
to match newlines, the pattern is implicitly anchored, |
1441 |
because whatever follows will be tried against every charac- |
because whatever follows will be tried against every charac- |
1442 |
ter position in the subject string, so there is no point in |
ter position in the subject string, so there is no point in |
1443 |
retrying the overall match at any position after the first. |
retrying the overall match at any position after the first. |
1490 |
|
|
1491 |
matches "sense and sensibility" and "response and responsi- |
matches "sense and sensibility" and "response and responsi- |
1492 |
bility", but not "sense and responsibility". If caseful |
bility", but not "sense and responsibility". If caseful |
1493 |
matching is in force at the time of the back reference, then |
matching is in force at the time of the back reference, the |
1494 |
the case of letters is relevant. For example, |
case of letters is relevant. For example, |
1495 |
|
|
1496 |
((?i)rah)\s+\1 |
((?i)rah)\s+\1 |
1497 |
|
|
1501 |
|
|
1502 |
There may be more than one back reference to the same sub- |
There may be more than one back reference to the same sub- |
1503 |
pattern. If a subpattern has not actually been used in a |
pattern. If a subpattern has not actually been used in a |
1504 |
particular match, then any back references to it always |
particular match, any back references to it always fail. For |
1505 |
fail. For example, the pattern |
example, the pattern |
1506 |
|
|
1507 |
(a|(bc))\2 |
(a|(bc))\2 |
1508 |
|
|
1510 |
Because there may be up to 99 back references, all digits |
Because there may be up to 99 back references, all digits |
1511 |
following the backslash are taken as part of a potential |
following the backslash are taken as part of a potential |
1512 |
back reference number. If the pattern continues with a digit |
back reference number. If the pattern continues with a digit |
1513 |
character, then some delimiter must be used to terminate the |
character, some delimiter must be used to terminate the back |
1514 |
back reference. If the PCRE_EXTENDED option is set, this can |
reference. If the PCRE_EXTENDED option is set, this can be |
1515 |
be whitespace. Otherwise an empty comment can be used. |
whitespace. Otherwise an empty comment can be used. |
1516 |
|
|
1517 |
A back reference that occurs inside the parentheses to which |
A back reference that occurs inside the parentheses to which |
1518 |
it refers fails when the subpattern is first used, so, for |
it refers fails when the subpattern is first used, so, for |
1612 |
matches "foo" preceded by three digits that are not "999". |
matches "foo" preceded by three digits that are not "999". |
1613 |
Notice that each of the assertions is applied independently |
Notice that each of the assertions is applied independently |
1614 |
at the same point in the subject string. First there is a |
at the same point in the subject string. First there is a |
1615 |
check that the previous three characters are all digits, |
check that the previous three characters are all digits, and |
1616 |
then there is a check that the same three characters are not |
then there is a check that the same three characters are not |
1617 |
"999". This pattern does not match "foo" preceded by six |
"999". This pattern does not match "foo" preceded by six |
1618 |
characters, the first of which are digits and the last three |
characters, the first of which are digits and the last three |
1713 |
|
|
1714 |
^.*abcd$ |
^.*abcd$ |
1715 |
|
|
1716 |
then the initial .* matches the entire string at first, but |
the initial .* matches the entire string at first, but when |
1717 |
when this fails (because there is no following "a"), it |
this fails (because there is no following "a"), it back- |
1718 |
backtracks to match all but the last character, then all but |
tracks to match all but the last character, then all but the |
1719 |
the last two characters, and so on. Once again the search |
last two characters, and so on. Once again the search for |
1720 |
for "a" covers the entire string, from right to left, so we |
"a" covers the entire string, from right to left, so we are |
1721 |
are no better off. However, if the pattern is written as |
no better off. However, if the pattern is written as |
1722 |
|
|
1723 |
^(?>.*)(?<=abcd) |
^(?>.*)(?<=abcd) |
1724 |
|
|
1725 |
then there can be no backtracking for the .* item; it can |
there can be no backtracking for the .* item; it can match |
1726 |
match only the entire string. The subsequent lookbehind |
only the entire string. The subsequent lookbehind assertion |
1727 |
assertion does a single test on the last four characters. If |
does a single test on the last four characters. If it fails, |
1728 |
it fails, the match fails immediately. For long strings, |
the match fails immediately. For long strings, this approach |
1729 |
this approach makes a significant difference to the process- |
makes a significant difference to the processing time. |
|
ing time. |
|
1730 |
|
|
1731 |
When a pattern contains an unlimited repeat inside a subpat- |
When a pattern contains an unlimited repeat inside a subpat- |
1732 |
tern that can itself be repeated an unlimited number of |
tern that can itself be repeated an unlimited number of |
1776 |
error occurs. |
error occurs. |
1777 |
|
|
1778 |
There are two kinds of condition. If the text between the |
There are two kinds of condition. If the text between the |
1779 |
parentheses consists of a sequence of digits, then the |
parentheses consists of a sequence of digits, the condition |
1780 |
condition is satisfied if the capturing subpattern of that |
is satisfied if the capturing subpattern of that number has |
1781 |
number has previously matched. Consider the following pat- |
previously matched. Consider the following pattern, which |
1782 |
tern, which contains non-significant white space to make it |
contains non-significant white space to make it more read- |
1783 |
more readable (assume the PCRE_EXTENDED option) and to |
able (assume the PCRE_EXTENDED option) and to divide it into |
1784 |
divide it into three parts for ease of discussion: |
three parts for ease of discussion: |
1785 |
|
|
1786 |
( \( )? [^()]+ (?(1) \) ) |
( \( )? [^()]+ (?(1) \) ) |
1787 |
|
|
1887 |
|
|
1888 |
\( ( ( (?>[^()]+) | (?R) )* ) \) |
\( ( ( (?>[^()]+) | (?R) )* ) \) |
1889 |
^ ^ |
^ ^ |
1890 |
^ ^ then the string they capture |
^ ^ the string they capture is |
1891 |
is "ab(cd)ef", the contents of the top level parentheses. If |
"ab(cd)ef", the contents of the top level parentheses. If |
1892 |
there are more than 15 capturing parentheses in a pattern, |
there are more than 15 capturing parentheses in a pattern, |
1893 |
PCRE has to obtain extra memory to store data during a |
PCRE has to obtain extra memory to store data during a |
1894 |
recursion, which it does by using pcre_malloc, freeing it |
recursion, which it does by using pcre_malloc, freeing it |