1315 |
.sp |
.sp |
1316 |
/(?|(abc)|(def))\e1/ |
/(?|(abc)|(def))\e1/ |
1317 |
.sp |
.sp |
1318 |
In contrast, a recursive or "subroutine" call to a numbered subpattern always |
In contrast, a subroutine call to a numbered subpattern always refers to the |
1319 |
refers to the first one in the pattern with the given number. The following |
first one in the pattern with the given number. The following pattern matches |
1320 |
pattern matches "abcabc" or "defabc": |
"abcabc" or "defabc": |
1321 |
.sp |
.sp |
1322 |
/(?|(abc)|(def))(?1)/ |
/(?|(abc)|(def))(?1)/ |
1323 |
.sp |
.sp |
1434 |
a character class |
a character class |
1435 |
a back reference (see next section) |
a back reference (see next section) |
1436 |
a parenthesized subpattern (including assertions) |
a parenthesized subpattern (including assertions) |
1437 |
a recursive or "subroutine" call to a subpattern |
a subroutine call to a subpattern (recursive or otherwise) |
1438 |
.sp |
.sp |
1439 |
The general repetition quantifier specifies a minimum and maximum number of |
The general repetition quantifier specifies a minimum and maximum number of |
1440 |
permitted matches, by giving the two numbers in curly brackets (braces), |
permitted matches, by giving the two numbers in curly brackets (braces), |
2123 |
name DEFINE, the condition is always false. In this case, there may be only one |
name DEFINE, the condition is always false. In this case, there may be only one |
2124 |
alternative in the subpattern. It is always skipped if control reaches this |
alternative in the subpattern. It is always skipped if control reaches this |
2125 |
point in the pattern; the idea of DEFINE is that it can be used to define |
point in the pattern; the idea of DEFINE is that it can be used to define |
2126 |
"subroutines" that can be referenced from elsewhere. (The use of |
subroutines that can be referenced from elsewhere. (The use of |
2127 |
.\" HTML <a href="#subpatternsassubroutines"> |
.\" HTML <a href="#subpatternsassubroutines"> |
2128 |
.\" </a> |
.\" </a> |
2129 |
"subroutines" |
subroutines |
2130 |
.\" |
.\" |
2131 |
is described below.) For example, a pattern to match an IPv4 address such as |
is described below.) For example, a pattern to match an IPv4 address such as |
2132 |
"192.168.23.245" could be written like this (ignore whitespace and line |
"192.168.23.245" could be written like this (ignore whitespace and line |
2221 |
this kind of recursion was subsequently introduced into Perl at release 5.10. |
this kind of recursion was subsequently introduced into Perl at release 5.10. |
2222 |
.P |
.P |
2223 |
A special item that consists of (? followed by a number greater than zero and a |
A special item that consists of (? followed by a number greater than zero and a |
2224 |
closing parenthesis is a recursive call of the subpattern of the given number, |
closing parenthesis is a recursive subroutine call of the subpattern of the |
2225 |
provided that it occurs inside that subpattern. (If not, it is a |
given number, provided that it occurs inside that subpattern. (If not, it is a |
2226 |
.\" HTML <a href="#subpatternsassubroutines"> |
.\" HTML <a href="#subpatternsassubroutines"> |
2227 |
.\" </a> |
.\" </a> |
2228 |
"subroutine" |
non-recursive subroutine |
2229 |
.\" |
.\" |
2230 |
call, which is described in the next section.) The special item (?R) or (?0) is |
call, which is described in the next section.) The special item (?R) or (?0) is |
2231 |
a recursive call of the entire regular expression. |
a recursive call of the entire regular expression. |
2260 |
reference is not inside the parentheses that are referenced. They are always |
reference is not inside the parentheses that are referenced. They are always |
2261 |
.\" HTML <a href="#subpatternsassubroutines"> |
.\" HTML <a href="#subpatternsassubroutines"> |
2262 |
.\" </a> |
.\" </a> |
2263 |
"subroutine" |
non-recursive subroutine |
2264 |
.\" |
.\" |
2265 |
calls, as described in the next section. |
calls, as described in the next section. |
2266 |
.P |
.P |
2393 |
.SH "SUBPATTERNS AS SUBROUTINES" |
.SH "SUBPATTERNS AS SUBROUTINES" |
2394 |
.rs |
.rs |
2395 |
.sp |
.sp |
2396 |
If the syntax for a recursive subpattern reference (either by number or by |
If the syntax for a recursive subpattern call (either by number or by |
2397 |
name) is used outside the parentheses to which it refers, it operates like a |
name) is used outside the parentheses to which it refers, it operates like a |
2398 |
subroutine in a programming language. The "called" subpattern may be defined |
subroutine in a programming language. The called subpattern may be defined |
2399 |
before or after the reference. A numbered reference can be absolute or |
before or after the reference. A numbered reference can be absolute or |
2400 |
relative, as in these examples: |
relative, as in these examples: |
2401 |
.sp |
.sp |
2415 |
is used, it does match "sense and responsibility" as well as the other two |
is used, it does match "sense and responsibility" as well as the other two |
2416 |
strings. Another example is given in the discussion of DEFINE above. |
strings. Another example is given in the discussion of DEFINE above. |
2417 |
.P |
.P |
2418 |
Like recursive subpatterns, a subroutine call is always treated as an atomic |
All subroutine calls, whether recursive or not, are always treated as atomic |
2419 |
group. That is, once it has matched some of the subject string, it is never |
groups. That is, once a subroutine has matched some of the subject string, it |
2420 |
re-entered, even if it contains untried alternatives and there is a subsequent |
is never re-entered, even if it contains untried alternatives and there is a |
2421 |
matching failure. Any capturing parentheses that are set during the subroutine |
subsequent matching failure. Any capturing parentheses that are set during the |
2422 |
call revert to their previous values afterwards. |
subroutine call revert to their previous values afterwards. |
2423 |
.P |
.P |
2424 |
When a subpattern is used as a subroutine, processing options such as |
Processing options such as case-independence are fixed when a subpattern is |
2425 |
case-independence are fixed when the subpattern is defined. They cannot be |
defined, so if it is used as a subroutine, such options cannot be changed for |
2426 |
changed for different calls. For example, consider this pattern: |
different calls. For example, consider this pattern: |
2427 |
.sp |
.sp |
2428 |
(abc)(?i:(?-1)) |
(abc)(?i:(?-1)) |
2429 |
.sp |
.sp |
2504 |
failing negative assertion, they cause an error if encountered by |
failing negative assertion, they cause an error if encountered by |
2505 |
\fBpcre_dfa_exec()\fP. |
\fBpcre_dfa_exec()\fP. |
2506 |
.P |
.P |
2507 |
If any of these verbs are used in an assertion or subroutine subpattern |
If any of these verbs are used in an assertion or in a subpattern that is |
2508 |
(including recursive subpatterns), their effect is confined to that subpattern; |
called as a subroutine (whether or not recursively), their effect is confined |
2509 |
it does not extend to the surrounding pattern, with one exception: a *MARK that |
to that subpattern; it does not extend to the surrounding pattern, with one |
2510 |
is encountered in a positive assertion \fIis\fP passed back (compare capturing |
exception: a *MARK that is encountered in a positive assertion \fIis\fP passed |
2511 |
parentheses in assertions). Note that such subpatterns are processed as |
back (compare capturing parentheses in assertions). Note that such subpatterns |
2512 |
anchored at the point where they are tested. |
are processed as anchored at the point where they are tested. Note also that |
2513 |
|
Perl's treatment of subroutines is different in some cases. |
2514 |
.P |
.P |
2515 |
The new verbs make use of what was previously invalid syntax: an opening |
The new verbs make use of what was previously invalid syntax: an opening |
2516 |
parenthesis followed by an asterisk. They are generally of the form |
parenthesis followed by an asterisk. They are generally of the form |
2517 |
(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour, |
(*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour, |
2518 |
depending on whether or not an argument is present. An name is a sequence of |
depending on whether or not an argument is present. A name is any sequence of |
2519 |
letters, digits, and underscores. If the name is empty, that is, if the closing |
characters that does not include a closing parenthesis. If the name is empty, |
2520 |
parenthesis immediately follows the colon, the effect is as if the colon were |
that is, if the closing parenthesis immediately follows the colon, the effect |
2521 |
not there. Any number of these verbs may occur in a pattern. |
is as if the colon were not there. Any number of these verbs may occur in a |
2522 |
|
pattern. |
2523 |
.P |
.P |
2524 |
PCRE contains some optimizations that are used to speed up matching by running |
PCRE contains some optimizations that are used to speed up matching by running |
2525 |
some checks at the start of each match attempt. For example, it may know the |
some checks at the start of each match attempt. For example, it may know the |
2540 |
(*ACCEPT) |
(*ACCEPT) |
2541 |
.sp |
.sp |
2542 |
This verb causes the match to end successfully, skipping the remainder of the |
This verb causes the match to end successfully, skipping the remainder of the |
2543 |
pattern. When inside a recursion, only the innermost pattern is ended |
pattern. However, when it is inside a subpattern that is called as a |
2544 |
immediately. If (*ACCEPT) is inside capturing parentheses, the data so far is |
subroutine, only that subpattern is ended successfully. Matching then continues |
2545 |
captured. (This feature was added to PCRE at release 8.00.) For example: |
at the outer level. If (*ACCEPT) is inside capturing parentheses, the data so |
2546 |
|
far is captured. For example: |
2547 |
.sp |
.sp |
2548 |
A((?:A|B(*ACCEPT)|C)D) |
A((?:A|B(*ACCEPT)|C)D) |
2549 |
.sp |
.sp |
2552 |
.sp |
.sp |
2553 |
(*FAIL) or (*F) |
(*FAIL) or (*F) |
2554 |
.sp |
.sp |
2555 |
This verb causes the match to fail, forcing backtracking to occur. It is |
This verb causes a matching failure, forcing backtracking to occur. It is |
2556 |
equivalent to (?!) but easier to read. The Perl documentation notes that it is |
equivalent to (?!) but easier to read. The Perl documentation notes that it is |
2557 |
probably useful only when combined with (?{}) or (??{}). Those are, of course, |
probably useful only when combined with (?{}) or (??{}). Those are, of course, |
2558 |
Perl features that are not present in PCRE. The nearest equivalent is the |
Perl features that are not present in PCRE. The nearest equivalent is the |
2605 |
.P |
.P |
2606 |
If (*MARK) is encountered in a positive assertion, its name is recorded and |
If (*MARK) is encountered in a positive assertion, its name is recorded and |
2607 |
passed back if it is the last-encountered. This does not happen for negative |
passed back if it is the last-encountered. This does not happen for negative |
2608 |
assetions. |
assertions. |
2609 |
.P |
.P |
2610 |
A name may also be returned after a failed match if the final path through the |
A name may also be returned after a failed match if the final path through the |
2611 |
pattern involves (*MARK). However, unless (*MARK) used in conjunction with |
pattern involves (*MARK). However, unless (*MARK) used in conjunction with |
2719 |
searched for the most recent (*MARK) that has the same name. If one is found, |
searched for the most recent (*MARK) that has the same name. If one is found, |
2720 |
the "bumpalong" advance is to the subject position that corresponds to that |
the "bumpalong" advance is to the subject position that corresponds to that |
2721 |
(*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a |
(*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a |
2722 |
matching name is found, normal "bumpalong" of one character happens (the |
matching name is found, normal "bumpalong" of one character happens (that is, |
2723 |
(*SKIP) is ignored). |
the (*SKIP) is ignored). |
2724 |
.sp |
.sp |
2725 |
(*THEN) or (*THEN:NAME) |
(*THEN) or (*THEN:NAME) |
2726 |
.sp |
.sp |
2727 |
This verb causes a skip to the next alternation in the innermost enclosing |
This verb causes a skip to the next innermost alternative if the rest of the |
2728 |
group if the rest of the pattern does not match. That is, it cancels pending |
pattern does not match. That is, it cancels pending backtracking, but only |
2729 |
backtracking, but only within the current alternation. Its name comes from the |
within the current alternative. Its name comes from the observation that it can |
2730 |
observation that it can be used for a pattern-based if-then-else block: |
be used for a pattern-based if-then-else block: |
2731 |
.sp |
.sp |
2732 |
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... |
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... |
2733 |
.sp |
.sp |
2734 |
If the COND1 pattern matches, FOO is tried (and possibly further items after |
If the COND1 pattern matches, FOO is tried (and possibly further items after |
2735 |
the end of the group if FOO succeeds); on failure the matcher skips to the |
the end of the group if FOO succeeds); on failure, the matcher skips to the |
2736 |
second alternative and tries COND2, without backtracking into COND1. The |
second alternative and tries COND2, without backtracking into COND1. The |
2737 |
behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN) if the |
behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN) if the |
2738 |
overall match fails. If (*THEN) is not directly inside an alternation, it acts |
overall match fails. If (*THEN) is not inside an alternation, it acts like |
2739 |
like (*PRUNE). |
(*PRUNE). |
|
. |
|
|
.P |
|
|
The above verbs provide four different "strengths" of control when subsequent |
|
|
matching fails. (*THEN) is the weakest, carrying on the match at the next |
|
|
alternation. (*PRUNE) comes next, failing the match at the current starting |
|
|
position, but allowing an advance to the next character (for an unanchored |
|
|
pattern). (*SKIP) is similar, except that the advance may be more than one |
|
|
character. (*COMMIT) is the strongest, causing the entire match to fail. |
|
2740 |
.P |
.P |
2741 |
If more than one is present in a pattern, the "stongest" one wins. For example, |
Note that a subpattern that does not contain a | character is just a part of |
2742 |
consider this pattern, where A, B, etc. are complex pattern fragments: |
the enclosing alternative; it is not a nested alternation with only one |
2743 |
|
alternative. The effect of (*THEN) extends beyond such a subpattern to the |
2744 |
|
enclosing alternative. Consider this pattern, where A, B, etc. are complex |
2745 |
|
pattern fragments that do not contain any | characters at this level: |
2746 |
|
.sp |
2747 |
|
A (B(*THEN)C) | D |
2748 |
|
.sp |
2749 |
|
If A and B are matched, but there is a failure in C, matching does not |
2750 |
|
backtrack into A; instead it moves to the next alternative, that is, D. |
2751 |
|
However, if the subpattern containing (*THEN) is given an alternative, it |
2752 |
|
behaves differently: |
2753 |
|
.sp |
2754 |
|
A (B(*THEN)C | (*FAIL)) | D |
2755 |
|
.sp |
2756 |
|
The effect of (*THEN) is now confined to the inner subpattern. After a failure |
2757 |
|
in C, matching moves to (*FAIL), which causes the whole subpattern to fail |
2758 |
|
because there are no more alternatives to try. In this case, matching does now |
2759 |
|
backtrack into A. |
2760 |
|
.P |
2761 |
|
Note also that a conditional subpattern is not considered as having two |
2762 |
|
alternatives, because only one is ever used. In other words, the | character in |
2763 |
|
a conditional subpattern has a different meaning. Ignoring white space, |
2764 |
|
consider: |
2765 |
|
.sp |
2766 |
|
^.*? (?(?=a) a | b(*THEN)c ) |
2767 |
|
.sp |
2768 |
|
If the subject is "ba", this pattern does not match. Because .*? is ungreedy, |
2769 |
|
it initially matches zero characters. The condition (?=a) then fails, the |
2770 |
|
character "b" is matched, but "c" is not. At this point, matching does not |
2771 |
|
backtrack to .*? as might perhaps be expected from the presence of the | |
2772 |
|
character. The conditional subpattern is part of the single alternative that |
2773 |
|
comprises the whole pattern, and so the match fails. (If there was a backtrack |
2774 |
|
into .*?, allowing it to match "b", the match would succeed.) |
2775 |
|
.P |
2776 |
|
The verbs just described provide four different "strengths" of control when |
2777 |
|
subsequent matching fails. (*THEN) is the weakest, carrying on the match at the |
2778 |
|
next alternative. (*PRUNE) comes next, failing the match at the current |
2779 |
|
starting position, but allowing an advance to the next character (for an |
2780 |
|
unanchored pattern). (*SKIP) is similar, except that the advance may be more |
2781 |
|
than one character. (*COMMIT) is the strongest, causing the entire match to |
2782 |
|
fail. |
2783 |
|
.P |
2784 |
|
If more than one such verb is present in a pattern, the "strongest" one wins. |
2785 |
|
For example, consider this pattern, where A, B, etc. are complex pattern |
2786 |
|
fragments: |
2787 |
.sp |
.sp |
2788 |
(A(*COMMIT)B(*THEN)C|D) |
(A(*COMMIT)B(*THEN)C|D) |
2789 |
.sp |
.sp |
2790 |
Once A has matched, PCRE is committed to this match, at the current starting |
Once A has matched, PCRE is committed to this match, at the current starting |
2791 |
position. If subsequently B matches, but C does not, the normal (*THEN) action |
position. If subsequently B matches, but C does not, the normal (*THEN) action |
2792 |
of trying the next alternation (that is, D) does not happen because (*COMMIT) |
of trying the next alternative (that is, D) does not happen because (*COMMIT) |
2793 |
overrides. |
overrides. |
2794 |
. |
. |
2795 |
. |
. |
2814 |
.rs |
.rs |
2815 |
.sp |
.sp |
2816 |
.nf |
.nf |
2817 |
Last updated: 24 August 2011 |
Last updated: 04 October 2011 |
2818 |
Copyright (c) 1997-2011 University of Cambridge. |
Copyright (c) 1997-2011 University of Cambridge. |
2819 |
.fi |
.fi |