35 |
<li><a name="TOC20" href="#SEC20">COMMENTS</a> |
<li><a name="TOC20" href="#SEC20">COMMENTS</a> |
36 |
<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a> |
<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a> |
37 |
<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a> |
<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a> |
38 |
<li><a name="TOC23" href="#SEC23">CALLOUTS</a> |
<li><a name="TOC23" href="#SEC23">ONIGURUMA SUBROUTINE SYNTAX</a> |
39 |
<li><a name="TOC24" href="#SEC24">BACKTRACKING CONTROL</a> |
<li><a name="TOC24" href="#SEC24">CALLOUTS</a> |
40 |
<li><a name="TOC25" href="#SEC25">SEE ALSO</a> |
<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a> |
41 |
<li><a name="TOC26" href="#SEC26">AUTHOR</a> |
<li><a name="TOC26" href="#SEC26">SEE ALSO</a> |
42 |
<li><a name="TOC27" href="#SEC27">REVISION</a> |
<li><a name="TOC27" href="#SEC27">AUTHOR</a> |
43 |
|
<li><a name="TOC28" href="#SEC28">REVISION</a> |
44 |
</ul> |
</ul> |
45 |
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br> |
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br> |
46 |
<P> |
<P> |
47 |
The syntax and semantics of the regular expressions that are supported by PCRE |
The syntax and semantics of the regular expressions that are supported by PCRE |
48 |
are described in detail below. There is a quick-reference syntax summary in the |
are described in detail below. There is a quick-reference syntax summary in the |
49 |
<a href="pcresyntax.html"><b>pcresyntax</b></a> |
<a href="pcresyntax.html"><b>pcresyntax</b></a> |
50 |
page. Perl's regular expressions are described in its own documentation, and |
page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE |
51 |
|
also supports some alternative regular expression syntax (which does not |
52 |
|
conflict with the Perl syntax) in order to provide some compatibility with |
53 |
|
regular expressions in Python, .NET, and Oniguruma. |
54 |
|
</P> |
55 |
|
<P> |
56 |
|
Perl's regular expressions are described in its own documentation, and |
57 |
regular expressions in general are covered in a number of books, some of which |
regular expressions in general are covered in a number of books, some of which |
58 |
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", |
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", |
59 |
published by O'Reilly, covers regular expressions in great detail. This |
published by O'Reilly, covers regular expressions in great detail. This |
319 |
<a href="#subpattern">parenthesized subpatterns.</a> |
<a href="#subpattern">parenthesized subpatterns.</a> |
320 |
</P> |
</P> |
321 |
<br><b> |
<br><b> |
322 |
|
Absolute and relative subroutine calls |
323 |
|
</b><br> |
324 |
|
<P> |
325 |
|
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or |
326 |
|
a number enclosed either in angle brackets or single quotes, is an alternative |
327 |
|
syntax for referencing a subpattern as a "subroutine". Details are discussed |
328 |
|
<a href="#onigurumasubroutines">later.</a> |
329 |
|
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
330 |
|
synonymous. The former is a back reference; the latter is a subroutine call. |
331 |
|
</P> |
332 |
|
<br><b> |
333 |
Generic character types |
Generic character types |
334 |
</b><br> |
</b><br> |
335 |
<P> |
<P> |
1249 |
</P> |
</P> |
1250 |
<P> |
<P> |
1251 |
The quantifier {0} is permitted, causing the expression to behave as if the |
The quantifier {0} is permitted, causing the expression to behave as if the |
1252 |
previous item and the quantifier were not present. |
previous item and the quantifier were not present. This may be useful for |
1253 |
|
subpatterns that are referenced as |
1254 |
|
<a href="#subpatternsassubroutines">subroutines</a> |
1255 |
|
from elsewhere in the pattern. Items other than subpatterns that have a {0} |
1256 |
|
quantifier are omitted from the compiled pattern. |
1257 |
</P> |
</P> |
1258 |
<P> |
<P> |
1259 |
For convenience, the three most common quantifiers have single-character |
For convenience, the three most common quantifiers have single-character |
2053 |
</pre> |
</pre> |
2054 |
It matches "abcabc". It does not match "abcABC" because the change of |
It matches "abcabc". It does not match "abcABC" because the change of |
2055 |
processing option does not affect the called subpattern. |
processing option does not affect the called subpattern. |
2056 |
|
<a name="onigurumasubroutines"></a></P> |
2057 |
|
<br><a name="SEC23" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br> |
2058 |
|
<P> |
2059 |
|
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or |
2060 |
|
a number enclosed either in angle brackets or single quotes, is an alternative |
2061 |
|
syntax for referencing a subpattern as a subroutine, possibly recursively. Here |
2062 |
|
are two of the examples used above, rewritten using this syntax: |
2063 |
|
<pre> |
2064 |
|
(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) ) |
2065 |
|
(sens|respons)e and \g'1'ibility |
2066 |
|
</pre> |
2067 |
|
PCRE supports an extension to Oniguruma: if a number is preceded by a |
2068 |
|
plus or a minus sign it is taken as a relative reference. For example: |
2069 |
|
<pre> |
2070 |
|
(abc)(?i:\g<-1>) |
2071 |
|
</pre> |
2072 |
|
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are <i>not</i> |
2073 |
|
synonymous. The former is a back reference; the latter is a subroutine call. |
2074 |
</P> |
</P> |
2075 |
<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br> |
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> |
2076 |
<P> |
<P> |
2077 |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl |
2078 |
code to be obeyed in the middle of matching a regular expression. This makes it |
code to be obeyed in the middle of matching a regular expression. This makes it |
2107 |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
<a href="pcrecallout.html"><b>pcrecallout</b></a> |
2108 |
documentation. |
documentation. |
2109 |
</P> |
</P> |
2110 |
<br><a name="SEC24" href="#TOC1">BACKTRACKING CONTROL</a><br> |
<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br> |
2111 |
<P> |
<P> |
2112 |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which |
2113 |
are described in the Perl documentation as "experimental and subject to change |
are described in the Perl documentation as "experimental and subject to change |
2116 |
remarks apply to the PCRE features described in this section. |
remarks apply to the PCRE features described in this section. |
2117 |
</P> |
</P> |
2118 |
<P> |
<P> |
2119 |
Since these verbs are specifically related to backtracking, they can be used |
Since these verbs are specifically related to backtracking, most of them can be |
2120 |
only when the pattern is to be matched using <b>pcre_exec()</b>, which uses a |
used only when the pattern is to be matched using <b>pcre_exec()</b>, which uses |
2121 |
backtracking algorithm. They cause an error if encountered by |
a backtracking algorithm. With the exception of (*FAIL), which behaves like a |
2122 |
|
failing negative assertion, they cause an error if encountered by |
2123 |
<b>pcre_dfa_exec()</b>. |
<b>pcre_dfa_exec()</b>. |
2124 |
</P> |
</P> |
2125 |
<P> |
<P> |
2223 |
second alternative and tries COND2, without backtracking into COND1. If (*THEN) |
second alternative and tries COND2, without backtracking into COND1. If (*THEN) |
2224 |
is used outside of any alternation, it acts exactly like (*PRUNE). |
is used outside of any alternation, it acts exactly like (*PRUNE). |
2225 |
</P> |
</P> |
2226 |
<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> |
<br><a name="SEC26" href="#TOC1">SEE ALSO</a><br> |
2227 |
<P> |
<P> |
2228 |
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3). |
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3). |
2229 |
</P> |
</P> |
2230 |
<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC27" href="#TOC1">AUTHOR</a><br> |
2231 |
<P> |
<P> |
2232 |
Philip Hazel |
Philip Hazel |
2233 |
<br> |
<br> |
2236 |
Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
2237 |
<br> |
<br> |
2238 |
</P> |
</P> |
2239 |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
2240 |
<P> |
<P> |
2241 |
Last updated: 17 September 2007 |
Last updated: 19 April 2008 |
2242 |
<br> |
<br> |
2243 |
Copyright © 1997-2007 University of Cambridge. |
Copyright © 1997-2008 University of Cambridge. |
2244 |
<br> |
<br> |
2245 |
<p> |
<p> |
2246 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |