/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 211 by ph10, Thu Aug 9 09:52:43 2007 UTC revision 247 by ph10, Mon Sep 17 09:38:32 2007 UTC
# Line 14  man page, in case the conversion went wr Line 14  man page, in case the conversion went wr
14  <br>  <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>  <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>
17  <li><a name="TOC2" href="#SEC2">CHARACTERS AND METACHARACTERS</a>  <li><a name="TOC2" href="#SEC2">NEWLINE CONVENTIONS</a>
18  <li><a name="TOC3" href="#SEC3">BACKSLASH</a>  <li><a name="TOC3" href="#SEC3">CHARACTERS AND METACHARACTERS</a>
19  <li><a name="TOC4" href="#SEC4">CIRCUMFLEX AND DOLLAR</a>  <li><a name="TOC4" href="#SEC4">BACKSLASH</a>
20  <li><a name="TOC5" href="#SEC5">FULL STOP (PERIOD, DOT)</a>  <li><a name="TOC5" href="#SEC5">CIRCUMFLEX AND DOLLAR</a>
21  <li><a name="TOC6" href="#SEC6">MATCHING A SINGLE BYTE</a>  <li><a name="TOC6" href="#SEC6">FULL STOP (PERIOD, DOT)</a>
22  <li><a name="TOC7" href="#SEC7">SQUARE BRACKETS AND CHARACTER CLASSES</a>  <li><a name="TOC7" href="#SEC7">MATCHING A SINGLE BYTE</a>
23  <li><a name="TOC8" href="#SEC8">POSIX CHARACTER CLASSES</a>  <li><a name="TOC8" href="#SEC8">SQUARE BRACKETS AND CHARACTER CLASSES</a>
24  <li><a name="TOC9" href="#SEC9">VERTICAL BAR</a>  <li><a name="TOC9" href="#SEC9">POSIX CHARACTER CLASSES</a>
25  <li><a name="TOC10" href="#SEC10">INTERNAL OPTION SETTING</a>  <li><a name="TOC10" href="#SEC10">VERTICAL BAR</a>
26  <li><a name="TOC11" href="#SEC11">SUBPATTERNS</a>  <li><a name="TOC11" href="#SEC11">INTERNAL OPTION SETTING</a>
27  <li><a name="TOC12" href="#SEC12">DUPLICATE SUBPATTERN NUMBERS</a>  <li><a name="TOC12" href="#SEC12">SUBPATTERNS</a>
28  <li><a name="TOC13" href="#SEC13">NAMED SUBPATTERNS</a>  <li><a name="TOC13" href="#SEC13">DUPLICATE SUBPATTERN NUMBERS</a>
29  <li><a name="TOC14" href="#SEC14">REPETITION</a>  <li><a name="TOC14" href="#SEC14">NAMED SUBPATTERNS</a>
30  <li><a name="TOC15" href="#SEC15">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>  <li><a name="TOC15" href="#SEC15">REPETITION</a>
31  <li><a name="TOC16" href="#SEC16">BACK REFERENCES</a>  <li><a name="TOC16" href="#SEC16">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
32  <li><a name="TOC17" href="#SEC17">ASSERTIONS</a>  <li><a name="TOC17" href="#SEC17">BACK REFERENCES</a>
33  <li><a name="TOC18" href="#SEC18">CONDITIONAL SUBPATTERNS</a>  <li><a name="TOC18" href="#SEC18">ASSERTIONS</a>
34  <li><a name="TOC19" href="#SEC19">COMMENTS</a>  <li><a name="TOC19" href="#SEC19">CONDITIONAL SUBPATTERNS</a>
35  <li><a name="TOC20" href="#SEC20">RECURSIVE PATTERNS</a>  <li><a name="TOC20" href="#SEC20">COMMENTS</a>
36  <li><a name="TOC21" href="#SEC21">SUBPATTERNS AS SUBROUTINES</a>  <li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a>
37  <li><a name="TOC22" href="#SEC22">CALLOUTS</a>  <li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a>
38  <li><a name="TOC23" href="#SEC23">BACTRACKING CONTROL</a>  <li><a name="TOC23" href="#SEC23">CALLOUTS</a>
39  <li><a name="TOC24" href="#SEC24">SEE ALSO</a>  <li><a name="TOC24" href="#SEC24">BACKTRACKING CONTROL</a>
40  <li><a name="TOC25" href="#SEC25">AUTHOR</a>  <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
41  <li><a name="TOC26" href="#SEC26">REVISION</a>  <li><a name="TOC26" href="#SEC26">AUTHOR</a>
42    <li><a name="TOC27" href="#SEC27">REVISION</a>
43  </ul>  </ul>
44  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
45  <P>  <P>
# Line 74  discussed in the Line 75  discussed in the
75  <a href="pcrematching.html"><b>pcrematching</b></a>  <a href="pcrematching.html"><b>pcrematching</b></a>
76  page.  page.
77  </P>  </P>
78  <br><a name="SEC2" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>  <br><a name="SEC2" href="#TOC1">NEWLINE CONVENTIONS</a><br>
79    <P>
80    PCRE supports five different conventions for indicating line breaks in
81    strings: a single CR (carriage return) character, a single LF (linefeed)
82    character, the two-character sequence CRLF, any of the three preceding, or any
83    Unicode newline sequence. The
84    <a href="pcreapi.html"><b>pcreapi</b></a>
85    page has
86    <a href="pcreapi.html#newlines">further discussion</a>
87    about newlines, and shows how to set the newline convention in the
88    <i>options</i> arguments for the compiling and matching functions.
89    </P>
90    <P>
91    It is also possible to specify a newline convention by starting a pattern
92    string with one of the following five sequences:
93    <pre>
94      (*CR)        carriage return
95      (*LF)        linefeed
96      (*CRLF)      carriage return, followed by linefeed
97      (*ANYCRLF)   any of the three above
98      (*ANY)       all Unicode newline sequences
99    </pre>
100    These override the default and the options given to <b>pcre_compile()</b>. For
101    example, on a Unix system where LF is the default newline sequence, the pattern
102    <pre>
103      (*CR)a.b
104    </pre>
105    changes the convention to CR. That pattern matches "a\nb" because LF is no
106    longer a newline. Note that these special settings, which are not
107    Perl-compatible, are recognized only at the very start of a pattern, and that
108    they must be in upper case. If more than one of them is present, the last one
109    is used.
110    </P>
111    <P>
112    The newline convention does not affect what the \R escape sequence matches. By
113    default, this is any Unicode newline sequence, for Perl compatibility. However,
114    this can be changed; see the description of \R in the section entitled
115    <a href="#newlineseq">"Newline sequences"</a>
116    below. A change of \R setting can be combined with a change of newline
117    convention.
118    </P>
119    <br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
120  <P>  <P>
121  A regular expression is a pattern that is matched against a subject string from  A regular expression is a pattern that is matched against a subject string from
122  left to right. Most characters stand for themselves in a pattern, and match the  left to right. Most characters stand for themselves in a pattern, and match the
# Line 131  a character class the only metacharacter Line 173  a character class the only metacharacter
173  </pre>  </pre>
174  The following sections describe the use of each of the metacharacters.  The following sections describe the use of each of the metacharacters.
175  </P>  </P>
176  <br><a name="SEC3" href="#TOC1">BACKSLASH</a><br>  <br><a name="SEC4" href="#TOC1">BACKSLASH</a><br>
177  <P>  <P>
178  The backslash character has several uses. Firstly, if it is followed by a  The backslash character has several uses. Firstly, if it is followed by a
179  non-alphanumeric character, it takes away any special meaning that character  non-alphanumeric character, it takes away any special meaning that character
# Line 180  represents: Line 222  represents:
222    \cx       "control-x", where x is any character    \cx       "control-x", where x is any character
223    \e        escape (hex 1B)    \e        escape (hex 1B)
224    \f        formfeed (hex 0C)    \f        formfeed (hex 0C)
225    \n        newline (hex 0A)    \n        linefeed (hex 0A)
226    \r        carriage return (hex 0D)    \r        carriage return (hex 0D)
227    \t        tab (hex 09)    \t        tab (hex 09)
228    \ddd      character with octal code ddd, or backreference    \ddd      character with octal code ddd, or backreference
# Line 358  page). For example, in a French locale s Line 400  page). For example, in a French locale s
400  or "french" in Windows, some character codes greater than 128 are used for  or "french" in Windows, some character codes greater than 128 are used for
401  accented letters, and these are matched by \w. The use of locales with Unicode  accented letters, and these are matched by \w. The use of locales with Unicode
402  is discouraged.  is discouraged.
403  </P>  <a name="newlineseq"></a></P>
404  <br><b>  <br><b>
405  Newline sequences  Newline sequences
406  </b><br>  </b><br>
407  <P>  <P>
408  Outside a character class, the escape sequence \R matches any Unicode newline  Outside a character class, by default, the escape sequence \R matches any
409  sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is equivalent to  Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is
410  the following:  equivalent to the following:
411  <pre>  <pre>
412    (?&#62;\r\n|\n|\x0b|\f|\r|\x85)    (?&#62;\r\n|\n|\x0b|\f|\r|\x85)
413  </pre>  </pre>
# Line 384  Unicode character property support is no Line 426  Unicode character property support is no
426  recognized.  recognized.
427  </P>  </P>
428  <P>  <P>
429    It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
430    complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF
431    either at compile time or when the pattern is matched. (BSR is an abbrevation
432    for "backslash R".) This can be made the default when PCRE is built; if this is
433    the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option.
434    It is also possible to specify these settings by starting a pattern string with
435    one of the following sequences:
436    <pre>
437      (*BSR_ANYCRLF)   CR, LF, or CRLF only
438      (*BSR_UNICODE)   any Unicode newline sequence
439    </pre>
440    These override the default and the options given to <b>pcre_compile()</b>, but
441    they can be overridden by options given to <b>pcre_exec()</b>. Note that these
442    special settings, which are not Perl-compatible, are recognized only at the
443    very start of a pattern, and that they must be in upper case. If more than one
444    of them is present, the last one is used. They can be combined with a change of
445    newline convention, for example, a pattern can start with:
446    <pre>
447      (*ANY)(*BSR_ANYCRLF)
448    </pre>
449  Inside a character class, \R matches the letter "R".  Inside a character class, \R matches the letter "R".
450  <a name="uniextseq"></a></P>  <a name="uniextseq"></a></P>
451  <br><b>  <br><b>
# Line 675  If all the alternatives of a pattern beg Line 737  If all the alternatives of a pattern beg
737  to the starting match position, and the "anchored" flag is set in the compiled  to the starting match position, and the "anchored" flag is set in the compiled
738  regular expression.  regular expression.
739  </P>  </P>
740  <br><a name="SEC4" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>  <br><a name="SEC5" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
741  <P>  <P>
742  Outside a character class, in the default matching mode, the circumflex  Outside a character class, in the default matching mode, the circumflex
743  character is an assertion that is true only if the current matching point is  character is an assertion that is true only if the current matching point is
# Line 729  Note that the sequences \A, \Z, and \z c Line 791  Note that the sequences \A, \Z, and \z c
791  end of the subject in both modes, and if all branches of a pattern start with  end of the subject in both modes, and if all branches of a pattern start with
792  \A it is always anchored, whether or not PCRE_MULTILINE is set.  \A it is always anchored, whether or not PCRE_MULTILINE is set.
793  </P>  </P>
794  <br><a name="SEC5" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>  <br><a name="SEC6" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>
795  <P>  <P>
796  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
797  the subject string except (by default) a character that signifies the end of a  the subject string except (by default) a character that signifies the end of a
# Line 754  The handling of dot is entirely independ Line 816  The handling of dot is entirely independ
816  dollar, the only relationship being that they both involve newlines. Dot has no  dollar, the only relationship being that they both involve newlines. Dot has no
817  special meaning in a character class.  special meaning in a character class.
818  </P>  </P>
819  <br><a name="SEC6" href="#TOC1">MATCHING A SINGLE BYTE</a><br>  <br><a name="SEC7" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
820  <P>  <P>
821  Outside a character class, the escape sequence \C matches any one byte, both  Outside a character class, the escape sequence \C matches any one byte, both
822  in and out of UTF-8 mode. Unlike a dot, it always matches any line-ending  in and out of UTF-8 mode. Unlike a dot, it always matches any line-ending
# Line 769  PCRE does not allow \C to appear in look Line 831  PCRE does not allow \C to appear in look
831  because in UTF-8 mode this would make it impossible to calculate the length of  because in UTF-8 mode this would make it impossible to calculate the length of
832  the lookbehind.  the lookbehind.
833  <a name="characterclass"></a></P>  <a name="characterclass"></a></P>
834  <br><a name="SEC7" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>  <br><a name="SEC8" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
835  <P>  <P>
836  An opening square bracket introduces a character class, terminated by a closing  An opening square bracket introduces a character class, terminated by a closing
837  square bracket. A closing square bracket on its own is not special. If a  square bracket. A closing square bracket on its own is not special. If a
# Line 864  introducing a POSIX class name - see the Line 926  introducing a POSIX class name - see the
926  closing square bracket. However, escaping other non-alphanumeric characters  closing square bracket. However, escaping other non-alphanumeric characters
927  does no harm.  does no harm.
928  </P>  </P>
929  <br><a name="SEC8" href="#TOC1">POSIX CHARACTER CLASSES</a><br>  <br><a name="SEC9" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
930  <P>  <P>
931  Perl supports the POSIX notation for character classes. This uses names  Perl supports the POSIX notation for character classes. This uses names
932  enclosed by [: and :] within the enclosing square brackets. PCRE also supports  enclosed by [: and :] within the enclosing square brackets. PCRE also supports
# Line 910  supported, and an error is given if they Line 972  supported, and an error is given if they
972  In UTF-8 mode, characters with values greater than 128 do not match any of  In UTF-8 mode, characters with values greater than 128 do not match any of
973  the POSIX character classes.  the POSIX character classes.
974  </P>  </P>
975  <br><a name="SEC9" href="#TOC1">VERTICAL BAR</a><br>  <br><a name="SEC10" href="#TOC1">VERTICAL BAR</a><br>
976  <P>  <P>
977  Vertical bar characters are used to separate alternative patterns. For example,  Vertical bar characters are used to separate alternative patterns. For example,
978  the pattern  the pattern
# Line 925  that succeeds is used. If the alternativ Line 987  that succeeds is used. If the alternativ
987  "succeeds" means matching the rest of the main pattern as well as the  "succeeds" means matching the rest of the main pattern as well as the
988  alternative in the subpattern.  alternative in the subpattern.
989  </P>  </P>
990  <br><a name="SEC10" href="#TOC1">INTERNAL OPTION SETTING</a><br>  <br><a name="SEC11" href="#TOC1">INTERNAL OPTION SETTING</a><br>
991  <P>  <P>
992  The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and  The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
993  PCRE_EXTENDED options can be changed from within the pattern by a sequence of  PCRE_EXTENDED options (which are Perl-compatible) can be changed from within
994  Perl option letters enclosed between "(?" and ")". The option letters are  the pattern by a sequence of Perl option letters enclosed between "(?" and ")".
995    The option letters are
996  <pre>  <pre>
997    i  for PCRE_CASELESS    i  for PCRE_CASELESS
998    m  for PCRE_MULTILINE    m  for PCRE_MULTILINE
# Line 944  permitted. If a letter appears both befo Line 1007  permitted. If a letter appears both befo
1007  unset.  unset.
1008  </P>  </P>
1009  <P>  <P>
1010    The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be
1011    changed in the same way as the Perl-compatible options by using the characters
1012    J, U and X respectively.
1013    </P>
1014    <P>
1015  When an option change occurs at top level (that is, not inside subpattern  When an option change occurs at top level (that is, not inside subpattern
1016  parentheses), the change applies to the remainder of the pattern that follows.  parentheses), the change applies to the remainder of the pattern that follows.
1017  If the change is placed right at the start of a pattern, PCRE extracts it into  If the change is placed right at the start of a pattern, PCRE extracts it into
# Line 967  matches "ab", "aB", "c", and "C", even t Line 1035  matches "ab", "aB", "c", and "C", even t
1035  branch is abandoned before the option setting. This is because the effects of  branch is abandoned before the option setting. This is because the effects of
1036  option settings happen at compile time. There would be some very weird  option settings happen at compile time. There would be some very weird
1037  behaviour otherwise.  behaviour otherwise.
 </P>  
 <P>  
 The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be  
 changed in the same way as the Perl-compatible options by using the characters  
 J, U and X respectively.  
1038  <a name="subpattern"></a></P>  <a name="subpattern"></a></P>
1039  <br><a name="SEC11" href="#TOC1">SUBPATTERNS</a><br>  <br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br>
1040  <P>  <P>
1041  Subpatterns are delimited by parentheses (round brackets), which can be nested.  Subpatterns are delimited by parentheses (round brackets), which can be nested.
1042  Turning part of a pattern into a subpattern does two things:  Turning part of a pattern into a subpattern does two things:
# Line 1027  from left to right, and options are not Line 1090  from left to right, and options are not
1090  is reached, an option setting in one branch does affect subsequent branches, so  is reached, an option setting in one branch does affect subsequent branches, so
1091  the above patterns match "SUNDAY" as well as "Saturday".  the above patterns match "SUNDAY" as well as "Saturday".
1092  </P>  </P>
1093  <br><a name="SEC12" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>  <br><a name="SEC13" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
1094  <P>  <P>
1095  Perl 5.10 introduced a feature whereby each alternative in a subpattern uses  Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
1096  the same numbers for its capturing parentheses. Such a subpattern starts with  the same numbers for its capturing parentheses. Such a subpattern starts with
# Line 1058  the first one in the pattern with the gi Line 1121  the first one in the pattern with the gi
1121  An alternative approach to using this "branch reset" feature is to use  An alternative approach to using this "branch reset" feature is to use
1122  duplicate named subpatterns, as described in the next section.  duplicate named subpatterns, as described in the next section.
1123  </P>  </P>
1124  <br><a name="SEC13" href="#TOC1">NAMED SUBPATTERNS</a><br>  <br><a name="SEC14" href="#TOC1">NAMED SUBPATTERNS</a><br>
1125  <P>  <P>
1126  Identifying capturing parentheses by number is simple, but it can be very hard  Identifying capturing parentheses by number is simple, but it can be very hard
1127  to keep track of the numbers in complicated regular expressions. Furthermore,  to keep track of the numbers in complicated regular expressions. Furthermore,
# Line 1113  details of the interfaces for handling n Line 1176  details of the interfaces for handling n
1176  <a href="pcreapi.html"><b>pcreapi</b></a>  <a href="pcreapi.html"><b>pcreapi</b></a>
1177  documentation.  documentation.
1178  </P>  </P>
1179  <br><a name="SEC14" href="#TOC1">REPETITION</a><br>  <br><a name="SEC15" href="#TOC1">REPETITION</a><br>
1180  <P>  <P>
1181  Repetition is specified by quantifiers, which can follow any of the following  Repetition is specified by quantifiers, which can follow any of the following
1182  items:  items:
# Line 1264  example, after Line 1327  example, after
1327  </pre>  </pre>
1328  matches "aba" the value of the second captured substring is "b".  matches "aba" the value of the second captured substring is "b".
1329  <a name="atomicgroup"></a></P>  <a name="atomicgroup"></a></P>
1330  <br><a name="SEC15" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>  <br><a name="SEC16" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
1331  <P>  <P>
1332  With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")  With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
1333  repetition, failure of what follows normally causes the repeated item to be  repetition, failure of what follows normally causes the repeated item to be
# Line 1368  an atomic group, like this: Line 1431  an atomic group, like this:
1431  </pre>  </pre>
1432  sequences of non-digits cannot be broken, and failure happens quickly.  sequences of non-digits cannot be broken, and failure happens quickly.
1433  <a name="backreferences"></a></P>  <a name="backreferences"></a></P>
1434  <br><a name="SEC16" href="#TOC1">BACK REFERENCES</a><br>  <br><a name="SEC17" href="#TOC1">BACK REFERENCES</a><br>
1435  <P>  <P>
1436  Outside a character class, a backslash followed by a digit greater than 0 (and  Outside a character class, a backslash followed by a digit greater than 0 (and
1437  possibly further digits) is a back reference to a capturing subpattern earlier  possibly further digits) is a back reference to a capturing subpattern earlier
# Line 1482  that the first iteration does not need t Line 1545  that the first iteration does not need t
1545  done using alternation, as in the example above, or by a quantifier with a  done using alternation, as in the example above, or by a quantifier with a
1546  minimum of zero.  minimum of zero.
1547  <a name="bigassertions"></a></P>  <a name="bigassertions"></a></P>
1548  <br><a name="SEC17" href="#TOC1">ASSERTIONS</a><br>  <br><a name="SEC18" href="#TOC1">ASSERTIONS</a><br>
1549  <P>  <P>
1550  An assertion is a test on the characters following or preceding the current  An assertion is a test on the characters following or preceding the current
1551  matching point that does not actually consume any characters. The simple  matching point that does not actually consume any characters. The simple
# Line 1642  preceded by "foo", while Line 1705  preceded by "foo", while
1705  is another pattern that matches "foo" preceded by three digits and any three  is another pattern that matches "foo" preceded by three digits and any three
1706  characters that are not "999".  characters that are not "999".
1707  <a name="conditions"></a></P>  <a name="conditions"></a></P>
1708  <br><a name="SEC18" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>  <br><a name="SEC19" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
1709  <P>  <P>
1710  It is possible to cause the matching process to obey a subpattern  It is possible to cause the matching process to obey a subpattern
1711  conditionally or to choose between two alternative subpatterns, depending on  conditionally or to choose between two alternative subpatterns, depending on
# Line 1780  subject is matched against the first alt Line 1843  subject is matched against the first alt
1843  against the second. This pattern matches strings in one of the two forms  against the second. This pattern matches strings in one of the two forms
1844  dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.  dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
1845  <a name="comments"></a></P>  <a name="comments"></a></P>
1846  <br><a name="SEC19" href="#TOC1">COMMENTS</a><br>  <br><a name="SEC20" href="#TOC1">COMMENTS</a><br>
1847  <P>  <P>
1848  The sequence (?# marks the start of a comment that continues up to the next  The sequence (?# marks the start of a comment that continues up to the next
1849  closing parenthesis. Nested parentheses are not permitted. The characters  closing parenthesis. Nested parentheses are not permitted. The characters
# Line 1791  If the PCRE_EXTENDED option is set, an u Line 1854  If the PCRE_EXTENDED option is set, an u
1854  character class introduces a comment that continues to immediately after the  character class introduces a comment that continues to immediately after the
1855  next newline in the pattern.  next newline in the pattern.
1856  <a name="recursion"></a></P>  <a name="recursion"></a></P>
1857  <br><a name="SEC20" href="#TOC1">RECURSIVE PATTERNS</a><br>  <br><a name="SEC21" href="#TOC1">RECURSIVE PATTERNS</a><br>
1858  <P>  <P>
1859  Consider the problem of matching a string in parentheses, allowing for  Consider the problem of matching a string in parentheses, allowing for
1860  unlimited nested parentheses. Without the use of recursion, the best that can  unlimited nested parentheses. Without the use of recursion, the best that can
# Line 1921  In this pattern, (?(R) is the start of a Line 1984  In this pattern, (?(R) is the start of a
1984  different alternatives for the recursive and non-recursive cases. The (?R) item  different alternatives for the recursive and non-recursive cases. The (?R) item
1985  is the actual recursive call.  is the actual recursive call.
1986  <a name="subpatternsassubroutines"></a></P>  <a name="subpatternsassubroutines"></a></P>
1987  <br><a name="SEC21" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>  <br><a name="SEC22" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
1988  <P>  <P>
1989  If the syntax for a recursive subpattern reference (either by number or by  If the syntax for a recursive subpattern reference (either by number or by
1990  name) is used outside the parentheses to which it refers, it operates like a  name) is used outside the parentheses to which it refers, it operates like a
# Line 1961  changed for different calls. For example Line 2024  changed for different calls. For example
2024  It matches "abcabc". It does not match "abcABC" because the change of  It matches "abcabc". It does not match "abcABC" because the change of
2025  processing option does not affect the called subpattern.  processing option does not affect the called subpattern.
2026  </P>  </P>
2027  <br><a name="SEC22" href="#TOC1">CALLOUTS</a><br>  <br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
2028  <P>  <P>
2029  Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl  Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
2030  code to be obeyed in the middle of matching a regular expression. This makes it  code to be obeyed in the middle of matching a regular expression. This makes it
# Line 1996  description of the interface to the call Line 2059  description of the interface to the call
2059  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
2060  documentation.  documentation.
2061  </P>  </P>
2062  <br><a name="SEC23" href="#TOC1">BACTRACKING CONTROL</a><br>  <br><a name="SEC24" href="#TOC1">BACKTRACKING CONTROL</a><br>
2063  <P>  <P>
2064  Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which  Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
2065  are described in the Perl documentation as "experimental and subject to change  are described in the Perl documentation as "experimental and subject to change
# Line 2111  the end of the group if FOO succeeds); o Line 2174  the end of the group if FOO succeeds); o
2174  second alternative and tries COND2, without backtracking into COND1. If (*THEN)  second alternative and tries COND2, without backtracking into COND1. If (*THEN)
2175  is used outside of any alternation, it acts exactly like (*PRUNE).  is used outside of any alternation, it acts exactly like (*PRUNE).
2176  </P>  </P>
2177  <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
2178  <P>  <P>
2179  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).
2180  </P>  </P>
2181  <br><a name="SEC25" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
2182  <P>  <P>
2183  Philip Hazel  Philip Hazel
2184  <br>  <br>
# Line 2124  University Computing Service Line 2187  University Computing Service
2187  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
2188  <br>  <br>
2189  </P>  </P>
2190  <br><a name="SEC26" href="#TOC1">REVISION</a><br>  <br><a name="SEC27" href="#TOC1">REVISION</a><br>
2191  <P>  <P>
2192  Last updated: 09 August 2007  Last updated: 14 September 2007
2193  <br>  <br>
2194  Copyright &copy; 1997-2007 University of Cambridge.  Copyright &copy; 1997-2007 University of Cambridge.
2195  <br>  <br>

Legend:
Removed from v.211  
changed lines
  Added in v.247

  ViewVC Help
Powered by ViewVC 1.1.5