/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 708 by ph10, Fri Sep 23 11:03:03 2011 UTC revision 733 by ph10, Tue Oct 11 10:29:36 2011 UTC
# Line 3257  DIFFERENCES BETWEEN PCRE AND PERL Line 3257  DIFFERENCES BETWEEN PCRE AND PERL
3257         "callout"  feature allows an external function to be called during pat-         "callout"  feature allows an external function to be called during pat-
3258         tern matching. See the pcrecallout documentation for details.         tern matching. See the pcrecallout documentation for details.
3259    
3260         10. Subpatterns that are called recursively  or  as  "subroutines"  are         10. Subpatterns that are called as subroutines (whether or  not  recur-
3261         always  treated  as  atomic  groups  in  PCRE. This is like Python, but         sively)  are  always  treated  as  atomic  groups in PCRE. This is like
3262         unlike Perl. There is a discussion of an example that explains this  in         Python, but unlike Perl.  Captured values that are set outside  a  sub-
3263         more  detail  in  the section on recursion differences from Perl in the         routine  call  can  be  reference from inside in PCRE, but not in Perl.
3264         pcrepattern page.         There is a discussion that explains these differences in more detail in
3265           the section on recursion differences from Perl in the pcrepattern page.
3266         11. There are some differences that are concerned with the settings  of  
3267         captured  strings  when  part  of  a  pattern is repeated. For example,         11.  If  (*THEN)  is present in a group that is called as a subroutine,
3268         matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2         its action is limited to that group, even if the group does not contain
3269           any | characters.
3270    
3271           12.  There are some differences that are concerned with the settings of
3272           captured strings when part of  a  pattern  is  repeated.  For  example,
3273           matching  "aba"  against  the  pattern  /^(a(b)?)+$/  in Perl leaves $2
3274         unset, but in PCRE it is set to "b".         unset, but in PCRE it is set to "b".
3275    
3276         12.  PCRE's handling of duplicate subpattern numbers and duplicate sub-         13. PCRE's handling of duplicate subpattern numbers and duplicate  sub-
3277         pattern names is not as general as Perl's. This is a consequence of the         pattern names is not as general as Perl's. This is a consequence of the
3278         fact the PCRE works internally just with numbers, using an external ta-         fact the PCRE works internally just with numbers, using an external ta-
3279         ble to translate between numbers and names. In  particular,  a  pattern         ble  to  translate  between numbers and names. In particular, a pattern
3280         such  as  (?|(?<a>A)|(?<b)B),  where the two capturing parentheses have         such as (?|(?<a>A)|(?<b)B), where the two  capturing  parentheses  have
3281         the same number but different names, is not supported,  and  causes  an         the  same  number  but different names, is not supported, and causes an
3282         error  at compile time. If it were allowed, it would not be possible to         error at compile time. If it were allowed, it would not be possible  to
3283         distinguish which parentheses matched, because both names map  to  cap-         distinguish  which  parentheses matched, because both names map to cap-
3284         turing subpattern number 1. To avoid this confusing situation, an error         turing subpattern number 1. To avoid this confusing situation, an error
3285         is given at compile time.         is given at compile time.
3286    
3287         13. Perl recognizes comments in some places that  PCRE  does  not,  for         14.  Perl  recognizes  comments  in some places that PCRE does not, for
3288         example,  between  the  ( and ? at the start of a subpattern. If the /x         example, between the ( and ? at the start of a subpattern.  If  the  /x
3289         modifier is set, Perl allows whitespace between ( and ? but PCRE  never         modifier  is set, Perl allows whitespace between ( and ? but PCRE never
3290         does, even if the PCRE_EXTENDED option is set.         does, even if the PCRE_EXTENDED option is set.
3291    
3292         14. PCRE provides some extensions to the Perl regular expression facil-         15. PCRE provides some extensions to the Perl regular expression facil-
3293         ities.  Perl 5.10 includes new features that are not  in  earlier  ver-         ities.   Perl  5.10  includes new features that are not in earlier ver-
3294         sions  of  Perl, some of which (such as named parentheses) have been in         sions of Perl, some of which (such as named parentheses) have  been  in
3295         PCRE for some time. This list is with respect to Perl 5.10:         PCRE for some time. This list is with respect to Perl 5.10:
3296    
3297         (a) Although lookbehind assertions in  PCRE  must  match  fixed  length         (a)  Although  lookbehind  assertions  in  PCRE must match fixed length
3298         strings,  each alternative branch of a lookbehind assertion can match a         strings, each alternative branch of a lookbehind assertion can match  a
3299         different length of string. Perl requires them all  to  have  the  same         different  length  of  string.  Perl requires them all to have the same
3300         length.         length.
3301    
3302         (b)  If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $         (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the  $
3303         meta-character matches only at the very end of the string.         meta-character matches only at the very end of the string.
3304    
3305         (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-         (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
3306         cial meaning is faulted. Otherwise, like Perl, the backslash is quietly         cial meaning is faulted. Otherwise, like Perl, the backslash is quietly
3307         ignored.  (Perl can be made to issue a warning.)         ignored.  (Perl can be made to issue a warning.)
3308    
3309         (d) If PCRE_UNGREEDY is set, the greediness of the  repetition  quanti-         (d)  If  PCRE_UNGREEDY is set, the greediness of the repetition quanti-
3310         fiers is inverted, that is, by default they are not greedy, but if fol-         fiers is inverted, that is, by default they are not greedy, but if fol-
3311         lowed by a question mark they are.         lowed by a question mark they are.
3312    
# Line 3309  DIFFERENCES BETWEEN PCRE AND PERL Line 3314  DIFFERENCES BETWEEN PCRE AND PERL
3314         tried only at the first matching position in the subject string.         tried only at the first matching position in the subject string.
3315    
3316         (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,         (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
3317         and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no  Perl  equiva-         and  PCRE_NO_AUTO_CAPTURE  options for pcre_exec() have no Perl equiva-
3318         lents.         lents.
3319    
3320         (g)  The  \R escape sequence can be restricted to match only CR, LF, or         (g) The \R escape sequence can be restricted to match only CR,  LF,  or
3321         CRLF by the PCRE_BSR_ANYCRLF option.         CRLF by the PCRE_BSR_ANYCRLF option.
3322    
3323         (h) The callout facility is PCRE-specific.         (h) The callout facility is PCRE-specific.
# Line 3320  DIFFERENCES BETWEEN PCRE AND PERL Line 3325  DIFFERENCES BETWEEN PCRE AND PERL
3325         (i) The partial matching facility is PCRE-specific.         (i) The partial matching facility is PCRE-specific.
3326    
3327         (j) Patterns compiled by PCRE can be saved and re-used at a later time,         (j) Patterns compiled by PCRE can be saved and re-used at a later time,
3328         even on different hosts that have the other endianness.         even on different hosts that have the other endianness.  However,  this
3329           does not apply to optimized data created by the just-in-time compiler.
3330    
3331         (k)  The  alternative  matching function (pcre_dfa_exec()) matches in a         (k)  The  alternative  matching function (pcre_dfa_exec()) matches in a
3332         different way and is not Perl-compatible.         different way and is not Perl-compatible.
# Line 3339  AUTHOR Line 3345  AUTHOR
3345    
3346  REVISION  REVISION
3347    
3348         Last updated: 24 August 2011         Last updated: 09 October 2011
3349         Copyright (c) 1997-2011 University of Cambridge.         Copyright (c) 1997-2011 University of Cambridge.
3350  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3351    
# Line 4469  DUPLICATE SUBPATTERN NUMBERS Line 4475  DUPLICATE SUBPATTERN NUMBERS
4475    
4476           /(?|(abc)|(def))\1/           /(?|(abc)|(def))\1/
4477    
4478         In contrast, a recursive or "subroutine" call to a numbered  subpattern         In contrast, a subroutine call to a numbered subpattern  always  refers
4479         always  refers  to  the first one in the pattern with the given number.         to  the  first  one in the pattern with the given number. The following
4480         The following pattern matches "abcabc" or "defabc":         pattern matches "abcabc" or "defabc":
4481    
4482           /(?|(abc)|(def))(?1)/           /(?|(abc)|(def))(?1)/
4483    
# Line 4567  REPETITION Line 4573  REPETITION
4573           a character class           a character class
4574           a back reference (see next section)           a back reference (see next section)
4575           a parenthesized subpattern (including assertions)           a parenthesized subpattern (including assertions)
4576           a recursive or "subroutine" call to a subpattern           a subroutine call to a subpattern (recursive or otherwise)
4577    
4578         The  general repetition quantifier specifies a minimum and maximum num-         The  general repetition quantifier specifies a minimum and maximum num-
4579         ber of permitted matches, by giving the two numbers in  curly  brackets         ber of permitted matches, by giving the two numbers in  curly  brackets
# Line 5213  CONDITIONAL SUBPATTERNS Line 5219  CONDITIONAL SUBPATTERNS
5219         with  the  name  DEFINE,  the  condition is always false. In this case,         with  the  name  DEFINE,  the  condition is always false. In this case,
5220         there may be only one alternative  in  the  subpattern.  It  is  always         there may be only one alternative  in  the  subpattern.  It  is  always
5221         skipped  if  control  reaches  this  point  in the pattern; the idea of         skipped  if  control  reaches  this  point  in the pattern; the idea of
5222         DEFINE is that it can be used to define "subroutines" that can be  ref-         DEFINE is that it can be used to define subroutines that can be  refer-
5223         erenced  from elsewhere. (The use of "subroutines" is described below.)         enced  from elsewhere. (The use of subroutines is described below.) For
5224         For  example,  a  pattern  to   match   an   IPv4   address   such   as         example, a pattern to match an IPv4 address  such  as  "192.168.23.245"
5225         "192.168.23.245" could be written like this (ignore whitespace and line         could be written like this (ignore whitespace and line breaks):
        breaks):  
5226    
5227           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
5228           \b (?&byte) (\.(?&byte)){3} \b           \b (?&byte) (\.(?&byte)){3} \b
5229    
5230         The first part of the pattern is a DEFINE group inside which a  another         The  first part of the pattern is a DEFINE group inside which a another
5231         group  named "byte" is defined. This matches an individual component of         group named "byte" is defined. This matches an individual component  of
5232         an IPv4 address (a number less than 256). When  matching  takes  place,         an  IPv4  address  (a number less than 256). When matching takes place,
5233         this  part  of  the pattern is skipped because DEFINE acts like a false         this part of the pattern is skipped because DEFINE acts  like  a  false
5234         condition. The rest of the pattern uses references to the  named  group         condition.  The  rest of the pattern uses references to the named group
5235         to  match the four dot-separated components of an IPv4 address, insist-         to match the four dot-separated components of an IPv4 address,  insist-
5236         ing on a word boundary at each end.         ing on a word boundary at each end.
5237    
5238     Assertion conditions     Assertion conditions
5239    
5240         If the condition is not in any of the above  formats,  it  must  be  an         If  the  condition  is  not  in any of the above formats, it must be an
5241         assertion.   This may be a positive or negative lookahead or lookbehind         assertion.  This may be a positive or negative lookahead or  lookbehind
5242         assertion. Consider  this  pattern,  again  containing  non-significant         assertion.  Consider  this  pattern,  again  containing non-significant
5243         white space, and with the two alternatives on the second line:         white space, and with the two alternatives on the second line:
5244    
5245           (?(?=[^a-z]*[a-z])           (?(?=[^a-z]*[a-z])
5246           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
5247    
5248         The  condition  is  a  positive  lookahead  assertion  that  matches an         The condition  is  a  positive  lookahead  assertion  that  matches  an
5249         optional sequence of non-letters followed by a letter. In other  words,         optional  sequence of non-letters followed by a letter. In other words,
5250         it  tests  for the presence of at least one letter in the subject. If a         it tests for the presence of at least one letter in the subject.  If  a
5251         letter is found, the subject is matched against the first  alternative;         letter  is found, the subject is matched against the first alternative;
5252         otherwise  it  is  matched  against  the  second.  This pattern matches         otherwise it is  matched  against  the  second.  This  pattern  matches
5253         strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are         strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
5254         letters and dd are digits.         letters and dd are digits.
5255    
5256    
# Line 5254  COMMENTS Line 5259  COMMENTS
5259         There are two ways of including comments in patterns that are processed         There are two ways of including comments in patterns that are processed
5260         by PCRE. In both cases, the start of the comment must not be in a char-         by PCRE. In both cases, the start of the comment must not be in a char-
5261         acter class, nor in the middle of any other sequence of related charac-         acter class, nor in the middle of any other sequence of related charac-
5262         ters such as (?: or a subpattern name or number.  The  characters  that         ters  such  as  (?: or a subpattern name or number. The characters that
5263         make up a comment play no part in the pattern matching.         make up a comment play no part in the pattern matching.
5264    
5265         The  sequence (?# marks the start of a comment that continues up to the         The sequence (?# marks the start of a comment that continues up to  the
5266         next closing parenthesis. Nested parentheses are not permitted. If  the         next  closing parenthesis. Nested parentheses are not permitted. If the
5267         PCRE_EXTENDED option is set, an unescaped # character also introduces a         PCRE_EXTENDED option is set, an unescaped # character also introduces a
5268         comment, which in this case continues to  immediately  after  the  next         comment,  which  in  this  case continues to immediately after the next
5269         newline  character  or character sequence in the pattern. Which charac-         newline character or character sequence in the pattern.  Which  charac-
5270         ters are interpreted as newlines is controlled by the options passed to         ters are interpreted as newlines is controlled by the options passed to
5271         pcre_compile() or by a special sequence at the start of the pattern, as         pcre_compile() or by a special sequence at the start of the pattern, as
5272         described in the section entitled  "Newline  conventions"  above.  Note         described  in  the  section  entitled "Newline conventions" above. Note
5273         that  the  end of this type of comment is a literal newline sequence in         that the end of this type of comment is a literal newline  sequence  in
5274         the pattern; escape sequences that happen to represent a newline do not         the pattern; escape sequences that happen to represent a newline do not
5275         count.  For  example,  consider this pattern when PCRE_EXTENDED is set,         count. For example, consider this pattern when  PCRE_EXTENDED  is  set,
5276         and the default newline convention is in force:         and the default newline convention is in force:
5277    
5278           abc #comment \n still comment           abc #comment \n still comment
5279    
5280         On encountering the # character, pcre_compile()  skips  along,  looking         On  encountering  the  # character, pcre_compile() skips along, looking
5281         for  a newline in the pattern. The sequence \n is still literal at this         for a newline in the pattern. The sequence \n is still literal at  this
5282         stage, so it does not terminate the comment. Only an  actual  character         stage,  so  it does not terminate the comment. Only an actual character
5283         with the code value 0x0a (the default newline) does so.         with the code value 0x0a (the default newline) does so.
5284    
5285    
5286  RECURSIVE PATTERNS  RECURSIVE PATTERNS
5287    
5288         Consider  the problem of matching a string in parentheses, allowing for         Consider the problem of matching a string in parentheses, allowing  for
5289         unlimited nested parentheses. Without the use of  recursion,  the  best         unlimited  nested  parentheses.  Without the use of recursion, the best
5290         that  can  be  done  is  to use a pattern that matches up to some fixed         that can be done is to use a pattern that  matches  up  to  some  fixed
5291         depth of nesting. It is not possible to  handle  an  arbitrary  nesting         depth  of  nesting.  It  is not possible to handle an arbitrary nesting
5292         depth.         depth.
5293    
5294         For some time, Perl has provided a facility that allows regular expres-         For some time, Perl has provided a facility that allows regular expres-
5295         sions to recurse (amongst other things). It does this by  interpolating         sions  to recurse (amongst other things). It does this by interpolating
5296         Perl  code in the expression at run time, and the code can refer to the         Perl code in the expression at run time, and the code can refer to  the
5297         expression itself. A Perl pattern using code interpolation to solve the         expression itself. A Perl pattern using code interpolation to solve the
5298         parentheses problem can be created like this:         parentheses problem can be created like this:
5299    
# Line 5298  RECURSIVE PATTERNS Line 5303  RECURSIVE PATTERNS
5303         refers recursively to the pattern in which it appears.         refers recursively to the pattern in which it appears.
5304    
5305         Obviously, PCRE cannot support the interpolation of Perl code. Instead,         Obviously, PCRE cannot support the interpolation of Perl code. Instead,
5306         it  supports  special  syntax  for recursion of the entire pattern, and         it supports special syntax for recursion of  the  entire  pattern,  and
5307         also for individual subpattern recursion.  After  its  introduction  in         also  for  individual  subpattern  recursion. After its introduction in
5308         PCRE  and  Python,  this  kind of recursion was subsequently introduced         PCRE and Python, this kind of  recursion  was  subsequently  introduced
5309         into Perl at release 5.10.         into Perl at release 5.10.
5310    
5311         A special item that consists of (? followed by a  number  greater  than         A  special  item  that consists of (? followed by a number greater than
5312         zero and a closing parenthesis is a recursive call of the subpattern of         zero and a closing parenthesis is a recursive subroutine  call  of  the
5313         the given number, provided that it occurs inside that  subpattern.  (If         subpattern  of  the  given  number, provided that it occurs inside that
5314         not,  it  is  a  "subroutine" call, which is described in the next sec-         subpattern. (If not, it is a non-recursive subroutine  call,  which  is
5315         tion.) The special item (?R) or (?0) is a recursive call of the  entire         described  in  the  next  section.)  The special item (?R) or (?0) is a
5316         regular expression.         recursive call of the entire regular expression.
5317    
5318         This  PCRE  pattern  solves  the nested parentheses problem (assume the         This PCRE pattern solves the nested  parentheses  problem  (assume  the
5319         PCRE_EXTENDED option is set so that white space is ignored):         PCRE_EXTENDED option is set so that white space is ignored):
5320    
5321           \( ( [^()]++ | (?R) )* \)           \( ( [^()]++ | (?R) )* \)
5322    
5323         First it matches an opening parenthesis. Then it matches any number  of         First  it matches an opening parenthesis. Then it matches any number of
5324         substrings  which  can  either  be  a sequence of non-parentheses, or a         substrings which can either be a  sequence  of  non-parentheses,  or  a
5325         recursive match of the pattern itself (that is, a  correctly  parenthe-         recursive  match  of the pattern itself (that is, a correctly parenthe-
5326         sized substring).  Finally there is a closing parenthesis. Note the use         sized substring).  Finally there is a closing parenthesis. Note the use
5327         of a possessive quantifier to avoid backtracking into sequences of non-         of a possessive quantifier to avoid backtracking into sequences of non-
5328         parentheses.         parentheses.
5329    
5330         If  this  were  part of a larger pattern, you would not want to recurse         If this were part of a larger pattern, you would not  want  to  recurse
5331         the entire pattern, so instead you could use this:         the entire pattern, so instead you could use this:
5332    
5333           ( \( ( [^()]++ | (?1) )* \) )           ( \( ( [^()]++ | (?1) )* \) )
5334    
5335         We have put the pattern into parentheses, and caused the  recursion  to         We  have  put the pattern into parentheses, and caused the recursion to
5336         refer to them instead of the whole pattern.         refer to them instead of the whole pattern.
5337    
5338         In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be         In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
5339         tricky. This is made easier by the use of relative references.  Instead         tricky.  This is made easier by the use of relative references. Instead
5340         of (?1) in the pattern above you can write (?-2) to refer to the second         of (?1) in the pattern above you can write (?-2) to refer to the second
5341         most recently opened parentheses  preceding  the  recursion.  In  other         most  recently  opened  parentheses  preceding  the recursion. In other
5342         words,  a  negative  number counts capturing parentheses leftwards from         words, a negative number counts capturing  parentheses  leftwards  from
5343         the point at which it is encountered.         the point at which it is encountered.
5344    
5345         It is also possible to refer to  subsequently  opened  parentheses,  by         It  is  also  possible  to refer to subsequently opened parentheses, by
5346         writing  references  such  as (?+2). However, these cannot be recursive         writing references such as (?+2). However, these  cannot  be  recursive
5347         because the reference is not inside the  parentheses  that  are  refer-         because  the  reference  is  not inside the parentheses that are refer-
5348         enced.  They  are  always  "subroutine" calls, as described in the next         enced. They are always non-recursive subroutine calls, as described  in
5349         section.         the next section.
5350    
5351         An alternative approach is to use named parentheses instead.  The  Perl         An  alternative  approach is to use named parentheses instead. The Perl
5352         syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also         syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
5353         supported. We could rewrite the above example as follows:         supported. We could rewrite the above example as follows:
5354    
5355           (?<pn> \( ( [^()]++ | (?&pn) )* \) )           (?<pn> \( ( [^()]++ | (?&pn) )* \) )
5356    
5357         If there is more than one subpattern with the same name,  the  earliest         If  there  is more than one subpattern with the same name, the earliest
5358         one is used.         one is used.
5359    
5360         This  particular  example pattern that we have been looking at contains         This particular example pattern that we have been looking  at  contains
5361         nested unlimited repeats, and so the use of a possessive quantifier for         nested unlimited repeats, and so the use of a possessive quantifier for
5362         matching strings of non-parentheses is important when applying the pat-         matching strings of non-parentheses is important when applying the pat-
5363         tern to strings that do not match. For example, when  this  pattern  is         tern  to  strings  that do not match. For example, when this pattern is
5364         applied to         applied to
5365    
5366           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
5367    
5368         it  yields  "no  match" quickly. However, if a possessive quantifier is         it yields "no match" quickly. However, if a  possessive  quantifier  is
5369         not used, the match runs for a very long time indeed because there  are         not  used, the match runs for a very long time indeed because there are
5370         so  many  different  ways the + and * repeats can carve up the subject,         so many different ways the + and * repeats can carve  up  the  subject,
5371         and all have to be tested before failure can be reported.         and all have to be tested before failure can be reported.
5372    
5373         At the end of a match, the values of capturing  parentheses  are  those         At  the  end  of a match, the values of capturing parentheses are those
5374         from  the outermost level. If you want to obtain intermediate values, a         from the outermost level. If you want to obtain intermediate values,  a
5375         callout function can be used (see below and the pcrecallout  documenta-         callout  function can be used (see below and the pcrecallout documenta-
5376         tion). If the pattern above is matched against         tion). If the pattern above is matched against
5377    
5378           (ab(cd)ef)           (ab(cd)ef)
5379    
5380         the  value  for  the  inner capturing parentheses (numbered 2) is "ef",         the value for the inner capturing parentheses  (numbered  2)  is  "ef",
5381         which is the last value taken on at the top level. If a capturing  sub-         which  is the last value taken on at the top level. If a capturing sub-
5382         pattern is not matched at the top level, its final value is unset, even         pattern is not matched at the top level, its final  captured  value  is
5383         if it is (temporarily) set at a deeper level.         unset,  even  if  it was (temporarily) set at a deeper level during the
5384           matching process.
5385    
5386         If there are more than 15 capturing parentheses in a pattern, PCRE  has         If there are more than 15 capturing parentheses in a pattern, PCRE  has
5387         to  obtain extra memory to store data during a recursion, which it does         to  obtain extra memory to store data during a recursion, which it does
# Line 5394  RECURSIVE PATTERNS Line 5400  RECURSIVE PATTERNS
5400         two different alternatives for the recursive and  non-recursive  cases.         two different alternatives for the recursive and  non-recursive  cases.
5401         The (?R) item is the actual recursive call.         The (?R) item is the actual recursive call.
5402    
5403     Recursion difference from Perl     Differences in recursion processing between PCRE and Perl
5404    
5405         In  PCRE (like Python, but unlike Perl), a recursive subpattern call is         Recursion  processing  in PCRE differs from Perl in two important ways.
5406           In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
5407         always treated as an atomic group. That is, once it has matched some of         always treated as an atomic group. That is, once it has matched some of
5408         the subject string, it is never re-entered, even if it contains untried         the subject string, it is never re-entered, even if it contains untried
5409         alternatives and there is a subsequent matching failure.  This  can  be         alternatives  and  there  is a subsequent matching failure. This can be
5410         illustrated  by the following pattern, which purports to match a palin-         illustrated by the following pattern, which purports to match a  palin-
5411         dromic string that contains an odd number of characters  (for  example,         dromic  string  that contains an odd number of characters (for example,
5412         "a", "aba", "abcba", "abcdcba"):         "a", "aba", "abcba", "abcdcba"):
5413    
5414           ^(.|(.)(?1)\2)$           ^(.|(.)(?1)\2)$
5415    
5416         The idea is that it either matches a single character, or two identical         The idea is that it either matches a single character, or two identical
5417         characters surrounding a sub-palindrome. In Perl, this  pattern  works;         characters  surrounding  a sub-palindrome. In Perl, this pattern works;
5418         in  PCRE  it  does  not if the pattern is longer than three characters.         in PCRE it does not if the pattern is  longer  than  three  characters.
5419         Consider the subject string "abcba":         Consider the subject string "abcba":
5420    
5421         At the top level, the first character is matched, but as it is  not  at         At  the  top level, the first character is matched, but as it is not at
5422         the end of the string, the first alternative fails; the second alterna-         the end of the string, the first alternative fails; the second alterna-
5423         tive is taken and the recursion kicks in. The recursive call to subpat-         tive is taken and the recursion kicks in. The recursive call to subpat-
5424         tern  1  successfully  matches the next character ("b"). (Note that the         tern 1 successfully matches the next character ("b").  (Note  that  the
5425         beginning and end of line tests are not part of the recursion).         beginning and end of line tests are not part of the recursion).
5426    
5427         Back at the top level, the next character ("c") is compared  with  what         Back  at  the top level, the next character ("c") is compared with what
5428         subpattern  2 matched, which was "a". This fails. Because the recursion         subpattern 2 matched, which was "a". This fails. Because the  recursion
5429         is treated as an atomic group, there are now  no  backtracking  points,         is  treated  as  an atomic group, there are now no backtracking points,
5430         and  so  the  entire  match fails. (Perl is able, at this point, to re-         and so the entire match fails. (Perl is able, at  this  point,  to  re-
5431         enter the recursion and try the second alternative.)  However,  if  the         enter  the  recursion  and try the second alternative.) However, if the
5432         pattern is written with the alternatives in the other order, things are         pattern is written with the alternatives in the other order, things are
5433         different:         different:
5434    
5435           ^((.)(?1)\2|.)$           ^((.)(?1)\2|.)$
5436    
5437         This time, the recursing alternative is tried first, and  continues  to         This  time,  the recursing alternative is tried first, and continues to
5438         recurse  until  it runs out of characters, at which point the recursion         recurse until it runs out of characters, at which point  the  recursion
5439         fails. But this time we do have  another  alternative  to  try  at  the         fails.  But  this  time  we  do  have another alternative to try at the
5440         higher  level.  That  is  the  big difference: in the previous case the         higher level. That is the big difference:  in  the  previous  case  the
5441         remaining alternative is at a deeper recursion level, which PCRE cannot         remaining alternative is at a deeper recursion level, which PCRE cannot
5442         use.         use.
5443    
5444         To  change  the pattern so that it matches all palindromic strings, not         To change the pattern so that it matches all palindromic  strings,  not
5445         just those with an odd number of characters, it is tempting  to  change         just  those  with an odd number of characters, it is tempting to change
5446         the pattern to this:         the pattern to this:
5447    
5448           ^((.)(?1)\2|.?)$           ^((.)(?1)\2|.?)$
5449    
5450         Again,  this  works  in Perl, but not in PCRE, and for the same reason.         Again, this works in Perl, but not in PCRE, and for  the  same  reason.
5451         When a deeper recursion has matched a single character,  it  cannot  be         When  a  deeper  recursion has matched a single character, it cannot be
5452         entered  again  in  order  to match an empty string. The solution is to         entered again in order to match an empty string.  The  solution  is  to
5453         separate the two cases, and write out the odd and even cases as  alter-         separate  the two cases, and write out the odd and even cases as alter-
5454         natives at the higher level:         natives at the higher level:
5455    
5456           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
5457    
5458         If  you  want  to match typical palindromic phrases, the pattern has to         If you want to match typical palindromic phrases, the  pattern  has  to
5459         ignore all non-word characters, which can be done like this:         ignore all non-word characters, which can be done like this:
5460    
5461           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
5462    
5463         If run with the PCRE_CASELESS option, this pattern matches phrases such         If run with the PCRE_CASELESS option, this pattern matches phrases such
5464         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
5465         Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-         Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
5466         ing  into  sequences of non-word characters. Without this, PCRE takes a         ing into sequences of non-word characters. Without this, PCRE  takes  a
5467         great deal longer (ten times or more) to  match  typical  phrases,  and         great  deal  longer  (ten  times or more) to match typical phrases, and
5468         Perl takes so long that you think it has gone into a loop.         Perl takes so long that you think it has gone into a loop.
5469    
5470         WARNING:  The  palindrome-matching patterns above work only if the sub-         WARNING: The palindrome-matching patterns above work only if  the  sub-
5471         ject string does not start with a palindrome that is shorter  than  the         ject  string  does not start with a palindrome that is shorter than the
5472         entire  string.  For example, although "abcba" is correctly matched, if         entire string.  For example, although "abcba" is correctly matched,  if
5473         the subject is "ababa", PCRE finds the palindrome "aba" at  the  start,         the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,
5474         then  fails at top level because the end of the string does not follow.         then fails at top level because the end of the string does not  follow.
5475         Once again, it cannot jump back into the recursion to try other  alter-         Once  again, it cannot jump back into the recursion to try other alter-
5476         natives, so the entire match fails.         natives, so the entire match fails.
5477    
5478           The second way in which PCRE and Perl differ in  their  recursion  pro-
5479           cessing  is in the handling of captured values. In Perl, when a subpat-
5480           tern is called recursively or as a subpattern (see the  next  section),
5481           it  has  no  access to any values that were captured outside the recur-
5482           sion, whereas in PCRE these values can  be  referenced.  Consider  this
5483           pattern:
5484    
5485             ^(.)(\1|a(?2))
5486    
5487           In  PCRE,  this  pattern matches "bab". The first capturing parentheses
5488           match "b", then in the second group, when the back reference  \1  fails
5489           to  match "b", the second alternative matches "a" and then recurses. In
5490           the recursion, \1 does now match "b" and so the whole  match  succeeds.
5491           In  Perl,  the pattern fails to match because inside the recursive call
5492           \1 cannot access the externally set value.
5493    
5494    
5495  SUBPATTERNS AS SUBROUTINES  SUBPATTERNS AS SUBROUTINES
5496    
5497         If the syntax for a recursive subpattern reference (either by number or         If the syntax for a recursive subpattern call (either by number  or  by
5498         by name) is used outside the parentheses to which it refers,  it  oper-         name)  is  used outside the parentheses to which it refers, it operates
5499         ates  like a subroutine in a programming language. The "called" subpat-         like a subroutine in a programming language. The called subpattern  may
5500         tern may be defined before or after the reference. A numbered reference         be  defined  before or after the reference. A numbered reference can be
5501         can be absolute or relative, as in these examples:         absolute or relative, as in these examples:
5502    
5503           (...(absolute)...)...(?2)...           (...(absolute)...)...(?2)...
5504           (...(relative)...)...(?-1)...           (...(relative)...)...(?-1)...
# Line 5485  SUBPATTERNS AS SUBROUTINES Line 5508  SUBPATTERNS AS SUBROUTINES
5508    
5509           (sens|respons)e and \1ibility           (sens|respons)e and \1ibility
5510    
5511         matches  "sense and sensibility" and "response and responsibility", but         matches "sense and sensibility" and "response and responsibility",  but
5512         not "sense and responsibility". If instead the pattern         not "sense and responsibility". If instead the pattern
5513    
5514           (sens|respons)e and (?1)ibility           (sens|respons)e and (?1)ibility
5515    
5516         is used, it does match "sense and responsibility" as well as the  other         is  used, it does match "sense and responsibility" as well as the other
5517         two  strings.  Another  example  is  given  in the discussion of DEFINE         two strings. Another example is  given  in  the  discussion  of  DEFINE
5518         above.         above.
5519    
5520         Like recursive subpatterns, a subroutine call is always treated  as  an         All  subroutine  calls, whether recursive or not, are always treated as
5521         atomic  group. That is, once it has matched some of the subject string,         atomic groups. That is, once a subroutine has matched some of the  sub-
5522         it is never re-entered, even if it contains  untried  alternatives  and         ject string, it is never re-entered, even if it contains untried alter-
5523         there  is a subsequent matching failure. Any capturing parentheses that         natives and there is  a  subsequent  matching  failure.  Any  capturing
5524         are set during the subroutine call  revert  to  their  previous  values         parentheses  that  are  set  during the subroutine call revert to their
5525         afterwards.         previous values afterwards.
5526    
5527         When  a  subpattern is used as a subroutine, processing options such as         Processing options such as case-independence are fixed when  a  subpat-
5528         case-independence are fixed when the subpattern is defined. They cannot         tern  is defined, so if it is used as a subroutine, such options cannot
5529         be changed for different calls. For example, consider this pattern:         be changed for different calls. For example, consider this pattern:
5530    
5531           (abc)(?i:(?-1))           (abc)(?i:(?-1))
5532    
5533         It  matches  "abcabc". It does not match "abcABC" because the change of         It matches "abcabc". It does not match "abcABC" because the  change  of
5534         processing option does not affect the called subpattern.         processing option does not affect the called subpattern.
5535    
5536    
5537  ONIGURUMA SUBROUTINE SYNTAX  ONIGURUMA SUBROUTINE SYNTAX
5538    
5539         For compatibility with Oniguruma, the non-Perl syntax \g followed by  a         For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
5540         name or a number enclosed either in angle brackets or single quotes, is         name or a number enclosed either in angle brackets or single quotes, is
5541         an alternative syntax for referencing a  subpattern  as  a  subroutine,         an  alternative  syntax  for  referencing a subpattern as a subroutine,
5542         possibly  recursively. Here are two of the examples used above, rewrit-         possibly recursively. Here are two of the examples used above,  rewrit-
5543         ten using this syntax:         ten using this syntax:
5544    
5545           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
5546           (sens|respons)e and \g'1'ibility           (sens|respons)e and \g'1'ibility
5547    
5548         PCRE supports an extension to Oniguruma: if a number is preceded  by  a         PCRE  supports  an extension to Oniguruma: if a number is preceded by a
5549         plus or a minus sign it is taken as a relative reference. For example:         plus or a minus sign it is taken as a relative reference. For example:
5550    
5551           (abc)(?i:\g<-1>)           (abc)(?i:\g<-1>)
5552    
5553         Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not         Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
5554         synonymous. The former is a back reference; the latter is a  subroutine         synonymous.  The former is a back reference; the latter is a subroutine
5555         call.         call.
5556    
5557    
5558  CALLOUTS  CALLOUTS
5559    
5560         Perl has a feature whereby using the sequence (?{...}) causes arbitrary         Perl has a feature whereby using the sequence (?{...}) causes arbitrary
5561         Perl code to be obeyed in the middle of matching a regular  expression.         Perl  code to be obeyed in the middle of matching a regular expression.
5562         This makes it possible, amongst other things, to extract different sub-         This makes it possible, amongst other things, to extract different sub-
5563         strings that match the same pair of parentheses when there is a repeti-         strings that match the same pair of parentheses when there is a repeti-
5564         tion.         tion.
5565    
5566         PCRE provides a similar feature, but of course it cannot obey arbitrary         PCRE provides a similar feature, but of course it cannot obey arbitrary
5567         Perl code. The feature is called "callout". The caller of PCRE provides         Perl code. The feature is called "callout". The caller of PCRE provides
5568         an  external function by putting its entry point in the global variable         an external function by putting its entry point in the global  variable
5569         pcre_callout.  By default, this variable contains NULL, which  disables         pcre_callout.   By default, this variable contains NULL, which disables
5570         all calling out.         all calling out.
5571    
5572         Within  a  regular  expression,  (?C) indicates the points at which the         Within a regular expression, (?C) indicates the  points  at  which  the
5573         external function is to be called. If you want  to  identify  different         external  function  is  to be called. If you want to identify different
5574         callout  points, you can put a number less than 256 after the letter C.         callout points, you can put a number less than 256 after the letter  C.
5575         The default value is zero.  For example, this pattern has  two  callout         The  default  value is zero.  For example, this pattern has two callout
5576         points:         points:
5577    
5578           (?C1)abc(?C2)def           (?C1)abc(?C2)def
5579    
5580         If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are         If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
5581         automatically installed before each item in the pattern. They  are  all         automatically  installed  before each item in the pattern. They are all
5582         numbered 255.         numbered 255.
5583    
5584         During matching, when PCRE reaches a callout point (and pcre_callout is         During matching, when PCRE reaches a callout point (and pcre_callout is
5585         set), the external function is called. It is provided with  the  number         set),  the  external function is called. It is provided with the number
5586         of  the callout, the position in the pattern, and, optionally, one item         of the callout, the position in the pattern, and, optionally, one  item
5587         of data originally supplied by the caller of pcre_exec().  The  callout         of  data  originally supplied by the caller of pcre_exec(). The callout
5588         function  may cause matching to proceed, to backtrack, or to fail alto-         function may cause matching to proceed, to backtrack, or to fail  alto-
5589         gether. A complete description of the interface to the callout function         gether. A complete description of the interface to the callout function
5590         is given in the pcrecallout documentation.         is given in the pcrecallout documentation.
5591    
5592    
5593  BACKTRACKING CONTROL  BACKTRACKING CONTROL
5594    
5595         Perl  5.10 introduced a number of "Special Backtracking Control Verbs",         Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
5596         which are described in the Perl documentation as "experimental and sub-         which are described in the Perl documentation as "experimental and sub-
5597         ject  to  change or removal in a future version of Perl". It goes on to         ject to change or removal in a future version of Perl". It goes  on  to
5598         say: "Their usage in production code should be noted to avoid  problems         say:  "Their usage in production code should be noted to avoid problems
5599         during upgrades." The same remarks apply to the PCRE features described         during upgrades." The same remarks apply to the PCRE features described
5600         in this section.         in this section.
5601    
5602         Since these verbs are specifically related  to  backtracking,  most  of         Since  these  verbs  are  specifically related to backtracking, most of
5603         them  can  be  used  only  when  the  pattern  is  to  be matched using         them can be  used  only  when  the  pattern  is  to  be  matched  using
5604         pcre_exec(), which uses a backtracking algorithm. With the exception of         pcre_exec(), which uses a backtracking algorithm. With the exception of
5605         (*FAIL), which behaves like a failing negative assertion, they cause an         (*FAIL), which behaves like a failing negative assertion, they cause an
5606         error if encountered by pcre_dfa_exec().         error if encountered by pcre_dfa_exec().
5607    
5608         If any of these verbs are used in an assertion or subroutine subpattern         If  any of these verbs are used in an assertion or in a subpattern that
5609         (including  recursive  subpatterns),  their  effect is confined to that         is called as a subroutine (whether or not recursively), their effect is
5610         subpattern; it does not extend to the  surrounding  pattern,  with  one         confined to that subpattern; it does not extend to the surrounding pat-
5611         exception:  a  *MARK  that  is  encountered  in a positive assertion is         tern, with one exception: a *MARK that is  encountered  in  a  positive
5612         passed back (compare capturing parentheses in  assertions).  Note  that         assertion is passed back (compare capturing parentheses in assertions).
5613         such  subpatterns are processed as anchored at the point where they are         Note that such subpatterns are processed as anchored at the point where
5614         tested.         they are tested. Note also that Perl's treatment of subroutines is dif-
5615           ferent in some cases.
5616    
5617         The new verbs make use of what was previously invalid syntax: an  open-         The new verbs make use of what was previously invalid syntax: an  open-
5618         ing parenthesis followed by an asterisk. They are generally of the form         ing parenthesis followed by an asterisk. They are generally of the form
5619         (*VERB) or (*VERB:NAME). Some may take either form, with differing  be-         (*VERB) or (*VERB:NAME). Some may take either form, with differing  be-
5620         haviour, depending on whether or not an argument is present. An name is         haviour,  depending on whether or not an argument is present. A name is
5621         a sequence of letters, digits, and underscores. If the name  is  empty,         any sequence of characters that does not include a closing parenthesis.
5622         that  is, if the closing parenthesis immediately follows the colon, the         If  the  name is empty, that is, if the closing parenthesis immediately
5623         effect is as if the colon were not there. Any number of these verbs may         follows the colon, the effect is as if the colon were  not  there.  Any
5624         occur in a pattern.         number of these verbs may occur in a pattern.
5625    
5626         PCRE  contains some optimizations that are used to speed up matching by         PCRE  contains some optimizations that are used to speed up matching by
5627         running some checks at the start of each match attempt. For example, it         running some checks at the start of each match attempt. For example, it
# Line 5616  BACKTRACKING CONTROL Line 5640  BACKTRACKING CONTROL
5640            (*ACCEPT)            (*ACCEPT)
5641    
5642         This  verb causes the match to end successfully, skipping the remainder         This  verb causes the match to end successfully, skipping the remainder
5643         of the pattern. When inside a recursion, only the innermost pattern  is         of the pattern. However, when it is inside a subpattern that is  called
5644         ended  immediately.  If  (*ACCEPT) is inside capturing parentheses, the         as  a  subroutine, only that subpattern is ended successfully. Matching
5645         data so far is captured. (This feature was added  to  PCRE  at  release         then continues at the outer level. If  (*ACCEPT)  is  inside  capturing
5646         8.00.) For example:         parentheses, the data so far is captured. For example:
5647    
5648           A((?:A|B(*ACCEPT)|C)D)           A((?:A|B(*ACCEPT)|C)D)
5649    
# Line 5628  BACKTRACKING CONTROL Line 5652  BACKTRACKING CONTROL
5652    
5653           (*FAIL) or (*F)           (*FAIL) or (*F)
5654    
5655         This verb causes the match to fail, forcing backtracking to  occur.  It         This verb causes a matching failure, forcing backtracking to occur.  It
5656         is  equivalent to (?!) but easier to read. The Perl documentation notes         is  equivalent to (?!) but easier to read. The Perl documentation notes
5657         that it is probably useful only when combined  with  (?{})  or  (??{}).         that it is probably useful only when combined  with  (?{})  or  (??{}).
5658         Those  are,  of course, Perl features that are not present in PCRE. The         Those  are,  of course, Perl features that are not present in PCRE. The
# Line 5674  BACKTRACKING CONTROL Line 5698  BACKTRACKING CONTROL
5698    
5699         If (*MARK) is encountered in a positive assertion, its name is recorded         If (*MARK) is encountered in a positive assertion, its name is recorded
5700         and passed back if it is the last-encountered. This does not happen for         and passed back if it is the last-encountered. This does not happen for
5701         negative assetions.         negative assertions.
5702    
5703         A name may also be returned after a failed  match  if  the  final  path         A name may also be returned after a failed  match  if  the  final  path
5704         through  the  pattern involves (*MARK). However, unless (*MARK) used in         through  the  pattern involves (*MARK). However, unless (*MARK) used in
# Line 5795  BACKTRACKING CONTROL Line 5819  BACKTRACKING CONTROL
5819         is found, the "bumpalong" advance is to the subject position that  cor-         is found, the "bumpalong" advance is to the subject position that  cor-
5820         responds  to  that (*MARK) instead of to where (*SKIP) was encountered.         responds  to  that (*MARK) instead of to where (*SKIP) was encountered.
5821         If no (*MARK) with a matching name is found, normal "bumpalong" of  one         If no (*MARK) with a matching name is found, normal "bumpalong" of  one
5822         character happens (the (*SKIP) is ignored).         character happens (that is, the (*SKIP) is ignored).
5823    
5824           (*THEN) or (*THEN:NAME)           (*THEN) or (*THEN:NAME)
5825    
5826         This  verb  causes  a  skip  to  the  next alternation in the innermost         This  verb  causes a skip to the next innermost alternative if the rest
5827         enclosing group if the rest of the pattern does not match. That is,  it         of the pattern does not match. That is, it cancels  pending  backtrack-
5828         cancels  pending backtracking, but only within the current alternation.         ing,  but  only within the current alternative. Its name comes from the
5829         Its name comes from the observation that it can be used for a  pattern-         observation that it can be used for a pattern-based if-then-else block:
        based if-then-else block:  
5830    
5831           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
5832    
5833         If  the COND1 pattern matches, FOO is tried (and possibly further items         If the COND1 pattern matches, FOO is tried (and possibly further  items
5834         after the end of the group if FOO succeeds);  on  failure  the  matcher         after  the  end  of the group if FOO succeeds); on failure, the matcher
5835         skips  to  the second alternative and tries COND2, without backtracking         skips to the second alternative and tries COND2,  without  backtracking
5836         into COND1. The behaviour  of  (*THEN:NAME)  is  exactly  the  same  as         into  COND1.  The  behaviour  of  (*THEN:NAME)  is  exactly the same as
5837         (*MARK:NAME)(*THEN)  if  the  overall  match  fails.  If (*THEN) is not         (*MARK:NAME)(*THEN) if the overall  match  fails.  If  (*THEN)  is  not
5838         directly inside an alternation, it acts like (*PRUNE).         inside an alternation, it acts like (*PRUNE).
5839    
5840         The above verbs provide four different "strengths" of control when sub-         Note  that  a  subpattern that does not contain a | character is just a
5841         sequent  matching  fails. (*THEN) is the weakest, carrying on the match         part of the enclosing alternative; it is not a nested alternation  with
5842         at the next alternation. (*PRUNE) comes next, failing the match at  the         only  one alternative. The effect of (*THEN) extends beyond such a sub-
5843         current  starting position, but allowing an advance to the next charac-         pattern to the enclosing alternative. Consider this pattern,  where  A,
5844         ter (for an unanchored pattern). (*SKIP) is similar,  except  that  the         B, etc. are complex pattern fragments that do not contain any | charac-
5845         advance  may  be  more  than one character. (*COMMIT) is the strongest,         ters at this level:
5846    
5847             A (B(*THEN)C) | D
5848    
5849           If A and B are matched, but there is a failure in C, matching does  not
5850           backtrack into A; instead it moves to the next alternative, that is, D.
5851           However, if the subpattern containing (*THEN) is given an  alternative,
5852           it behaves differently:
5853    
5854             A (B(*THEN)C | (*FAIL)) | D
5855    
5856           The  effect of (*THEN) is now confined to the inner subpattern. After a
5857           failure in C, matching moves to (*FAIL), which causes the whole subpat-
5858           tern  to  fail  because  there are no more alternatives to try. In this
5859           case, matching does now backtrack into A.
5860    
5861           Note also that a conditional subpattern is not considered as having two
5862           alternatives,  because  only  one  is  ever used. In other words, the |
5863           character in a conditional subpattern has a different meaning. Ignoring
5864           white space, consider:
5865    
5866             ^.*? (?(?=a) a | b(*THEN)c )
5867    
5868           If  the  subject  is  "ba", this pattern does not match. Because .*? is
5869           ungreedy, it initially matches zero  characters.  The  condition  (?=a)
5870           then  fails,  the  character  "b"  is  matched, but "c" is not. At this
5871           point, matching does not backtrack to .*? as might perhaps be  expected
5872           from  the  presence  of  the | character. The conditional subpattern is
5873           part of the single alternative that comprises the whole pattern, and so
5874           the  match  fails.  (If  there was a backtrack into .*?, allowing it to
5875           match "b", the match would succeed.)
5876    
5877           The verbs just described provide four different "strengths" of  control
5878           when subsequent matching fails. (*THEN) is the weakest, carrying on the
5879           match at the next alternative. (*PRUNE) comes next, failing  the  match
5880           at  the  current starting position, but allowing an advance to the next
5881           character (for an unanchored pattern). (*SKIP) is similar, except  that
5882           the advance may be more than one character. (*COMMIT) is the strongest,
5883         causing the entire match to fail.         causing the entire match to fail.
5884    
5885         If more than one is present in a pattern, the "stongest" one wins.  For         If more than one such verb is present in a pattern, the "strongest" one
5886         example,  consider  this  pattern, where A, B, etc. are complex pattern         wins.  For example, consider this pattern, where A, B, etc. are complex
5887         fragments:         pattern fragments:
5888    
5889           (A(*COMMIT)B(*THEN)C|D)           (A(*COMMIT)B(*THEN)C|D)
5890    
5891         Once A has matched, PCRE is committed to this  match,  at  the  current         Once A has matched, PCRE is committed to this  match,  at  the  current
5892         starting  position. If subsequently B matches, but C does not, the nor-         starting  position. If subsequently B matches, but C does not, the nor-
5893         mal (*THEN) action of trying the next alternation (that is, D) does not         mal (*THEN) action of trying the next alternative (that is, D) does not
5894         happen because (*COMMIT) overrides.         happen because (*COMMIT) overrides.
5895    
5896    
# Line 5848  AUTHOR Line 5908  AUTHOR
5908    
5909  REVISION  REVISION
5910    
5911         Last updated: 24 August 2011         Last updated: 09 October 2011
5912         Copyright (c) 1997-2011 University of Cambridge.         Copyright (c) 1997-2011 University of Cambridge.
5913  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5914    
# Line 6410  AVAILABILITY OF JIT SUPPORT Line 6470  AVAILABILITY OF JIT SUPPORT
6470           ARM v5, v7, and Thumb2           ARM v5, v7, and Thumb2
6471           Intel x86 32-bit and 64-bit           Intel x86 32-bit and 64-bit
6472           MIPS 32-bit           MIPS 32-bit
6473           Power PC 32-bit and 64-bit           Power PC 32-bit and 64-bit (experimental)
6474    
6475         If --enable-jit is set on an unsupported platform, compilation fails.         The Power PC support is designated as experimental because it  has  not
6476           been  fully  tested. If --enable-jit is set on an unsupported platform,
6477           compilation fails.
6478    
6479         A program can tell if JIT support is available by calling pcre_config()         A program can tell if JIT support is available by calling pcre_config()
6480         with the PCRE_CONFIG_JIT option. The result is 1 when JIT is available,         with the PCRE_CONFIG_JIT option. The result is 1 when JIT is available,
# Line 6629  AUTHOR Line 6691  AUTHOR
6691    
6692  REVISION  REVISION
6693    
6694         Last updated: 23 September 2011         Last updated: 05 October 2011
6695         Copyright (c) 1997-2011 University of Cambridge.         Copyright (c) 1997-2011 University of Cambridge.
6696  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6697    

Legend:
Removed from v.708  
changed lines
  Added in v.733

  ViewVC Help
Powered by ViewVC 1.1.5