/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 453 by ph10, Fri Sep 18 19:12:35 2009 UTC revision 454 by ph10, Tue Sep 22 09:42:11 2009 UTC
# Line 1156  COMPILING A PATTERN Line 1156  COMPILING A PATTERN
1156         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
1157         sets the variable pointed to by errptr to point to a textual error mes-         sets the variable pointed to by errptr to point to a textual error mes-
1158         sage. This is a static string that is part of the library. You must not         sage. This is a static string that is part of the library. You must not
1159         try to free it. The offset from the start of the pattern to the charac-         try to free it. The byte offset from the start of the  pattern  to  the
1160         ter where the error was discovered is placed in the variable pointed to         character  that  was  being  processes when the error was discovered is
1161         by erroffset, which must not be NULL. If it is, an immediate  error  is         placed in the variable pointed to by erroffset, which must not be NULL.
1162         given.         If  it  is,  an  immediate error is given. Some errors are not detected
1163           until checks are carried out when the whole pattern has  been  scanned;
1164           in this case the offset is set to the end of the pattern.
1165    
1166         If  pcre_compile2()  is  used instead of pcre_compile(), and the error-         If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
1167         codeptr argument is not NULL, a non-zero error code number is  returned         codeptr argument is not NULL, a non-zero error code number is  returned
# Line 2666  AUTHOR Line 2668  AUTHOR
2668    
2669  REVISION  REVISION
2670    
2671         Last updated: 11 September 2009         Last updated: 22 September 2009
2672         Copyright (c) 1997-2009 University of Cambridge.         Copyright (c) 1997-2009 University of Cambridge.
2673  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2674    
# Line 4483  ASSERTIONS Line 4485  ASSERTIONS
4485    
4486         causes  an  error at compile time. Branches that match different length         causes  an  error at compile time. Branches that match different length
4487         strings are permitted only at the top level of a lookbehind  assertion.         strings are permitted only at the top level of a lookbehind  assertion.
4488         This  is  an  extension  compared  with  Perl (at least for 5.8), which         This  is an extension compared with Perl (5.8 and 5.10), which requires
4489         requires all branches to match the same length of string. An  assertion         all branches to match the same length of string. An assertion such as
        such as  
4490    
4491           (?<=ab(c|de))           (?<=ab(c|de))
4492    
4493         is  not  permitted,  because  its single top-level branch can match two         is not permitted, because its single top-level  branch  can  match  two
4494         different lengths, but it is acceptable if rewritten to  use  two  top-         different lengths, but it is acceptable to PCRE if rewritten to use two
4495         level branches:         top-level branches:
4496    
4497           (?<=abc|abde)           (?<=abc|abde)
4498    
4499         In some cases, the Perl 5.10 escape sequence \K (see above) can be used         In some cases, the Perl 5.10 escape sequence \K (see above) can be used
4500         instead of a lookbehind assertion; this is not restricted to  a  fixed-         instead  of  a  lookbehind  assertion  to  get  round  the fixed-length
4501         length.         restriction.
4502    
4503         The  implementation  of lookbehind assertions is, for each alternative,         The implementation of lookbehind assertions is, for  each  alternative,
4504         to temporarily move the current position back by the fixed  length  and         to  temporarily  move the current position back by the fixed length and
4505         then try to match. If there are insufficient characters before the cur-         then try to match. If there are insufficient characters before the cur-
4506         rent position, the assertion fails.         rent position, the assertion fails.
4507    
4508         PCRE does not allow the \C escape (which matches a single byte in UTF-8         PCRE does not allow the \C escape (which matches a single byte in UTF-8
4509         mode)  to appear in lookbehind assertions, because it makes it impossi-         mode) to appear in lookbehind assertions, because it makes it  impossi-
4510         ble to calculate the length of the lookbehind. The \X and  \R  escapes,         ble  to  calculate the length of the lookbehind. The \X and \R escapes,
4511         which can match different numbers of bytes, are also not permitted.         which can match different numbers of bytes, are also not permitted.
4512    
4513         Possessive  quantifiers  can  be  used  in  conjunction with lookbehind         "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
4514         assertions to specify efficient matching at  the  end  of  the  subject         lookbehinds,  as  long as the subpattern matches a fixed-length string.
4515           Recursion, however, is not supported.
4516    
4517           Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
4518           assertions  to  specify  efficient  matching  at the end of the subject
4519         string. Consider a simple pattern such as         string. Consider a simple pattern such as
4520    
4521           abcd$           abcd$
4522    
4523         when  applied  to  a  long string that does not match. Because matching         when applied to a long string that does  not  match.  Because  matching
4524         proceeds from left to right, PCRE will look for each "a" in the subject         proceeds from left to right, PCRE will look for each "a" in the subject
4525         and  then  see  if what follows matches the rest of the pattern. If the         and then see if what follows matches the rest of the  pattern.  If  the
4526         pattern is specified as         pattern is specified as
4527    
4528           ^.*abcd$           ^.*abcd$
4529    
4530         the initial .* matches the entire string at first, but when this  fails         the  initial .* matches the entire string at first, but when this fails
4531         (because there is no following "a"), it backtracks to match all but the         (because there is no following "a"), it backtracks to match all but the
4532         last character, then all but the last two characters, and so  on.  Once         last  character,  then all but the last two characters, and so on. Once
4533         again  the search for "a" covers the entire string, from right to left,         again the search for "a" covers the entire string, from right to  left,
4534         so we are no better off. However, if the pattern is written as         so we are no better off. However, if the pattern is written as
4535    
4536           ^.*+(?<=abcd)           ^.*+(?<=abcd)
4537    
4538         there can be no backtracking for the .*+ item; it can  match  only  the         there  can  be  no backtracking for the .*+ item; it can match only the
4539         entire  string.  The subsequent lookbehind assertion does a single test         entire string. The subsequent lookbehind assertion does a  single  test
4540         on the last four characters. If it fails, the match fails  immediately.         on  the last four characters. If it fails, the match fails immediately.
4541         For  long  strings, this approach makes a significant difference to the         For long strings, this approach makes a significant difference  to  the
4542         processing time.         processing time.
4543    
4544     Using multiple assertions     Using multiple assertions
# Line 4542  ASSERTIONS Line 4547  ASSERTIONS
4547    
4548           (?<=\d{3})(?<!999)foo           (?<=\d{3})(?<!999)foo
4549    
4550         matches "foo" preceded by three digits that are not "999". Notice  that         matches  "foo" preceded by three digits that are not "999". Notice that
4551         each  of  the  assertions is applied independently at the same point in         each of the assertions is applied independently at the  same  point  in
4552         the subject string. First there is a  check  that  the  previous  three         the  subject  string.  First  there  is a check that the previous three
4553         characters  are  all  digits,  and  then there is a check that the same         characters are all digits, and then there is  a  check  that  the  same
4554         three characters are not "999".  This pattern does not match "foo" pre-         three characters are not "999".  This pattern does not match "foo" pre-
4555         ceded  by  six  characters,  the first of which are digits and the last         ceded by six characters, the first of which are  digits  and  the  last
4556         three of which are not "999". For example, it  doesn't  match  "123abc-         three  of  which  are not "999". For example, it doesn't match "123abc-
4557         foo". A pattern to do that is         foo". A pattern to do that is
4558    
4559           (?<=\d{3}...)(?<!999)foo           (?<=\d{3}...)(?<!999)foo
4560    
4561         This  time  the  first assertion looks at the preceding six characters,         This time the first assertion looks at the  preceding  six  characters,
4562         checking that the first three are digits, and then the second assertion         checking that the first three are digits, and then the second assertion
4563         checks that the preceding three characters are not "999".         checks that the preceding three characters are not "999".
4564    
# Line 4561  ASSERTIONS Line 4566  ASSERTIONS
4566    
4567           (?<=(?<!foo)bar)baz           (?<=(?<!foo)bar)baz
4568    
4569         matches  an occurrence of "baz" that is preceded by "bar" which in turn         matches an occurrence of "baz" that is preceded by "bar" which in  turn
4570         is not preceded by "foo", while         is not preceded by "foo", while
4571    
4572           (?<=\d{3}(?!999)...)foo           (?<=\d{3}(?!999)...)foo
4573    
4574         is another pattern that matches "foo" preceded by three digits and  any         is  another pattern that matches "foo" preceded by three digits and any
4575         three characters that are not "999".         three characters that are not "999".
4576    
4577    
4578  CONDITIONAL SUBPATTERNS  CONDITIONAL SUBPATTERNS
4579    
4580         It  is possible to cause the matching process to obey a subpattern con-         It is possible to cause the matching process to obey a subpattern  con-
4581         ditionally or to choose between two alternative subpatterns,  depending         ditionally  or to choose between two alternative subpatterns, depending
4582         on  the result of an assertion, or whether a previous capturing subpat-         on the result of an assertion, or whether a previous capturing  subpat-
4583         tern matched or not. The two possible forms of  conditional  subpattern         tern  matched  or not. The two possible forms of conditional subpattern
4584         are         are
4585    
4586           (?(condition)yes-pattern)           (?(condition)yes-pattern)
4587           (?(condition)yes-pattern|no-pattern)           (?(condition)yes-pattern|no-pattern)
4588    
4589         If  the  condition is satisfied, the yes-pattern is used; otherwise the         If the condition is satisfied, the yes-pattern is used;  otherwise  the
4590         no-pattern (if present) is used. If there are more  than  two  alterna-         no-pattern  (if  present)  is used. If there are more than two alterna-
4591         tives in the subpattern, a compile-time error occurs.         tives in the subpattern, a compile-time error occurs.
4592    
4593         There  are  four  kinds of condition: references to subpatterns, refer-         There are four kinds of condition: references  to  subpatterns,  refer-
4594         ences to recursion, a pseudo-condition called DEFINE, and assertions.         ences to recursion, a pseudo-condition called DEFINE, and assertions.
4595    
4596     Checking for a used subpattern by number     Checking for a used subpattern by number
4597    
4598         If the text between the parentheses consists of a sequence  of  digits,         If  the  text between the parentheses consists of a sequence of digits,
4599         the  condition  is  true if the capturing subpattern of that number has         the condition is true if the capturing subpattern of  that  number  has
4600         previously matched. An alternative notation is to  precede  the  digits         previously  matched.  An  alternative notation is to precede the digits
4601         with a plus or minus sign. In this case, the subpattern number is rela-         with a plus or minus sign. In this case, the subpattern number is rela-
4602         tive rather than absolute.  The most recently opened parentheses can be         tive rather than absolute.  The most recently opened parentheses can be
4603         referenced  by  (?(-1),  the  next most recent by (?(-2), and so on. In         referenced by (?(-1), the next most recent by (?(-2),  and  so  on.  In
4604         looping constructs it can also make sense to refer to subsequent groups         looping constructs it can also make sense to refer to subsequent groups
4605         with constructs such as (?(+2).         with constructs such as (?(+2).
4606    
4607         Consider  the  following  pattern, which contains non-significant white         Consider the following pattern, which  contains  non-significant  white
4608         space to make it more readable (assume the PCRE_EXTENDED option) and to         space to make it more readable (assume the PCRE_EXTENDED option) and to
4609         divide it into three parts for ease of discussion:         divide it into three parts for ease of discussion:
4610    
4611           ( \( )?    [^()]+    (?(1) \) )           ( \( )?    [^()]+    (?(1) \) )
4612    
4613         The  first  part  matches  an optional opening parenthesis, and if that         The first part matches an optional opening  parenthesis,  and  if  that
4614         character is present, sets it as the first captured substring. The sec-         character is present, sets it as the first captured substring. The sec-
4615         ond  part  matches one or more characters that are not parentheses. The         ond part matches one or more characters that are not  parentheses.  The
4616         third part is a conditional subpattern that tests whether the first set         third part is a conditional subpattern that tests whether the first set
4617         of parentheses matched or not. If they did, that is, if subject started         of parentheses matched or not. If they did, that is, if subject started
4618         with an opening parenthesis, the condition is true, and so the yes-pat-         with an opening parenthesis, the condition is true, and so the yes-pat-
4619         tern  is  executed  and  a  closing parenthesis is required. Otherwise,         tern is executed and a  closing  parenthesis  is  required.  Otherwise,
4620         since no-pattern is not present, the  subpattern  matches  nothing.  In         since  no-pattern  is  not  present, the subpattern matches nothing. In
4621         other  words,  this  pattern  matches  a  sequence  of non-parentheses,         other words,  this  pattern  matches  a  sequence  of  non-parentheses,
4622         optionally enclosed in parentheses.         optionally enclosed in parentheses.
4623    
4624         If you were embedding this pattern in a larger one,  you  could  use  a         If  you  were  embedding  this pattern in a larger one, you could use a
4625         relative reference:         relative reference:
4626    
4627           ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...           ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
4628    
4629         This  makes  the  fragment independent of the parentheses in the larger         This makes the fragment independent of the parentheses  in  the  larger
4630         pattern.         pattern.
4631    
4632     Checking for a used subpattern by name     Checking for a used subpattern by name
4633    
4634         Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a         Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
4635         used  subpattern  by  name.  For compatibility with earlier versions of         used subpattern by name. For compatibility  with  earlier  versions  of
4636         PCRE, which had this facility before Perl, the syntax  (?(name)...)  is         PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
4637         also  recognized. However, there is a possible ambiguity with this syn-         also recognized. However, there is a possible ambiguity with this  syn-
4638         tax, because subpattern names may  consist  entirely  of  digits.  PCRE         tax,  because  subpattern  names  may  consist entirely of digits. PCRE
4639         looks  first for a named subpattern; if it cannot find one and the name         looks first for a named subpattern; if it cannot find one and the  name
4640         consists entirely of digits, PCRE looks for a subpattern of  that  num-         consists  entirely  of digits, PCRE looks for a subpattern of that num-
4641         ber,  which must be greater than zero. Using subpattern names that con-         ber, which must be greater than zero. Using subpattern names that  con-
4642         sist entirely of digits is not recommended.         sist entirely of digits is not recommended.
4643    
4644         Rewriting the above example to use a named subpattern gives this:         Rewriting the above example to use a named subpattern gives this:
# Line 4644  CONDITIONAL SUBPATTERNS Line 4649  CONDITIONAL SUBPATTERNS
4649     Checking for pattern recursion     Checking for pattern recursion
4650    
4651         If the condition is the string (R), and there is no subpattern with the         If the condition is the string (R), and there is no subpattern with the
4652         name  R, the condition is true if a recursive call to the whole pattern         name R, the condition is true if a recursive call to the whole  pattern
4653         or any subpattern has been made. If digits or a name preceded by amper-         or any subpattern has been made. If digits or a name preceded by amper-
4654         sand follow the letter R, for example:         sand follow the letter R, for example:
4655    
4656           (?(R3)...) or (?(R&name)...)           (?(R3)...) or (?(R&name)...)
4657    
4658         the  condition is true if the most recent recursion is into the subpat-         the condition is true if the most recent recursion is into the  subpat-
4659         tern whose number or name is given. This condition does not  check  the         tern  whose  number or name is given. This condition does not check the
4660         entire recursion stack.         entire recursion stack.
4661    
4662         At  "top  level", all these recursion test conditions are false. Recur-         At "top level", all these recursion test conditions are false.   Recur-
4663         sive patterns are described below.         sive patterns are described below.
4664    
4665     Defining subpatterns for use by reference only     Defining subpatterns for use by reference only
4666    
4667         If the condition is the string (DEFINE), and  there  is  no  subpattern         If  the  condition  is  the string (DEFINE), and there is no subpattern
4668         with  the  name  DEFINE,  the  condition is always false. In this case,         with the name DEFINE, the condition is  always  false.  In  this  case,
4669         there may be only one alternative  in  the  subpattern.  It  is  always         there  may  be  only  one  alternative  in the subpattern. It is always
4670         skipped  if  control  reaches  this  point  in the pattern; the idea of         skipped if control reaches this point  in  the  pattern;  the  idea  of
4671         DEFINE is that it can be used to define "subroutines" that can be  ref-         DEFINE  is that it can be used to define "subroutines" that can be ref-
4672         erenced  from elsewhere. (The use of "subroutines" is described below.)         erenced from elsewhere. (The use of "subroutines" is described  below.)
4673         For example, a pattern to match an IPv4 address could be  written  like         For  example,  a pattern to match an IPv4 address could be written like
4674         this (ignore whitespace and line breaks):         this (ignore whitespace and line breaks):
4675    
4676           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
4677           \b (?&byte) (\.(?&byte)){3} \b           \b (?&byte) (\.(?&byte)){3} \b
4678    
4679         The  first part of the pattern is a DEFINE group inside which a another         The first part of the pattern is a DEFINE group inside which a  another
4680         group named "byte" is defined. This matches an individual component  of         group  named "byte" is defined. This matches an individual component of
4681         an  IPv4  address  (a number less than 256). When matching takes place,         an IPv4 address (a number less than 256). When  matching  takes  place,
4682         this part of the pattern is skipped because DEFINE acts  like  a  false         this  part  of  the pattern is skipped because DEFINE acts like a false
4683         condition.         condition.
4684    
4685         The rest of the pattern uses references to the named group to match the         The rest of the pattern uses references to the named group to match the
4686         four dot-separated components of an IPv4 address, insisting on  a  word         four  dot-separated  components of an IPv4 address, insisting on a word
4687         boundary at each end.         boundary at each end.
4688    
4689     Assertion conditions     Assertion conditions
4690    
4691         If  the  condition  is  not  in any of the above formats, it must be an         If the condition is not in any of the above  formats,  it  must  be  an
4692         assertion.  This may be a positive or negative lookahead or  lookbehind         assertion.   This may be a positive or negative lookahead or lookbehind
4693         assertion.  Consider  this  pattern,  again  containing non-significant         assertion. Consider  this  pattern,  again  containing  non-significant
4694         white space, and with the two alternatives on the second line:         white space, and with the two alternatives on the second line:
4695    
4696           (?(?=[^a-z]*[a-z])           (?(?=[^a-z]*[a-z])
4697           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
4698    
4699         The condition  is  a  positive  lookahead  assertion  that  matches  an         The  condition  is  a  positive  lookahead  assertion  that  matches an
4700         optional  sequence of non-letters followed by a letter. In other words,         optional sequence of non-letters followed by a letter. In other  words,
4701         it tests for the presence of at least one letter in the subject.  If  a         it  tests  for the presence of at least one letter in the subject. If a
4702         letter  is found, the subject is matched against the first alternative;         letter is found, the subject is matched against the first  alternative;
4703         otherwise it is  matched  against  the  second.  This  pattern  matches         otherwise  it  is  matched  against  the  second.  This pattern matches
4704         strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are         strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
4705         letters and dd are digits.         letters and dd are digits.
4706    
4707    
4708  COMMENTS  COMMENTS
4709    
4710         The sequence (?# marks the start of a comment that continues up to  the         The  sequence (?# marks the start of a comment that continues up to the
4711         next  closing  parenthesis.  Nested  parentheses are not permitted. The         next closing parenthesis. Nested parentheses  are  not  permitted.  The
4712         characters that make up a comment play no part in the pattern  matching         characters  that make up a comment play no part in the pattern matching
4713         at all.         at all.
4714    
4715         If  the PCRE_EXTENDED option is set, an unescaped # character outside a         If the PCRE_EXTENDED option is set, an unescaped # character outside  a
4716         character class introduces a  comment  that  continues  to  immediately         character  class  introduces  a  comment  that continues to immediately
4717         after the next newline in the pattern.         after the next newline in the pattern.
4718    
4719    
4720  RECURSIVE PATTERNS  RECURSIVE PATTERNS
4721    
4722         Consider  the problem of matching a string in parentheses, allowing for         Consider the problem of matching a string in parentheses, allowing  for
4723         unlimited nested parentheses. Without the use of  recursion,  the  best         unlimited  nested  parentheses.  Without the use of recursion, the best
4724         that  can  be  done  is  to use a pattern that matches up to some fixed         that can be done is to use a pattern that  matches  up  to  some  fixed
4725         depth of nesting. It is not possible to  handle  an  arbitrary  nesting         depth  of  nesting.  It  is not possible to handle an arbitrary nesting
4726         depth.         depth.
4727    
4728         For some time, Perl has provided a facility that allows regular expres-         For some time, Perl has provided a facility that allows regular expres-
4729         sions to recurse (amongst other things). It does this by  interpolating         sions  to recurse (amongst other things). It does this by interpolating
4730         Perl  code in the expression at run time, and the code can refer to the         Perl code in the expression at run time, and the code can refer to  the
4731         expression itself. A Perl pattern using code interpolation to solve the         expression itself. A Perl pattern using code interpolation to solve the
4732         parentheses problem can be created like this:         parentheses problem can be created like this:
4733    
# Line 4732  RECURSIVE PATTERNS Line 4737  RECURSIVE PATTERNS
4737         refers recursively to the pattern in which it appears.         refers recursively to the pattern in which it appears.
4738    
4739         Obviously, PCRE cannot support the interpolation of Perl code. Instead,         Obviously, PCRE cannot support the interpolation of Perl code. Instead,
4740         it  supports  special  syntax  for recursion of the entire pattern, and         it supports special syntax for recursion of  the  entire  pattern,  and
4741         also for individual subpattern recursion.  After  its  introduction  in         also  for  individual  subpattern  recursion. After its introduction in
4742         PCRE  and  Python,  this  kind of recursion was subsequently introduced         PCRE and Python, this kind of  recursion  was  subsequently  introduced
4743         into Perl at release 5.10.         into Perl at release 5.10.
4744    
4745         A special item that consists of (? followed by a  number  greater  than         A  special  item  that consists of (? followed by a number greater than
4746         zero and a closing parenthesis is a recursive call of the subpattern of         zero and a closing parenthesis is a recursive call of the subpattern of
4747         the given number, provided that it occurs inside that  subpattern.  (If         the  given  number, provided that it occurs inside that subpattern. (If
4748         not,  it  is  a  "subroutine" call, which is described in the next sec-         not, it is a "subroutine" call, which is described  in  the  next  sec-
4749         tion.) The special item (?R) or (?0) is a recursive call of the  entire         tion.)  The special item (?R) or (?0) is a recursive call of the entire
4750         regular expression.         regular expression.
4751    
4752         This  PCRE  pattern  solves  the nested parentheses problem (assume the         This PCRE pattern solves the nested  parentheses  problem  (assume  the
4753         PCRE_EXTENDED option is set so that white space is ignored):         PCRE_EXTENDED option is set so that white space is ignored):
4754    
4755           \( ( (?>[^()]+) | (?R) )* \)           \( ( (?>[^()]+) | (?R) )* \)
4756    
4757         First it matches an opening parenthesis. Then it matches any number  of         First  it matches an opening parenthesis. Then it matches any number of
4758         substrings  which  can  either  be  a sequence of non-parentheses, or a         substrings which can either be a  sequence  of  non-parentheses,  or  a
4759         recursive match of the pattern itself (that is, a  correctly  parenthe-         recursive  match  of the pattern itself (that is, a correctly parenthe-
4760         sized substring).  Finally there is a closing parenthesis.         sized substring).  Finally there is a closing parenthesis.
4761    
4762         If  this  were  part of a larger pattern, you would not want to recurse         If this were part of a larger pattern, you would not  want  to  recurse
4763         the entire pattern, so instead you could use this:         the entire pattern, so instead you could use this:
4764    
4765           ( \( ( (?>[^()]+) | (?1) )* \) )           ( \( ( (?>[^()]+) | (?1) )* \) )
4766    
4767         We have put the pattern into parentheses, and caused the  recursion  to         We  have  put the pattern into parentheses, and caused the recursion to
4768         refer to them instead of the whole pattern.         refer to them instead of the whole pattern.
4769    
4770         In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be         In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
4771         tricky. This is made easier by the use of relative references. (A  Perl         tricky.  This is made easier by the use of relative references. (A Perl
4772         5.10  feature.)   Instead  of  (?1)  in the pattern above you can write         5.10 feature.)  Instead of (?1) in the  pattern  above  you  can  write
4773         (?-2) to refer to the second most recently opened parentheses preceding         (?-2) to refer to the second most recently opened parentheses preceding
4774         the  recursion.  In  other  words,  a  negative number counts capturing         the recursion. In other  words,  a  negative  number  counts  capturing
4775         parentheses leftwards from the point at which it is encountered.         parentheses leftwards from the point at which it is encountered.
4776    
4777         It is also possible to refer to  subsequently  opened  parentheses,  by         It  is  also  possible  to refer to subsequently opened parentheses, by
4778         writing  references  such  as (?+2). However, these cannot be recursive         writing references such as (?+2). However, these  cannot  be  recursive
4779         because the reference is not inside the  parentheses  that  are  refer-         because  the  reference  is  not inside the parentheses that are refer-
4780         enced.  They  are  always  "subroutine" calls, as described in the next         enced. They are always "subroutine" calls, as  described  in  the  next
4781         section.         section.
4782    
4783         An alternative approach is to use named parentheses instead.  The  Perl         An  alternative  approach is to use named parentheses instead. The Perl
4784         syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also         syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
4785         supported. We could rewrite the above example as follows:         supported. We could rewrite the above example as follows:
4786    
4787           (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )           (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
4788    
4789         If there is more than one subpattern with the same name,  the  earliest         If  there  is more than one subpattern with the same name, the earliest
4790         one is used.         one is used.
4791    
4792         This  particular  example pattern that we have been looking at contains         This particular example pattern that we have been looking  at  contains
4793         nested unlimited repeats, and so the use of atomic grouping for  match-         nested  unlimited repeats, and so the use of atomic grouping for match-
4794         ing  strings  of non-parentheses is important when applying the pattern         ing strings of non-parentheses is important when applying  the  pattern
4795         to strings that do not match. For example, when this pattern is applied         to strings that do not match. For example, when this pattern is applied
4796         to         to
4797    
4798           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
4799    
4800         it  yields "no match" quickly. However, if atomic grouping is not used,         it yields "no match" quickly. However, if atomic grouping is not  used,
4801         the match runs for a very long time indeed because there  are  so  many         the  match  runs  for a very long time indeed because there are so many
4802         different  ways  the  + and * repeats can carve up the subject, and all         different ways the + and * repeats can carve up the  subject,  and  all
4803         have to be tested before failure can be reported.         have to be tested before failure can be reported.
4804    
4805         At the end of a match, the values set for any capturing subpatterns are         At the end of a match, the values set for any capturing subpatterns are
4806         those from the outermost level of the recursion at which the subpattern         those from the outermost level of the recursion at which the subpattern
4807         value is set.  If you want to obtain  intermediate  values,  a  callout         value  is  set.   If  you want to obtain intermediate values, a callout
4808         function  can be used (see below and the pcrecallout documentation). If         function can be used (see below and the pcrecallout documentation).  If
4809         the pattern above is matched against         the pattern above is matched against
4810    
4811           (ab(cd)ef)           (ab(cd)ef)
4812    
4813         the value for the capturing parentheses is  "ef",  which  is  the  last         the  value  for  the  capturing  parentheses is "ef", which is the last
4814         value  taken  on at the top level. If additional parentheses are added,         value taken on at the top level. If additional parentheses  are  added,
4815         giving         giving
4816    
4817           \( ( ( (?>[^()]+) | (?R) )* ) \)           \( ( ( (?>[^()]+) | (?R) )* ) \)
4818              ^                        ^              ^                        ^
4819              ^                        ^              ^                        ^
4820    
4821         the string they capture is "ab(cd)ef", the contents of  the  top  level         the  string  they  capture is "ab(cd)ef", the contents of the top level
4822         parentheses.  If there are more than 15 capturing parentheses in a pat-         parentheses. If there are more than 15 capturing parentheses in a  pat-
4823         tern, PCRE has to obtain extra memory to store data during a recursion,         tern, PCRE has to obtain extra memory to store data during a recursion,
4824         which  it  does  by  using pcre_malloc, freeing it via pcre_free after-         which it does by using pcre_malloc, freeing  it  via  pcre_free  after-
4825         wards. If  no  memory  can  be  obtained,  the  match  fails  with  the         wards.  If  no  memory  can  be  obtained,  the  match  fails  with the
4826         PCRE_ERROR_NOMEMORY error.         PCRE_ERROR_NOMEMORY error.
4827    
4828         Do  not  confuse  the (?R) item with the condition (R), which tests for         Do not confuse the (?R) item with the condition (R),  which  tests  for
4829         recursion.  Consider this pattern, which matches text in  angle  brack-         recursion.   Consider  this pattern, which matches text in angle brack-
4830         ets,  allowing for arbitrary nesting. Only digits are allowed in nested         ets, allowing for arbitrary nesting. Only digits are allowed in  nested
4831         brackets (that is, when recursing), whereas any characters are  permit-         brackets  (that is, when recursing), whereas any characters are permit-
4832         ted at the outer level.         ted at the outer level.
4833    
4834           < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >           < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
4835    
4836         In  this  pattern, (?(R) is the start of a conditional subpattern, with         In this pattern, (?(R) is the start of a conditional  subpattern,  with
4837         two different alternatives for the recursive and  non-recursive  cases.         two  different  alternatives for the recursive and non-recursive cases.
4838         The (?R) item is the actual recursive call.         The (?R) item is the actual recursive call.
4839    
4840     Recursion difference from Perl     Recursion difference from Perl
4841    
4842         In  PCRE (like Python, but unlike Perl), a recursive subpattern call is         In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
4843         always treated as an atomic group. That is, once it has matched some of         always treated as an atomic group. That is, once it has matched some of
4844         the subject string, it is never re-entered, even if it contains untried         the subject string, it is never re-entered, even if it contains untried
4845         alternatives and there is a subsequent matching failure.  This  can  be         alternatives  and  there  is a subsequent matching failure. This can be
4846         illustrated  by the following pattern, which purports to match a palin-         illustrated by the following pattern, which purports to match a  palin-
4847         dromic string that contains an odd number of characters  (for  example,         dromic  string  that contains an odd number of characters (for example,
4848         "a", "aba", "abcba", "abcdcba"):         "a", "aba", "abcba", "abcdcba"):
4849    
4850           ^(.|(.)(?1)\2)$           ^(.|(.)(?1)\2)$
4851    
4852         The idea is that it either matches a single character, or two identical         The idea is that it either matches a single character, or two identical
4853         characters surrounding a sub-palindrome. In Perl, this  pattern  works;         characters  surrounding  a sub-palindrome. In Perl, this pattern works;
4854         in  PCRE  it  does  not if the pattern is longer than three characters.         in PCRE it does not if the pattern is  longer  than  three  characters.
4855         Consider the subject string "abcba":         Consider the subject string "abcba":
4856    
4857         At the top level, the first character is matched, but as it is  not  at         At  the  top level, the first character is matched, but as it is not at
4858         the end of the string, the first alternative fails; the second alterna-         the end of the string, the first alternative fails; the second alterna-
4859         tive is taken and the recursion kicks in. The recursive call to subpat-         tive is taken and the recursion kicks in. The recursive call to subpat-
4860         tern  1  successfully  matches the next character ("b"). (Note that the         tern 1 successfully matches the next character ("b").  (Note  that  the
4861         beginning and end of line tests are not part of the recursion).         beginning and end of line tests are not part of the recursion).
4862    
4863         Back at the top level, the next character ("c") is compared  with  what         Back  at  the top level, the next character ("c") is compared with what
4864         subpattern  2 matched, which was "a". This fails. Because the recursion         subpattern 2 matched, which was "a". This fails. Because the  recursion
4865         is treated as an atomic group, there are now  no  backtracking  points,         is  treated  as  an atomic group, there are now no backtracking points,
4866         and  so  the  entire  match fails. (Perl is able, at this point, to re-         and so the entire match fails. (Perl is able, at  this  point,  to  re-
4867         enter the recursion and try the second alternative.)  However,  if  the         enter  the  recursion  and try the second alternative.) However, if the
4868         pattern is written with the alternatives in the other order, things are         pattern is written with the alternatives in the other order, things are
4869         different:         different:
4870    
4871           ^((.)(?1)\2|.)$           ^((.)(?1)\2|.)$
4872    
4873         This time, the recursing alternative is tried first, and  continues  to         This  time,  the recursing alternative is tried first, and continues to
4874         recurse  until  it runs out of characters, at which point the recursion         recurse until it runs out of characters, at which point  the  recursion
4875         fails. But this time we do have  another  alternative  to  try  at  the         fails.  But  this  time  we  do  have another alternative to try at the
4876         higher  level.  That  is  the  big difference: in the previous case the         higher level. That is the big difference:  in  the  previous  case  the
4877         remaining alternative is at a deeper recursion level, which PCRE cannot         remaining alternative is at a deeper recursion level, which PCRE cannot
4878         use.         use.
4879    
4880         To change the pattern so that matches all palindromic strings, not just         To change the pattern so that matches all palindromic strings, not just
4881         those with an odd number of characters, it is tempting  to  change  the         those  with  an  odd number of characters, it is tempting to change the
4882         pattern to this:         pattern to this:
4883    
4884           ^((.)(?1)\2|.?)$           ^((.)(?1)\2|.?)$
4885    
4886         Again,  this  works  in Perl, but not in PCRE, and for the same reason.         Again, this works in Perl, but not in PCRE, and for  the  same  reason.
4887         When a deeper recursion has matched a single character,  it  cannot  be         When  a  deeper  recursion has matched a single character, it cannot be
4888         entered  again  in  order  to match an empty string. The solution is to         entered again in order to match an empty string.  The  solution  is  to
4889         separate the two cases, and write out the odd and even cases as  alter-         separate  the two cases, and write out the odd and even cases as alter-
4890         natives at the higher level:         natives at the higher level:
4891    
4892           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
4893    
4894         If  you  want  to match typical palindromic phrases, the pattern has to         If you want to match typical palindromic phrases, the  pattern  has  to
4895         ignore all non-word characters, which can be done like this:         ignore all non-word characters, which can be done like this:
4896    
4897           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+4|\W*+.\W*+))\W*+$           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+4|\W*+.\W*+))\W*+$
4898    
4899         If run with the PCRE_CASELESS option, this pattern matches phrases such         If run with the PCRE_CASELESS option, this pattern matches phrases such
4900         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
4901         Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-         Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-
4902         ing  into  sequences of non-word characters. Without this, PCRE takes a         ing into sequences of non-word characters. Without this, PCRE  takes  a
4903         great deal longer (ten times or more) to  match  typical  phrases,  and         great  deal  longer  (ten  times or more) to match typical phrases, and
4904         Perl takes so long that you think it has gone into a loop.         Perl takes so long that you think it has gone into a loop.
4905    
4906    
4907  SUBPATTERNS AS SUBROUTINES  SUBPATTERNS AS SUBROUTINES
4908    
4909         If the syntax for a recursive subpattern reference (either by number or         If the syntax for a recursive subpattern reference (either by number or
4910         by name) is used outside the parentheses to which it refers,  it  oper-         by  name)  is used outside the parentheses to which it refers, it oper-
4911         ates  like a subroutine in a programming language. The "called" subpat-         ates like a subroutine in a programming language. The "called"  subpat-
4912         tern may be defined before or after the reference. A numbered reference         tern may be defined before or after the reference. A numbered reference
4913         can be absolute or relative, as in these examples:         can be absolute or relative, as in these examples:
4914    
# Line 4915  SUBPATTERNS AS SUBROUTINES Line 4920  SUBPATTERNS AS SUBROUTINES
4920    
4921           (sens|respons)e and \1ibility           (sens|respons)e and \1ibility
4922    
4923         matches  "sense and sensibility" and "response and responsibility", but         matches "sense and sensibility" and "response and responsibility",  but
4924         not "sense and responsibility". If instead the pattern         not "sense and responsibility". If instead the pattern
4925    
4926           (sens|respons)e and (?1)ibility           (sens|respons)e and (?1)ibility
4927    
4928         is used, it does match "sense and responsibility" as well as the  other         is  used, it does match "sense and responsibility" as well as the other
4929         two  strings.  Another  example  is  given  in the discussion of DEFINE         two strings. Another example is  given  in  the  discussion  of  DEFINE
4930         above.         above.
4931    
4932         Like recursive subpatterns, a "subroutine" call is always treated as an         Like recursive subpatterns, a "subroutine" call is always treated as an
4933         atomic  group. That is, once it has matched some of the subject string,         atomic group. That is, once it has matched some of the subject  string,
4934         it is never re-entered, even if it contains  untried  alternatives  and         it  is  never  re-entered, even if it contains untried alternatives and
4935         there is a subsequent matching failure.         there is a subsequent matching failure.
4936    
4937         When  a  subpattern is used as a subroutine, processing options such as         When a subpattern is used as a subroutine, processing options  such  as
4938         case-independence are fixed when the subpattern is defined. They cannot         case-independence are fixed when the subpattern is defined. They cannot
4939         be changed for different calls. For example, consider this pattern:         be changed for different calls. For example, consider this pattern:
4940    
4941           (abc)(?i:(?-1))           (abc)(?i:(?-1))
4942    
4943         It  matches  "abcabc". It does not match "abcABC" because the change of         It matches "abcabc". It does not match "abcABC" because the  change  of
4944         processing option does not affect the called subpattern.         processing option does not affect the called subpattern.
4945    
4946    
4947  ONIGURUMA SUBROUTINE SYNTAX  ONIGURUMA SUBROUTINE SYNTAX
4948    
4949         For compatibility with Oniguruma, the non-Perl syntax \g followed by  a         For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
4950         name or a number enclosed either in angle brackets or single quotes, is         name or a number enclosed either in angle brackets or single quotes, is
4951         an alternative syntax for referencing a  subpattern  as  a  subroutine,         an  alternative  syntax  for  referencing a subpattern as a subroutine,
4952         possibly  recursively. Here are two of the examples used above, rewrit-         possibly recursively. Here are two of the examples used above,  rewrit-
4953         ten using this syntax:         ten using this syntax:
4954    
4955           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
4956           (sens|respons)e and \g'1'ibility           (sens|respons)e and \g'1'ibility
4957    
4958         PCRE supports an extension to Oniguruma: if a number is preceded  by  a         PCRE  supports  an extension to Oniguruma: if a number is preceded by a
4959         plus or a minus sign it is taken as a relative reference. For example:         plus or a minus sign it is taken as a relative reference. For example:
4960    
4961           (abc)(?i:\g<-1>)           (abc)(?i:\g<-1>)
4962    
4963         Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not         Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not
4964         synonymous. The former is a back reference; the latter is a  subroutine         synonymous.  The former is a back reference; the latter is a subroutine
4965         call.         call.
4966    
4967    
4968  CALLOUTS  CALLOUTS
4969    
4970         Perl has a feature whereby using the sequence (?{...}) causes arbitrary         Perl has a feature whereby using the sequence (?{...}) causes arbitrary
4971         Perl code to be obeyed in the middle of matching a regular  expression.         Perl  code to be obeyed in the middle of matching a regular expression.
4972         This makes it possible, amongst other things, to extract different sub-         This makes it possible, amongst other things, to extract different sub-
4973         strings that match the same pair of parentheses when there is a repeti-         strings that match the same pair of parentheses when there is a repeti-
4974         tion.         tion.
4975    
4976         PCRE provides a similar feature, but of course it cannot obey arbitrary         PCRE provides a similar feature, but of course it cannot obey arbitrary
4977         Perl code. The feature is called "callout". The caller of PCRE provides         Perl code. The feature is called "callout". The caller of PCRE provides
4978         an  external function by putting its entry point in the global variable         an external function by putting its entry point in the global  variable
4979         pcre_callout.  By default, this variable contains NULL, which  disables         pcre_callout.   By default, this variable contains NULL, which disables
4980         all calling out.         all calling out.
4981    
4982         Within  a  regular  expression,  (?C) indicates the points at which the         Within a regular expression, (?C) indicates the  points  at  which  the
4983         external function is to be called. If you want  to  identify  different         external  function  is  to be called. If you want to identify different
4984         callout  points, you can put a number less than 256 after the letter C.         callout points, you can put a number less than 256 after the letter  C.
4985         The default value is zero.  For example, this pattern has  two  callout         The  default  value is zero.  For example, this pattern has two callout
4986         points:         points:
4987    
4988           (?C1)abc(?C2)def           (?C1)abc(?C2)def
4989    
4990         If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are         If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
4991         automatically installed before each item in the pattern. They  are  all         automatically  installed  before each item in the pattern. They are all
4992         numbered 255.         numbered 255.
4993    
4994         During matching, when PCRE reaches a callout point (and pcre_callout is         During matching, when PCRE reaches a callout point (and pcre_callout is
4995         set), the external function is called. It is provided with  the  number         set),  the  external function is called. It is provided with the number
4996         of  the callout, the position in the pattern, and, optionally, one item         of the callout, the position in the pattern, and, optionally, one  item
4997         of data originally supplied by the caller of pcre_exec().  The  callout         of  data  originally supplied by the caller of pcre_exec(). The callout
4998         function  may cause matching to proceed, to backtrack, or to fail alto-         function may cause matching to proceed, to backtrack, or to fail  alto-
4999         gether. A complete description of the interface to the callout function         gether. A complete description of the interface to the callout function
5000         is given in the pcrecallout documentation.         is given in the pcrecallout documentation.
5001    
5002    
5003  BACKTRACKING CONTROL  BACKTRACKING CONTROL
5004    
5005         Perl  5.10 introduced a number of "Special Backtracking Control Verbs",         Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
5006         which are described in the Perl documentation as "experimental and sub-         which are described in the Perl documentation as "experimental and sub-
5007         ject  to  change or removal in a future version of Perl". It goes on to         ject to change or removal in a future version of Perl". It goes  on  to
5008         say: "Their usage in production code should be noted to avoid  problems         say:  "Their usage in production code should be noted to avoid problems
5009         during upgrades." The same remarks apply to the PCRE features described         during upgrades." The same remarks apply to the PCRE features described
5010         in this section.         in this section.
5011    
5012         Since these verbs are specifically related  to  backtracking,  most  of         Since  these  verbs  are  specifically related to backtracking, most of
5013         them  can  be  used  only  when  the  pattern  is  to  be matched using         them can be  used  only  when  the  pattern  is  to  be  matched  using
5014         pcre_exec(), which uses a backtracking algorithm. With the exception of         pcre_exec(), which uses a backtracking algorithm. With the exception of
5015         (*FAIL), which behaves like a failing negative assertion, they cause an         (*FAIL), which behaves like a failing negative assertion, they cause an
5016         error if encountered by pcre_dfa_exec().         error if encountered by pcre_dfa_exec().
5017    
5018         If any of these verbs are used in an assertion subpattern, their effect         If any of these verbs are used in an assertion subpattern, their effect
5019         is  confined  to that subpattern; it does not extend to the surrounding         is confined to that subpattern; it does not extend to  the  surrounding
5020         pattern.  Note that assertion subpatterns are processed as anchored  at         pattern.   Note that assertion subpatterns are processed as anchored at
5021         the point where they are tested.         the point where they are tested.
5022    
5023         The  new verbs make use of what was previously invalid syntax: an open-         The new verbs make use of what was previously invalid syntax: an  open-
5024         ing parenthesis followed by an asterisk. In Perl, they are generally of         ing parenthesis followed by an asterisk. In Perl, they are generally of
5025         the form (*VERB:ARG) but PCRE does not support the use of arguments, so         the form (*VERB:ARG) but PCRE does not support the use of arguments, so
5026         its general form is just (*VERB). Any number of these verbs  may  occur         its  general  form is just (*VERB). Any number of these verbs may occur
5027         in a pattern. There are two kinds:         in a pattern. There are two kinds:
5028    
5029     Verbs that act immediately     Verbs that act immediately
# Line 5027  BACKTRACKING CONTROL Line 5032  BACKTRACKING CONTROL
5032    
5033            (*ACCEPT)            (*ACCEPT)
5034    
5035         This  verb causes the match to end successfully, skipping the remainder         This verb causes the match to end successfully, skipping the  remainder
5036         of the pattern. When inside a recursion, only the innermost pattern  is         of  the pattern. When inside a recursion, only the innermost pattern is
5037         ended  immediately.  If  the (*ACCEPT) is inside capturing parentheses,         ended immediately. If the (*ACCEPT) is  inside  capturing  parentheses,
5038         the data so far is captured. (This feature was added to PCRE at release         the data so far is captured. (This feature was added to PCRE at release
5039         8.00.) For example:         8.00.) For example:
5040    
5041           A((?:A|B(*ACCEPT)|C)D)           A((?:A|B(*ACCEPT)|C)D)
5042    
5043         This  matches  "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap-         This matches "AB", "AAD", or "ACD"; when it matches "AB", "B"  is  cap-
5044         tured by the outer parentheses.         tured by the outer parentheses.
5045    
5046           (*FAIL) or (*F)           (*FAIL) or (*F)
5047    
5048         This verb causes the match to fail, forcing backtracking to  occur.  It         This  verb  causes the match to fail, forcing backtracking to occur. It
5049         is  equivalent to (?!) but easier to read. The Perl documentation notes         is equivalent to (?!) but easier to read. The Perl documentation  notes
5050         that it is probably useful only when combined  with  (?{})  or  (??{}).         that  it  is  probably  useful only when combined with (?{}) or (??{}).
5051         Those  are,  of course, Perl features that are not present in PCRE. The         Those are, of course, Perl features that are not present in  PCRE.  The
5052         nearest equivalent is the callout feature, as for example in this  pat-         nearest  equivalent is the callout feature, as for example in this pat-
5053         tern:         tern:
5054    
5055           a+(?C)(*FAIL)           a+(?C)(*FAIL)
5056    
5057         A  match  with the string "aaaa" always fails, but the callout is taken         A match with the string "aaaa" always fails, but the callout  is  taken
5058         before each backtrack happens (in this example, 10 times).         before each backtrack happens (in this example, 10 times).
5059    
5060     Verbs that act after backtracking     Verbs that act after backtracking
5061    
5062         The following verbs do nothing when they are encountered. Matching con-         The following verbs do nothing when they are encountered. Matching con-
5063         tinues  with what follows, but if there is no subsequent match, a fail-         tinues with what follows, but if there is no subsequent match, a  fail-
5064         ure is forced.  The verbs  differ  in  exactly  what  kind  of  failure         ure  is  forced.   The  verbs  differ  in  exactly what kind of failure
5065         occurs.         occurs.
5066    
5067           (*COMMIT)           (*COMMIT)
5068    
5069         This  verb  causes  the whole match to fail outright if the rest of the         This verb causes the whole match to fail outright if the  rest  of  the
5070         pattern does not match. Even if the pattern is unanchored,  no  further         pattern  does  not match. Even if the pattern is unanchored, no further
5071         attempts  to find a match by advancing the start point take place. Once         attempts to find a match by advancing the start point take place.  Once
5072         (*COMMIT) has been passed, pcre_exec() is committed to finding a  match         (*COMMIT)  has been passed, pcre_exec() is committed to finding a match
5073         at the current starting point, or not at all. For example:         at the current starting point, or not at all. For example:
5074    
5075           a+(*COMMIT)b           a+(*COMMIT)b
5076    
5077         This  matches  "xxaab" but not "aacaab". It can be thought of as a kind         This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
5078         of dynamic anchor, or "I've started, so I must finish."         of dynamic anchor, or "I've started, so I must finish."
5079    
5080           (*PRUNE)           (*PRUNE)
5081    
5082         This verb causes the match to fail at the current position if the  rest         This  verb causes the match to fail at the current position if the rest
5083         of the pattern does not match. If the pattern is unanchored, the normal         of the pattern does not match. If the pattern is unanchored, the normal
5084         "bumpalong" advance to the next starting character then happens.  Back-         "bumpalong"  advance to the next starting character then happens. Back-
5085         tracking  can  occur as usual to the left of (*PRUNE), or when matching         tracking can occur as usual to the left of (*PRUNE), or  when  matching
5086         to the right of (*PRUNE), but if there is no match to the right,  back-         to  the right of (*PRUNE), but if there is no match to the right, back-
5087         tracking  cannot  cross (*PRUNE).  In simple cases, the use of (*PRUNE)         tracking cannot cross (*PRUNE).  In simple cases, the use  of  (*PRUNE)
5088         is just an alternative to an atomic group or possessive quantifier, but         is just an alternative to an atomic group or possessive quantifier, but
5089         there  are  some uses of (*PRUNE) that cannot be expressed in any other         there are some uses of (*PRUNE) that cannot be expressed in  any  other
5090         way.         way.
5091    
5092           (*SKIP)           (*SKIP)
5093    
5094         This verb is like (*PRUNE), except that if the pattern  is  unanchored,         This  verb  is like (*PRUNE), except that if the pattern is unanchored,
5095         the  "bumpalong" advance is not to the next character, but to the posi-         the "bumpalong" advance is not to the next character, but to the  posi-
5096         tion in the subject where (*SKIP) was  encountered.  (*SKIP)  signifies         tion  in  the  subject where (*SKIP) was encountered. (*SKIP) signifies
5097         that  whatever  text  was  matched leading up to it cannot be part of a         that whatever text was matched leading up to it cannot  be  part  of  a
5098         successful match. Consider:         successful match. Consider:
5099    
5100           a+(*SKIP)b           a+(*SKIP)b
5101    
5102         If the subject is "aaaac...",  after  the  first  match  attempt  fails         If  the  subject  is  "aaaac...",  after  the first match attempt fails
5103         (starting  at  the  first  character in the string), the starting point         (starting at the first character in the  string),  the  starting  point
5104         skips on to start the next attempt at "c". Note that a possessive quan-         skips on to start the next attempt at "c". Note that a possessive quan-
5105         tifer  does not have the same effect in this example; although it would         tifer does not have the same effect in this example; although it  would
5106         suppress backtracking  during  the  first  match  attempt,  the  second         suppress  backtracking  during  the  first  match  attempt,  the second
5107         attempt  would  start at the second character instead of skipping on to         attempt would start at the second character instead of skipping  on  to
5108         "c".         "c".
5109    
5110           (*THEN)           (*THEN)
5111    
5112         This verb causes a skip to the next alternation if the rest of the pat-         This verb causes a skip to the next alternation if the rest of the pat-
5113         tern does not match. That is, it cancels pending backtracking, but only         tern does not match. That is, it cancels pending backtracking, but only
5114         within the current alternation. Its name  comes  from  the  observation         within  the  current  alternation.  Its name comes from the observation
5115         that it can be used for a pattern-based if-then-else block:         that it can be used for a pattern-based if-then-else block:
5116    
5117           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
5118    
5119         If  the COND1 pattern matches, FOO is tried (and possibly further items         If the COND1 pattern matches, FOO is tried (and possibly further  items
5120         after the end of the group if FOO succeeds);  on  failure  the  matcher         after  the  end  of  the group if FOO succeeds); on failure the matcher
5121         skips  to  the second alternative and tries COND2, without backtracking         skips to the second alternative and tries COND2,  without  backtracking
5122         into COND1. If (*THEN) is used outside  of  any  alternation,  it  acts         into  COND1.  If  (*THEN)  is  used outside of any alternation, it acts
5123         exactly like (*PRUNE).         exactly like (*PRUNE).
5124    
5125    
# Line 5132  AUTHOR Line 5137  AUTHOR
5137    
5138  REVISION  REVISION
5139    
5140         Last updated: 18 September 2009         Last updated: 22 September 2009
5141         Copyright (c) 1997-2009 University of Cambridge.         Copyright (c) 1997-2009 University of Cambridge.
5142  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5143    

Legend:
Removed from v.453  
changed lines
  Added in v.454

  ViewVC Help
Powered by ViewVC 1.1.5