/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 724 by ph10, Sun Oct 9 16:23:45 2011 UTC revision 733 by ph10, Tue Oct 11 10:29:36 2011 UTC
# Line 2389  PCRE finds the palindrome "aba" at the s Line 2389  PCRE finds the palindrome "aba" at the s
2389  the end of the string does not follow. Once again, it cannot jump back into the  the end of the string does not follow. Once again, it cannot jump back into the
2390  recursion to try other alternatives, so the entire match fails.  recursion to try other alternatives, so the entire match fails.
2391  .P  .P
2392  The second way in which PCRE and Perl differ in their recursion processing is  The second way in which PCRE and Perl differ in their recursion processing is
2393  in the handling of captured values. In Perl, when a subpattern is called  in the handling of captured values. In Perl, when a subpattern is called
2394  recursively or as a subpattern (see the next section), it has no access to any  recursively or as a subpattern (see the next section), it has no access to any
2395  values that were captured outside the recursion, whereas in PCRE these values  values that were captured outside the recursion, whereas in PCRE these values
2396  can be referenced. Consider this pattern:  can be referenced. Consider this pattern:
2397  .sp  .sp
2398    ^(.)(\e1|a(?2))    ^(.)(\e1|a(?2))
2399  .sp  .sp
2400  In PCRE, this pattern matches "bab". The first capturing parentheses match "b",  In PCRE, this pattern matches "bab". The first capturing parentheses match "b",
2401  then in the second group, when the back reference \e1 fails to match "b", the  then in the second group, when the back reference \e1 fails to match "b", the
2402  second alternative matches "a" and then recurses. In the recursion, \e1 does  second alternative matches "a" and then recurses. In the recursion, \e1 does
2403  now match "b" and so the whole match succeeds. In Perl, the pattern fails to  now match "b" and so the whole match succeeds. In Perl, the pattern fails to
# Line 2762  pattern fragments that do not contain an Line 2762  pattern fragments that do not contain an
2762  .sp  .sp
2763    A (B(*THEN)C) | D    A (B(*THEN)C) | D
2764  .sp  .sp
2765  If A and B are matched, but there is a failure in C, matching does not  If A and B are matched, but there is a failure in C, matching does not
2766  backtrack into A; instead it moves to the next alternative, that is, D.  backtrack into A; instead it moves to the next alternative, that is, D.
2767  However, if the subpattern containing (*THEN) is given an alternative, it  However, if the subpattern containing (*THEN) is given an alternative, it
2768  behaves differently:  behaves differently:
# Line 2770  behaves differently: Line 2770  behaves differently:
2770    A (B(*THEN)C | (*FAIL)) | D    A (B(*THEN)C | (*FAIL)) | D
2771  .sp  .sp
2772  The effect of (*THEN) is now confined to the inner subpattern. After a failure  The effect of (*THEN) is now confined to the inner subpattern. After a failure
2773  in C, matching moves to (*FAIL), which causes the whole subpattern to fail  in C, matching moves to (*FAIL), which causes the whole subpattern to fail
2774  because there are no more alternatives to try. In this case, matching does now  because there are no more alternatives to try. In this case, matching does now
2775  backtrack into A.  backtrack into A.
2776  .P  .P
2777  Note also that a conditional subpattern is not considered as having two  Note also that a conditional subpattern is not considered as having two
2778  alternatives, because only one is ever used. In other words, the | character in  alternatives, because only one is ever used. In other words, the | character in
2779  a conditional subpattern has a different meaning. Ignoring white space,  a conditional subpattern has a different meaning. Ignoring white space,
2780  consider:  consider:
2781  .sp  .sp
2782    ^.*? (?(?=a) a | b(*THEN)c )    ^.*? (?(?=a) a | b(*THEN)c )
2783  .sp  .sp
2784  If the subject is "ba", this pattern does not match. Because .*? is ungreedy,  If the subject is "ba", this pattern does not match. Because .*? is ungreedy,
2785  it initially matches zero characters. The condition (?=a) then fails, the  it initially matches zero characters. The condition (?=a) then fails, the
2786  character "b" is matched, but "c" is not. At this point, matching does not  character "b" is matched, but "c" is not. At this point, matching does not
2787  backtrack to .*? as might perhaps be expected from the presence of the |  backtrack to .*? as might perhaps be expected from the presence of the |
2788  character. The conditional subpattern is part of the single alternative that  character. The conditional subpattern is part of the single alternative that
2789  comprises the whole pattern, and so the match fails. (If there was a backtrack  comprises the whole pattern, and so the match fails. (If there was a backtrack
2790  into .*?, allowing it to match "b", the match would succeed.)  into .*?, allowing it to match "b", the match would succeed.)
2791  .P  .P
2792  The verbs just described provide four different "strengths" of control when  The verbs just described provide four different "strengths" of control when

Legend:
Removed from v.724  
changed lines
  Added in v.733

  ViewVC Help
Powered by ViewVC 1.1.5