/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 758 by ph10, Mon Nov 21 12:05:36 2011 UTC revision 771 by ph10, Tue Nov 29 15:34:12 2011 UTC
# Line 2569  failing negative assertion, they cause a Line 2569  failing negative assertion, they cause a
2569  If any of these verbs are used in an assertion or in a subpattern that is  If any of these verbs are used in an assertion or in a subpattern that is
2570  called as a subroutine (whether or not recursively), their effect is confined  called as a subroutine (whether or not recursively), their effect is confined
2571  to that subpattern; it does not extend to the surrounding pattern, with one  to that subpattern; it does not extend to the surrounding pattern, with one
2572  exception: a *MARK that is encountered in a positive assertion \fIis\fP passed  exception: the name from a *(MARK), (*PRUNE), or (*THEN) that is encountered in
2573  back (compare capturing parentheses in assertions). Note that such subpatterns  a successful positive assertion \fIis\fP passed back when a match succeeds
2574  are processed as anchored at the point where they are tested. Note also that  (compare capturing parentheses in assertions). Note that such subpatterns are
2575  Perl's treatment of subroutines is different in some cases.  processed as anchored at the point where they are tested. Note also that Perl's
2576    treatment of subroutines is different in some cases.
2577  .P  .P
2578  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
2579  parenthesis followed by an asterisk. They are generally of the form  parenthesis followed by an asterisk. They are generally of the form
# Line 2591  included backtracking verbs will not, of Line 2592  included backtracking verbs will not, of
2592  the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option  the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
2593  when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the  when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the
2594  pattern with (*NO_START_OPT).  pattern with (*NO_START_OPT).
2595    .P
2596    Experiments with Perl suggest that it too has similar optimizations, sometimes
2597    leading to anomalous results.
2598  .  .
2599  .  .
2600  .SS "Verbs that act immediately"  .SS "Verbs that act immediately"
# Line 2638  starting point (see (*SKIP) below). Line 2642  starting point (see (*SKIP) below).
2642  A name is always required with this verb. There may be as many instances of  A name is always required with this verb. There may be as many instances of
2643  (*MARK) as you like in a pattern, and their names do not have to be unique.  (*MARK) as you like in a pattern, and their names do not have to be unique.
2644  .P  .P
2645  When a match succeeds, the name of the last-encountered (*MARK) is passed back  When a match succeeds, the name of the last-encountered (*MARK) on the matching
2646  to the caller via the \fIpcre_extra\fP data structure, as described in the  path is passed back to the caller via the \fIpcre_extra\fP data structure, as
2647    described in the
2648  .\" HTML <a href="pcreapi.html#extradata">  .\" HTML <a href="pcreapi.html#extradata">
2649  .\" </a>  .\" </a>
2650  section on \fIpcre_extra\fP  section on \fIpcre_extra\fP
# Line 2648  in the Line 2653  in the
2653  .\" HREF  .\" HREF
2654  \fBpcreapi\fP  \fBpcreapi\fP
2655  .\"  .\"
2656  documentation. No data is returned for a partial match. Here is an example of  documentation. Here is an example of \fBpcretest\fP output, where the /K
2657  \fBpcretest\fP output, where the /K modifier requests the retrieval and  modifier requests the retrieval and outputting of (*MARK) data:
 outputting of (*MARK) data:  
2658  .sp  .sp
2659    /X(*MARK:A)Y|X(*MARK:B)Z/K      re> /X(*MARK:A)Y|X(*MARK:B)Z/K
2660    XY    data> XY
2661     0: XY     0: XY
2662    MK: A    MK: A
2663    XZ    XZ
# Line 2669  If (*MARK) is encountered in a positive Line 2673  If (*MARK) is encountered in a positive
2673  passed back if it is the last-encountered. This does not happen for negative  passed back if it is the last-encountered. This does not happen for negative
2674  assertions.  assertions.
2675  .P  .P
2676  A name may also be returned after a failed match if the final path through the  After a partial match or a failed match, the name of the last encountered
2677  pattern involves (*MARK). However, unless (*MARK) used in conjunction with  (*MARK) in the entire match process is returned. For example:
 (*COMMIT), this is unlikely to happen for an unanchored pattern because, as the  
 starting point for matching is advanced, the final check is often with an empty  
 string, causing a failure before (*MARK) is reached. For example:  
 .sp  
   /X(*MARK:A)Y|X(*MARK:B)Z/K  
   XP  
   No match  
 .sp  
 There are three potential starting points for this match (starting with X,  
 starting with P, and with an empty string). If the pattern is anchored, the  
 result is different:  
2678  .sp  .sp
2679    /^X(*MARK:A)Y|^X(*MARK:B)Z/K      re> /X(*MARK:A)Y|X(*MARK:B)Z/K
2680    XP    data> XP
2681    No match, mark = B    No match, mark = B
2682  .sp  .sp
2683  PCRE's start-of-match optimizations can also interfere with this. For example,  Note that in this unanchored example the mark is retained from the match
2684  if, as a result of a call to \fBpcre_study()\fP, it knows the minimum  attempt that started at the letter "X". Subsequent match attempts starting at
2685  subject length for a match, a shorter subject will not be scanned at all.  "P" and then with an empty string do not get as far as the (*MARK) item, but
2686  .P  nevertheless do not reset it.
 Note that similar anomalies (though different in detail) exist in Perl, no  
 doubt for the same reasons. The use of (*MARK) data after a failed match of an  
 unanchored pattern is not recommended, unless (*COMMIT) is involved.  
2687  .  .
2688  .  .
2689  .SS "Verbs that act after backtracking"  .SS "Verbs that act after backtracking"
# Line 2730  Note that (*COMMIT) at the start of a pa Line 2720  Note that (*COMMIT) at the start of a pa
2720  unless PCRE's start-of-match optimizations are turned off, as shown in this  unless PCRE's start-of-match optimizations are turned off, as shown in this
2721  \fBpcretest\fP example:  \fBpcretest\fP example:
2722  .sp  .sp
2723    /(*COMMIT)abc/      re> /(*COMMIT)abc/
2724    xyzabc    data> xyzabc
2725     0: abc     0: abc
2726    xyzabc\eY    xyzabc\eY
2727    No match    No match
# Line 2752  reached, or when matching to the right o Line 2742  reached, or when matching to the right o
2742  the right, backtracking cannot cross (*PRUNE). In simple cases, the use of  the right, backtracking cannot cross (*PRUNE). In simple cases, the use of
2743  (*PRUNE) is just an alternative to an atomic group or possessive quantifier,  (*PRUNE) is just an alternative to an atomic group or possessive quantifier,
2744  but there are some uses of (*PRUNE) that cannot be expressed in any other way.  but there are some uses of (*PRUNE) that cannot be expressed in any other way.
2745  The behaviour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE) when the  The behaviour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE). In an
2746  match fails completely; the name is passed back if this is the final attempt.  anchored pattern (*PRUNE) has the same effect as (*COMMIT).
 (*PRUNE:NAME) does not pass back a name if the match succeeds. In an anchored  
 pattern (*PRUNE) has the same effect as (*COMMIT).  
2747  .sp  .sp
2748    (*SKIP)    (*SKIP)
2749  .sp  .sp
# Line 2781  following pattern fails to match, the pr Line 2769  following pattern fails to match, the pr
2769  searched for the most recent (*MARK) that has the same name. If one is found,  searched for the most recent (*MARK) that has the same name. If one is found,
2770  the "bumpalong" advance is to the subject position that corresponds to that  the "bumpalong" advance is to the subject position that corresponds to that
2771  (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a  (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a
2772  matching name is found, normal "bumpalong" of one character happens (that is,  matching name is found, the (*SKIP) is ignored.
 the (*SKIP) is ignored).  
2773  .sp  .sp
2774    (*THEN) or (*THEN:NAME)    (*THEN) or (*THEN:NAME)
2775  .sp  .sp
# Line 2796  be used for a pattern-based if-then-else Line 2783  be used for a pattern-based if-then-else
2783  If the COND1 pattern matches, FOO is tried (and possibly further items after  If the COND1 pattern matches, FOO is tried (and possibly further items after
2784  the end of the group if FOO succeeds); on failure, the matcher skips to the  the end of the group if FOO succeeds); on failure, the matcher skips to the
2785  second alternative and tries COND2, without backtracking into COND1. The  second alternative and tries COND2, without backtracking into COND1. The
2786  behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN) if the  behaviour of (*THEN:NAME) is exactly the same as (*MARK:NAME)(*THEN).
2787  overall match fails. If (*THEN) is not inside an alternation, it acts like  If (*THEN) is not inside an alternation, it acts like (*PRUNE).
 (*PRUNE).  
2788  .P  .P
2789  Note that a subpattern that does not contain a | character is just a part of  Note that a subpattern that does not contain a | character is just a part of
2790  the enclosing alternative; it is not a nested alternation with only one  the enclosing alternative; it is not a nested alternation with only one
# Line 2876  Cambridge CB2 3QH, England. Line 2862  Cambridge CB2 3QH, England.
2862  .rs  .rs
2863  .sp  .sp
2864  .nf  .nf
2865  Last updated: 19 November 2011  Last updated: 29 November 2011
2866  Copyright (c) 1997-2011 University of Cambridge.  Copyright (c) 1997-2011 University of Cambridge.
2867  .fi  .fi

Legend:
Removed from v.758  
changed lines
  Added in v.771

  ViewVC Help
Powered by ViewVC 1.1.5