/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 510 by ph10, Sat Mar 27 17:45:29 2010 UTC revision 512 by ph10, Tue Mar 30 11:11:52 2010 UTC
# Line 2341  processed as anchored at the point where Line 2341  processed as anchored at the point where
2341  .P  .P
2342  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
2343  parenthesis followed by an asterisk. They are generally of the form  parenthesis followed by an asterisk. They are generally of the form
2344  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
2345  depending on whether or not an argument is present. An name is a sequence of  depending on whether or not an argument is present. An name is a sequence of
2346  letters, digits, and underscores. If the name is empty, that is, if the closing  letters, digits, and underscores. If the name is empty, that is, if the closing
2347  parenthesis immediately follows the colon, the effect is as if the colon were  parenthesis immediately follows the colon, the effect is as if the colon were
2348  not there. Any number of these verbs may occur in a pattern.  not there. Any number of these verbs may occur in a pattern.
2349  .P  .P
2350  PCRE contains some optimizations that are used to speed up matching by running  PCRE contains some optimizations that are used to speed up matching by running
2351  some checks at the start of each match attempt. For example, it may know the  some checks at the start of each match attempt. For example, it may know the
2352  minimum length of matching subject, or that a particular character must be  minimum length of matching subject, or that a particular character must be
2353  present. When one of these optimizations suppresses the running of a match, any  present. When one of these optimizations suppresses the running of a match, any
2354  included backtracking verbs will not, of course, be processed. You can suppress  included backtracking verbs will not, of course, be processed. You can suppress
2355  the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option  the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option
2356  when calling \fBpcre_exec()\fP.  when calling \fBpcre_exec()\fP.
2357  .  .
2358  .  .
2359  .SS "Verbs that act immediately"  .SS "Verbs that act immediately"
2360  .rs  .rs
2361  .sp  .sp
2362  The following verbs act as soon as they are encountered. They may not be  The following verbs act as soon as they are encountered. They may not be
2363  followed by a name.  followed by a name.
2364  .sp  .sp
2365     (*ACCEPT)     (*ACCEPT)
# Line 2391  each backtrack happens (in this example, Line 2391  each backtrack happens (in this example,
2391  .SS "Recording which path was taken"  .SS "Recording which path was taken"
2392  .rs  .rs
2393  .sp  .sp
2394  There is one verb whose main purpose is to track how a match was arrived at,  There is one verb whose main purpose is to track how a match was arrived at,
2395  though it also has a secondary use in conjunction with advancing the match  though it also has a secondary use in conjunction with advancing the match
2396  starting point (see (*SKIP) below).  starting point (see (*SKIP) below).
2397  .sp  .sp
2398    (*MARK:NAME) or (*:NAME)    (*MARK:NAME) or (*:NAME)
# Line 2406  to the caller via the \fIpcre_extra\fP d Line 2406  to the caller via the \fIpcre_extra\fP d
2406  .\" </a>  .\" </a>
2407  section on \fIpcre_extra\fP  section on \fIpcre_extra\fP
2408  .\"  .\"
2409  in the  in the
2410  .\" HREF  .\" HREF
2411  \fBpcreapi\fP  \fBpcreapi\fP
2412  .\"  .\"
# Line 2422  outputting of (*MARK) data: Line 2422  outputting of (*MARK) data:
2422     0: XZ     0: XZ
2423    MK: B    MK: B
2424  .sp  .sp
2425  The (*MARK) name is tagged with "MK:" in this output, and in this example it  The (*MARK) name is tagged with "MK:" in this output, and in this example it
2426  indicates which of the two alternatives matched. This is a more efficient way  indicates which of the two alternatives matched. This is a more efficient way
2427  of obtaining this information than putting each alternative in its own  of obtaining this information than putting each alternative in its own
2428  capturing parentheses.  capturing parentheses.
2429  .P  .P
# Line 2438  string, causing a failure before (*MARK) Line 2438  string, causing a failure before (*MARK)
2438    No match    No match
2439  .sp  .sp
2440  There are three potential starting points for this match (starting with X,  There are three potential starting points for this match (starting with X,
2441  starting with P, and with an empty string). If the pattern is anchored, the  starting with P, and with an empty string). If the pattern is anchored, the
2442  result is different:  result is different:
2443  .sp  .sp
2444    /^X(*MARK:A)Y|^X(*MARK:B)Z/K    /^X(*MARK:A)Y|^X(*MARK:B)Z/K
2445    XP    XP
2446    No match, mark = B    No match, mark = B
2447  .sp  .sp
2448  PCRE's start-of-match optimizations can also interfere with this. For example,  PCRE's start-of-match optimizations can also interfere with this. For example,
2449  if, as a result of a call to \fBpcre_study()\fP, it knows the minimum  if, as a result of a call to \fBpcre_study()\fP, it knows the minimum
2450  subject length for a match, a shorter subject will not be scanned at all.  subject length for a match, a shorter subject will not be scanned at all.
2451  .P  .P
2452  Note that similar anomalies (though different in detail) exist in Perl, no  Note that similar anomalies (though different in detail) exist in Perl, no
2453  doubt for the same reasons. The use of (*MARK) data after a failed match of an  doubt for the same reasons. The use of (*MARK) data after a failed match of an
2454  unanchored pattern is not recommended, unless (*COMMIT) is involved.  unanchored pattern is not recommended, unless (*COMMIT) is involved.
2455  .  .
2456  .  .
# Line 2463  the verb, a failure is forced. That is, Line 2463  the verb, a failure is forced. That is,
2463  the verb. However, when one of these verbs appears inside an atomic group, its  the verb. However, when one of these verbs appears inside an atomic group, its
2464  effect is confined to that group, because once the group has been matched,  effect is confined to that group, because once the group has been matched,
2465  there is never any backtracking into it. In this situation, backtracking can  there is never any backtracking into it. In this situation, backtracking can
2466  "jump back" to the left of the entire atomic group. (Remember also, as stated  "jump back" to the left of the entire atomic group. (Remember also, as stated
2467  above, that this localization also applies in subroutine calls and assertions.)  above, that this localization also applies in subroutine calls and assertions.)
2468  .P  .P
2469  These verbs differ in exactly what kind of failure occurs when backtracking  These verbs differ in exactly what kind of failure occurs when backtracking
# Line 2480  finding a match at the current starting Line 2480  finding a match at the current starting
2480    a+(*COMMIT)b    a+(*COMMIT)b
2481  .sp  .sp
2482  This matches "xxaab" but not "aacaab". It can be thought of as a kind of  This matches "xxaab" but not "aacaab". It can be thought of as a kind of
2483  dynamic anchor, or "I've started, so I must finish." The name of the most  dynamic anchor, or "I've started, so I must finish." The name of the most
2484  recently passed (*MARK) in the path is passed back when (*COMMIT) forces a  recently passed (*MARK) in the path is passed back when (*COMMIT) forces a
2485  match failure.  match failure.
2486  .P  .P
2487  Note that (*COMMIT) at the start of a pattern is not the same as an anchor,  Note that (*COMMIT) at the start of a pattern is not the same as an anchor,
2488  unless PCRE's start-of-match optimizations are turned off, as shown in this  unless PCRE's start-of-match optimizations are turned off, as shown in this
2489  \fBpcretest\fP example:  \fBpcretest\fP example:
2490  .sp  .sp
2491    /(*COMMIT)abc/    /(*COMMIT)abc/
# Line 2494  unless PCRE's start-of-match optimizatio Line 2494  unless PCRE's start-of-match optimizatio
2494    xyzabc\eY    xyzabc\eY
2495    No match    No match
2496  .sp  .sp
2497  PCRE knows that any match must start with "a", so the optimization skips along  PCRE knows that any match must start with "a", so the optimization skips along
2498  the subject to "a" before running the first match attempt, which succeeds. When  the subject to "a" before running the first match attempt, which succeeds. When
2499  the optimization is disabled by the \eY escape in the second subject, the match  the optimization is disabled by the \eY escape in the second subject, the match
2500  starts at "x" and so the (*COMMIT) causes it to fail without trying any other  starts at "x" and so the (*COMMIT) causes it to fail without trying any other
2501  starting points.  starting points.
2502  .sp  .sp
2503    (*PRUNE) or (*PRUNE:NAME)    (*PRUNE) or (*PRUNE:NAME)
2504  .sp  .sp
2505  This verb causes the match to fail at the current starting position in the  This verb causes the match to fail at the current starting position in the
2506  subject if the rest of the pattern does not match. If the pattern is  subject if the rest of the pattern does not match. If the pattern is
2507  unanchored, the normal "bumpalong" advance to the next starting character then  unanchored, the normal "bumpalong" advance to the next starting character then
2508  happens. Backtracking can occur as usual to the left of (*PRUNE), before it is  happens. Backtracking can occur as usual to the left of (*PRUNE), before it is
# Line 2534  instead of skipping on to "c". Line 2534  instead of skipping on to "c".
2534  .sp  .sp
2535    (*SKIP:NAME)    (*SKIP:NAME)
2536  .sp  .sp
2537  When (*SKIP) has an associated name, its behaviour is modified. If the  When (*SKIP) has an associated name, its behaviour is modified. If the
2538  following pattern fails to match, the previous path through the pattern is  following pattern fails to match, the previous path through the pattern is
2539  searched for the most recent (*MARK) that has the same name. If one is found,  searched for the most recent (*MARK) that has the same name. If one is found,
2540  the "bumpalong" advance is to the subject position that corresponds to that  the "bumpalong" advance is to the subject position that corresponds to that
2541  (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a  (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with a
2542  matching name is found, normal "bumpalong" of one character happens (the  matching name is found, normal "bumpalong" of one character happens (the
2543  (*SKIP) is ignored).  (*SKIP) is ignored).
2544  .sp  .sp
2545    (*THEN) or (*THEN:NAME)    (*THEN) or (*THEN:NAME)

Legend:
Removed from v.510  
changed lines
  Added in v.512

  ViewVC Help
Powered by ViewVC 1.1.5