/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 171 by ph10, Mon Jun 4 14:28:58 2007 UTC revision 175 by ph10, Mon Jun 11 13:38:38 2007 UTC
# Line 30  The remainder of this document discusses Line 30  The remainder of this document discusses
30  PCRE when its main matching function, \fBpcre_exec()\fP, is used.  PCRE when its main matching function, \fBpcre_exec()\fP, is used.
31  From release 6.0, PCRE offers a second matching function,  From release 6.0, PCRE offers a second matching function,
32  \fBpcre_dfa_exec()\fP, which matches using a different algorithm that is not  \fBpcre_dfa_exec()\fP, which matches using a different algorithm that is not
33  Perl-compatible. Some of the features discussed below are not available when  Perl-compatible. Some of the features discussed below are not available when
34  \fBpcre_dfa_exec()\fP is used. The advantages and disadvantages of the  \fBpcre_dfa_exec()\fP is used. The advantages and disadvantages of the
35  alternative function, and how it differs from the normal function, are  alternative function, and how it differs from the normal function, are
36  discussed in the  discussed in the
# Line 241  meanings Line 241  meanings
241  .rs  .rs
242  .sp  .sp
243  The sequence \eg followed by a positive or negative number, optionally enclosed  The sequence \eg followed by a positive or negative number, optionally enclosed
244  in braces, is an absolute or relative back reference. A named back reference  in braces, is an absolute or relative back reference. A named back reference
245  can be coded as \eg{name}. Back references are discussed  can be coded as \eg{name}. Back references are discussed
246  .\" HTML <a href="#backreferences">  .\" HTML <a href="#backreferences">
247  .\" </a>  .\" </a>
# Line 525  properties in PCRE. Line 525  properties in PCRE.
525  .SS "Resetting the match start"  .SS "Resetting the match start"
526  .rs  .rs
527  .sp  .sp
528  The escape sequence \eK, which is a Perl 5.10 feature, causes any previously  The escape sequence \eK, which is a Perl 5.10 feature, causes any previously
529  matched characters not to be included in the final matched sequence. For  matched characters not to be included in the final matched sequence. For
530  example, the pattern:  example, the pattern:
531  .sp  .sp
532    foo\eKbar    foo\eKbar
533  .sp  .sp
534  matches "foobar", but reports that it has matched "bar". This feature is  matches "foobar", but reports that it has matched "bar". This feature is
535  similar to a lookbehind assertion  similar to a lookbehind assertion
536  .\" HTML <a href="#lookbehind">  .\" HTML <a href="#lookbehind">
537  .\" </a>  .\" </a>
538  (described below).  (described below).
539  .\"  .\"
540  However, in this case, the part of the subject before the real match does not  However, in this case, the part of the subject before the real match does not
541  have to be of fixed length, as lookbehind assertions do. The use of \eK does  have to be of fixed length, as lookbehind assertions do. The use of \eK does
542  not interfere with the setting of  not interfere with the setting of
543  .\" HTML <a href="#subpattern">  .\" HTML <a href="#subpattern">
544  .\" </a>  .\" </a>
545  captured substrings.  captured substrings.
546  .\"  .\"
547  For example, when the pattern  For example, when the pattern
548  .sp  .sp
549    (foo)\eKbar    (foo)\eKbar
550  .sp  .sp
551  matches "foobar", the first substring is still set to "foo".  matches "foobar", the first substring is still set to "foo".
552  .  .
553  .  .
554  .\" HTML <a name="smallassertions"></a>  .\" HTML <a name="smallassertions"></a>
# Line 958  is reached, an option setting in one bra Line 958  is reached, an option setting in one bra
958  the above patterns match "SUNDAY" as well as "Saturday".  the above patterns match "SUNDAY" as well as "Saturday".
959  .  .
960  .  .
961    .SH "DUPLICATE SUBPATTERN NUMBERS"
962    .rs
963    .sp
964    Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
965    the same numbers for its capturing parentheses. Such a subpattern starts with
966    (?| and is itself a non-capturing subpattern. For example, consider this
967    pattern:
968    .sp
969      (?|(Sat)ur|(Sun))day
970    .sp
971    Because the two alternatives are inside a (?| group, both sets of capturing
972    parentheses are numbered one. Thus, when the pattern matches, you can look
973    at captured substring number one, whichever alternative matched. This construct
974    is useful when you want to capture part, but not all, of one of a number of
975    alternatives. Inside a (?| group, parentheses are numbered as usual, but the
976    number is reset at the start of each branch. The numbers of any capturing
977    buffers that follow the subpattern start after the highest number used in any
978    branch. The following example is taken from the Perl documentation.
979    The numbers underneath show in which buffer the captured content will be
980    stored.
981    .sp
982      # before  ---------------branch-reset----------- after
983      / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
984      # 1            2         2  3        2     3     4
985    .sp
986    A backreference or a recursive call to a numbered subpattern always refers to
987    the first one in the pattern with the given number.
988    .P
989    An alternative approach to using this "branch reset" feature is to use
990    duplicate named subpatterns, as described in the next section.
991    .
992    .
993  .SH "NAMED SUBPATTERNS"  .SH "NAMED SUBPATTERNS"
994  .rs  .rs
995  .sp  .sp
# Line 1007  abbreviation. This pattern (ignoring the Line 1039  abbreviation. This pattern (ignoring the
1039    (?<DN>Sat)(?:urday)?    (?<DN>Sat)(?:urday)?
1040  .sp  .sp
1041  There are five capturing substrings, but only one is ever set after a match.  There are five capturing substrings, but only one is ever set after a match.
1042    (An alternative way of solving this problem is to use a "branch reset"
1043    subpattern, as described in the previous section.)
1044    .P
1045  The convenience function for extracting the data by name returns the substring  The convenience function for extracting the data by name returns the substring
1046  for the first (and in this example, the only) subpattern of that name that  for the first (and in this example, the only) subpattern of that name that
1047  matched. This saves searching to find which numbered subpattern it was. If you  matched. This saves searching to find which numbered subpattern it was. If you
# Line 1458  lengths, but it is acceptable if rewritt Line 1493  lengths, but it is acceptable if rewritt
1493  .sp  .sp
1494    (?<=abc|abde)    (?<=abc|abde)
1495  .sp  .sp
1496  In some cases, the Perl 5.10 escape sequence \eK  In some cases, the Perl 5.10 escape sequence \eK
1497  .\" HTML <a href="#resetmatchstart">  .\" HTML <a href="#resetmatchstart">
1498  .\" </a>  .\" </a>
1499  (see above)  (see above)
# Line 1560  recursion, a pseudo-condition called DEF Line 1595  recursion, a pseudo-condition called DEF
1595  .sp  .sp
1596  If the text between the parentheses consists of a sequence of digits, the  If the text between the parentheses consists of a sequence of digits, the
1597  condition is true if the capturing subpattern of that number has previously  condition is true if the capturing subpattern of that number has previously
1598  matched. An alternative notation is to precede the digits with a plus or minus  matched. An alternative notation is to precede the digits with a plus or minus
1599  sign. In this case, the subpattern number is relative rather than absolute.  sign. In this case, the subpattern number is relative rather than absolute.
1600  The most recently opened parentheses can be referenced by (?(-1), the next most  The most recently opened parentheses can be referenced by (?(-1), the next most
1601  recent by (?(-2), and so on. In looping constructs it can also make sense to  recent by (?(-2), and so on. In looping constructs it can also make sense to
1602  refer to subsequent groups with constructs such as (?(+2).  refer to subsequent groups with constructs such as (?(+2).
1603  .P  .P
# Line 1582  parenthesis is required. Otherwise, sinc Line 1617  parenthesis is required. Otherwise, sinc
1617  subpattern matches nothing. In other words, this pattern matches a sequence of  subpattern matches nothing. In other words, this pattern matches a sequence of
1618  non-parentheses, optionally enclosed in parentheses.  non-parentheses, optionally enclosed in parentheses.
1619  .P  .P
1620  If you were embedding this pattern in a larger one, you could use a relative  If you were embedding this pattern in a larger one, you could use a relative
1621  reference:  reference:
1622  .sp  .sp
1623    ...other stuff... ( \e( )?    [^()]+    (?(-1) \e) ) ...    ...other stuff... ( \e( )?    [^()]+    (?(-1) \e) ) ...
# Line 1730  pattern, so instead you could use this: Line 1765  pattern, so instead you could use this:
1765    ( \e( ( (?>[^()]+) | (?1) )* \e) )    ( \e( ( (?>[^()]+) | (?1) )* \e) )
1766  .sp  .sp
1767  We have put the pattern into parentheses, and caused the recursion to refer to  We have put the pattern into parentheses, and caused the recursion to refer to
1768  them instead of the whole pattern.  them instead of the whole pattern.
1769  .P  .P
1770  In a larger pattern, keeping track of parenthesis numbers can be tricky. This  In a larger pattern, keeping track of parenthesis numbers can be tricky. This
1771  is made easier by the use of relative references. (A Perl 5.10 feature.)  is made easier by the use of relative references. (A Perl 5.10 feature.)
# Line 1751  could rewrite the above example as follo Line 1786  could rewrite the above example as follo
1786    (?<pn> \e( ( (?>[^()]+) | (?&pn) )* \e) )    (?<pn> \e( ( (?>[^()]+) | (?&pn) )* \e) )
1787  .sp  .sp
1788  If there is more than one subpattern with the same name, the earliest one is  If there is more than one subpattern with the same name, the earliest one is
1789  used.  used.
1790  .P  .P
1791  This particular example pattern that we have been looking at contains nested  This particular example pattern that we have been looking at contains nested
1792  unlimited repeats, and so the use of atomic grouping for matching strings of  unlimited repeats, and so the use of atomic grouping for matching strings of
# Line 1813  relative, as in these examples: Line 1848  relative, as in these examples:
1848  .sp  .sp
1849    (...(absolute)...)...(?2)...    (...(absolute)...)...(?2)...
1850    (...(relative)...)...(?-1)...    (...(relative)...)...(?-1)...
1851    (...(?+1)...(relative)...    (...(?+1)...(relative)...
1852  .sp  .sp
1853  An earlier example pointed out that the pattern  An earlier example pointed out that the pattern
1854  .sp  .sp
# Line 1898  Cambridge CB2 3QH, England. Line 1933  Cambridge CB2 3QH, England.
1933  .rs  .rs
1934  .sp  .sp
1935  .nf  .nf
1936  Last updated: 29 May 2007  Last updated: 11 June 2007
1937  Copyright (c) 1997-2007 University of Cambridge.  Copyright (c) 1997-2007 University of Cambridge.
1938  .fi  .fi

Legend:
Removed from v.171  
changed lines
  Added in v.175

  ViewVC Help
Powered by ViewVC 1.1.5