/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 96 by nigel, Fri Mar 2 13:10:43 2007 UTC revision 167 by ph10, Wed May 9 15:53:54 2007 UTC
# Line 291  in the Line 291  in the
291  .\" HREF  .\" HREF
292  \fBpcreapi\fP  \fBpcreapi\fP
293  .\"  .\"
294  page). For example, in the "fr_FR" (French) locale, some character codes  page). For example, in a French locale such as "fr_FR" in Unix-like systems,
295  greater than 128 are used for accented letters, and these are matched by \ew.  or "french" in Windows, some character codes greater than 128 are used for
296    accented letters, and these are matched by \ew.
297  .P  .P
298  In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or  In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or
299  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode
# Line 740  example [\ex{100}-\ex{2ff}]. Line 741  example [\ex{100}-\ex{2ff}].
741  If a range that includes letters is used when caseless matching is set, it  If a range that includes letters is used when caseless matching is set, it
742  matches the letters in either case. For example, [W-c] is equivalent to  matches the letters in either case. For example, [W-c] is equivalent to
743  [][\e\e^_`wxyzabc], matched caselessly, and in non-UTF-8 mode, if character  [][\e\e^_`wxyzabc], matched caselessly, and in non-UTF-8 mode, if character
744  tables for the "fr_FR" locale are in use, [\exc8-\excb] matches accented E  tables for a French locale are in use, [\exc8-\excb] matches accented E
745  characters in both cases. In UTF-8 mode, PCRE supports the concept of case for  characters in both cases. In UTF-8 mode, PCRE supports the concept of case for
746  characters with values greater than 128 only when it is compiled with Unicode  characters with values greater than 128 only when it is compiled with Unicode
747  property support.  property support.
# Line 1514  recursion, a pseudo-condition called DEF Line 1515  recursion, a pseudo-condition called DEF
1515  .sp  .sp
1516  If the text between the parentheses consists of a sequence of digits, the  If the text between the parentheses consists of a sequence of digits, the
1517  condition is true if the capturing subpattern of that number has previously  condition is true if the capturing subpattern of that number has previously
1518  matched.  matched. An alternative notation is to precede the digits with a plus or minus
1519    sign. In this case, the subpattern number is relative rather than absolute.
1520    The most recently opened parentheses can be referenced by (?(-1), the next most
1521    recent by (?(-2), and so on. In looping constructs it can also make sense to
1522    refer to subsequent groups with constructs such as (?(+2).
1523  .P  .P
1524  Consider the following pattern, which contains non-significant white space to  Consider the following pattern, which contains non-significant white space to
1525  make it more readable (assume the PCRE_EXTENDED option) and to divide it into  make it more readable (assume the PCRE_EXTENDED option) and to divide it into
# Line 1531  the condition is true, and so the yes-pa Line 1536  the condition is true, and so the yes-pa
1536  parenthesis is required. Otherwise, since no-pattern is not present, the  parenthesis is required. Otherwise, since no-pattern is not present, the
1537  subpattern matches nothing. In other words, this pattern matches a sequence of  subpattern matches nothing. In other words, this pattern matches a sequence of
1538  non-parentheses, optionally enclosed in parentheses.  non-parentheses, optionally enclosed in parentheses.
1539    .P
1540    If you were embedding this pattern in a larger one, you could use a relative
1541    reference:
1542    .sp
1543      ...other stuff... ( \e( )?    [^()]+    (?(-1) \e) ) ...
1544    .sp
1545    This makes the fragment independent of the parentheses in the larger pattern.
1546  .  .
1547  .SS "Checking for a used subpattern by name"  .SS "Checking for a used subpattern by name"
1548  .rs  .rs
# Line 1673  pattern, so instead you could use this: Line 1685  pattern, so instead you could use this:
1685    ( \e( ( (?>[^()]+) | (?1) )* \e) )    ( \e( ( (?>[^()]+) | (?1) )* \e) )
1686  .sp  .sp
1687  We have put the pattern into parentheses, and caused the recursion to refer to  We have put the pattern into parentheses, and caused the recursion to refer to
1688  them instead of the whole pattern. In a larger pattern, keeping track of  them instead of the whole pattern.
1689  parenthesis numbers can be tricky. It may be more convenient to use named  .P
1690  parentheses instead. The Perl syntax for this is (?&name); PCRE's earlier  In a larger pattern, keeping track of parenthesis numbers can be tricky. This
1691  syntax (?P>name) is also supported. We could rewrite the above example as  is made easier by the use of relative references. (A Perl 5.10 feature.)
1692  follows:  Instead of (?1) in the pattern above you can write (?-2) to refer to the second
1693    most recently opened parentheses preceding the recursion. In other words, a
1694    negative number counts capturing parentheses leftwards from the point at which
1695    it is encountered.
1696    .P
1697    It is also possible to refer to subsequently opened parentheses, by writing
1698    references such as (?+2). However, these cannot be recursive because the
1699    reference is not inside the parentheses that are referenced. They are always
1700    "subroutine" calls, as described in the next section.
1701    .P
1702    An alternative approach is to use named parentheses instead. The Perl syntax
1703    for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We
1704    could rewrite the above example as follows:
1705  .sp  .sp
1706    (?<pn> \e( ( (?>[^()]+) | (?&pn) )* \e) )    (?<pn> \e( ( (?>[^()]+) | (?&pn) )* \e) )
1707  .sp  .sp
1708  If there is more than one subpattern with the same name, the earliest one is  If there is more than one subpattern with the same name, the earliest one is
1709  used. This particular example pattern contains nested unlimited repeats, and so  used.
1710  the use of atomic grouping for matching strings of non-parentheses is important  .P
1711  when applying the pattern to strings that do not match. For example, when this  This particular example pattern that we have been looking at contains nested
1712  pattern is applied to  unlimited repeats, and so the use of atomic grouping for matching strings of
1713    non-parentheses is important when applying the pattern to strings that do not
1714    match. For example, when this pattern is applied to
1715  .sp  .sp
1716    (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()    (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
1717  .sp  .sp
# Line 1737  is the actual recursive call. Line 1763  is the actual recursive call.
1763  If the syntax for a recursive subpattern reference (either by number or by  If the syntax for a recursive subpattern reference (either by number or by
1764  name) is used outside the parentheses to which it refers, it operates like a  name) is used outside the parentheses to which it refers, it operates like a
1765  subroutine in a programming language. The "called" subpattern may be defined  subroutine in a programming language. The "called" subpattern may be defined
1766  before or after the reference. An earlier example pointed out that the pattern  before or after the reference. A numbered reference can be absolute or
1767    relative, as in these examples:
1768    .sp
1769      (...(absolute)...)...(?2)...
1770      (...(relative)...)...(?-1)...
1771      (...(?+1)...(relative)...
1772    .sp
1773    An earlier example pointed out that the pattern
1774  .sp  .sp
1775    (sens|respons)e and \e1ibility    (sens|respons)e and \e1ibility
1776  .sp  .sp
# Line 1758  When a subpattern is used as a subroutin Line 1791  When a subpattern is used as a subroutin
1791  case-independence are fixed when the subpattern is defined. They cannot be  case-independence are fixed when the subpattern is defined. They cannot be
1792  changed for different calls. For example, consider this pattern:  changed for different calls. For example, consider this pattern:
1793  .sp  .sp
1794    (abc)(?i:(?1))    (abc)(?i:(?-1))
1795  .sp  .sp
1796  It matches "abcabc". It does not match "abcABC" because the change of  It matches "abcabc". It does not match "abcABC" because the change of
1797  processing option does not affect the called subpattern.  processing option does not affect the called subpattern.
# Line 1782  function is to be called. If you want to Line 1815  function is to be called. If you want to
1815  can put a number less than 256 after the letter C. The default value is zero.  can put a number less than 256 after the letter C. The default value is zero.
1816  For example, this pattern has two callout points:  For example, this pattern has two callout points:
1817  .sp  .sp
1818    (?C1)\dabc(?C2)def    (?C1)abc(?C2)def
1819  .sp  .sp
1820  If the PCRE_AUTO_CALLOUT flag is passed to \fBpcre_compile()\fP, callouts are  If the PCRE_AUTO_CALLOUT flag is passed to \fBpcre_compile()\fP, callouts are
1821  automatically installed before each item in the pattern. They are all numbered  automatically installed before each item in the pattern. They are all numbered
# Line 1804  documentation. Line 1837  documentation.
1837  .rs  .rs
1838  .sp  .sp
1839  \fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), \fBpcre\fP(3).  \fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), \fBpcre\fP(3).
1840  .P  .
1841  .in 0  .
1842  Last updated: 06 December 2006  .SH AUTHOR
1843  .br  .rs
1844  Copyright (c) 1997-2006 University of Cambridge.  .sp
1845    .nf
1846    Philip Hazel
1847    University Computing Service
1848    Cambridge CB2 3QH, England.
1849    .fi
1850    .
1851    .
1852    .SH REVISION
1853    .rs
1854    .sp
1855    .nf
1856    Last updated: 09 May 2007
1857    Copyright (c) 1997-2007 University of Cambridge.
1858    .fi

Legend:
Removed from v.96  
changed lines
  Added in v.167

  ViewVC Help
Powered by ViewVC 1.1.5