/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 607 by ph10, Sun Jun 12 15:09:49 2011 UTC revision 678 by ph10, Sun Aug 28 15:23:03 2011 UTC
# Line 32  Starting a pattern with this sequence is Line 32  Starting a pattern with this sequence is
32  option. This feature is not Perl-compatible. How setting UTF-8 mode affects  option. This feature is not Perl-compatible. How setting UTF-8 mode affects
33  pattern matching is mentioned in several places below. There is also a summary  pattern matching is mentioned in several places below. There is also a summary
34  of UTF-8 features in the  of UTF-8 features in the
 .\" HTML <a href="pcre.html#utf8support">  
 .\" </a>  
 section on UTF-8 support  
 .\"  
 in the main  
35  .\" HREF  .\" HREF
36  \fBpcre\fP  \fBpcreunicode\fP
37  .\"  .\"
38  page.  page.
39  .P  .P
# Line 220  Perl, $ and @ cause variable interpolati Line 215  Perl, $ and @ cause variable interpolati
215    \eQabc\eE\e$\eQxyz\eE   abc$xyz        abc$xyz    \eQabc\eE\e$\eQxyz\eE   abc$xyz        abc$xyz
216  .sp  .sp
217  The \eQ...\eE sequence is recognized both inside and outside character classes.  The \eQ...\eE sequence is recognized both inside and outside character classes.
218  An isolated \eE that is not preceded by \eQ is ignored. If \eQ is not followed  An isolated \eE that is not preceded by \eQ is ignored. If \eQ is not followed
219  by \eE later in the pattern, the literal interpretation continues to the end of  by \eE later in the pattern, the literal interpretation continues to the end of
220  the pattern (that is, \eE is assumed at the end). If the isolated \eQ is inside  the pattern (that is, \eE is assumed at the end). If the isolated \eQ is inside
221  a character class, this causes an error, because the character class is not  a character class, this causes an error, because the character class is not
222  terminated.  terminated.
# Line 757  Characters with the "mark" property are Line 752  Characters with the "mark" property are
752  preceding character. None of them have codepoints less than 256, so in  preceding character. None of them have codepoints less than 256, so in
753  non-UTF-8 mode \eX matches any one character.  non-UTF-8 mode \eX matches any one character.
754  .P  .P
755    Note that recent versions of Perl have changed \eX to match what Unicode calls
756    an "extended grapheme cluster", which has a more complicated definition.
757    .P
758  Matching characters by Unicode property is not fast, because PCRE has to search  Matching characters by Unicode property is not fast, because PCRE has to search
759  a structure that contains data for over fifteen thousand characters. That is  a structure that contains data for over fifteen thousand characters. That is
760  why the traditional escape sequences such as \ed and \ew do not use Unicode  why the traditional escape sequences such as \ed and \ew do not use Unicode
# Line 1435  items: Line 1433  items:
1433    an escape such as \ed or \epL that matches a single character    an escape such as \ed or \epL that matches a single character
1434    a character class    a character class
1435    a back reference (see next section)    a back reference (see next section)
1436    a parenthesized subpattern (unless it is an assertion)    a parenthesized subpattern (including assertions)
1437    a recursive or "subroutine" call to a subpattern    a recursive or "subroutine" call to a subpattern
1438  .sp  .sp
1439  The general repetition quantifier specifies a minimum and maximum number of  The general repetition quantifier specifies a minimum and maximum number of
# Line 1826  those that look ahead of the current pos Line 1824  those that look ahead of the current pos
1824  that look behind it. An assertion subpattern is matched in the normal way,  that look behind it. An assertion subpattern is matched in the normal way,
1825  except that it does not cause the current matching position to be changed.  except that it does not cause the current matching position to be changed.
1826  .P  .P
1827  Assertion subpatterns are not capturing subpatterns, and may not be repeated,  Assertion subpatterns are not capturing subpatterns. If such an assertion
1828  because it makes no sense to assert the same thing several times. If any kind  contains capturing subpatterns within it, these are counted for the purposes of
1829  of assertion contains capturing subpatterns within it, these are counted for  numbering the capturing subpatterns in the whole pattern. However, substring
1830  the purposes of numbering the capturing subpatterns in the whole pattern.  capturing is carried out only for positive assertions, because it does not make
1831  However, substring capturing is carried out only for positive assertions,  sense for negative assertions.
1832  because it does not make sense for negative assertions.  .P
1833    For compatibility with Perl, assertion subpatterns may be repeated; though
1834    it makes no sense to assert the same thing several times, the side effect of
1835    capturing parentheses may occasionally be useful. In practice, there only three
1836    cases:
1837    .sp
1838    (1) If the quantifier is {0}, the assertion is never obeyed during matching.
1839    However, it may contain internal capturing parenthesized groups that are called
1840    from elsewhere via the
1841    .\" HTML <a href="#subpatternsassubroutines">
1842    .\" </a>
1843    subroutine mechanism.
1844    .\"
1845    .sp
1846    (2) If quantifier is {0,n} where n is greater than zero, it is treated as if it
1847    were {0,1}. At run time, the rest of the pattern match is tried with and
1848    without the assertion, the order depending on the greediness of the quantifier.
1849    .sp
1850    (3) If the minimum repetition is greater than zero, the quantifier is ignored.
1851    The assertion is obeyed just once when encountered during matching.
1852  .  .
1853  .  .
1854  .SS "Lookahead assertions"  .SS "Lookahead assertions"
# Line 2489  failing negative assertion, they cause a Line 2506  failing negative assertion, they cause a
2506  .P  .P
2507  If any of these verbs are used in an assertion or subroutine subpattern  If any of these verbs are used in an assertion or subroutine subpattern
2508  (including recursive subpatterns), their effect is confined to that subpattern;  (including recursive subpatterns), their effect is confined to that subpattern;
2509  it does not extend to the surrounding pattern. Note that such subpatterns are  it does not extend to the surrounding pattern, with one exception: a *MARK that
2510  processed as anchored at the point where they are tested.  is encountered in a positive assertion \fIis\fP passed back (compare capturing
2511    parentheses in assertions). Note that such subpatterns are processed as
2512    anchored at the point where they are tested.
2513  .P  .P
2514  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
2515  parenthesis followed by an asterisk. They are generally of the form  parenthesis followed by an asterisk. They are generally of the form
# Line 2581  indicates which of the two alternatives Line 2600  indicates which of the two alternatives
2600  of obtaining this information than putting each alternative in its own  of obtaining this information than putting each alternative in its own
2601  capturing parentheses.  capturing parentheses.
2602  .P  .P
2603    If (*MARK) is encountered in a positive assertion, its name is recorded and
2604    passed back if it is the last-encountered. This does not happen for negative
2605    assetions.
2606    .P
2607  A name may also be returned after a failed match if the final path through the  A name may also be returned after a failed match if the final path through the
2608  pattern involves (*MARK). However, unless (*MARK) used in conjunction with  pattern involves (*MARK). However, unless (*MARK) used in conjunction with
2609  (*COMMIT), this is unlikely to happen for an unanchored pattern because, as the  (*COMMIT), this is unlikely to happen for an unanchored pattern because, as the
# Line 2752  Cambridge CB2 3QH, England. Line 2775  Cambridge CB2 3QH, England.
2775  .rs  .rs
2776  .sp  .sp
2777  .nf  .nf
2778  Last updated: 12 June 2011  Last updated: 24 August 2011
2779  Copyright (c) 1997-2011 University of Cambridge.  Copyright (c) 1997-2011 University of Cambridge.
2780  .fi  .fi

Legend:
Removed from v.607  
changed lines
  Added in v.678

  ViewVC Help
Powered by ViewVC 1.1.5