/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 566 by ph10, Thu Jun 3 19:18:24 2010 UTC revision 567 by ph10, Sat Nov 6 17:10:00 2010 UTC
# Line 99  alternative function, and how it differs Line 99  alternative function, and how it differs
99  discussed in the  discussed in the
100  <a href="pcrematching.html"><b>pcrematching</b></a>  <a href="pcrematching.html"><b>pcrematching</b></a>
101  page.  page.
102  </P>  <a name="newlines"></a></P>
103  <br><a name="SEC2" href="#TOC1">NEWLINE CONVENTIONS</a><br>  <br><a name="SEC2" href="#TOC1">NEWLINE CONVENTIONS</a><br>
104  <P>  <P>
105  PCRE supports five different conventions for indicating line breaks in  PCRE supports five different conventions for indicating line breaks in
# Line 234  Perl, $ and @ cause variable interpolati Line 234  Perl, $ and @ cause variable interpolati
234    \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz    \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
235  </pre>  </pre>
236  The \Q...\E sequence is recognized both inside and outside character classes.  The \Q...\E sequence is recognized both inside and outside character classes.
237    An isolated \E that is not preceded by \Q is ignored.
238  <a name="digitsafterbackslash"></a></P>  <a name="digitsafterbackslash"></a></P>
239  <br><b>  <br><b>
240  Non-printing characters  Non-printing characters
# Line 1936  already been matched. The two possible f Line 1937  already been matched. The two possible f
1937  </pre>  </pre>
1938  If the condition is satisfied, the yes-pattern is used; otherwise the  If the condition is satisfied, the yes-pattern is used; otherwise the
1939  no-pattern (if present) is used. If there are more than two alternatives in the  no-pattern (if present) is used. If there are more than two alternatives in the
1940  subpattern, a compile-time error occurs.  subpattern, a compile-time error occurs. Each of the two alternatives may
1941    itself contain nested subpatterns of any form, including conditional
1942    subpatterns; the restriction to two alternatives applies only at the level of
1943    the condition. This pattern fragment is an example where the alternatives are
1944    complex:
1945    <pre>
1946      (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
1947    
1948    </PRE>
1949  </P>  </P>
1950  <P>  <P>
1951  There are four kinds of condition: references to subpatterns, references to  There are four kinds of condition: references to subpatterns, references to
# Line 2071  dd-aaa-dd or dd-dd-dd, where aaa are let Line 2080  dd-aaa-dd or dd-dd-dd, where aaa are let
2080  <a name="comments"></a></P>  <a name="comments"></a></P>
2081  <br><a name="SEC20" href="#TOC1">COMMENTS</a><br>  <br><a name="SEC20" href="#TOC1">COMMENTS</a><br>
2082  <P>  <P>
2083  The sequence (?# marks the start of a comment that continues up to the next  There are two ways of including comments in patterns that are processed by
2084  closing parenthesis. Nested parentheses are not permitted. The characters  PCRE. In both cases, the start of the comment must not be in a character class,
2085  that make up a comment play no part in the pattern matching at all.  nor in the middle of any other sequence of related characters such as (?: or a
2086    subpattern name or number. The characters that make up a comment play no part
2087    in the pattern matching.
2088  </P>  </P>
2089  <P>  <P>
2090  If the PCRE_EXTENDED option is set, an unescaped # character outside a  The sequence (?# marks the start of a comment that continues up to the next
2091  character class introduces a comment that continues to immediately after the  closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED
2092  next newline in the pattern.  option is set, an unescaped # character also introduces a comment, which in
2093    this case continues to immediately after the next newline character or
2094    character sequence in the pattern. Which characters are interpreted as newlines
2095    is controlled by the options passed to <b>pcre_compile()</b> or by a special
2096    sequence at the start of the pattern, as described in the section entitled
2097    <a href="#recursion">"Newline conventions"</a>
2098    above. Note that end of this type of comment is a literal newline sequence in
2099    the pattern; escape sequences that happen to represent a newline do not count.
2100    For example, consider this pattern when PCRE_EXTENDED is set, and the default
2101    newline convention is in force:
2102    <pre>
2103      abc #comment \n still comment
2104    </pre>
2105    On encountering the # character, <b>pcre_compile()</b> skips along, looking for
2106    a newline in the pattern. The sequence \n is still literal at this stage, so
2107    it does not terminate the comment. Only an actual character with the code value
2108    0x0a (the default newline) does so.
2109  <a name="recursion"></a></P>  <a name="recursion"></a></P>
2110  <br><a name="SEC21" href="#TOC1">RECURSIVE PATTERNS</a><br>  <br><a name="SEC21" href="#TOC1">RECURSIVE PATTERNS</a><br>
2111  <P>  <P>
# Line 2600  matching name is found, normal "bumpalon Line 2627  matching name is found, normal "bumpalon
2627  <pre>  <pre>
2628    (*THEN) or (*THEN:NAME)    (*THEN) or (*THEN:NAME)
2629  </pre>  </pre>
2630  This verb causes a skip to the next alternation if the rest of the pattern does  This verb causes a skip to the next alternation in the innermost enclosing
2631  not match. That is, it cancels pending backtracking, but only within the  group if the rest of the pattern does not match. That is, it cancels pending
2632  current alternation. Its name comes from the observation that it can be used  backtracking, but only within the current alternation. Its name comes from the
2633  for a pattern-based if-then-else block:  observation that it can be used for a pattern-based if-then-else block:
2634  <pre>  <pre>
2635    ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...    ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
2636  </pre>  </pre>
# Line 2614  behaviour of (*THEN:NAME) is exactly the Line 2641  behaviour of (*THEN:NAME) is exactly the
2641  overall match fails. If (*THEN) is not directly inside an alternation, it acts  overall match fails. If (*THEN) is not directly inside an alternation, it acts
2642  like (*PRUNE).  like (*PRUNE).
2643  </P>  </P>
2644    <P>
2645    The above verbs provide four different "strengths" of control when subsequent
2646    matching fails. (*THEN) is the weakest, carrying on the match at the next
2647    alternation. (*PRUNE) comes next, failing the match at the current starting
2648    position, but allowing an advance to the next character (for an unanchored
2649    pattern). (*SKIP) is similar, except that the advance may be more than one
2650    character. (*COMMIT) is the strongest, causing the entire match to fail.
2651    </P>
2652    <P>
2653    If more than one is present in a pattern, the "stongest" one wins. For example,
2654    consider this pattern, where A, B, etc. are complex pattern fragments:
2655    <pre>
2656      (A(*COMMIT)B(*THEN)C|D)
2657    </pre>
2658    Once A has matched, PCRE is committed to this match, at the current starting
2659    position. If subsequently B matches, but C does not, the normal (*THEN) action
2660    of trying the next alternation (that is, D) does not happen because (*COMMIT)
2661    overrides.
2662    </P>
2663  <br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>
2664  <P>  <P>
2665  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
# Line 2630  Cambridge CB2 3QH, England. Line 2676  Cambridge CB2 3QH, England.
2676  </P>  </P>
2677  <br><a name="SEC28" href="#TOC1">REVISION</a><br>  <br><a name="SEC28" href="#TOC1">REVISION</a><br>
2678  <P>  <P>
2679  Last updated: 18 May 2010  Last updated: 31 October 2010
2680  <br>  <br>
2681  Copyright &copy; 1997-2010 University of Cambridge.  Copyright &copy; 1997-2010 University of Cambridge.
2682  <br>  <br>

Legend:
Removed from v.566  
changed lines
  Added in v.567

  ViewVC Help
Powered by ViewVC 1.1.5