--- code/trunk/doc/html/pcrepattern.html 2010/11/03 18:32:55 566 +++ code/trunk/doc/html/pcrepattern.html 2010/11/06 17:10:00 567 @@ -99,7 +99,7 @@ discussed in the pcrematching page. -

+


NEWLINE CONVENTIONS

PCRE supports five different conventions for indicating line breaks in @@ -234,6 +234,7 @@ \Qabc\E\$\Qxyz\E abc$xyz abc$xyz The \Q...\E sequence is recognized both inside and outside character classes. +An isolated \E that is not preceded by \Q is ignored.


Non-printing characters @@ -1936,7 +1937,15 @@ If the condition is satisfied, the yes-pattern is used; otherwise the no-pattern (if present) is used. If there are more than two alternatives in the -subpattern, a compile-time error occurs. +subpattern, a compile-time error occurs. Each of the two alternatives may +itself contain nested subpatterns of any form, including conditional +subpatterns; the restriction to two alternatives applies only at the level of +the condition. This pattern fragment is an example where the alternatives are +complex: +
+  (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
+
+

There are four kinds of condition: references to subpatterns, references to @@ -2071,14 +2080,32 @@


COMMENTS

-The sequence (?# marks the start of a comment that continues up to the next -closing parenthesis. Nested parentheses are not permitted. The characters -that make up a comment play no part in the pattern matching at all. +There are two ways of including comments in patterns that are processed by +PCRE. In both cases, the start of the comment must not be in a character class, +nor in the middle of any other sequence of related characters such as (?: or a +subpattern name or number. The characters that make up a comment play no part +in the pattern matching.

-If the PCRE_EXTENDED option is set, an unescaped # character outside a -character class introduces a comment that continues to immediately after the -next newline in the pattern. +The sequence (?# marks the start of a comment that continues up to the next +closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED +option is set, an unescaped # character also introduces a comment, which in +this case continues to immediately after the next newline character or +character sequence in the pattern. Which characters are interpreted as newlines +is controlled by the options passed to pcre_compile() or by a special +sequence at the start of the pattern, as described in the section entitled +"Newline conventions" +above. Note that end of this type of comment is a literal newline sequence in +the pattern; escape sequences that happen to represent a newline do not count. +For example, consider this pattern when PCRE_EXTENDED is set, and the default +newline convention is in force: +

+  abc #comment \n still comment
+
+On encountering the # character, pcre_compile() skips along, looking for +a newline in the pattern. The sequence \n is still literal at this stage, so +it does not terminate the comment. Only an actual character with the code value +0x0a (the default newline) does so.


RECURSIVE PATTERNS

@@ -2600,10 +2627,10 @@

   (*THEN) or (*THEN:NAME)
 
-This verb causes a skip to the next alternation if the rest of the pattern does -not match. That is, it cancels pending backtracking, but only within the -current alternation. Its name comes from the observation that it can be used -for a pattern-based if-then-else block: +This verb causes a skip to the next alternation in the innermost enclosing +group if the rest of the pattern does not match. That is, it cancels pending +backtracking, but only within the current alternation. Its name comes from the +observation that it can be used for a pattern-based if-then-else block:
   ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
 
@@ -2614,6 +2641,25 @@ overall match fails. If (*THEN) is not directly inside an alternation, it acts like (*PRUNE).

+

+The above verbs provide four different "strengths" of control when subsequent +matching fails. (*THEN) is the weakest, carrying on the match at the next +alternation. (*PRUNE) comes next, failing the match at the current starting +position, but allowing an advance to the next character (for an unanchored +pattern). (*SKIP) is similar, except that the advance may be more than one +character. (*COMMIT) is the strongest, causing the entire match to fail. +

+

+If more than one is present in a pattern, the "stongest" one wins. For example, +consider this pattern, where A, B, etc. are complex pattern fragments: +

+  (A(*COMMIT)B(*THEN)C|D)
+
+Once A has matched, PCRE is committed to this match, at the current starting +position. If subsequently B matches, but C does not, the normal (*THEN) action +of trying the next alternation (that is, D) does not happen because (*COMMIT) +overrides. +


SEE ALSO

pcreapi(3), pcrecallout(3), pcrematching(3), @@ -2630,7 +2676,7 @@


REVISION

-Last updated: 18 May 2010 +Last updated: 31 October 2010
Copyright © 1997-2010 University of Cambridge.