--- code/trunk/doc/html/pcreapi.html 2007/08/21 11:46:08 226 +++ code/trunk/doc/html/pcreapi.html 2007/08/21 15:00:15 227 @@ -243,7 +243,7 @@ points during a matching operation. Details are given in the pcrecallout documentation. -+
PCRE supports five different conventions for indicating line breaks in @@ -262,13 +262,22 @@ matched.
+At compile time, the newline convention can be specified by the options +argument of pcre_compile(), or it can be specified by special text at the +start of the pattern itself; this overrides any other settings. See the +pcrepattern +page for details of the special character sequences. ++
In the PCRE documentation the word "newline" is used to mean "the character or pair of characters that indicate a line break". The choice of newline convention affects the handling of the dot, circumflex, and dollar metacharacters, the handling of #-comments in /x mode, and, when CRLF is a recognized line ending sequence, the match position advancement for a -non-anchored pattern. The choice of newline convention does not affect the -interpretation of the \n or \r escape sequences. +non-anchored pattern. There is more detail about this in the +section on pcre_exec() options +below. The choice of newline convention does not affect the interpretation of +the \n or \r escape sequences.
@@ -894,6 +903,11 @@ string, a pointer to the table is returned. Otherwise NULL is returned. The fourth argument should point to an unsigned char * variable.
+ PCRE_INFO_HASCRORLF ++Return 1 if the pattern contains any explicit matches for CR or LF characters, +otherwise 0. The fourth argument should point to an int variable. +
PCRE_INFO_JCHANGEDReturn 1 if the (?J) option setting is used in the pattern, otherwise 0. The @@ -1176,7 +1190,7 @@ called. See the pcreprecompile documentation for a discussion of saving compiled patterns for later use. - +
+When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a +match attempt for an unanchored pattern fails when the current position is at a +CRLF sequence, and the pattern contains no explicit matches for CR or NL +characters, the match position is advanced by two characters instead of one, in +other words, to after the CRLF. ++
+The above rule is a compromise that makes the most common cases work as +expected. For example, if the pattern is .+A (and the PCRE_DOTALL option is not +set), it does not match the string "\r\nA" because, after failing at the +start, it skips both the CR and the LF before retrying. However, the pattern +[\r\n]A does match that string, because it contains an explicit CR or LF +reference, and so advances only by one character after the first failure. +Note than an explicit CR or LF reference occurs for negated character classes +such as [^X] because they can match CR or LF characters. ++
+Notwithstanding the above, anomalous effects may still occur when CRLF is a +valid newline sequence and explicit \r or \n escapes appear in the pattern.
PCRE_NOTBOL@@ -1883,7 +1915,7 @@
-Last updated: 09 August 2007
+Last updated: 21 August 2007
Copyright © 1997-2007 University of Cambridge.