--- code/trunk/doc/pcreapi.3 2007/08/20 14:38:34 225
+++ code/trunk/doc/pcreapi.3 2007/08/21 11:46:08 226
@@ -240,8 +240,13 @@
convention affects the handling of the dot, circumflex, and dollar
metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
recognized line ending sequence, the match position advancement for a
-non-anchored pattern. The choice of newline convention does not affect the
-interpretation of the \en or \er escape sequences.
+non-anchored pattern. There is more detail about this in the
+.\" HTML
+.\"
+section on \fBpcre_exec()\fP options
+.\"
+below. The choice of newline convention does not affect the interpretation of
+the \en or \er escape sequences.
.
.
.SH MULTITHREADING
@@ -882,6 +887,11 @@
string, a pointer to the table is returned. Otherwise NULL is returned. The
fourth argument should point to an \fBunsigned char *\fP variable.
.sp
+ PCRE_INFO_HASCRORLF
+.sp
+Return 1 if the pattern contains any explicit matches for CR or LF characters,
+otherwise 0. The fourth argument should point to an \fBint\fP variable.
+.sp
PCRE_INFO_JCHANGED
.sp
Return 1 if the (?J) option setting is used in the pattern, otherwise 0. The
@@ -1169,6 +1179,7 @@
.\"
documentation for a discussion of saving compiled patterns for later use.
.
+.\" HTML
.SS "Option bits for \fBpcre_exec()\fP"
.rs
.sp
@@ -1194,19 +1205,25 @@
\fBpcre_compile()\fP above. During matching, the newline choice affects the
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter
the way the match position is advanced after a match failure for an unanchored
-pattern. When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is
-set, and a match attempt fails when the current position is at a CRLF sequence,
-the match position is advanced by two characters instead of one, in other
-words, to after the CRLF.
-.P
-Anomalous effects can occur when CRLF is a valid newline sequence and explicit
-\er or \en escapes appear in the pattern. For example, the string "\er\enA"
-matches the unanchored pattern \enA but not [X\en]A. This happens because, in
-the first case, PCRE knows that the match must start with \en, and so it skips
-there before trying to match. In the second case, it has no knowledge about the
-starting character, so it starts matching at the beginning of the string, and
-on failing, skips over the CRLF as described above. However, if the pattern is
-studied, the match succeeds, because then PCRE once again knows where to start.
+pattern.
+.P
+When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a
+match attempt for an unanchored pattern fails when the current position is at a
+CRLF sequence, and the pattern contains no explicit matches for CR or NL
+characters, the match position is advanced by two characters instead of one, in
+other words, to after the CRLF.
+.P
+The above rule is a compromise that makes the most common cases work as
+expected. For example, if the pattern is .+A (and the PCRE_DOTALL option is not
+set), it does not match the string "\er\enA" because, after failing at the
+start, it skips both the CR and the LF before retrying. However, the pattern
+[\er\en]A does match that string, because it contains an explicit CR or LF
+reference, and so advances only by one character after the first failure.
+Note than an explicit CR or LF reference occurs for negated character classes
+such as [^X] because they can match CR or LF characters.
+.P
+Notwithstanding the above, anomalous effects may still occur when CRLF is a
+valid newline sequence and explicit \er or \en escapes appear in the pattern.
.sp
PCRE_NOTBOL
.sp
@@ -1895,6 +1912,6 @@
.rs
.sp
.nf
-Last updated: 20 August 2007
+Last updated: 21 August 2007
Copyright (c) 1997-2007 University of Cambridge.
.fi