/[pcre]/code/trunk/doc/html/pcreapi.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreapi.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 226 by ph10, Thu Aug 9 09:52:43 2007 UTC revision 227 by ph10, Tue Aug 21 15:00:15 2007 UTC
# Line 243  by the caller to a "callout" function, w Line 243  by the caller to a "callout" function, w
243  points during a matching operation. Details are given in the  points during a matching operation. Details are given in the
244  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
245  documentation.  documentation.
246  </P>  <a name="newlines"></a></P>
247  <br><a name="SEC3" href="#TOC1">NEWLINES</a><br>  <br><a name="SEC3" href="#TOC1">NEWLINES</a><br>
248  <P>  <P>
249  PCRE supports five different conventions for indicating line breaks in  PCRE supports five different conventions for indicating line breaks in
# Line 262  default can be overridden, either when a Line 262  default can be overridden, either when a
262  matched.  matched.
263  </P>  </P>
264  <P>  <P>
265    At compile time, the newline convention can be specified by the <i>options</i>
266    argument of <b>pcre_compile()</b>, or it can be specified by special text at the
267    start of the pattern itself; this overrides any other settings. See the
268    <a href="pcrepattern.html"><b>pcrepattern</b></a>
269    page for details of the special character sequences.
270    </P>
271    <P>
272  In the PCRE documentation the word "newline" is used to mean "the character or  In the PCRE documentation the word "newline" is used to mean "the character or
273  pair of characters that indicate a line break". The choice of newline  pair of characters that indicate a line break". The choice of newline
274  convention affects the handling of the dot, circumflex, and dollar  convention affects the handling of the dot, circumflex, and dollar
275  metacharacters, the handling of #-comments in /x mode, and, when CRLF is a  metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
276  recognized line ending sequence, the match position advancement for a  recognized line ending sequence, the match position advancement for a
277  non-anchored pattern. The choice of newline convention does not affect the  non-anchored pattern. There is more detail about this in the
278  interpretation of the \n or \r escape sequences.  <a href="#execoptions">section on <b>pcre_exec()</b> options</a>
279    below. The choice of newline convention does not affect the interpretation of
280    the \n or \r escape sequences.
281  </P>  </P>
282  <br><a name="SEC4" href="#TOC1">MULTITHREADING</a><br>  <br><a name="SEC4" href="#TOC1">MULTITHREADING</a><br>
283  <P>  <P>
# Line 894  table indicating a fixed set of bytes fo Line 903  table indicating a fixed set of bytes fo
903  string, a pointer to the table is returned. Otherwise NULL is returned. The  string, a pointer to the table is returned. Otherwise NULL is returned. The
904  fourth argument should point to an <b>unsigned char *</b> variable.  fourth argument should point to an <b>unsigned char *</b> variable.
905  <pre>  <pre>
906      PCRE_INFO_HASCRORLF
907    </pre>
908    Return 1 if the pattern contains any explicit matches for CR or LF characters,
909    otherwise 0. The fourth argument should point to an <b>int</b> variable.
910    <pre>
911    PCRE_INFO_JCHANGED    PCRE_INFO_JCHANGED
912  </pre>  </pre>
913  Return 1 if the (?J) option setting is used in the pattern, otherwise 0. The  Return 1 if the (?J) option setting is used in the pattern, otherwise 0. The
# Line 1176  the external tables might be at a differ Line 1190  the external tables might be at a differ
1190  called. See the  called. See the
1191  <a href="pcreprecompile.html"><b>pcreprecompile</b></a>  <a href="pcreprecompile.html"><b>pcreprecompile</b></a>
1192  documentation for a discussion of saving compiled patterns for later use.  documentation for a discussion of saving compiled patterns for later use.
1193  </P>  <a name="execoptions"></a></P>
1194  <br><b>  <br><b>
1195  Option bits for <b>pcre_exec()</b>  Option bits for <b>pcre_exec()</b>
1196  </b><br>  </b><br>
# Line 1203  the pattern was compiled. For details, s Line 1217  the pattern was compiled. For details, s
1217  <b>pcre_compile()</b> above. During matching, the newline choice affects the  <b>pcre_compile()</b> above. During matching, the newline choice affects the
1218  behaviour of the dot, circumflex, and dollar metacharacters. It may also alter  behaviour of the dot, circumflex, and dollar metacharacters. It may also alter
1219  the way the match position is advanced after a match failure for an unanchored  the way the match position is advanced after a match failure for an unanchored
1220  pattern. When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is  pattern.
1221  set, and a match attempt fails when the current position is at a CRLF sequence,  </P>
1222  the match position is advanced by two characters instead of one, in other  <P>
1223  words, to after the CRLF.  When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a
1224    match attempt for an unanchored pattern fails when the current position is at a
1225    CRLF sequence, and the pattern contains no explicit matches for CR or NL
1226    characters, the match position is advanced by two characters instead of one, in
1227    other words, to after the CRLF.
1228    </P>
1229    <P>
1230    The above rule is a compromise that makes the most common cases work as
1231    expected. For example, if the pattern is .+A (and the PCRE_DOTALL option is not
1232    set), it does not match the string "\r\nA" because, after failing at the
1233    start, it skips both the CR and the LF before retrying. However, the pattern
1234    [\r\n]A does match that string, because it contains an explicit CR or LF
1235    reference, and so advances only by one character after the first failure.
1236    Note than an explicit CR or LF reference occurs for negated character classes
1237    such as [^X] because they can match CR or LF characters.
1238    </P>
1239    <P>
1240    Notwithstanding the above, anomalous effects may still occur when CRLF is a
1241    valid newline sequence and explicit \r or \n escapes appear in the pattern.
1242  <pre>  <pre>
1243    PCRE_NOTBOL    PCRE_NOTBOL
1244  </pre>  </pre>
# Line 1883  Cambridge CB2 3QH, England. Line 1915  Cambridge CB2 3QH, England.
1915  </P>  </P>
1916  <br><a name="SEC22" href="#TOC1">REVISION</a><br>  <br><a name="SEC22" href="#TOC1">REVISION</a><br>
1917  <P>  <P>
1918  Last updated: 09 August 2007  Last updated: 21 August 2007
1919  <br>  <br>
1920  Copyright &copy; 1997-2007 University of Cambridge.  Copyright &copy; 1997-2007 University of Cambridge.
1921  <br>  <br>

Legend:
Removed from v.226  
changed lines
  Added in v.227

  ViewVC Help
Powered by ViewVC 1.1.5