246 |
</P> |
</P> |
247 |
<br><a name="SEC3" href="#TOC1">NEWLINES</a><br> |
<br><a name="SEC3" href="#TOC1">NEWLINES</a><br> |
248 |
<P> |
<P> |
249 |
PCRE supports four different conventions for indicating line breaks in |
PCRE supports five different conventions for indicating line breaks in |
250 |
strings: a single CR (carriage return) character, a single LF (linefeed) |
strings: a single CR (carriage return) character, a single LF (linefeed) |
251 |
character, the two-character sequence CRLF, or any Unicode newline sequence. |
character, the two-character sequence CRLF, any of the three preceding, or any |
252 |
The Unicode newline sequences are the three just mentioned, plus the single |
Unicode newline sequence. The Unicode newline sequences are the three just |
253 |
characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
254 |
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
255 |
|
(paragraph separator, U+2029). |
256 |
</P> |
</P> |
257 |
<P> |
<P> |
258 |
Each of the first three conventions is used by at least one operating system as |
Each of the first three conventions is used by at least one operating system as |
318 |
</pre> |
</pre> |
319 |
The output is an integer whose value specifies the default character sequence |
The output is an integer whose value specifies the default character sequence |
320 |
that is recognized as meaning "newline". The four values that are supported |
that is recognized as meaning "newline". The four values that are supported |
321 |
are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY. The default should |
are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY. The |
322 |
normally be the standard sequence for your operating system. |
default should normally be the standard sequence for your operating system. |
323 |
<pre> |
<pre> |
324 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
325 |
</pre> |
</pre> |
547 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
548 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
549 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
550 |
|
PCRE_NEWLINE_ANYCRLF |
551 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
552 |
</pre> |
</pre> |
553 |
These options override the default newline definition that was chosen when PCRE |
These options override the default newline definition that was chosen when PCRE |
554 |
was built. Setting the first or the second specifies that a newline is |
was built. Setting the first or the second specifies that a newline is |
555 |
indicated by a single character (CR or LF, respectively). Setting |
indicated by a single character (CR or LF, respectively). Setting |
556 |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
557 |
CRLF sequence. Setting PCRE_NEWLINE_ANY specifies that any Unicode newline |
CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three |
558 |
sequence should be recognized. The Unicode newline sequences are the three just |
preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies |
559 |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
that any Unicode newline sequence should be recognized. The Unicode newline |
560 |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
sequences are the three just mentioned, plus the single characters VT (vertical |
561 |
(paragraph separator, U+2029). The last two are recognized only in UTF-8 mode. |
tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line |
562 |
|
separator, U+2028), and PS (paragraph separator, U+2029). The last two are |
563 |
|
recognized only in UTF-8 mode. |
564 |
</P> |
</P> |
565 |
<P> |
<P> |
566 |
The newline setting in the options word uses three bits that are treated |
The newline setting in the options word uses three bits that are treated |
567 |
as a number, giving eight possibilities. Currently only five are used (default |
as a number, giving eight possibilities. Currently only six are used (default |
568 |
plus the four values above). This means that if you set more than one newline |
plus the five values above). This means that if you set more than one newline |
569 |
option, the combination may or may not be sensible. For example, |
option, the combination may or may not be sensible. For example, |
570 |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
571 |
other combinations yield unused numbers and cause an error. |
other combinations may yield unused numbers and cause an error. |
572 |
</P> |
</P> |
573 |
<P> |
<P> |
574 |
The only time that a line break is specially recognized when compiling a |
The only time that a line break is specially recognized when compiling a |
1170 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1171 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
1172 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
1173 |
|
PCRE_NEWLINE_ANYCRLF |
1174 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
1175 |
</pre> |
</pre> |
1176 |
These options override the newline definition that was chosen or defaulted when |
These options override the newline definition that was chosen or defaulted when |
1178 |
<b>pcre_compile()</b> above. During matching, the newline choice affects the |
<b>pcre_compile()</b> above. During matching, the newline choice affects the |
1179 |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
1180 |
the way the match position is advanced after a match failure for an unanchored |
the way the match position is advanced after a match failure for an unanchored |
1181 |
pattern. When PCRE_NEWLINE_CRLF or PCRE_NEWLINE_ANY is set, and a match attempt |
pattern. When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
1182 |
fails when the current position is at a CRLF sequence, the match position is |
set, and a match attempt fails when the current position is at a CRLF sequence, |
1183 |
advanced by two characters instead of one, in other words, to after the CRLF. |
the match position is advanced by two characters instead of one, in other |
1184 |
|
words, to after the CRLF. |
1185 |
<pre> |
<pre> |
1186 |
PCRE_NOTBOL |
PCRE_NOTBOL |
1187 |
</pre> |
</pre> |
1857 |
</P> |
</P> |
1858 |
<br><a name="SEC22" href="#TOC1">REVISION</a><br> |
<br><a name="SEC22" href="#TOC1">REVISION</a><br> |
1859 |
<P> |
<P> |
1860 |
Last updated: 06 March 2007 |
Last updated: 16 April 2007 |
1861 |
<br> |
<br> |
1862 |
Copyright © 1997-2007 University of Cambridge. |
Copyright © 1997-2007 University of Cambridge. |
1863 |
<br> |
<br> |