221 |
.SH NEWLINES |
.SH NEWLINES |
222 |
.rs |
.rs |
223 |
.sp |
.sp |
224 |
PCRE supports four different conventions for indicating line breaks in |
PCRE supports five different conventions for indicating line breaks in |
225 |
strings: a single CR (carriage return) character, a single LF (linefeed) |
strings: a single CR (carriage return) character, a single LF (linefeed) |
226 |
character, the two-character sequence CRLF, or any Unicode newline sequence. |
character, the two-character sequence CRLF, any of the three preceding, or any |
227 |
The Unicode newline sequences are the three just mentioned, plus the single |
Unicode newline sequence. The Unicode newline sequences are the three just |
228 |
characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
229 |
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
230 |
|
(paragraph separator, U+2029). |
231 |
.P |
.P |
232 |
Each of the first three conventions is used by at least one operating system as |
Each of the first three conventions is used by at least one operating system as |
233 |
its standard newline sequence. When PCRE is built, a default can be specified. |
its standard newline sequence. When PCRE is built, a default can be specified. |
298 |
.sp |
.sp |
299 |
The output is an integer whose value specifies the default character sequence |
The output is an integer whose value specifies the default character sequence |
300 |
that is recognized as meaning "newline". The four values that are supported |
that is recognized as meaning "newline". The four values that are supported |
301 |
are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY. The default should |
are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY. The |
302 |
normally be the standard sequence for your operating system. |
default should normally be the standard sequence for your operating system. |
303 |
.sp |
.sp |
304 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
305 |
.sp |
.sp |
533 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
534 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
535 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
536 |
|
PCRE_NEWLINE_ANYCRLF |
537 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
538 |
.sp |
.sp |
539 |
These options override the default newline definition that was chosen when PCRE |
These options override the default newline definition that was chosen when PCRE |
540 |
was built. Setting the first or the second specifies that a newline is |
was built. Setting the first or the second specifies that a newline is |
541 |
indicated by a single character (CR or LF, respectively). Setting |
indicated by a single character (CR or LF, respectively). Setting |
542 |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
543 |
CRLF sequence. Setting PCRE_NEWLINE_ANY specifies that any Unicode newline |
CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three |
544 |
sequence should be recognized. The Unicode newline sequences are the three just |
preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies |
545 |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
that any Unicode newline sequence should be recognized. The Unicode newline |
546 |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
sequences are the three just mentioned, plus the single characters VT (vertical |
547 |
(paragraph separator, U+2029). The last two are recognized only in UTF-8 mode. |
tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line |
548 |
|
separator, U+2028), and PS (paragraph separator, U+2029). The last two are |
549 |
|
recognized only in UTF-8 mode. |
550 |
.P |
.P |
551 |
The newline setting in the options word uses three bits that are treated |
The newline setting in the options word uses three bits that are treated |
552 |
as a number, giving eight possibilities. Currently only five are used (default |
as a number, giving eight possibilities. Currently only six are used (default |
553 |
plus the four values above). This means that if you set more than one newline |
plus the five values above). This means that if you set more than one newline |
554 |
option, the combination may or may not be sensible. For example, |
option, the combination may or may not be sensible. For example, |
555 |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
556 |
other combinations yield unused numbers and cause an error. |
other combinations may yield unused numbers and cause an error. |
557 |
.P |
.P |
558 |
The only time that a line break is specially recognized when compiling a |
The only time that a line break is specially recognized when compiling a |
559 |
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
1154 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1155 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
1156 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
1157 |
|
PCRE_NEWLINE_ANYCRLF |
1158 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
1159 |
.sp |
.sp |
1160 |
These options override the newline definition that was chosen or defaulted when |
These options override the newline definition that was chosen or defaulted when |
1162 |
\fBpcre_compile()\fP above. During matching, the newline choice affects the |
\fBpcre_compile()\fP above. During matching, the newline choice affects the |
1163 |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
1164 |
the way the match position is advanced after a match failure for an unanchored |
the way the match position is advanced after a match failure for an unanchored |
1165 |
pattern. When PCRE_NEWLINE_CRLF or PCRE_NEWLINE_ANY is set, and a match attempt |
pattern. When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
1166 |
fails when the current position is at a CRLF sequence, the match position is |
set, and a match attempt fails when the current position is at a CRLF sequence, |
1167 |
advanced by two characters instead of one, in other words, to after the CRLF. |
the match position is advanced by two characters instead of one, in other |
1168 |
|
words, to after the CRLF. |
1169 |
.sp |
.sp |
1170 |
PCRE_NOTBOL |
PCRE_NOTBOL |
1171 |
.sp |
.sp |
1849 |
.rs |
.rs |
1850 |
.sp |
.sp |
1851 |
.nf |
.nf |
1852 |
Last updated: 06 March 2007 |
Last updated: 16 April 2007 |
1853 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
1854 |
.fi |
.fi |