221 |
.SH NEWLINES |
.SH NEWLINES |
222 |
.rs |
.rs |
223 |
.sp |
.sp |
224 |
PCRE supports four different conventions for indicating line breaks in |
PCRE supports five different conventions for indicating line breaks in |
225 |
strings: a single CR (carriage return) character, a single LF (linefeed) |
strings: a single CR (carriage return) character, a single LF (linefeed) |
226 |
character, the two-character sequence CRLF, or any Unicode newline sequence. |
character, the two-character sequence CRLF, any of the three preceding, or any |
227 |
The Unicode newline sequences are the three just mentioned, plus the single |
Unicode newline sequence. The Unicode newline sequences are the three just |
228 |
characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
229 |
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
230 |
|
(paragraph separator, U+2029). |
231 |
.P |
.P |
232 |
Each of the first three conventions is used by at least one operating system as |
Each of the first three conventions is used by at least one operating system as |
233 |
its standard newline sequence. When PCRE is built, a default can be specified. |
its standard newline sequence. When PCRE is built, a default can be specified. |
265 |
.\" HREF |
.\" HREF |
266 |
\fBpcreprecompile\fP |
\fBpcreprecompile\fP |
267 |
.\" |
.\" |
268 |
documentation. |
documentation. However, compiling a regular expression with one version of PCRE |
269 |
|
for use with a different version is not guaranteed to work and may cause |
270 |
|
crashes. |
271 |
. |
. |
272 |
. |
. |
273 |
.SH "CHECKING BUILD-TIME OPTIONS" |
.SH "CHECKING BUILD-TIME OPTIONS" |
300 |
.sp |
.sp |
301 |
The output is an integer whose value specifies the default character sequence |
The output is an integer whose value specifies the default character sequence |
302 |
that is recognized as meaning "newline". The four values that are supported |
that is recognized as meaning "newline". The four values that are supported |
303 |
are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY. The default should |
are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY. The |
304 |
normally be the standard sequence for your operating system. |
default should normally be the standard sequence for your operating system. |
305 |
.sp |
.sp |
306 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
307 |
.sp |
.sp |
535 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
536 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
537 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
538 |
|
PCRE_NEWLINE_ANYCRLF |
539 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
540 |
.sp |
.sp |
541 |
These options override the default newline definition that was chosen when PCRE |
These options override the default newline definition that was chosen when PCRE |
542 |
was built. Setting the first or the second specifies that a newline is |
was built. Setting the first or the second specifies that a newline is |
543 |
indicated by a single character (CR or LF, respectively). Setting |
indicated by a single character (CR or LF, respectively). Setting |
544 |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
545 |
CRLF sequence. Setting PCRE_NEWLINE_ANY specifies that any Unicode newline |
CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three |
546 |
sequence should be recognized. The Unicode newline sequences are the three just |
preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies |
547 |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
that any Unicode newline sequence should be recognized. The Unicode newline |
548 |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
sequences are the three just mentioned, plus the single characters VT (vertical |
549 |
(paragraph separator, U+2029). The last two are recognized only in UTF-8 mode. |
tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line |
550 |
|
separator, U+2028), and PS (paragraph separator, U+2029). The last two are |
551 |
|
recognized only in UTF-8 mode. |
552 |
.P |
.P |
553 |
The newline setting in the options word uses three bits that are treated |
The newline setting in the options word uses three bits that are treated |
554 |
as a number, giving eight possibilities. Currently only five are used (default |
as a number, giving eight possibilities. Currently only six are used (default |
555 |
plus the four values above). This means that if you set more than one newline |
plus the five values above). This means that if you set more than one newline |
556 |
option, the combination may or may not be sensible. For example, |
option, the combination may or may not be sensible. For example, |
557 |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
558 |
other combinations yield unused numbers and cause an error. |
other combinations may yield unused numbers and cause an error. |
559 |
.P |
.P |
560 |
The only time that a line break is specially recognized when compiling a |
The only time that a line break is specially recognized when compiling a |
561 |
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
1156 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1157 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
1158 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
1159 |
|
PCRE_NEWLINE_ANYCRLF |
1160 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
1161 |
.sp |
.sp |
1162 |
These options override the newline definition that was chosen or defaulted when |
These options override the newline definition that was chosen or defaulted when |
1164 |
\fBpcre_compile()\fP above. During matching, the newline choice affects the |
\fBpcre_compile()\fP above. During matching, the newline choice affects the |
1165 |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
1166 |
the way the match position is advanced after a match failure for an unanchored |
the way the match position is advanced after a match failure for an unanchored |
1167 |
pattern. When PCRE_NEWLINE_CRLF or PCRE_NEWLINE_ANY is set, and a match attempt |
pattern. When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
1168 |
fails when the current position is at a CRLF sequence, the match position is |
set, and a match attempt fails when the current position is at a CRLF sequence, |
1169 |
advanced by two characters instead of one, in other words, to after the CRLF. |
the match position is advanced by two characters instead of one, in other |
1170 |
|
words, to after the CRLF. |
1171 |
.sp |
.sp |
1172 |
PCRE_NOTBOL |
PCRE_NOTBOL |
1173 |
.sp |
.sp |
1851 |
.rs |
.rs |
1852 |
.sp |
.sp |
1853 |
.nf |
.nf |
1854 |
Last updated: 06 March 2007 |
Last updated: 24 April 2007 |
1855 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
1856 |
.fi |
.fi |