105 |
changes the convention to CR. That pattern matches "a\nb" because LF is no |
changes the convention to CR. That pattern matches "a\nb" because LF is no |
106 |
longer a newline. Note that these special settings, which are not |
longer a newline. Note that these special settings, which are not |
107 |
Perl-compatible, are recognized only at the very start of a pattern, and that |
Perl-compatible, are recognized only at the very start of a pattern, and that |
108 |
they must be in upper case. |
they must be in upper case. If more than one of them is present, the last one |
109 |
|
is used. |
110 |
|
</P> |
111 |
|
<P> |
112 |
|
The newline convention does not affect what the \R escape sequence matches. By |
113 |
|
default, this is any Unicode newline sequence, for Perl compatibility. However, |
114 |
|
this can be changed; see the description of \R in the section entitled |
115 |
|
<a href="#newlineseq">"Newline sequences"</a> |
116 |
|
below. |
117 |
</P> |
</P> |
118 |
<br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br> |
<br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br> |
119 |
<P> |
<P> |
399 |
or "french" in Windows, some character codes greater than 128 are used for |
or "french" in Windows, some character codes greater than 128 are used for |
400 |
accented letters, and these are matched by \w. The use of locales with Unicode |
accented letters, and these are matched by \w. The use of locales with Unicode |
401 |
is discouraged. |
is discouraged. |
402 |
</P> |
<a name="newlineseq"></a></P> |
403 |
<br><b> |
<br><b> |
404 |
Newline sequences |
Newline sequences |
405 |
</b><br> |
</b><br> |
406 |
<P> |
<P> |
407 |
Outside a character class, the escape sequence \R matches any Unicode newline |
Outside a character class, by default, the escape sequence \R matches any |
408 |
sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is equivalent to |
Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is |
409 |
the following: |
equivalent to the following: |
410 |
<pre> |
<pre> |
411 |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
(?>\r\n|\n|\x0b|\f|\r|\x85) |
412 |
</pre> |
</pre> |
425 |
recognized. |
recognized. |
426 |
</P> |
</P> |
427 |
<P> |
<P> |
428 |
|
It is possible to restrict \R to match only CR, LF, or CRLF (instead of the |
429 |
|
complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF |
430 |
|
either at compile time or when the pattern is matched. This can be made the |
431 |
|
default when PCRE is built; if this is the case, the other behaviour can be |
432 |
|
requested via the PCRE_BSR_UNICODE option. It is also possible to specify these |
433 |
|
settings by starting a pattern string with one of the following sequences: |
434 |
|
<pre> |
435 |
|
(*BSR_ANYCRLF) CR, LF, or CRLF only |
436 |
|
(*BSR_UNICODE) any Unicode newline sequence |
437 |
|
</pre> |
438 |
|
These override the default and the options given to <b>pcre_compile()</b>, but |
439 |
|
they can be overridden by options given to <b>pcre_exec()</b>. Note that these |
440 |
|
special settings, which are not Perl-compatible, are recognized only at the |
441 |
|
very start of a pattern, and that they must be in upper case. If more than one |
442 |
|
of them is present, the last one is used. |
443 |
|
</P> |
444 |
|
<P> |
445 |
Inside a character class, \R matches the letter "R". |
Inside a character class, \R matches the letter "R". |
446 |
<a name="uniextseq"></a></P> |
<a name="uniextseq"></a></P> |
447 |
<br><b> |
<br><b> |
2184 |
</P> |
</P> |
2185 |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
<br><a name="SEC27" href="#TOC1">REVISION</a><br> |
2186 |
<P> |
<P> |
2187 |
Last updated: 21 August 2007 |
Last updated: 11 September 2007 |
2188 |
<br> |
<br> |
2189 |
Copyright © 1997-2007 University of Cambridge. |
Copyright © 1997-2007 University of Cambridge. |
2190 |
<br> |
<br> |