241 |
\et tab (hex 09) |
\et tab (hex 09) |
242 |
\eddd character with octal code ddd, or back reference |
\eddd character with octal code ddd, or back reference |
243 |
\exhh character with hex code hh |
\exhh character with hex code hh |
244 |
\ex{hhh..} character with hex code hhh.. |
\ex{hhh..} character with hex code hhh.. (non-JavaScript mode) |
245 |
|
\euhhhh character with hex code hhhh (JavaScript mode only) |
246 |
.sp |
.sp |
247 |
The precise effect of \ecx is as follows: if x is a lower case letter, it |
The precise effect of \ecx is as follows: if x is a lower case letter, it |
248 |
is converted to upper case. Then bit 6 of the character (hex 40) is inverted. |
is converted to upper case. Then bit 6 of the character (hex 40) is inverted. |
253 |
values are valid. A lower case letter is converted to upper case, and then the |
values are valid. A lower case letter is converted to upper case, and then the |
254 |
0xc0 bits are flipped.) |
0xc0 bits are flipped.) |
255 |
.P |
.P |
256 |
After \ex, from zero to two hexadecimal digits are read (letters can be in |
By default, after \ex, from zero to two hexadecimal digits are read (letters |
257 |
upper or lower case). Any number of hexadecimal digits may appear between \ex{ |
can be in upper or lower case). Any number of hexadecimal digits may appear |
258 |
and }, but the value of the character code must be less than 256 in non-UTF-8 |
between \ex{ and }, but the value of the character code must be less than 256 |
259 |
mode, and less than 2**31 in UTF-8 mode. That is, the maximum value in |
in non-UTF-8 mode, and less than 2**31 in UTF-8 mode. That is, the maximum |
260 |
hexadecimal is 7FFFFFFF. Note that this is bigger than the largest Unicode code |
value in hexadecimal is 7FFFFFFF. Note that this is bigger than the largest |
261 |
point, which is 10FFFF. |
Unicode code point, which is 10FFFF. |
262 |
.P |
.P |
263 |
If characters other than hexadecimal digits appear between \ex{ and }, or if |
If characters other than hexadecimal digits appear between \ex{ and }, or if |
264 |
there is no terminating }, this form of escape is not recognized. Instead, the |
there is no terminating }, this form of escape is not recognized. Instead, the |
265 |
initial \ex will be interpreted as a basic hexadecimal escape, with no |
initial \ex will be interpreted as a basic hexadecimal escape, with no |
266 |
following digits, giving a character whose value is zero. |
following digits, giving a character whose value is zero. |
267 |
.P |
.P |
268 |
|
If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \ex is |
269 |
|
as just described only when it is followed by two hexadecimal digits. |
270 |
|
Otherwise, it matches a literal "x" character. In JavaScript mode, support for |
271 |
|
code points greater than 256 is provided by \eu, which must be followed by |
272 |
|
four hexadecimal digits; otherwise it matches a literal "u" character. |
273 |
|
.P |
274 |
Characters whose value is less than 256 can be defined by either of the two |
Characters whose value is less than 256 can be defined by either of the two |
275 |
syntaxes for \ex. There is no difference in the way they are handled. For |
syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the |
276 |
example, \exdc is exactly the same as \ex{dc}. |
way they are handled. For example, \exdc is exactly the same as \ex{dc} (or |
277 |
|
\eu00dc in JavaScript mode). |
278 |
.P |
.P |
279 |
After \e0 up to two further octal digits are read. If there are fewer than two |
After \e0 up to two further octal digits are read. If there are fewer than two |
280 |
digits, just those that are present are used. Thus the sequence \e0\ex\e07 |
digits, just those that are present are used. Thus the sequence \e0\ex\e07 |
336 |
set. Outside a character class, these sequences have different meanings. |
set. Outside a character class, these sequences have different meanings. |
337 |
. |
. |
338 |
. |
. |
339 |
|
.SS "Unsupported escape sequences" |
340 |
|
.rs |
341 |
|
.sp |
342 |
|
In Perl, the sequences \el, \eL, \eu, and \eU are recognized by its string |
343 |
|
handler and used to modify the case of following characters. By default, PCRE |
344 |
|
does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT |
345 |
|
option is set, \eU matches a "U" character, and \eu can be used to define a |
346 |
|
character by code point, as described in the previous section. |
347 |
|
. |
348 |
|
. |
349 |
.SS "Absolute and relative back references" |
.SS "Absolute and relative back references" |
350 |
.rs |
.rs |
351 |
.sp |
.sp |
405 |
.\" </a> |
.\" </a> |
406 |
the "." metacharacter |
the "." metacharacter |
407 |
.\" |
.\" |
408 |
when PCRE_DOTALL is not set. |
when PCRE_DOTALL is not set. Perl also uses \eN to match characters by name; |
409 |
|
PCRE does not support this. |
410 |
.P |
.P |
411 |
Each pair of lower and upper case escape sequences partitions the complete set |
Each pair of lower and upper case escape sequences partitions the complete set |
412 |
of characters into two disjoint sets. Any given character matches one, and only |
of characters into two disjoint sets. Any given character matches one, and only |
983 |
.P |
.P |
984 |
The escape sequence \eN behaves like a dot, except that it is not affected by |
The escape sequence \eN behaves like a dot, except that it is not affected by |
985 |
the PCRE_DOTALL option. In other words, it matches any character except one |
the PCRE_DOTALL option. In other words, it matches any character except one |
986 |
that signifies the end of a line. |
that signifies the end of a line. Perl also uses \eN to match characters by |
987 |
|
name; PCRE does not support this. |
988 |
. |
. |
989 |
. |
. |
990 |
.SH "MATCHING A SINGLE BYTE" |
.SH "MATCHING A SINGLE BYTE" |
2874 |
.rs |
.rs |
2875 |
.sp |
.sp |
2876 |
.nf |
.nf |
2877 |
Last updated: 19 October 2011 |
Last updated: 14 November 2011 |
2878 |
Copyright (c) 1997-2011 University of Cambridge. |
Copyright (c) 1997-2011 University of Cambridge. |
2879 |
.fi |
.fi |