505 |
\eX an extended Unicode sequence |
\eX an extended Unicode sequence |
506 |
.sp |
.sp |
507 |
The property names represented by \fIxx\fP above are limited to the Unicode |
The property names represented by \fIxx\fP above are limited to the Unicode |
508 |
script names, the general category properties, and "Any", which matches any |
script names, the general category properties, "Any", which matches any |
509 |
character (including newline). Other properties such as "InMusicalSymbols" are |
character (including newline), and some special PCRE properties (described |
510 |
not currently supported by PCRE. Note that \eP{Any} does not match any |
in the |
511 |
characters, so always causes a match failure. |
.\" HTML <a href="#extraprops"> |
512 |
|
.\" </a> |
513 |
|
next section). |
514 |
|
.\" |
515 |
|
Other Perl properties such as "InMusicalSymbols" are not currently supported by |
516 |
|
PCRE. Note that \eP{Any} does not match any characters, so always causes a |
517 |
|
match failure. |
518 |
.P |
.P |
519 |
Sets of Unicode characters are defined as belonging to certain scripts. A |
Sets of Unicode characters are defined as belonging to certain scripts. A |
520 |
character from one of these sets can be matched using a script name. For |
character from one of these sets can be matched using a script name. For |
619 |
Vai, |
Vai, |
620 |
Yi. |
Yi. |
621 |
.P |
.P |
622 |
Each character has exactly one general category property, specified by a |
Each character has exactly one Unicode general category property, specified by |
623 |
two-letter abbreviation. For compatibility with Perl, negation can be specified |
a two-letter abbreviation. For compatibility with Perl, negation can be |
624 |
by including a circumflex between the opening brace and the property name. For |
specified by including a circumflex between the opening brace and the property |
625 |
example, \ep{^Lu} is the same as \eP{Lu}. |
name. For example, \ep{^Lu} is the same as \eP{Lu}. |
626 |
.P |
.P |
627 |
If only one letter is specified with \ep or \eP, it includes all the general |
If only one letter is specified with \ep or \eP, it includes all the general |
628 |
category properties that start with that letter. In this case, in the absence |
category properties that start with that letter. In this case, in the absence |
724 |
properties in PCRE. |
properties in PCRE. |
725 |
. |
. |
726 |
. |
. |
727 |
|
.\" HTML <a name="extraprops"></a> |
728 |
|
.SS PCRE's additional properties |
729 |
|
.rs |
730 |
|
.sp |
731 |
|
As well as the standard Unicode properties described in the previous |
732 |
|
section, PCRE supports four more that make it possible to convert traditional |
733 |
|
escape sequences such as \ew and \es and POSIX character classes to use Unicode |
734 |
|
properties. These are: |
735 |
|
.sp |
736 |
|
Xan Any alphanumeric character |
737 |
|
Xps Any POSIX space character |
738 |
|
Xsp Any Perl space character |
739 |
|
Xwd Any Perl "word" character |
740 |
|
.sp |
741 |
|
Xan matches characters that have either the L (letter) or the N (number) |
742 |
|
property. Xps matches the characters tab, linefeed, vertical tab, formfeed, or |
743 |
|
carriage return, and any other character that has the Z (separator) property. |
744 |
|
Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the |
745 |
|
same characters as Xan, plus underscore. |
746 |
|
. |
747 |
|
. |
748 |
.\" HTML <a name="resetmatchstart"></a> |
.\" HTML <a name="resetmatchstart"></a> |
749 |
.SS "Resetting the match start" |
.SS "Resetting the match start" |
750 |
.rs |
.rs |
2624 |
.rs |
.rs |
2625 |
.sp |
.sp |
2626 |
.nf |
.nf |
2627 |
Last updated: 03 May 2010 |
Last updated: 05 May 2010 |
2628 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
2629 |
.fi |
.fi |