260 |
Another use of backslash is for specifying generic character types. The |
Another use of backslash is for specifying generic character types. The |
261 |
following are always recognized: |
following are always recognized: |
262 |
.sp |
.sp |
263 |
\ed any decimal digit |
\ed any decimal digit |
264 |
\eD any character that is not a decimal digit |
\eD any character that is not a decimal digit |
265 |
\eh any horizontal whitespace character |
\eh any horizontal whitespace character |
266 |
\eH any character that is not a horizontal whitespace character |
\eH any character that is not a horizontal whitespace character |
267 |
\es any whitespace character |
\es any whitespace character |
268 |
\eS any character that is not a whitespace character |
\eS any character that is not a whitespace character |
269 |
\ev any vertical whitespace character |
\ev any vertical whitespace character |
270 |
\eV any character that is not a vertical whitespace character |
\eV any character that is not a vertical whitespace character |
271 |
\ew any "word" character |
\ew any "word" character |
272 |
\eW any "non-word" character |
\eW any "non-word" character |
273 |
.sp |
.sp |
287 |
.P |
.P |
288 |
In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or |
In UTF-8 mode, characters with values greater than 128 never match \ed, \es, or |
289 |
\ew, and always match \eD, \eS, and \eW. This is true even when Unicode |
\ew, and always match \eD, \eS, and \eW. This is true even when Unicode |
290 |
character property support is available. These sequences retain their original |
character property support is available. These sequences retain their original |
291 |
meanings from before UTF-8 support was available, mainly for efficiency |
meanings from before UTF-8 support was available, mainly for efficiency |
292 |
reasons. |
reasons. |
293 |
.P |
.P |
294 |
The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the |
The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the |
295 |
other sequences, these do match certain high-valued codepoints in UTF-8 mode. |
other sequences, these do match certain high-valued codepoints in UTF-8 mode. |
296 |
The horizontal space characters are: |
The horizontal space characters are: |
297 |
.sp |
.sp |
1001 |
.SH "DUPLICATE SUBPATTERN NUMBERS" |
.SH "DUPLICATE SUBPATTERN NUMBERS" |
1002 |
.rs |
.rs |
1003 |
.sp |
.sp |
1004 |
Perl 5.10 introduced a feature whereby each alternative in a subpattern uses |
Perl 5.10 introduced a feature whereby each alternative in a subpattern uses |
1005 |
the same numbers for its capturing parentheses. Such a subpattern starts with |
the same numbers for its capturing parentheses. Such a subpattern starts with |
1006 |
(?| and is itself a non-capturing subpattern. For example, consider this |
(?| and is itself a non-capturing subpattern. For example, consider this |
1007 |
pattern: |
pattern: |
1008 |
.sp |
.sp |
1009 |
(?|(Sat)ur|(Sun))day |
(?|(Sat)ur|(Sun))day |
1010 |
.sp |
.sp |
1011 |
Because the two alternatives are inside a (?| group, both sets of capturing |
Because the two alternatives are inside a (?| group, both sets of capturing |
1012 |
parentheses are numbered one. Thus, when the pattern matches, you can look |
parentheses are numbered one. Thus, when the pattern matches, you can look |
1013 |
at captured substring number one, whichever alternative matched. This construct |
at captured substring number one, whichever alternative matched. This construct |
1014 |
is useful when you want to capture part, but not all, of one of a number of |
is useful when you want to capture part, but not all, of one of a number of |
1015 |
alternatives. Inside a (?| group, parentheses are numbered as usual, but the |
alternatives. Inside a (?| group, parentheses are numbered as usual, but the |
1016 |
number is reset at the start of each branch. The numbers of any capturing |
number is reset at the start of each branch. The numbers of any capturing |
1017 |
buffers that follow the subpattern start after the highest number used in any |
buffers that follow the subpattern start after the highest number used in any |
1018 |
branch. The following example is taken from the Perl documentation. |
branch. The following example is taken from the Perl documentation. |
1019 |
The numbers underneath show in which buffer the captured content will be |
The numbers underneath show in which buffer the captured content will be |
1020 |
stored. |
stored. |
1021 |
.sp |
.sp |
1022 |
# before ---------------branch-reset----------- after |
# before ---------------branch-reset----------- after |
1023 |
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x |
/ ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x |
1024 |
# 1 2 2 3 2 3 4 |
# 1 2 2 3 2 3 4 |
1025 |
.sp |
.sp |
1026 |
A backreference or a recursive call to a numbered subpattern always refers to |
A backreference or a recursive call to a numbered subpattern always refers to |
1027 |
the first one in the pattern with the given number. |
the first one in the pattern with the given number. |
1028 |
.P |
.P |
1079 |
(?<DN>Sat)(?:urday)? |
(?<DN>Sat)(?:urday)? |
1080 |
.sp |
.sp |
1081 |
There are five capturing substrings, but only one is ever set after a match. |
There are five capturing substrings, but only one is ever set after a match. |
1082 |
(An alternative way of solving this problem is to use a "branch reset" |
(An alternative way of solving this problem is to use a "branch reset" |
1083 |
subpattern, as described in the previous section.) |
subpattern, as described in the previous section.) |
1084 |
.P |
.P |
1085 |
The convenience function for extracting the data by name returns the substring |
The convenience function for extracting the data by name returns the substring |