/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1376 by ph10, Sat Oct 12 18:02:11 2013 UTC revision 1387 by ph10, Sat Nov 2 18:29:05 2013 UTC
# Line 1  Line 1 
1  .TH PCREPATTERN 3 "12 October 2013" "PCRE 8.34"  .TH PCREPATTERN 3 "02 November 2013" "PCRE 8.34"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .SH "PCRE REGULAR EXPRESSION DETAILS"  .SH "PCRE REGULAR EXPRESSION DETAILS"
# Line 925  the "mark" property always have the "ext Line 925  the "mark" property always have the "ext
925  .sp  .sp
926  As well as the standard Unicode properties described above, PCRE supports four  As well as the standard Unicode properties described above, PCRE supports four
927  more that make it possible to convert traditional escape sequences such as \ew  more that make it possible to convert traditional escape sequences such as \ew
928  and \es and POSIX character classes to use Unicode properties. PCRE uses these  and \es to use Unicode properties. PCRE uses these non-standard, non-Perl
929  non-standard, non-Perl properties internally when PCRE_UCP is set. However,  properties internally when PCRE_UCP is set. However, they may also be used
930  they may also be used explicitly. These properties are:  explicitly. These properties are:
931  .sp  .sp
932    Xan   Any alphanumeric character    Xan   Any alphanumeric character
933    Xps   Any POSIX space character    Xps   Any POSIX space character
# Line 937  they may also be used explicitly. These Line 937  they may also be used explicitly. These
937  Xan matches characters that have either the L (letter) or the N (number)  Xan matches characters that have either the L (letter) or the N (number)
938  property. Xps matches the characters tab, linefeed, vertical tab, form feed, or  property. Xps matches the characters tab, linefeed, vertical tab, form feed, or
939  carriage return, and any other character that has the Z (separator) property.  carriage return, and any other character that has the Z (separator) property.
940  Xsp is the same as Xps, except that vertical tab is excluded. Xwd matches the  Xsp is the same as Xps; it used to exclude vertical tab, for Perl
941  same characters as Xan, plus underscore.  compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd
942    matches the same characters as Xan, plus underscore.
943  .P  .P
944  There is another non-standard property, Xuc, which matches any character that  There is another non-standard property, Xuc, which matches any character that
945  can be represented by a Universal Character Name in C++ and other programming  can be represented by a Universal Character Name in C++ and other programming
# Line 1332  supported, and an error is given if they Line 1333  supported, and an error is given if they
1333  By default, in UTF modes, characters with values greater than 128 do not match  By default, in UTF modes, characters with values greater than 128 do not match
1334  any of the POSIX character classes. However, if the PCRE_UCP option is passed  any of the POSIX character classes. However, if the PCRE_UCP option is passed
1335  to \fBpcre_compile()\fP, some of the classes are changed so that Unicode  to \fBpcre_compile()\fP, some of the classes are changed so that Unicode
1336  character properties are used. This is achieved by replacing the POSIX classes  character properties are used. This is achieved by replacing certain POSIX
1337  by other sequences, as follows:  classes by other sequences, as follows:
1338  .sp  .sp
1339    [:alnum:]  becomes  \ep{Xan}    [:alnum:]  becomes  \ep{Xan}
1340    [:alpha:]  becomes  \ep{L}    [:alpha:]  becomes  \ep{L}
# Line 1344  by other sequences, as follows: Line 1345  by other sequences, as follows:
1345    [:upper:]  becomes  \ep{Lu}    [:upper:]  becomes  \ep{Lu}
1346    [:word:]   becomes  \ep{Xwd}    [:word:]   becomes  \ep{Xwd}
1347  .sp  .sp
1348  Negated versions, such as [:^alpha:] use \eP instead of \ep. The other POSIX  Negated versions, such as [:^alpha:] use \eP instead of \ep. Three other POSIX
1349  classes are unchanged, and match only characters with code points less than  classes are handled specially in UCP mode:
1350  128.  .TP 10
1351    [:graph:]
1352    This matches characters that have glyphs that mark the page when printed. In
1353    Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf
1354    properties, except for:
1355    .sp
1356      U+061C           Arabic Letter Mark
1357      U+180E           Mongolian Vowel Separator
1358      U+2066 - U+2069  Various "isolate"s
1359    .sp
1360    .TP 10
1361    [:print:]
1362    This matches the same characters as [:graph:] plus space characters that are
1363    not controls, that is, characters with the Zs property.
1364    .TP 10
1365    [:punct:]
1366    This matches all characters that have the Unicode P (punctuation) property,
1367    plus those characters whose code points are less than 128 that have the S
1368    (Symbol) property.
1369    .P
1370    The other POSIX classes are unchanged, and match only characters with code
1371    points less than 128.
1372  .  .
1373  .  .
1374  .SH "VERTICAL BAR"  .SH "VERTICAL BAR"
# Line 3176  Cambridge CB2 3QH, England. Line 3198  Cambridge CB2 3QH, England.
3198  .rs  .rs
3199  .sp  .sp
3200  .nf  .nf
3201  Last updated: 12 October 2013  Last updated: 02 November 2013
3202  Copyright (c) 1997-2013 University of Cambridge.  Copyright (c) 1997-2013 University of Cambridge.
3203  .fi  .fi

Legend:
Removed from v.1376  
changed lines
  Added in v.1387

  ViewVC Help
Powered by ViewVC 1.1.5