/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1398 by ph10, Tue Nov 12 15:20:26 2013 UTC revision 1401 by ph10, Tue Nov 12 17:44:07 2013 UTC
# Line 533  there is no character to match. Line 533  there is no character to match.
533  .P  .P
534  For compatibility with Perl, \es did not used to match the VT character (code  For compatibility with Perl, \es did not used to match the VT character (code
535  11), which made it different from the the POSIX "space" class. However, Perl  11), which made it different from the the POSIX "space" class. However, Perl
536  added VT at release 5.18, and PCRE followed suit at release 8.34. The \es  added VT at release 5.18, and PCRE followed suit at release 8.34. The default
537  characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space (32).  \es characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space
538    (32), which are defined as white space in the "C" locale. This list may vary if
539    locale-specific matching is taking place; in particular, in some locales the
540    "non-breaking space" character (\exA0) is recognized as white space.
541  .P  .P
542  A "word" character is an underscore or any character that is a letter or digit.  A "word" character is an underscore or any character that is a letter or digit.
543  By default, the definition of letters and digits is controlled by PCRE's  By default, the definition of letters and digits is controlled by PCRE's
# Line 549  in the Line 552  in the
552  \fBpcreapi\fP  \fBpcreapi\fP
553  .\"  .\"
554  page). For example, in a French locale such as "fr_FR" in Unix-like systems,  page). For example, in a French locale such as "fr_FR" in Unix-like systems,
555  or "french" in Windows, some character codes greater than 128 are used for  or "french" in Windows, some character codes greater than 127 are used for
556  accented letters, and these are then matched by \ew. The use of locales with  accented letters, and these are then matched by \ew. The use of locales with
557  Unicode is discouraged.  Unicode is discouraged.
558  .P  .P
559  By default, in a UTF mode, characters with values greater than 128 never match  By default, characters whose code points are greater than 127 never match \ed,
560  \ed, \es, or \ew, and always match \eD, \eS, and \eW. These sequences retain  \es, or \ew, and always match \eD, \eS, and \eW, although this may vary for
561  their original meanings from before UTF support was available, mainly for  characters in the range 128-255 when locale-specific matching is happening.
562  efficiency reasons. However, if PCRE is compiled with Unicode property support,  These escape sequences retain their original meanings from before Unicode
563  and the PCRE_UCP option is set, the behaviour is changed so that Unicode  support was available, mainly for efficiency reasons. If PCRE is compiled with
564  properties are used to determine character types, as follows:  Unicode property support, and the PCRE_UCP option is set, the behaviour is
565    changed so that Unicode properties are used to determine character types, as
566    follows:
567  .sp  .sp
568    \ed  any character that matches \ep{Nd} (decimal digit)    \ed  any character that matches \ep{Nd} (decimal digit)
569    \es  any character that matches \ep{Z} or \eh or \ev    \es  any character that matches \ep{Z} or \eh or \ev
# Line 572  is noticeably slower when PCRE_UCP is se Line 577  is noticeably slower when PCRE_UCP is se
577  .P  .P
578  The sequences \eh, \eH, \ev, and \eV are features that were added to Perl at  The sequences \eh, \eH, \ev, and \eV are features that were added to Perl at
579  release 5.10. In contrast to the other sequences, which match only ASCII  release 5.10. In contrast to the other sequences, which match only ASCII
580  characters by default, these always match certain high-valued codepoints,  characters by default, these always match certain high-valued code points,
581  whether or not PCRE_UCP is set. The horizontal space characters are:  whether or not PCRE_UCP is set. The horizontal space characters are:
582  .sp  .sp
583    U+0009     Horizontal tab (HT)    U+0009     Horizontal tab (HT)
# Line 1339  are: Line 1344  are:
1344    word     "word" characters (same as \ew)    word     "word" characters (same as \ew)
1345    xdigit   hexadecimal digits    xdigit   hexadecimal digits
1346  .sp  .sp
1347  The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and  The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
1348  space (32). "Space" used to be different to \es, which did not include VT, for  and space (32). If locale-specific matching is taking place, there may be
1349  Perl compatibility. However, Perl changed at release 5.18, and PCRE followed at  additional space characters. "Space" used to be different to \es, which did not
1350  release 8.34. "Space" and \es now match the same set of characters.  include VT, for Perl compatibility. However, Perl changed at release 5.18, and
1351    PCRE followed at release 8.34. "Space" and \es now match the same set of
1352    characters.
1353  .P  .P
1354  The name "word" is a Perl extension, and "blank" is a GNU extension from Perl  The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
1355  5.8. Another Perl extension is negation, which is indicated by a ^ character  5.8. Another Perl extension is negation, which is indicated by a ^ character
# Line 1354  matches "1", "2", or any non-digit. PCRE Line 1361  matches "1", "2", or any non-digit. PCRE
1361  syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not  syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
1362  supported, and an error is given if they are encountered.  supported, and an error is given if they are encountered.
1363  .P  .P
1364  By default, in UTF modes, characters with values greater than 128 do not match  By default, characters with values greater than 128 do not match any of the
1365  any of the POSIX character classes. However, if the PCRE_UCP option is passed  POSIX character classes. However, if the PCRE_UCP option is passed to
1366  to \fBpcre_compile()\fP, some of the classes are changed so that Unicode  \fBpcre_compile()\fP, some of the classes are changed so that Unicode character
1367  character properties are used. This is achieved by replacing certain POSIX  properties are used. This is achieved by replacing certain POSIX classes by
1368  classes by other sequences, as follows:  other sequences, as follows:
1369  .sp  .sp
1370    [:alnum:]  becomes  \ep{Xan}    [:alnum:]  becomes  \ep{Xan}
1371    [:alpha:]  becomes  \ep{L}    [:alpha:]  becomes  \ep{L}

Legend:
Removed from v.1398  
changed lines
  Added in v.1401

  ViewVC Help
Powered by ViewVC 1.1.5