/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 737 by ph10, Wed Oct 19 17:37:29 2011 UTC revision 738 by ph10, Fri Oct 21 09:04:01 2011 UTC
# Line 989  because in UTF-8 mode this would make it Line 989  because in UTF-8 mode this would make it
989  the lookbehind.  the lookbehind.
990  .P  .P
991  In general, the \eC escape sequence is best avoided in UTF-8 mode. However, one  In general, the \eC escape sequence is best avoided in UTF-8 mode. However, one
992  way of using it that avoids the problem of malformed UTF-8 characters is to  way of using it that avoids the problem of malformed UTF-8 characters is to
993  use a lookahead to check the length of the next character, as in this pattern  use a lookahead to check the length of the next character, as in this pattern
994  (ignore white space and line breaks):  (ignore white space and line breaks):
995  .sp  .sp
996    (?| (?=[\ex00-\ex7f])(\eC) |    (?| (?=[\ex00-\ex7f])(\eC) |
997        (?=[\ex80-\ex{7ff}])(\eC)(\eC) |        (?=[\ex80-\ex{7ff}])(\eC)(\eC) |
998        (?=[\ex{800}-\ex{ffff}])(\eC)(\eC)(\eC) |        (?=[\ex{800}-\ex{ffff}])(\eC)(\eC)(\eC) |
999        (?=[\ex{10000}-\ex{1fffff}])(\eC)(\eC)(\eC)(\eC))        (?=[\ex{10000}-\ex{1fffff}])(\eC)(\eC)(\eC)(\eC))
1000  .sp  .sp
1001  A group that starts with (?| resets the capturing parentheses numbers in each  A group that starts with (?| resets the capturing parentheses numbers in each
1002  alternative (see  alternative (see
1003  .\" HTML <a href="#dupsubpatternnumber">  .\" HTML <a href="#dupsubpatternnumber">
1004  .\" </a>  .\" </a>
1005  "Duplicate Subpattern Numbers"  "Duplicate Subpattern Numbers"
1006  .\"  .\"
1007  below). The assertions at the start of each branch check the next UTF-8  below). The assertions at the start of each branch check the next UTF-8
1008  character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The  character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The
1009  character's individual bytes are then captured by the appropriate number of  character's individual bytes are then captured by the appropriate number of
1010  groups.  groups.
1011  .  .

Legend:
Removed from v.737  
changed lines
  Added in v.738

  ViewVC Help
Powered by ViewVC 1.1.5