/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1219 by ph10, Sun Nov 11 18:04:37 2012 UTC revision 1221 by ph10, Sun Nov 11 20:27:03 2012 UTC
# Line 32  these special sequences: Line 32  these special sequences:
32    (*UTF8)    (*UTF8)
33    (*UTF16)    (*UTF16)
34    (*UTF32)    (*UTF32)
35    (*UTF)    (*UTF)
36  .sp  .sp
37  (*UTF) is a generic sequence that can be used with any of the libraries.  (*UTF) is a generic sequence that can be used with any of the libraries.
38  Starting a pattern with such a sequence is equivalent to setting the relevant  Starting a pattern with such a sequence is equivalent to setting the relevant
# Line 76  page. Line 76  page.
76  .SH "EBCDIC CHARACTER CODES"  .SH "EBCDIC CHARACTER CODES"
77  .rs  .rs
78  .sp  .sp
79  PCRE can be compiled to run in an environment that uses EBCDIC as its character  PCRE can be compiled to run in an environment that uses EBCDIC as its character
80  code rather than ASCII or Unicode (typically a mainframe system). In the  code rather than ASCII or Unicode (typically a mainframe system). In the
81  sections below, character code values are ASCII or Unicode; in an EBCDIC  sections below, character code values are ASCII or Unicode; in an EBCDIC
82  environment these characters may have different code values, and there are no  environment these characters may have different code values, and there are no
83  code points greater than 255.  code points greater than 255.
84  .  .
85  .  .
# Line 274  recognized when PCRE is compiled in EBCD Line 274  recognized when PCRE is compiled in EBCD
274  bytes. In this mode, all values are valid after \ec. If the next character is a  bytes. In this mode, all values are valid after \ec. If the next character is a
275  lower case letter, it is converted to upper case. Then the 0xc0 bits of the  lower case letter, it is converted to upper case. Then the 0xc0 bits of the
276  byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because  byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because
277  the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other  the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other
278  characters also generate different values.  characters also generate different values.
279  .P  .P
280  By default, after \ex, from zero to two hexadecimal digits are read (letters  By default, after \ex, from zero to two hexadecimal digits are read (letters
# Line 835  property, and creating rules that use th Line 835  property, and creating rules that use th
835  of extended grapheme clusters. In releases of PCRE later than 8.31, \eX matches  of extended grapheme clusters. In releases of PCRE later than 8.31, \eX matches
836  one of these clusters.  one of these clusters.
837  .P  .P
838  \eX always matches at least one character. Then it decides whether to add  \eX always matches at least one character. Then it decides whether to add
839  additional characters according to the following rules for ending a cluster:  additional characters according to the following rules for ending a cluster:
840  .P  .P
841  1. End at the end of the subject string.  1. End at the end of the subject string.
842  .P  .P
843  2. Do not end between CR and LF; otherwise end after any control character.  2. Do not end between CR and LF; otherwise end after any control character.
844  .P  .P
845  3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters  3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters
846  are of five types: L, V, T, LV, and LVT. An L character may be followed by an  are of five types: L, V, T, LV, and LVT. An L character may be followed by an
847  L, V, LV, or LVT character; an LV or V character may be followed by a V or T  L, V, LV, or LVT character; an LV or V character may be followed by a V or T
848  character; an LVT or T character may be follwed only by a T character.  character; an LVT or T character may be follwed only by a T character.
849  .P  .P
850  4. Do not end before extending characters or spacing marks. Characters with  4. Do not end before extending characters or spacing marks. Characters with
# Line 979  regular expression. Line 979  regular expression.
979  .SH "CIRCUMFLEX AND DOLLAR"  .SH "CIRCUMFLEX AND DOLLAR"
980  .rs  .rs
981  .sp  .sp
982  The circumflex and dollar metacharacters are zero-width assertions. That is,  The circumflex and dollar metacharacters are zero-width assertions. That is,
983  they test for a particular condition being true without consuming any  they test for a particular condition being true without consuming any
984  characters from the subject string.  characters from the subject string.
985  .P  .P
986  Outside a character class, in the default matching mode, the circumflex  Outside a character class, in the default matching mode, the circumflex
# Line 1693  one succeeds. Consider this pattern: Line 1693  one succeeds. Consider this pattern:
1693  .sp  .sp
1694    (?>.*?a)b    (?>.*?a)b
1695  .sp  .sp
1696  It matches "ab" in the subject "aab". The use of the backtracking control verbs  It matches "ab" in the subject "aab". The use of the backtracking control verbs
1697  (*PRUNE) and (*SKIP) also disable this optimization.  (*PRUNE) and (*SKIP) also disable this optimization.
1698  .P  .P
1699  When a capturing subpattern is repeated, the value captured is the substring  When a capturing subpattern is repeated, the value captured is the substring

Legend:
Removed from v.1219  
changed lines
  Added in v.1221

  ViewVC Help
Powered by ViewVC 1.1.5