/[pcre]/code/trunk/doc/pcreunicode.3
ViewVC logotype

Diff of /code/trunk/doc/pcreunicode.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 878 by ph10, Sun Jan 15 15:44:47 2012 UTC revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC
# Line 5  PCRE - Perl-compatible regular expressio Line 5  PCRE - Perl-compatible regular expressio
5  .rs  .rs
6  .sp  .sp
7  From Release 8.30, in addition to its previous UTF-8 support, PCRE also  From Release 8.30, in addition to its previous UTF-8 support, PCRE also
8  supports UTF-16 by means of a separate 16-bit library. This can be built as  supports UTF-16 by means of a separate 16-bit library. This can be built as
9  well as, or instead of, the 8-bit library.  well as, or instead of, the 8-bit library.
10  .  .
11  .  .
# Line 77  releases of PCRE followed the rules of R Line 77  releases of PCRE followed the rules of R
77  range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.  range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.
78  .P  .P
79  The excluded code points are the "Surrogate Area" of Unicode. They are reserved  The excluded code points are the "Surrogate Area" of Unicode. They are reserved
80  for use by UTF-16, where they are used in pairs to encode codepoints with  for use by UTF-16, where they are used in pairs to encode codepoints with
81  values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs  values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs
82  are available independently in the UTF-8 encoding. (In other words, the whole  are available independently in the UTF-8 encoding. (In other words, the whole
83  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
# Line 148  two-byte characters for values greater t Line 148  two-byte characters for values greater t
148  3. Repeat quantifiers apply to complete UTF characters, not to individual  3. Repeat quantifiers apply to complete UTF characters, not to individual
149  data units, for example: \ex{100}{3}.  data units, for example: \ex{100}{3}.
150  .P  .P
151  4. The dot metacharacter matches one UTF character instead of a single data  4. The dot metacharacter matches one UTF character instead of a single data
152  unit.  unit.
153  .P  .P
154  5. The escape sequence \eC can be used to match a single byte in UTF-8 mode, or  5. The escape sequence \eC can be used to match a single byte in UTF-8 mode, or
# Line 166  be carried out by the normal interpretiv Line 166  be carried out by the normal interpretiv
166  .P  .P
167  6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly  6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
168  test characters of any code value, but, by default, the characters that PCRE  test characters of any code value, but, by default, the characters that PCRE
169  recognizes as digits, spaces, or word characters remain the same set as in  recognizes as digits, spaces, or word characters remain the same set as in
170  non-UTF mode, all with values less than 256. This remains true even when PCRE  non-UTF mode, all with values less than 256. This remains true even when PCRE
171  is built to include Unicode property support, because to do otherwise would  is built to include Unicode property support, because to do otherwise would
172  slow down PCRE in many common cases. Note in particular that this applies to  slow down PCRE in many common cases. Note in particular that this applies to

Legend:
Removed from v.878  
changed lines
  Added in v.903

  ViewVC Help
Powered by ViewVC 1.1.5