/[pcre]/code/trunk/doc/html/pcreunicode.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreunicode.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 902 by ph10, Sun Jan 15 15:44:47 2012 UTC revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC
# Line 17  UTF-8, UTF-16, AND UNICODE PROPERTY SUPP Line 17  UTF-8, UTF-16, AND UNICODE PROPERTY SUPP
17  </b><br>  </b><br>
18  <P>  <P>
19  From Release 8.30, in addition to its previous UTF-8 support, PCRE also  From Release 8.30, in addition to its previous UTF-8 support, PCRE also
20  supports UTF-16 by means of a separate 16-bit library. This can be built as  supports UTF-16 by means of a separate 16-bit library. This can be built as
21  well as, or instead of, the 8-bit library.  well as, or instead of, the 8-bit library.
22  </P>  </P>
23  <br><b>  <br><b>
# Line 82  range U+0 to U+10FFFF, excluding U+D800 Line 82  range U+0 to U+10FFFF, excluding U+D800
82  </P>  </P>
83  <P>  <P>
84  The excluded code points are the "Surrogate Area" of Unicode. They are reserved  The excluded code points are the "Surrogate Area" of Unicode. They are reserved
85  for use by UTF-16, where they are used in pairs to encode codepoints with  for use by UTF-16, where they are used in pairs to encode codepoints with
86  values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs  values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs
87  are available independently in the UTF-8 encoding. (In other words, the whole  are available independently in the UTF-8 encoding. (In other words, the whole
88  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
# Line 161  two-byte characters for values greater t Line 161  two-byte characters for values greater t
161  data units, for example: \x{100}{3}.  data units, for example: \x{100}{3}.
162  </P>  </P>
163  <P>  <P>
164  4. The dot metacharacter matches one UTF character instead of a single data  4. The dot metacharacter matches one UTF character instead of a single data
165  unit.  unit.
166  </P>  </P>
167  <P>  <P>
# Line 179  be carried out by the normal interpretiv Line 179  be carried out by the normal interpretiv
179  <P>  <P>
180  6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly  6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
181  test characters of any code value, but, by default, the characters that PCRE  test characters of any code value, but, by default, the characters that PCRE
182  recognizes as digits, spaces, or word characters remain the same set as in  recognizes as digits, spaces, or word characters remain the same set as in
183  non-UTF mode, all with values less than 256. This remains true even when PCRE  non-UTF mode, all with values less than 256. This remains true even when PCRE
184  is built to include Unicode property support, because to do otherwise would  is built to include Unicode property support, because to do otherwise would
185  slow down PCRE in many common cases. Note in particular that this applies to  slow down PCRE in many common cases. Note in particular that this applies to

Legend:
Removed from v.902  
changed lines
  Added in v.903

  ViewVC Help
Powered by ViewVC 1.1.5