/[pcre]/code/trunk/doc/pcreunicode.3
ViewVC logotype

Diff of /code/trunk/doc/pcreunicode.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1103 by chpe, Tue Oct 16 15:56:38 2012 UTC revision 1107 by chpe, Tue Oct 16 15:56:51 2012 UTC
# Line 93  place. From release 7.3 of PCRE, the che Line 93  place. From release 7.3 of PCRE, the che
93  which are themselves derived from the Unicode specification. Earlier releases  which are themselves derived from the Unicode specification. Earlier releases
94  of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit  of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
95  values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0  values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
96  to U+10FFFF, excluding U+D800 to U+DFFF.  to U+10FFFF, excluding the surrogate area, and the non-characters.
97  .P  .P
98  The excluded code points are the "Surrogate Area" of Unicode. They are reserved  Excluded code points are the "Surrogate Area" of Unicode. They are reserved
99  for use by UTF-16, where they are used in pairs to encode codepoints with  for use by UTF-16, where they are used in pairs to encode codepoints with
100  values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs  values greater than 0xFFFF. The code points that are encoded by UTF-16 pairs
101  are available independently in the UTF-8 encoding. (In other words, the whole  are available independently in the UTF-8 encoding. (In other words, the whole
102  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
103  .P  .P
104    Also excluded are the "Non-Characters" code points, which are U+FDD0 to U+FDEF
105    and the last two code points in each plane, U+??FFFE and U+??FFFF.
106    .P
107  If an invalid UTF-8 string is passed to PCRE, an error return is given. At  If an invalid UTF-8 string is passed to PCRE, an error return is given. At
108  compile time, the only additional information is the offset to the first byte  compile time, the only additional information is the offset to the first byte
109  of the failing character. The run-time functions \fBpcre_exec()\fP and  of the failing character. The run-time functions \fBpcre_exec()\fP and
# Line 143  to the relevant functions. Values other Line 146  to the relevant functions. Values other
146  U+D800 to U+DFFF are independent code points. Values in the surrogate range  U+D800 to U+DFFF are independent code points. Values in the surrogate range
147  must be used in pairs in the correct manner.  must be used in pairs in the correct manner.
148  .P  .P
149    Excluded are the "Non-Characters" code points, which are U+FDD0 to U+FDEF
150    and the last two code points in each plane, U+??FFFE and U+??FFFF.
151    .P
152  If an invalid UTF-16 string is passed to PCRE, an error return is given. At  If an invalid UTF-16 string is passed to PCRE, an error return is given. At
153  compile time, the only additional information is the offset to the first data  compile time, the only additional information is the offset to the first data
154  unit of the failing character. The run-time functions \fBpcre16_exec()\fP and  unit of the failing character. The run-time functions \fBpcre16_exec()\fP and
# Line 163  sequences. In this case, it does not dia Line 169  sequences. In this case, it does not dia
169  When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are  When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are
170  passed as patterns and subjects are (by default) checked for validity on entry  passed as patterns and subjects are (by default) checked for validity on entry
171  to the relevant functions.  This check allows only values in the range U+0  to the relevant functions.  This check allows only values in the range U+0
172  to U+10FFFF, excluding the surrogate are U+D800 to U+DFFF, and U+FFEF.  to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF, and the
173    "Non-Characters" code points, which are U+FDD0 to U+FDEF and the last two
174    characters in each plane, U+??FFFE and U+??FFFF.
175  .P  .P
176  If an invalid UTF-32 string is passed to PCRE, an error return is given. At  If an invalid UTF-32 string is passed to PCRE, an error return is given. At
177  compile time, the only additional information is the offset to the first data  compile time, the only additional information is the offset to the first data

Legend:
Removed from v.1103  
changed lines
  Added in v.1107

  ViewVC Help
Powered by ViewVC 1.1.5