/[pcre]/code/trunk/doc/pcreunicode.3
ViewVC logotype

Diff of /code/trunk/doc/pcreunicode.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1220 by ph10, Sun Nov 11 18:04:37 2012 UTC revision 1221 by ph10, Sun Nov 11 20:27:03 2012 UTC
# Line 26  instead of strings of individual 1-byte Line 26  instead of strings of individual 1-byte
26  .SH "UTF-16 AND UTF-32 SUPPORT"  .SH "UTF-16 AND UTF-32 SUPPORT"
27  .rs  .rs
28  .sp  .sp
29  In order process UTF-16 or UTF-32 strings, you must build PCRE's 16-bit or  In order process UTF-16 or UTF-32 strings, you must build PCRE's 16-bit or
30  32-bit library with UTF support, and, in addition, you must call  32-bit library with UTF support, and, in addition, you must call
31  .\" HREF  .\" HREF
32  \fBpcre16_compile()\fP  \fBpcre16_compile()\fP
# Line 90  Characters in the "Surrogate Area" of Un Line 90  Characters in the "Surrogate Area" of Un
90  where they are used in pairs to encode codepoints with values greater than  where they are used in pairs to encode codepoints with values greater than
91  0xFFFF. The code points that are encoded by UTF-16 pairs are available  0xFFFF. The code points that are encoded by UTF-16 pairs are available
92  independently in the UTF-8 and UTF-32 encodings. (In other words, the whole  independently in the UTF-8 and UTF-32 encodings. (In other words, the whole
93  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and  surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and
94  UTF-32.)  UTF-32.)
95  .P  .P
96  Also excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF  Also excluded are the "Non-Character" code points, which are U+FDD0 to U+FDEF
# Line 109  If you set the PCRE_NO_UTF8_CHECK flag a Line 109  If you set the PCRE_NO_UTF8_CHECK flag a
109  assumes that the pattern or subject it is given (respectively) contains only  assumes that the pattern or subject it is given (respectively) contains only
110  valid UTF-8 codes. In this case, it does not diagnose an invalid UTF-8 string.  valid UTF-8 codes. In this case, it does not diagnose an invalid UTF-8 string.
111  .P  .P
112  Note that passing PCRE_NO_UTF8_CHECK to \fBpcre_compile()\fP just disables the  Note that passing PCRE_NO_UTF8_CHECK to \fBpcre_compile()\fP just disables the
113  check for the pattern; it does not also apply to subject strings. If you want  check for the pattern; it does not also apply to subject strings. If you want
114  to disable the check for a subject string you must pass this option to  to disable the check for a subject string you must pass this option to
115  \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP.  \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP.
116  .P  .P
117  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, the result  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, the result
118  is undefined and your program may crash.  is undefined and your program may crash.
119  .  .
120  .  .
# Line 166  In some situations, you may already know Line 166  In some situations, you may already know
166  therefore want to skip these checks in order to improve performance. If you set  therefore want to skip these checks in order to improve performance. If you set
167  the PCRE_NO_UTF32_CHECK flag at compile time or at run time, PCRE assumes that  the PCRE_NO_UTF32_CHECK flag at compile time or at run time, PCRE assumes that
168  the pattern or subject it is given (respectively) contains only valid UTF-32  the pattern or subject it is given (respectively) contains only valid UTF-32
169  sequences. In this case, it does not diagnose an invalid UTF-32 string.  sequences. In this case, it does not diagnose an invalid UTF-32 string.
170  However, if an invalid string is passed, the result is undefined.  However, if an invalid string is passed, the result is undefined.
171  .  .
172  .  .

Legend:
Removed from v.1220  
changed lines
  Added in v.1221

  ViewVC Help
Powered by ViewVC 1.1.5