/[pcre]/code/trunk/doc/html/pcreunicode.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreunicode.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 958 by ph10, Sat Jan 21 16:37:17 2012 UTC revision 959 by ph10, Sat Apr 14 16:16:58 2012 UTC
# Line 74  Validity of UTF-8 strings Line 74  Validity of UTF-8 strings
74  <P>  <P>
75  When you set the PCRE_UTF8 flag, the byte strings passed as patterns and  When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
76  subjects are (by default) checked for validity on entry to the relevant  subjects are (by default) checked for validity on entry to the relevant
77  functions. From release 7.3 of PCRE, the check is according the rules of RFC  functions. The entire string is checked before any other processing takes
78  3629, which are themselves derived from the Unicode specification. Earlier  place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
79  releases of PCRE followed the rules of RFC 2279, which allows the full range of  which are themselves derived from the Unicode specification. Earlier releases
80  31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the  of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
81  range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.  values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
82    to U+10FFFF, excluding U+D800 to U+DFFF.
83  </P>  </P>
84  <P>  <P>
85  The excluded code points are the "Surrogate Area" of Unicode. They are reserved  The excluded code points are the "Surrogate Area" of Unicode. They are reserved
# Line 96  detailed reason code if the caller has p Line 97  detailed reason code if the caller has p
97  </P>  </P>
98  <P>  <P>
99  In some situations, you may already know that your strings are valid, and  In some situations, you may already know that your strings are valid, and
100  therefore want to skip these checks in order to improve performance. If you set  therefore want to skip these checks in order to improve performance, for
101  the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that  example in the case of a long subject string that is being scanned repeatedly
102  the pattern or subject it is given (respectively) contains only valid UTF-8  with different patterns. If you set the PCRE_NO_UTF8_CHECK flag at compile time
103  codes. In this case, it does not diagnose an invalid UTF-8 string.  or at run time, PCRE assumes that the pattern or subject it is given
104    (respectively) contains only valid UTF-8 codes. In this case, it does not
105    diagnose an invalid UTF-8 string.
106  </P>  </P>
107  <P>  <P>
108  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
# Line 228  Cambridge CB2 3QH, England. Line 231  Cambridge CB2 3QH, England.
231  REVISION  REVISION
232  </b><br>  </b><br>
233  <P>  <P>
234  Last updated: 13 January 2012  Last updated: 14 April 2012
235  <br>  <br>
236  Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
237  <br>  <br>

Legend:
Removed from v.958  
changed lines
  Added in v.959

  ViewVC Help
Powered by ViewVC 1.1.5