/[pcre]/code/trunk/doc/pcreunicode.3
ViewVC logotype

Diff of /code/trunk/doc/pcreunicode.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 958 by ph10, Sat Mar 31 18:09:26 2012 UTC revision 959 by ph10, Sat Apr 14 16:16:58 2012 UTC
# Line 1  Line 1 
1  .TH PCREUNICODE 3 "13 January 2012" "PCRE 8.30"  .TH PCREUNICODE 3 "14 April 2012" "PCRE 8.30"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"  .SH "UTF-8, UTF-16, AND UNICODE PROPERTY SUPPORT"
# Line 70  compatibility with Perl 5.6. PCRE does n Line 70  compatibility with Perl 5.6. PCRE does n
70  .sp  .sp
71  When you set the PCRE_UTF8 flag, the byte strings passed as patterns and  When you set the PCRE_UTF8 flag, the byte strings passed as patterns and
72  subjects are (by default) checked for validity on entry to the relevant  subjects are (by default) checked for validity on entry to the relevant
73  functions. From release 7.3 of PCRE, the check is according the rules of RFC  functions. The entire string is checked before any other processing takes
74  3629, which are themselves derived from the Unicode specification. Earlier  place. From release 7.3 of PCRE, the check is according the rules of RFC 3629,
75  releases of PCRE followed the rules of RFC 2279, which allows the full range of  which are themselves derived from the Unicode specification. Earlier releases
76  31-bit values (0 to 0x7FFFFFFF). The current check allows only values in the  of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit
77  range U+0 to U+10FFFF, excluding U+D800 to U+DFFF.  values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0
78    to U+10FFFF, excluding U+D800 to U+DFFF.
79  .P  .P
80  The excluded code points are the "Surrogate Area" of Unicode. They are reserved  The excluded code points are the "Surrogate Area" of Unicode. They are reserved
81  for use by UTF-16, where they are used in pairs to encode codepoints with  for use by UTF-16, where they are used in pairs to encode codepoints with
# Line 89  of the failing character. The runtime fu Line 90  of the failing character. The runtime fu
90  detailed reason code if the caller has provided memory in which to do this.  detailed reason code if the caller has provided memory in which to do this.
91  .P  .P
92  In some situations, you may already know that your strings are valid, and  In some situations, you may already know that your strings are valid, and
93  therefore want to skip these checks in order to improve performance. If you set  therefore want to skip these checks in order to improve performance, for
94  the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE assumes that  example in the case of a long subject string that is being scanned repeatedly
95  the pattern or subject it is given (respectively) contains only valid UTF-8  with different patterns. If you set the PCRE_NO_UTF8_CHECK flag at compile time
96  codes. In this case, it does not diagnose an invalid UTF-8 string.  or at run time, PCRE assumes that the pattern or subject it is given
97    (respectively) contains only valid UTF-8 codes. In this case, it does not
98    diagnose an invalid UTF-8 string.
99  .P  .P
100  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what  If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
101  happens depends on why the string is invalid. If the string conforms to the  happens depends on why the string is invalid. If the string conforms to the
# Line 217  Cambridge CB2 3QH, England. Line 220  Cambridge CB2 3QH, England.
220  .rs  .rs
221  .sp  .sp
222  .nf  .nf
223  Last updated: 13 January 2012  Last updated: 14 April 2012
224  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
225  .fi  .fi

Legend:
Removed from v.958  
changed lines
  Added in v.959

  ViewVC Help
Powered by ViewVC 1.1.5