/[pcre]/code/trunk/doc/pcre.3
ViewVC logotype

Diff of /code/trunk/doc/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 517 by ph10, Mon Mar 1 17:45:08 2010 UTC revision 518 by ph10, Tue May 18 15:47:01 2010 UTC
# Line 11  appeared in Perl are also available usin Line 11  appeared in Perl are also available usin
11  support for one or two .NET and Oniguruma syntax items, and there is an option  support for one or two .NET and Oniguruma syntax items, and there is an option
12  for requesting some minor changes that give better JavaScript compatibility.  for requesting some minor changes that give better JavaScript compatibility.
13  .P  .P
14  The current implementation of PCRE corresponds approximately with Perl 5.10,  The current implementation of PCRE corresponds approximately with Perl
15  including support for UTF-8 encoded strings and Unicode general category  5.10/5.11, including support for UTF-8 encoded strings and Unicode general
16  properties. However, UTF-8 and Unicode support has to be explicitly enabled; it  category properties. However, UTF-8 and Unicode support has to be explicitly
17  is not the default. The Unicode tables correspond to Unicode release 5.2.0.  enabled; it is not the default. The Unicode tables correspond to Unicode
18    release 5.2.0.
19  .P  .P
20  In addition to the Perl-compatible matching function, PCRE contains an  In addition to the Perl-compatible matching function, PCRE contains an
21  alternative function that matches the same compiled patterns in a different  alternative function that matches the same compiled patterns in a different
# Line 150  documentation. Line 151  documentation.
151  .  .
152  .  .
153  .\" HTML <a name="utf8support"></a>  .\" HTML <a name="utf8support"></a>
 .  
154  .SH "UTF-8 AND UNICODE PROPERTY SUPPORT"  .SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
155  .rs  .rs
156  .sp  .sp
# Line 189  compatibility with Perl 5.6. PCRE does n Line 189  compatibility with Perl 5.6. PCRE does n
189  .  .
190  .  .
191  .\" HTML <a name="utf8strings"></a>  .\" HTML <a name="utf8strings"></a>
 .  
192  .SS "Validity of UTF-8 strings"  .SS "Validity of UTF-8 strings"
193  .rs  .rs
194  .sp  .sp
# Line 231  encoded in a UTF-8-like manner as per th Line 230  encoded in a UTF-8-like manner as per th
230  PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in this  PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in this
231  situation, you will have to apply your own validity check.  situation, you will have to apply your own validity check.
232  .  .
233    .
234  .SS "General comments about UTF-8 mode"  .SS "General comments about UTF-8 mode"
235  .rs  .rs
236  .sp  .sp
# Line 250  but its use can lead to some strange eff Line 250  but its use can lead to some strange eff
250  the alternative matching function, \fBpcre_dfa_exec()\fP.  the alternative matching function, \fBpcre_dfa_exec()\fP.
251  .P  .P
252  6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly  6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
253  test characters of any code value, but the characters that PCRE recognizes as  test characters of any code value, but, by default, the characters that PCRE
254  digits, spaces, or word characters remain the same set as before, all with  recognizes as digits, spaces, or word characters remain the same set as before,
255  values less than 256. This remains true even when PCRE includes Unicode  all with values less than 256. This remains true even when PCRE is built to
256  property support, because to do otherwise would slow down PCRE in many common  include Unicode property support, because to do otherwise would slow down PCRE
257  cases. If you really want to test for a wider sense of, say, "digit", you  in many common cases. Note that this also applies to \eb, because it is defined
258  must use Unicode property tests such as \ep{Nd}. Note that this also applies to  in terms of \ew and \eW. If you really want to test for a wider sense of, say,
259  \eb, because it is defined in terms of \ew and \eW.  "digit", you can use explicit Unicode property tests such as \ep{Nd}.
260    Alternatively, if you set the PCRE_UCP option, the way that the character
261    escapes work is changed so that Unicode properties are used to determine which
262    characters match. There are more details in the section on
263    .\" HTML <a href="pcrepattern.html#genericchartypes">
264    .\" </a>
265    generic character types
266    .\"
267    in the
268    .\" HREF
269    \fBpcrepattern\fP
270    .\"
271    documentation.
272  .P  .P
273  7. Similarly, characters that match the POSIX named character classes are all  7. Similarly, characters that match the POSIX named character classes are all
274  low-valued characters.  low-valued characters, unless the PCRE_UCP option is set.
275  .P  .P
276  8. However, the Perl 5.10 horizontal and vertical whitespace matching escapes  8. However, the Perl 5.10 horizontal and vertical whitespace matching escapes
277  (\eh, \eH, \ev, and \eV) do match all the appropriate Unicode characters.  (\eh, \eH, \ev, and \eV) do match all the appropriate Unicode characters,
278    whether or not PCRE_UCP is set.
279  .P  .P
280  9. Case-insensitive matching applies only to characters whose values are less  9. Case-insensitive matching applies only to characters whose values are less
281  than 128, unless PCRE is built with Unicode property support. Even when Unicode  than 128, unless PCRE is built with Unicode property support. Even when Unicode
# Line 293  two digits 10, at the domain cam.ac.uk. Line 306  two digits 10, at the domain cam.ac.uk.
306  .rs  .rs
307  .sp  .sp
308  .nf  .nf
309  Last updated: 01 March 2010  Last updated: 12 May 2010
310  Copyright (c) 1997-2010 University of Cambridge.  Copyright (c) 1997-2010 University of Cambridge.
311  .fi  .fi

Legend:
Removed from v.517  
changed lines
  Added in v.518

  ViewVC Help
Powered by ViewVC 1.1.5