63 |
The original operation of PCRE was on strings of one-byte characters. However, |
The original operation of PCRE was on strings of one-byte characters. However, |
64 |
there is now also support for UTF-8 character strings. To use this, you must |
there is now also support for UTF-8 character strings. To use this, you must |
65 |
build PCRE to include UTF-8 support, and then call <b>pcre_compile()</b> with |
build PCRE to include UTF-8 support, and then call <b>pcre_compile()</b> with |
66 |
the PCRE_UTF8 option. How this affects pattern matching is mentioned in several |
the PCRE_UTF8 option. There is also a special sequence that can be given at the |
67 |
places below. There is also a summary of UTF-8 features in the |
start of a pattern: |
68 |
|
<pre> |
69 |
|
(*UTF8) |
70 |
|
</pre> |
71 |
|
Starting a pattern with this sequence is equivalent to setting the PCRE_UTF8 |
72 |
|
option. This feature is not Perl-compatible. How setting UTF-8 mode affects |
73 |
|
pattern matching is mentioned in several places below. There is also a summary |
74 |
|
of UTF-8 features in the |
75 |
<a href="pcre.html#utf8support">section on UTF-8 support</a> |
<a href="pcre.html#utf8support">section on UTF-8 support</a> |
76 |
in the main |
in the main |
77 |
<a href="pcre.html"><b>pcre</b></a> |
<a href="pcre.html"><b>pcre</b></a> |
1038 |
J, U and X respectively. |
J, U and X respectively. |
1039 |
</P> |
</P> |
1040 |
<P> |
<P> |
1041 |
When an option change occurs at top level (that is, not inside subpattern |
When one of these option changes occurs at top level (that is, not inside |
1042 |
parentheses), the change applies to the remainder of the pattern that follows. |
subpattern parentheses), the change applies to the remainder of the pattern |
1043 |
If the change is placed right at the start of a pattern, PCRE extracts it into |
that follows. If the change is placed right at the start of a pattern, PCRE |
1044 |
the global options (and it will therefore show up in data extracted by the |
extracts it into the global options (and it will therefore show up in data |
1045 |
<b>pcre_fullinfo()</b> function). |
extracted by the <b>pcre_fullinfo()</b> function). |
1046 |
</P> |
</P> |
1047 |
<P> |
<P> |
1048 |
An option change within a subpattern (see below for a description of |
An option change within a subpattern (see below for a description of |
1065 |
<P> |
<P> |
1066 |
<b>Note:</b> There are other PCRE-specific options that can be set by the |
<b>Note:</b> There are other PCRE-specific options that can be set by the |
1067 |
application when the compile or match functions are called. In some cases the |
application when the compile or match functions are called. In some cases the |
1068 |
pattern can contain special leading sequences to override what the application |
pattern can contain special leading sequences such as (*CRLF) to override what |
1069 |
has set or what has been defaulted. Details are given in the section entitled |
the application has set or what has been defaulted. Details are given in the |
1070 |
|
section entitled |
1071 |
<a href="#newlineseq">"Newline sequences"</a> |
<a href="#newlineseq">"Newline sequences"</a> |
1072 |
above. |
above. There is also the (*UTF8) leading sequence that can be used to set UTF-8 |
1073 |
|
mode; this is equivalent to setting the PCRE_UTF8 option. |
1074 |
<a name="subpattern"></a></P> |
<a name="subpattern"></a></P> |
1075 |
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br> |
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br> |
1076 |
<P> |
<P> |
2253 |
</P> |
</P> |
2254 |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
<br><a name="SEC28" href="#TOC1">REVISION</a><br> |
2255 |
<P> |
<P> |
2256 |
Last updated: 18 March 2009 |
Last updated: 11 April 2009 |
2257 |
<br> |
<br> |
2258 |
Copyright © 1997-2009 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
2259 |
<br> |
<br> |