116 |
|
|
117 |
The following comments apply when PCRE is running in UTF-8 mode: |
The following comments apply when PCRE is running in UTF-8 mode: |
118 |
|
|
119 |
1. PCRE assumes that the strings it is given contain valid UTF-8 codes. It does |
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects |
120 |
not diagnose invalid UTF-8 strings. If you pass invalid UTF-8 strings to PCRE, |
are checked for validity on entry to the relevant functions. If an invalid |
121 |
the results are undefined. |
UTF-8 string is passed, an error return is given. In some situations, you may |
122 |
|
already know that your strings are valid, and therefore want to skip these |
123 |
|
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag |
124 |
|
at compile time or at run time, PCRE assumes that the pattern or subject it |
125 |
|
is given (respectively) contains only valid UTF-8 codes. In this case, it does |
126 |
|
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to |
127 |
|
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program |
128 |
|
may crash. |
129 |
|
|
130 |
2. In a pattern, the escape sequence \\x{...}, where the contents of the braces |
2. In a pattern, the escape sequence \\x{...}, where the contents of the braces |
131 |
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose |
is a string of hexadecimal digits, is interpreted as a UTF-8 character whose |
169 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
170 |
|
|
171 |
.in 0 |
.in 0 |
172 |
Last updated: 04 February 2003 |
Last updated: 20 August 2003 |
173 |
.br |
.br |
174 |
Copyright (c) 1997-2003 University of Cambridge. |
Copyright (c) 1997-2003 University of Cambridge. |