601 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
602 |
.sp |
.sp |
603 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
604 |
automatically checked. If an invalid UTF-8 sequence of bytes is found, |
automatically checked. Note that the check is for a syntactically valid UTF-8 |
605 |
\fBpcre_compile()\fP returns an error. If you already know that your pattern is |
byte string, as defined by RFC 2279. It is \fInot\fP a check for a UTF-8 string |
606 |
valid, and you want to skip this check for performance reasons, you can set the |
of assigned or allowable Unicode code points. |
607 |
PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid |
.P |
608 |
UTF-8 string as a pattern is undefined. It may cause your program to crash. |
If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP returns an |
609 |
Note that this option can also be passed to \fBpcre_exec()\fP and |
error. If you already know that your pattern is valid, and you want to skip |
610 |
\fBpcre_dfa_exec()\fP, to suppress the UTF-8 validity checking of subject |
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. |
611 |
strings. |
When it is set, the effect of passing an invalid UTF-8 string as a pattern is |
612 |
|
undefined. It may cause your program to crash. Note that this option can also |
613 |
|
be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the UTF-8 |
614 |
|
validity checking of subject strings. |
615 |
. |
. |
616 |
. |
. |
617 |
.SH "COMPILATION ERROR CODES" |
.SH "COMPILATION ERROR CODES" |
1234 |
.sp |
.sp |
1235 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
1236 |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
1237 |
The value of \fIstartoffset\fP is also checked to ensure that it points to the |
Note that the check is for a syntactically valid UTF-8 byte string, as defined |
1238 |
start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is found, |
by RFC 2279. It is \fInot\fP a check for a UTF-8 string of assigned or |
1239 |
\fBpcre_exec()\fP returns the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP |
allowable Unicode code points. The value of \fIstartoffset\fP is also checked |
1240 |
contains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8 |
1241 |
|
sequence of bytes is found, \fBpcre_exec()\fP returns the error |
1242 |
|
PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
1243 |
|
PCRE_ERROR_BADUTF8_OFFSET is returned. |
1244 |
.P |
.P |
1245 |
If you already know that your subject is valid, and you want to skip these |
If you already know that your subject is valid, and you want to skip these |
1246 |
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
1874 |
.rs |
.rs |
1875 |
.sp |
.sp |
1876 |
.nf |
.nf |
1877 |
Last updated: 30 July 2007 |
Last updated: 07 August 2007 |
1878 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
1879 |
.fi |
.fi |