601 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
602 |
.sp |
.sp |
603 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
604 |
automatically checked. Note that the check is for a syntactically valid UTF-8 |
automatically checked. There is a discussion about the |
605 |
byte string, as defined by RFC 2279. It is \fInot\fP a check for a UTF-8 string |
.\" HTML <a href="pcre.html#utf8strings"> |
606 |
of assigned or allowable Unicode code points. |
.\" </a> |
607 |
.P |
validity of UTF-8 strings |
608 |
If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP returns an |
.\" |
609 |
error. If you already know that your pattern is valid, and you want to skip |
in the main |
610 |
this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. |
.\" HREF |
611 |
When it is set, the effect of passing an invalid UTF-8 string as a pattern is |
\fBpcre\fP |
612 |
undefined. It may cause your program to crash. Note that this option can also |
.\" |
613 |
be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the UTF-8 |
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP |
614 |
validity checking of subject strings. |
returns an error. If you already know that your pattern is valid, and you want |
615 |
|
to skip this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK |
616 |
|
option. When it is set, the effect of passing an invalid UTF-8 string as a |
617 |
|
pattern is undefined. It may cause your program to crash. Note that this option |
618 |
|
can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress |
619 |
|
the UTF-8 validity checking of subject strings. |
620 |
. |
. |
621 |
. |
. |
622 |
.SH "COMPILATION ERROR CODES" |
.SH "COMPILATION ERROR CODES" |
1239 |
.sp |
.sp |
1240 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 |
1241 |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
string is automatically checked when \fBpcre_exec()\fP is subsequently called. |
1242 |
Note that the check is for a syntactically valid UTF-8 byte string, as defined |
The value of \fIstartoffset\fP is also checked to ensure that it points to the |
1243 |
by RFC 2279. It is \fInot\fP a check for a UTF-8 string of assigned or |
start of a UTF-8 character. There is a discussion about the validity of UTF-8 |
1244 |
allowable Unicode code points. The value of \fIstartoffset\fP is also checked |
strings in the |
1245 |
to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8 |
.\" HTML <a href="pcre.html#utf8strings"> |
1246 |
sequence of bytes is found, \fBpcre_exec()\fP returns the error |
.\" </a> |
1247 |
PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
section on UTF-8 support |
1248 |
|
.\" |
1249 |
|
in the main |
1250 |
|
.\" HREF |
1251 |
|
\fBpcre\fP |
1252 |
|
.\" |
1253 |
|
page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns |
1254 |
|
the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value, |
1255 |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
1256 |
.P |
.P |
1257 |
If you already know that your subject is valid, and you want to skip these |
If you already know that your subject is valid, and you want to skip these |
1886 |
.rs |
.rs |
1887 |
.sp |
.sp |
1888 |
.nf |
.nf |
1889 |
Last updated: 07 August 2007 |
Last updated: 09 August 2007 |
1890 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
1891 |
.fi |
.fi |