/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 209 by ph10, Tue Aug 7 09:22:06 2007 UTC revision 225 by ph10, Mon Aug 20 14:38:34 2007 UTC
# Line 601  page. Line 601  page.
601    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
602  .sp  .sp
603  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
604  automatically checked. Note that the check is for a syntactically valid UTF-8  automatically checked. There is a discussion about the
605  byte string, as defined by RFC 2279. It is \fInot\fP a check for a UTF-8 string  .\" HTML <a href="pcre.html#utf8strings">
606  of assigned or allowable Unicode code points.  .\" </a>
607  .P  validity of UTF-8 strings
608  If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP returns an  .\"
609  error. If you already know that your pattern is valid, and you want to skip  in the main
610  this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option.  .\" HREF
611  When it is set, the effect of passing an invalid UTF-8 string as a pattern is  \fBpcre\fP
612  undefined. It may cause your program to crash. Note that this option can also  .\"
613  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the UTF-8  page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP
614  validity checking of subject strings.  returns an error. If you already know that your pattern is valid, and you want
615    to skip this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK
616    option. When it is set, the effect of passing an invalid UTF-8 string as a
617    pattern is undefined. It may cause your program to crash. Note that this option
618    can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress
619    the UTF-8 validity checking of subject strings.
620  .  .
621  .  .
622  .SH "COMPILATION ERROR CODES"  .SH "COMPILATION ERROR CODES"
# Line 1193  pattern. When PCRE_NEWLINE_CRLF, PCRE_NE Line 1198  pattern. When PCRE_NEWLINE_CRLF, PCRE_NE
1198  set, and a match attempt fails when the current position is at a CRLF sequence,  set, and a match attempt fails when the current position is at a CRLF sequence,
1199  the match position is advanced by two characters instead of one, in other  the match position is advanced by two characters instead of one, in other
1200  words, to after the CRLF.  words, to after the CRLF.
1201    .P
1202    Anomalous effects can occur when CRLF is a valid newline sequence and explicit
1203    \er or \en escapes appear in the pattern. For example, the string "\er\enA"
1204    matches the unanchored pattern \enA but not [X\en]A. This happens because, in
1205    the first case, PCRE knows that the match must start with \en, and so it skips
1206    there before trying to match. In the second case, it has no knowledge about the
1207    starting character, so it starts matching at the beginning of the string, and
1208    on failing, skips over the CRLF as described above. However, if the pattern is
1209    studied, the match succeeds, because then PCRE once again knows where to start.
1210  .sp  .sp
1211    PCRE_NOTBOL    PCRE_NOTBOL
1212  .sp  .sp
# Line 1234  code that demonstrates how to do this in Line 1248  code that demonstrates how to do this in
1248  .sp  .sp
1249  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
1250  string is automatically checked when \fBpcre_exec()\fP is subsequently called.  string is automatically checked when \fBpcre_exec()\fP is subsequently called.
1251  Note that the check is for a syntactically valid UTF-8 byte string, as defined  The value of \fIstartoffset\fP is also checked to ensure that it points to the
1252  by RFC 2279. It is \fInot\fP a check for a UTF-8 string of assigned or  start of a UTF-8 character. There is a discussion about the validity of UTF-8
1253  allowable Unicode code points. The value of \fIstartoffset\fP is also checked  strings in the
1254  to ensure that it points to the start of a UTF-8 character. If an invalid UTF-8  .\" HTML <a href="pcre.html#utf8strings">
1255  sequence of bytes is found, \fBpcre_exec()\fP returns the error  .\" </a>
1256  PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value,  section on UTF-8 support
1257    .\"
1258    in the main
1259    .\" HREF
1260    \fBpcre\fP
1261    .\"
1262    page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns
1263    the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value,
1264  PCRE_ERROR_BADUTF8_OFFSET is returned.  PCRE_ERROR_BADUTF8_OFFSET is returned.
1265  .P  .P
1266  If you already know that your subject is valid, and you want to skip these  If you already know that your subject is valid, and you want to skip these
# Line 1874  Cambridge CB2 3QH, England. Line 1895  Cambridge CB2 3QH, England.
1895  .rs  .rs
1896  .sp  .sp
1897  .nf  .nf
1898  Last updated: 07 August 2007  Last updated: 20 August 2007
1899  Copyright (c) 1997-2007 University of Cambridge.  Copyright (c) 1997-2007 University of Cambridge.
1900  .fi  .fi

Legend:
Removed from v.209  
changed lines
  Added in v.225

  ViewVC Help
Powered by ViewVC 1.1.5