/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 182 by ph10, Wed Jun 13 15:09:54 2007 UTC revision 225 by ph10, Mon Aug 20 14:38:34 2007 UTC
# Line 601  page. Line 601  page.
601    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
602  .sp  .sp
603  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
604  automatically checked. If an invalid UTF-8 sequence of bytes is found,  automatically checked. There is a discussion about the
605  \fBpcre_compile()\fP returns an error. If you already know that your pattern is  .\" HTML <a href="pcre.html#utf8strings">
606  valid, and you want to skip this check for performance reasons, you can set the  .\" </a>
607  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid  validity of UTF-8 strings
608  UTF-8 string as a pattern is undefined. It may cause your program to crash.  .\"
609  Note that this option can also be passed to \fBpcre_exec()\fP and  in the main
610  \fBpcre_dfa_exec()\fP, to suppress the UTF-8 validity checking of subject  .\" HREF
611  strings.  \fBpcre\fP
612    .\"
613    page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_compile()\fP
614    returns an error. If you already know that your pattern is valid, and you want
615    to skip this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK
616    option. When it is set, the effect of passing an invalid UTF-8 string as a
617    pattern is undefined. It may cause your program to crash. Note that this option
618    can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress
619    the UTF-8 validity checking of subject strings.
620  .  .
621  .  .
622  .SH "COMPILATION ERROR CODES"  .SH "COMPILATION ERROR CODES"
# Line 669  out of use. To avoid confusion, they hav Line 677  out of use. To avoid confusion, they hav
677    47  unknown property name after \eP or \ep    47  unknown property name after \eP or \ep
678    48  subpattern name is too long (maximum 32 characters)    48  subpattern name is too long (maximum 32 characters)
679    49  too many named subpatterns (maximum 10,000)    49  too many named subpatterns (maximum 10,000)
680    50  repeated subpattern is too long    50  [this code is not in use]
681    51  octal value is greater than \e377 (not in UTF-8 mode)    51  octal value is greater than \e377 (not in UTF-8 mode)
682    52  internal error: overran compiling workspace    52  internal error: overran compiling workspace
683    53  internal error: previously-checked referenced subpattern not found    53  internal error: previously-checked referenced subpattern not found
# Line 947  matching is used. Line 955  matching is used.
955  Return a copy of the options with which the pattern was compiled. The fourth  Return a copy of the options with which the pattern was compiled. The fourth
956  argument should point to an \fBunsigned long int\fP variable. These option bits  argument should point to an \fBunsigned long int\fP variable. These option bits
957  are those specified in the call to \fBpcre_compile()\fP, modified by any  are those specified in the call to \fBpcre_compile()\fP, modified by any
958  top-level option settings within the pattern itself.  top-level option settings at the start of the pattern itself. In other words,
959    they are the options that will be in force when matching starts. For example,
960    if the pattern /(?im)abc(?-i)d/ is compiled with the PCRE_EXTENDED option, the
961    result is PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED.
962  .P  .P
963  A pattern is automatically anchored by PCRE if all of its top-level  A pattern is automatically anchored by PCRE if all of its top-level
964  alternatives begin with one of the following:  alternatives begin with one of the following:
# Line 1187  pattern. When PCRE_NEWLINE_CRLF, PCRE_NE Line 1198  pattern. When PCRE_NEWLINE_CRLF, PCRE_NE
1198  set, and a match attempt fails when the current position is at a CRLF sequence,  set, and a match attempt fails when the current position is at a CRLF sequence,
1199  the match position is advanced by two characters instead of one, in other  the match position is advanced by two characters instead of one, in other
1200  words, to after the CRLF.  words, to after the CRLF.
1201    .P
1202    Anomalous effects can occur when CRLF is a valid newline sequence and explicit
1203    \er or \en escapes appear in the pattern. For example, the string "\er\enA"
1204    matches the unanchored pattern \enA but not [X\en]A. This happens because, in
1205    the first case, PCRE knows that the match must start with \en, and so it skips
1206    there before trying to match. In the second case, it has no knowledge about the
1207    starting character, so it starts matching at the beginning of the string, and
1208    on failing, skips over the CRLF as described above. However, if the pattern is
1209    studied, the match succeeds, because then PCRE once again knows where to start.
1210  .sp  .sp
1211    PCRE_NOTBOL    PCRE_NOTBOL
1212  .sp  .sp
# Line 1229  code that demonstrates how to do this in Line 1249  code that demonstrates how to do this in
1249  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
1250  string is automatically checked when \fBpcre_exec()\fP is subsequently called.  string is automatically checked when \fBpcre_exec()\fP is subsequently called.
1251  The value of \fIstartoffset\fP is also checked to ensure that it points to the  The value of \fIstartoffset\fP is also checked to ensure that it points to the
1252  start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is found,  start of a UTF-8 character. There is a discussion about the validity of UTF-8
1253  \fBpcre_exec()\fP returns the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP  strings in the
1254  contains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.  .\" HTML <a href="pcre.html#utf8strings">
1255    .\" </a>
1256    section on UTF-8 support
1257    .\"
1258    in the main
1259    .\" HREF
1260    \fBpcre\fP
1261    .\"
1262    page. If an invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fP returns
1263    the error PCRE_ERROR_BADUTF8. If \fIstartoffset\fP contains an invalid value,
1264    PCRE_ERROR_BADUTF8_OFFSET is returned.
1265  .P  .P
1266  If you already know that your subject is valid, and you want to skip these  If you already know that your subject is valid, and you want to skip these
1267  checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when  checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
# Line 1463  The internal recursion limit, as specifi Line 1493  The internal recursion limit, as specifi
1493  field in a \fBpcre_extra\fP structure (or defaulted) was reached. See the  field in a \fBpcre_extra\fP structure (or defaulted) was reached. See the
1494  description above.  description above.
1495  .sp  .sp
   PCRE_ERROR_NULLWSLIMIT    (-22)  
 .sp  
 When a group that can match an empty substring is repeated with an unbounded  
 upper limit, the subject position at the start of the group must be remembered,  
 so that a test for an empty string can be made when the end of the group is  
 reached. Some workspace is required for this; if it runs out, this error is  
 given.  
 .sp  
1496    PCRE_ERROR_BADNEWLINE     (-23)    PCRE_ERROR_BADNEWLINE     (-23)
1497  .sp  .sp
1498  An invalid combination of PCRE_NEWLINE_\fIxxx\fP options was given.  An invalid combination of PCRE_NEWLINE_\fIxxx\fP options was given.
1499  .P  .P
1500  Error numbers -16 to -20 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP.
1501  .  .
1502  .  .
1503  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"
# Line 1640  example is shown in the Line 1662  example is shown in the
1662  .\" HREF  .\" HREF
1663  \fBpcrepattern\fP  \fBpcrepattern\fP
1664  .\"  .\"
1665  documentation. When duplicates are present, \fBpcre_copy_named_substring()\fP  documentation.
1666  and \fBpcre_get_named_substring()\fP return the first substring corresponding  .P
1667  to the given name that is set. If none are set, an empty string is returned.  When duplicates are present, \fBpcre_copy_named_substring()\fP and
1668  The \fBpcre_get_stringnumber()\fP function returns one of the numbers that are  \fBpcre_get_named_substring()\fP return the first substring corresponding to
1669  associated with the name, but it is not defined which it is.  the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING (-7) is
1670  .sp  returned; no data is returned. The \fBpcre_get_stringnumber()\fP function
1671    returns one of the numbers that are associated with the name, but it is not
1672    defined which it is.
1673    .P
1674  If you want to get full details of all captured substrings for a given name,  If you want to get full details of all captured substrings for a given name,
1675  you must use the \fBpcre_get_stringtable_entries()\fP function. The first  you must use the \fBpcre_get_stringtable_entries()\fP function. The first
1676  argument is the compiled pattern, and the second is the name. The third and  argument is the compiled pattern, and the second is the name. The third and
# Line 1870  Cambridge CB2 3QH, England. Line 1895  Cambridge CB2 3QH, England.
1895  .rs  .rs
1896  .sp  .sp
1897  .nf  .nf
1898  Last updated: 13 June 2007  Last updated: 20 August 2007
1899  Copyright (c) 1997-2007 University of Cambridge.  Copyright (c) 1997-2007 University of Cambridge.
1900  .fi  .fi

Legend:
Removed from v.182  
changed lines
  Added in v.225

  ViewVC Help
Powered by ViewVC 1.1.5