/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 887 by ph10, Tue Jan 17 14:32:32 2012 UTC revision 964 by ph10, Fri May 4 13:03:39 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3  .TH PCREAPI 3 "04 May 2012" "PCRE 8.31"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 148  just use different data types for their Line 148  just use different data types for their
148  start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8  start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8
149  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with
150  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit
151  option names define the same bit values.  option names define the same bit values.
152  .P  .P
153  References to bytes and UTF-8 in this document should be read as references to  References to bytes and UTF-8 in this document should be read as references to
154  16-bit data quantities and UTF-16 when using the 16-bit library, unless  16-bit data quantities and UTF-16 when using the 16-bit library, unless
# Line 157  library are given in the Line 157  library are given in the
157  .\" HREF  .\" HREF
158  \fBpcre16\fP  \fBpcre16\fP
159  .\"  .\"
160  page.  page.
161  .  .
162  .  .
163  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
# Line 392  not recognized. The following informatio Line 392  not recognized. The following informatio
392    PCRE_CONFIG_UTF8    PCRE_CONFIG_UTF8
393  .sp  .sp
394  The output is an integer that is set to one if UTF-8 support is available;  The output is an integer that is set to one if UTF-8 support is available;
395  otherwise it is set to zero. If this option is given to the 16-bit version of  otherwise it is set to zero. If this option is given to the 16-bit version of
396  this function, \fBpcre16_config()\fP, the result is PCRE_ERROR_BADOPTION.  this function, \fBpcre16_config()\fP, the result is PCRE_ERROR_BADOPTION.
397  .sp  .sp
398    PCRE_CONFIG_UTF16    PCRE_CONFIG_UTF16
# Line 415  compiling is available; otherwise it is Line 415  compiling is available; otherwise it is
415    PCRE_CONFIG_JITTARGET    PCRE_CONFIG_JITTARGET
416  .sp  .sp
417  The output is a pointer to a zero-terminated "const char *" string. If JIT  The output is a pointer to a zero-terminated "const char *" string. If JIT
418  support is available, the string contains the name of the architecture for  support is available, the string contains the name of the architecture for
419  which the JIT compiler is configured, for example "x86 32bit (little endian +  which the JIT compiler is configured, for example "x86 32bit (little endian +
420  unaligned)". If JIT support is not available, the result is NULL.  unaligned)". If JIT support is not available, the result is NULL.
421  .sp  .sp
422    PCRE_CONFIG_NEWLINE    PCRE_CONFIG_NEWLINE
# Line 526  documentation). For those options that c Line 526  documentation). For those options that c
526  the pattern, the contents of the \fIoptions\fP argument specifies their  the pattern, the contents of the \fIoptions\fP argument specifies their
527  settings at the start of compilation and execution. The PCRE_ANCHORED,  settings at the start of compilation and execution. The PCRE_ANCHORED,
528  PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and  PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and
529  PCRE_NO_START_OPT options can be set at the time of matching as well as at  PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at
530  compile time.  compile time.
531  .P  .P
532  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
# Line 742  preceding sequences should be recognized Line 742  preceding sequences should be recognized
742  that any Unicode newline sequence should be recognized. The Unicode newline  that any Unicode newline sequence should be recognized. The Unicode newline
743  sequences are the three just mentioned, plus the single characters VT (vertical  sequences are the three just mentioned, plus the single characters VT (vertical
744  tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line  tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
745  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
746  library, the last two are recognized only in UTF-8 mode.  library, the last two are recognized only in UTF-8 mode.
747  .P  .P
748  The newline setting in the options word uses three bits that are treated  The newline setting in the options word uses three bits that are treated
# Line 819  page. Line 819  page.
819  .sp  .sp
820    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
821  .sp  .sp
822  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8
823  string is automatically checked. There is a discussion about the  string is automatically checked. There is a discussion about the
824  .\" HTML <a href="pcreunicode.html#utf8strings">  .\" HTML <a href="pcreunicode.html#utf8strings">
825  .\" </a>  .\" </a>
826  validity of UTF-8 strings  validity of UTF-8 strings
827  .\"  .\"
828  in the  in the
829  .\" HREF  .\" HREF
# Line 843  validity checking of subject strings. Line 843  validity checking of subject strings.
843  .sp  .sp
844  The following table lists the error codes than may be returned by  The following table lists the error codes than may be returned by
845  \fBpcre_compile2()\fP, along with the error messages that may be returned by  \fBpcre_compile2()\fP, along with the error messages that may be returned by
846  both compiling functions. Note that error messages are always 8-bit ASCII  both compiling functions. Note that error messages are always 8-bit ASCII
847  strings, even in 16-bit mode. As PCRE has developed, some error codes have  strings, even in 16-bit mode. As PCRE has developed, some error codes have
848  fallen out of use. To avoid confusion, they have not been re-used.  fallen out of use. To avoid confusion, they have not been re-used.
849  .sp  .sp
# Line 917  fallen out of use. To avoid confusion, t Line 917  fallen out of use. To avoid confusion, t
917    65  different names for subpatterns of the same number are    65  different names for subpatterns of the same number are
918          not allowed          not allowed
919    66  (*MARK) must have an argument    66  (*MARK) must have an argument
920    67  this version of PCRE is not compiled with Unicode property    67  this version of PCRE is not compiled with Unicode property
921          support          support
922    68  \ec must be followed by an ASCII character    68  \ec must be followed by an ASCII character
923    69  \ek is not followed by a braced, angle-bracketed, or quoted name    69  \ek is not followed by a braced, angle-bracketed, or quoted name
924    70  internal error: unknown opcode in find_fixedlength()    70  internal error: unknown opcode in find_fixedlength()
925    71  \eN is not supported in a class    71  \eN is not supported in a class
926    72  too many forward references    72  too many forward references
927    73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)    73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
928    74  invalid UTF-16 string (specifically UTF-16)    74  invalid UTF-16 string (specifically UTF-16)
929      75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
930  .sp  .sp
931  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
932  be used if the limits were changed when PCRE was built.  be used if the limits were changed when PCRE was built.
# Line 962  If studying the pattern does not produce Line 963  If studying the pattern does not produce
963  wants to pass any of the other fields to \fBpcre_exec()\fP or  wants to pass any of the other fields to \fBpcre_exec()\fP or
964  \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.  \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.
965  .P  .P
966  The second argument of \fBpcre_study()\fP contains option bits. There is only  The second argument of \fBpcre_study()\fP contains option bits. There are three
967  one option: PCRE_STUDY_JIT_COMPILE. If this is set, and the just-in-time  options:
968  compiler is available, the pattern is further compiled into machine code that  .sp
969  executes much faster than the \fBpcre_exec()\fP matching function. If    PCRE_STUDY_JIT_COMPILE
970  the just-in-time compiler is not available, this option is ignored. All other    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
971  bits in the \fIoptions\fP argument must be zero.    PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
972    .sp
973    If any of these are set, and the just-in-time compiler is available, the
974    pattern is further compiled into machine code that executes much faster than
975    the \fBpcre_exec()\fP interpretive matching function. If the just-in-time
976    compiler is not available, these options are ignored. All other bits in the
977    \fIoptions\fP argument must be zero.
978  .P  .P
979  JIT compilation is a heavyweight optimization. It can take some time for  JIT compilation is a heavyweight optimization. It can take some time for
980  patterns to be analyzed, and for one-off matches and simple patterns the  patterns to be analyzed, and for one-off matches and simple patterns the
# Line 991  When you are finished with a pattern, yo Line 998  When you are finished with a pattern, yo
998  study data by calling \fBpcre_free_study()\fP. This function was added to the  study data by calling \fBpcre_free_study()\fP. This function was added to the
999  API for release 8.20. For earlier versions, the memory could be freed with  API for release 8.20. For earlier versions, the memory could be freed with
1000  \fBpcre_free()\fP, just like the pattern itself. This will still work in cases  \fBpcre_free()\fP, just like the pattern itself. This will still work in cases
1001  where PCRE_STUDY_JIT_COMPILE is not used, but it is advisable to change to the  where JIT optimization is not used, but it is advisable to change to the new
1002  new function when convenient.  function when convenient.
1003  .P  .P
1004  This is a typical way in which \fBpcre_study\fP() is used (except that in a  This is a typical way in which \fBpcre_study\fP() is used (except that in a
1005  real application there should be tests for errors):  real application there should be tests for errors):
# Line 1025  created. This speeds up finding a positi Line 1032  created. This speeds up finding a positi
1032  matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256.)  matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256.)
1033  .P  .P
1034  These two optimizations apply to both \fBpcre_exec()\fP and  These two optimizations apply to both \fBpcre_exec()\fP and
1035  \fBpcre_dfa_exec()\fP. However, they are not used by \fBpcre_exec()\fP if  \fBpcre_dfa_exec()\fP, and the information is also used by the JIT compiler.
1036  \fBpcre_study()\fP is called with the PCRE_STUDY_JIT_COMPILE option, and  The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option
1037  just-in-time compiling is successful. The optimizations can be disabled by  when calling \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP, but if this is done,
1038  setting the PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or  JIT execution is also disabled. You might want to do this if your pattern
1039  \fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains  contains callouts or (*MARK) and you want to make use of these facilities in
1040  callouts or (*MARK) (which cannot be handled by the JIT compiler), and you want  cases where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE
 to make use of these facilities in cases where matching fails. See the  
 discussion of PCRE_NO_START_OPTIMIZE  
1041  .\" HTML <a href="#execoptions">  .\" HTML <a href="#execoptions">
1042  .\" </a>  .\" </a>
1043  below.  below.
# Line 1120  the following negative numbers: Line 1125  the following negative numbers:
1125    PCRE_ERROR_NULL           the argument \fIcode\fP was NULL    PCRE_ERROR_NULL           the argument \fIcode\fP was NULL
1126                              the argument \fIwhere\fP was NULL                              the argument \fIwhere\fP was NULL
1127    PCRE_ERROR_BADMAGIC       the "magic number" was not found    PCRE_ERROR_BADMAGIC       the "magic number" was not found
1128    PCRE_ERROR_BADENDIANNESS  the pattern was compiled with different    PCRE_ERROR_BADENDIANNESS  the pattern was compiled with different
1129                              endianness                              endianness
1130    PCRE_ERROR_BADOPTION      the value of \fIwhat\fP was invalid    PCRE_ERROR_BADOPTION      the value of \fIwhat\fP was invalid
1131  .sp  .sp
1132  The "magic number" is placed at the start of each compiled pattern as an simple  The "magic number" is placed at the start of each compiled pattern as an simple
1133  check against passing an arbitrary memory pointer. The endianness error can  check against passing an arbitrary memory pointer. The endianness error can
1134  occur if a compiled pattern is saved and reloaded on a different host. Here is  occur if a compiled pattern is saved and reloaded on a different host. Here is
1135  a typical call of \fBpcre_fullinfo()\fP, to obtain the length of the compiled  a typical call of \fBpcre_fullinfo()\fP, to obtain the length of the compiled
1136  pattern:  pattern:
# Line 1168  where data units are bytes.) The fourth Line 1173  where data units are bytes.) The fourth
1173  variable.  variable.
1174  .P  .P
1175  If there is a fixed first value, for example, the letter "c" from a pattern  If there is a fixed first value, for example, the letter "c" from a pattern
1176  such as (cat|cow|coyote), its value is returned. In the 8-bit library, the  such as (cat|cow|coyote), its value is returned. In the 8-bit library, the
1177  value is always less than 256; in the 16-bit library the value can be up to  value is always less than 256; in the 16-bit library the value can be up to
1178  0xffff.  0xffff.
1179  .P  .P
1180  If there is no fixed first value, and if either  If there is no fixed first value, and if either
# Line 1205  Return 1 if the (?J) or (?-J) option set Line 1210  Return 1 if the (?J) or (?-J) option set
1210  .sp  .sp
1211    PCRE_INFO_JIT    PCRE_INFO_JIT
1212  .sp  .sp
1213  Return 1 if the pattern was studied with the PCRE_STUDY_JIT_COMPILE option, and  Return 1 if the pattern was studied with one of the JIT options, and
1214  just-in-time compiling was successful. The fourth argument should point to an  just-in-time compiling was successful. The fourth argument should point to an
1215  \fBint\fP variable. A return value of 0 means that JIT support is not available  \fBint\fP variable. A return value of 0 means that JIT support is not available
1216  in this version of PCRE, or that the pattern was not studied with the  in this version of PCRE, or that the pattern was not studied with a JIT option,
1217  PCRE_STUDY_JIT_COMPILE option, or that the JIT compiler could not handle this  or that the JIT compiler could not handle this particular pattern. See the
 particular pattern. See the  
1218  .\" HREF  .\" HREF
1219  \fBpcrejit\fP  \fBpcrejit\fP
1220  .\"  .\"
# Line 1218  documentation for details of what can an Line 1222  documentation for details of what can an
1222  .sp  .sp
1223    PCRE_INFO_JITSIZE    PCRE_INFO_JITSIZE
1224  .sp  .sp
1225  If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,  If the pattern was successfully studied with a JIT option, return the size of
1226  return the size of the JIT compiled code, otherwise return zero. The fourth  the JIT compiled code, otherwise return zero. The fourth argument should point
1227  argument should point to a \fBsize_t\fP variable.  to a \fBsize_t\fP variable.
1228  .sp  .sp
1229    PCRE_INFO_LASTLITERAL    PCRE_INFO_LASTLITERAL
1230  .sp  .sp
# Line 1232  only if it follows something of variable Line 1236  only if it follows something of variable
1236  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value
1237  is -1.  is -1.
1238  .sp  .sp
1239      PCRE_INFO_MAXLOOKBEHIND
1240    .sp
1241    Return the number of characters (NB not bytes) in the longest lookbehind
1242    assertion in the pattern. Note that the simple assertions \eb and \eB require a
1243    one-character lookbehind. This information is useful when doing multi-segment
1244    matching using the partial matching facilities.
1245    .sp
1246    PCRE_INFO_MINLENGTH    PCRE_INFO_MINLENGTH
1247  .sp  .sp
1248  If the pattern was studied and a minimum length for matching subject strings  If the pattern was studied and a minimum length for matching subject strings
# Line 1459  fields (not necessarily in this order): Line 1470  fields (not necessarily in this order):
1470    const unsigned char *\fItables\fP;    const unsigned char *\fItables\fP;
1471    unsigned char **\fImark\fP;    unsigned char **\fImark\fP;
1472  .sp  .sp
1473  In the 16-bit version of this structure, the \fImark\fP field has type  In the 16-bit version of this structure, the \fImark\fP field has type
1474  "PCRE_UCHAR16 **".  "PCRE_UCHAR16 **".
1475  .P  .P
1476  The \fIflags\fP field is a bitmap that specifies which of the other fields  The \fIflags\fP field is used to specify which of the other fields are set. The
1477  are set. The flag bits are:  flag bits are:
1478  .sp  .sp
1479    PCRE_EXTRA_STUDY_DATA    PCRE_EXTRA_CALLOUT_DATA
1480    PCRE_EXTRA_EXECUTABLE_JIT    PCRE_EXTRA_EXECUTABLE_JIT
1481      PCRE_EXTRA_MARK
1482    PCRE_EXTRA_MATCH_LIMIT    PCRE_EXTRA_MATCH_LIMIT
1483    PCRE_EXTRA_MATCH_LIMIT_RECURSION    PCRE_EXTRA_MATCH_LIMIT_RECURSION
1484    PCRE_EXTRA_CALLOUT_DATA    PCRE_EXTRA_STUDY_DATA
1485    PCRE_EXTRA_TABLES    PCRE_EXTRA_TABLES
   PCRE_EXTRA_MARK  
1486  .sp  .sp
1487  Other flag bits should be set to zero. The \fIstudy_data\fP field and sometimes  Other flag bits should be set to zero. The \fIstudy_data\fP field and sometimes
1488  the \fIexecutable_jit\fP field are set in the \fBpcre_extra\fP block that is  the \fIexecutable_jit\fP field are set in the \fBpcre_extra\fP block that is
1489  returned by \fBpcre_study()\fP, together with the appropriate flag bits. You  returned by \fBpcre_study()\fP, together with the appropriate flag bits. You
1490  should not set these yourself, but you may add to the block by setting the  should not set these yourself, but you may add to the block by setting other
1491  other fields and their corresponding flag bits.  fields and their corresponding flag bits.
1492  .P  .P
1493  The \fImatch_limit\fP field provides a means of preventing PCRE from using up a  The \fImatch_limit\fP field provides a means of preventing PCRE from using up a
1494  vast amount of resources when running patterns that are not going to match,  vast amount of resources when running patterns that are not going to match,
# Line 1492  patterns that are not anchored, the coun Line 1503  patterns that are not anchored, the coun
1503  in the subject string.  in the subject string.
1504  .P  .P
1505  When \fBpcre_exec()\fP is called with a pattern that was successfully studied  When \fBpcre_exec()\fP is called with a pattern that was successfully studied
1506  with the PCRE_STUDY_JIT_COMPILE option, the way that the matching is executed  with a JIT option, the way that the matching is executed is entirely different.
1507  is entirely different. However, there is still the possibility of runaway  However, there is still the possibility of runaway matching that goes on for a
1508  matching that goes on for a very long time, and so the \fImatch_limit\fP value  very long time, and so the \fImatch_limit\fP value is also used in this case
1509  is also used in this case (but in a different way) to limit how long the  (but in a different way) to limit how long the matching can continue.
 matching can continue.  
1510  .P  .P
1511  The default value for the limit can be set when PCRE is built; the default  The default value for the limit can be set when PCRE is built; the default
1512  default is 10 million, which handles all but the most extreme cases. You can  default is 10 million, which handles all but the most extreme cases. You can
# Line 1514  This limit is of use only if it is set s Line 1524  This limit is of use only if it is set s
1524  Limiting the recursion depth limits the amount of machine stack that can be  Limiting the recursion depth limits the amount of machine stack that can be
1525  used, or, when PCRE has been compiled to use memory on the heap instead of the  used, or, when PCRE has been compiled to use memory on the heap instead of the
1526  stack, the amount of heap memory that can be used. This limit is not relevant,  stack, the amount of heap memory that can be used. This limit is not relevant,
1527  and is ignored, if the pattern was successfully studied with  and is ignored, when matching is done using JIT compiled code.
 PCRE_STUDY_JIT_COMPILE.  
1528  .P  .P
1529  The default value for \fImatch_limit_recursion\fP can be set when PCRE is  The default value for \fImatch_limit_recursion\fP can be set when PCRE is
1530  built; the default default is the same value as the default for  built; the default default is the same value as the default for
# Line 1572  documentation. Line 1581  documentation.
1581  The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be  The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be
1582  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,
1583  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
1584  PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and  PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and
1585  PCRE_PARTIAL_HARD.  PCRE_PARTIAL_SOFT.
1586  .P  .P
1587  If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,  If the pattern was successfully studied with one of the just-in-time (JIT)
1588  the only supported options for JIT execution are PCRE_NO_UTF8_CHECK,  compile options, the only supported options for JIT execution are
1589  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NOTEMPTY_ATSTART. Note in  PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
1590  particular that partial matching is not supported. If an unsupported option is  PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an
1591  used, JIT execution is disabled and the normal interpretive code in  unsupported option is used, JIT execution is disabled and the normal
1592  \fBpcre_exec()\fP is run.  interpretive code in \fBpcre_exec()\fP is run.
1593  .sp  .sp
1594    PCRE_ANCHORED    PCRE_ANCHORED
1595  .sp  .sp
# Line 1699  causing performance to suffer, but ensur Line 1708  causing performance to suffer, but ensur
1708  "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)  "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
1709  are considered at every possible starting position in the subject string. If  are considered at every possible starting position in the subject string. If
1710  PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching  PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
1711  time.  time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
1712    matching is always done using interpretively.
1713  .P  .P
1714  Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.  Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
1715  Consider the pattern  Consider the pattern
# Line 1732  returned. Line 1742  returned.
1742  .sp  .sp
1743  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
1744  string is automatically checked when \fBpcre_exec()\fP is subsequently called.  string is automatically checked when \fBpcre_exec()\fP is subsequently called.
1745  The value of \fIstartoffset\fP is also checked to ensure that it points to the  The entire string is checked before any other processing takes place. The value
1746  start of a UTF-8 character. There is a discussion about the validity of UTF-8  of \fIstartoffset\fP is also checked to ensure that it points to the start of a
1747  strings in the  UTF-8 character. There is a discussion about the
1748    .\" HTML <a href="pcreunicode.html#utf8strings">
1749    .\" </a>
1750    validity of UTF-8 strings
1751    .\"
1752    in the
1753  .\" HREF  .\" HREF
1754  \fBpcreunicode\fP  \fBpcreunicode\fP
1755  .\"  .\"
# Line 1882  string that it matched that is returned. Line 1897  string that it matched that is returned.
1897  .P  .P
1898  If the vector is too small to hold all the captured substring offsets, it is  If the vector is too small to hold all the captured substring offsets, it is
1899  used as far as possible (up to two-thirds of its length), and the function  used as far as possible (up to two-thirds of its length), and the function
1900  returns a value of zero. If neither the actual string matched not any captured  returns a value of zero. If neither the actual string matched nor any captured
1901  substrings are of interest, \fBpcre_exec()\fP may be called with \fIovector\fP  substrings are of interest, \fBpcre_exec()\fP may be called with \fIovector\fP
1902  passed as NULL and \fIovecsize\fP as zero. However, if the pattern contains  passed as NULL and \fIovecsize\fP as zero. However, if the pattern contains
1903  back references and the \fIovector\fP is not big enough to remember the related  back references and the \fIovector\fP is not big enough to remember the related
# Line 2082  time. Line 2097  time.
2097  .sp  .sp
2098    PCRE_ERROR_JIT_STACKLIMIT (-27)    PCRE_ERROR_JIT_STACKLIMIT (-27)
2099  .sp  .sp
2100  This error is returned when a pattern that was successfully studied using the  This error is returned when a pattern that was successfully studied using a
2101  PCRE_STUDY_JIT_COMPILE option is being matched, but the memory available for  JIT compile option is being matched, but the memory available for the
2102  the just-in-time processing stack is not large enough. See the  just-in-time processing stack is not large enough. See the
2103  .\" HREF  .\" HREF
2104  \fBpcrejit\fP  \fBpcrejit\fP
2105  .\"  .\"
2106  documentation for more details.  documentation for more details.
2107  .sp  .sp
2108    PCRE_ERROR_BADMODE (-28)    PCRE_ERROR_BADMODE        (-28)
2109  .sp  .sp
2110  This error is given if a pattern that was compiled by the 8-bit library is  This error is given if a pattern that was compiled by the 8-bit library is
2111  passed to a 16-bit library function, or vice versa.  passed to a 16-bit library function, or vice versa.
2112  .sp  .sp
2113    PCRE_ERROR_BADENDIANNESS (-29)    PCRE_ERROR_BADENDIANNESS  (-29)
2114  .sp  .sp
2115  This error is given if a pattern that was compiled and saved is reloaded on a  This error is given if a pattern that was compiled and saved is reloaded on a
2116  host with different endianness. The utility function  host with different endianness. The utility function
2117  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern
2118  so that it runs on the new host.  so that it runs on the new host.
2119  .P  .P
2120  Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20, -22, and -30 are not used by \fBpcre_exec()\fP.
2121  .  .
2122  .  .
2123  .\" HTML <a name="badutf8reasons"></a>  .\" HTML <a name="badutf8reasons"></a>
2124  .SS "Reason codes for invalid UTF-8 strings"  .SS "Reason codes for invalid UTF-8 strings"
2125  .rs  .rs
2126  .sp  .sp
2127  This section applies only to the 8-bit library. The corresponding information  This section applies only to the 8-bit library. The corresponding information
2128  for the 16-bit library is given in the  for the 16-bit library is given in the
2129  .\" HREF  .\" HREF
2130  \fBpcre16\fP  \fBpcre16\fP
# Line 2413  other alternatives. Ultimately, when it Line 2428  other alternatives. Ultimately, when it
2428  will yield PCRE_ERROR_NOMATCH.  will yield PCRE_ERROR_NOMATCH.
2429  .  .
2430  .  .
2431    .SH "OBTAINING AN ESTIMATE OF STACK USAGE"
2432    .rs
2433    .sp
2434    Matching certain patterns using \fBpcre_exec()\fP can use a lot of process
2435    stack, which in certain environments can be rather limited in size. Some users
2436    find it helpful to have an estimate of the amount of stack that is used by
2437    \fBpcre_exec()\fP, to help them set recursion limits, as described in the
2438    .\" HREF
2439    \fBpcrestack\fP
2440    .\"
2441    documentation. The estimate that is output by \fBpcretest\fP when called with
2442    the \fB-m\fP and \fB-C\fP options is obtained by calling \fBpcre_exec\fP with
2443    the values NULL, NULL, NULL, -999, and -999 for its first five arguments.
2444    .P
2445    Normally, if its first argument is NULL, \fBpcre_exec()\fP immediately returns
2446    the negative error code PCRE_ERROR_NULL, but with this special combination of
2447    arguments, it returns instead a negative number whose absolute value is the
2448    approximate stack frame size in bytes. (A negative number is used so that it is
2449    clear that no match has happened.) The value is approximate because in some
2450    cases, recursive calls to \fBpcre_exec()\fP occur when there are one or two
2451    additional variables on the stack.
2452    .P
2453    If PCRE has been compiled to use the heap instead of the stack for recursion,
2454    the value returned is the size of each block that is obtained from the heap.
2455    .
2456    .
2457  .\" HTML <a name="dfamatch"></a>  .\" HTML <a name="dfamatch"></a>
2458  .SH "MATCHING A PATTERN: THE ALTERNATIVE FUNCTION"  .SH "MATCHING A PATTERN: THE ALTERNATIVE FUNCTION"
2459  .rs  .rs
# Line 2594  When a recursive subpattern is processed Line 2635  When a recursive subpattern is processed
2635  recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This  recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This
2636  error is given if the output vector is not large enough. This should be  error is given if the output vector is not large enough. This should be
2637  extremely rare, as a vector of size 1000 is used.  extremely rare, as a vector of size 1000 is used.
2638    .sp
2639      PCRE_ERROR_DFA_BADRESTART (-30)
2640    .sp
2641    When \fBpcre_dfa_exec()\fP is called with the \fBPCRE_DFA_RESTART\fP option,
2642    some plausibility checks are made on the contents of the workspace, which
2643    should contain data about the previous partial match. If any of these checks
2644    fail, this error is given.
2645  .  .
2646  .  .
2647  .SH "SEE ALSO"  .SH "SEE ALSO"
# Line 2618  Cambridge CB2 3QH, England. Line 2666  Cambridge CB2 3QH, England.
2666  .rs  .rs
2667  .sp  .sp
2668  .nf  .nf
2669  Last updated: 17 January 2012  Last updated: 04 May 2012
2670  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2671  .fi  .fi

Legend:
Removed from v.887  
changed lines
  Added in v.964

  ViewVC Help
Powered by ViewVC 1.1.5