/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC revision 964 by ph10, Fri May 4 13:03:39 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3  .TH PCREAPI 3 "04 May 2012" "PCRE 8.31"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 526  documentation). For those options that c Line 526  documentation). For those options that c
526  the pattern, the contents of the \fIoptions\fP argument specifies their  the pattern, the contents of the \fIoptions\fP argument specifies their
527  settings at the start of compilation and execution. The PCRE_ANCHORED,  settings at the start of compilation and execution. The PCRE_ANCHORED,
528  PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and  PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and
529  PCRE_NO_START_OPT options can be set at the time of matching as well as at  PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at
530  compile time.  compile time.
531  .P  .P
532  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
# Line 926  fallen out of use. To avoid confusion, t Line 926  fallen out of use. To avoid confusion, t
926    72  too many forward references    72  too many forward references
927    73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)    73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
928    74  invalid UTF-16 string (specifically UTF-16)    74  invalid UTF-16 string (specifically UTF-16)
929      75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
930  .sp  .sp
931  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
932  be used if the limits were changed when PCRE was built.  be used if the limits were changed when PCRE was built.
# Line 962  If studying the pattern does not produce Line 963  If studying the pattern does not produce
963  wants to pass any of the other fields to \fBpcre_exec()\fP or  wants to pass any of the other fields to \fBpcre_exec()\fP or
964  \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.  \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.
965  .P  .P
966  The second argument of \fBpcre_study()\fP contains option bits. There is only  The second argument of \fBpcre_study()\fP contains option bits. There are three
967  one option: PCRE_STUDY_JIT_COMPILE. If this is set, and the just-in-time  options:
968  compiler is available, the pattern is further compiled into machine code that  .sp
969  executes much faster than the \fBpcre_exec()\fP matching function. If    PCRE_STUDY_JIT_COMPILE
970  the just-in-time compiler is not available, this option is ignored. All other    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
971  bits in the \fIoptions\fP argument must be zero.    PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
972    .sp
973    If any of these are set, and the just-in-time compiler is available, the
974    pattern is further compiled into machine code that executes much faster than
975    the \fBpcre_exec()\fP interpretive matching function. If the just-in-time
976    compiler is not available, these options are ignored. All other bits in the
977    \fIoptions\fP argument must be zero.
978  .P  .P
979  JIT compilation is a heavyweight optimization. It can take some time for  JIT compilation is a heavyweight optimization. It can take some time for
980  patterns to be analyzed, and for one-off matches and simple patterns the  patterns to be analyzed, and for one-off matches and simple patterns the
# Line 991  When you are finished with a pattern, yo Line 998  When you are finished with a pattern, yo
998  study data by calling \fBpcre_free_study()\fP. This function was added to the  study data by calling \fBpcre_free_study()\fP. This function was added to the
999  API for release 8.20. For earlier versions, the memory could be freed with  API for release 8.20. For earlier versions, the memory could be freed with
1000  \fBpcre_free()\fP, just like the pattern itself. This will still work in cases  \fBpcre_free()\fP, just like the pattern itself. This will still work in cases
1001  where PCRE_STUDY_JIT_COMPILE is not used, but it is advisable to change to the  where JIT optimization is not used, but it is advisable to change to the new
1002  new function when convenient.  function when convenient.
1003  .P  .P
1004  This is a typical way in which \fBpcre_study\fP() is used (except that in a  This is a typical way in which \fBpcre_study\fP() is used (except that in a
1005  real application there should be tests for errors):  real application there should be tests for errors):
# Line 1025  created. This speeds up finding a positi Line 1032  created. This speeds up finding a positi
1032  matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256.)  matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256.)
1033  .P  .P
1034  These two optimizations apply to both \fBpcre_exec()\fP and  These two optimizations apply to both \fBpcre_exec()\fP and
1035  \fBpcre_dfa_exec()\fP. However, they are not used by \fBpcre_exec()\fP if  \fBpcre_dfa_exec()\fP, and the information is also used by the JIT compiler.
1036  \fBpcre_study()\fP is called with the PCRE_STUDY_JIT_COMPILE option, and  The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option
1037  just-in-time compiling is successful. The optimizations can be disabled by  when calling \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP, but if this is done,
1038  setting the PCRE_NO_START_OPTIMIZE option when calling \fBpcre_exec()\fP or  JIT execution is also disabled. You might want to do this if your pattern
1039  \fBpcre_dfa_exec()\fP. You might want to do this if your pattern contains  contains callouts or (*MARK) and you want to make use of these facilities in
1040  callouts or (*MARK) (which cannot be handled by the JIT compiler), and you want  cases where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE
 to make use of these facilities in cases where matching fails. See the  
 discussion of PCRE_NO_START_OPTIMIZE  
1041  .\" HTML <a href="#execoptions">  .\" HTML <a href="#execoptions">
1042  .\" </a>  .\" </a>
1043  below.  below.
# Line 1205  Return 1 if the (?J) or (?-J) option set Line 1210  Return 1 if the (?J) or (?-J) option set
1210  .sp  .sp
1211    PCRE_INFO_JIT    PCRE_INFO_JIT
1212  .sp  .sp
1213  Return 1 if the pattern was studied with the PCRE_STUDY_JIT_COMPILE option, and  Return 1 if the pattern was studied with one of the JIT options, and
1214  just-in-time compiling was successful. The fourth argument should point to an  just-in-time compiling was successful. The fourth argument should point to an
1215  \fBint\fP variable. A return value of 0 means that JIT support is not available  \fBint\fP variable. A return value of 0 means that JIT support is not available
1216  in this version of PCRE, or that the pattern was not studied with the  in this version of PCRE, or that the pattern was not studied with a JIT option,
1217  PCRE_STUDY_JIT_COMPILE option, or that the JIT compiler could not handle this  or that the JIT compiler could not handle this particular pattern. See the
 particular pattern. See the  
1218  .\" HREF  .\" HREF
1219  \fBpcrejit\fP  \fBpcrejit\fP
1220  .\"  .\"
# Line 1218  documentation for details of what can an Line 1222  documentation for details of what can an
1222  .sp  .sp
1223    PCRE_INFO_JITSIZE    PCRE_INFO_JITSIZE
1224  .sp  .sp
1225  If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,  If the pattern was successfully studied with a JIT option, return the size of
1226  return the size of the JIT compiled code, otherwise return zero. The fourth  the JIT compiled code, otherwise return zero. The fourth argument should point
1227  argument should point to a \fBsize_t\fP variable.  to a \fBsize_t\fP variable.
1228  .sp  .sp
1229    PCRE_INFO_LASTLITERAL    PCRE_INFO_LASTLITERAL
1230  .sp  .sp
# Line 1232  only if it follows something of variable Line 1236  only if it follows something of variable
1236  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value
1237  is -1.  is -1.
1238  .sp  .sp
1239      PCRE_INFO_MAXLOOKBEHIND
1240    .sp
1241    Return the number of characters (NB not bytes) in the longest lookbehind
1242    assertion in the pattern. Note that the simple assertions \eb and \eB require a
1243    one-character lookbehind. This information is useful when doing multi-segment
1244    matching using the partial matching facilities.
1245    .sp
1246    PCRE_INFO_MINLENGTH    PCRE_INFO_MINLENGTH
1247  .sp  .sp
1248  If the pattern was studied and a minimum length for matching subject strings  If the pattern was studied and a minimum length for matching subject strings
# Line 1462  fields (not necessarily in this order): Line 1473  fields (not necessarily in this order):
1473  In the 16-bit version of this structure, the \fImark\fP field has type  In the 16-bit version of this structure, the \fImark\fP field has type
1474  "PCRE_UCHAR16 **".  "PCRE_UCHAR16 **".
1475  .P  .P
1476  The \fIflags\fP field is a bitmap that specifies which of the other fields  The \fIflags\fP field is used to specify which of the other fields are set. The
1477  are set. The flag bits are:  flag bits are:
1478  .sp  .sp
1479    PCRE_EXTRA_STUDY_DATA    PCRE_EXTRA_CALLOUT_DATA
1480    PCRE_EXTRA_EXECUTABLE_JIT    PCRE_EXTRA_EXECUTABLE_JIT
1481      PCRE_EXTRA_MARK
1482    PCRE_EXTRA_MATCH_LIMIT    PCRE_EXTRA_MATCH_LIMIT
1483    PCRE_EXTRA_MATCH_LIMIT_RECURSION    PCRE_EXTRA_MATCH_LIMIT_RECURSION
1484    PCRE_EXTRA_CALLOUT_DATA    PCRE_EXTRA_STUDY_DATA
1485    PCRE_EXTRA_TABLES    PCRE_EXTRA_TABLES
   PCRE_EXTRA_MARK  
1486  .sp  .sp
1487  Other flag bits should be set to zero. The \fIstudy_data\fP field and sometimes  Other flag bits should be set to zero. The \fIstudy_data\fP field and sometimes
1488  the \fIexecutable_jit\fP field are set in the \fBpcre_extra\fP block that is  the \fIexecutable_jit\fP field are set in the \fBpcre_extra\fP block that is
1489  returned by \fBpcre_study()\fP, together with the appropriate flag bits. You  returned by \fBpcre_study()\fP, together with the appropriate flag bits. You
1490  should not set these yourself, but you may add to the block by setting the  should not set these yourself, but you may add to the block by setting other
1491  other fields and their corresponding flag bits.  fields and their corresponding flag bits.
1492  .P  .P
1493  The \fImatch_limit\fP field provides a means of preventing PCRE from using up a  The \fImatch_limit\fP field provides a means of preventing PCRE from using up a
1494  vast amount of resources when running patterns that are not going to match,  vast amount of resources when running patterns that are not going to match,
# Line 1492  patterns that are not anchored, the coun Line 1503  patterns that are not anchored, the coun
1503  in the subject string.  in the subject string.
1504  .P  .P
1505  When \fBpcre_exec()\fP is called with a pattern that was successfully studied  When \fBpcre_exec()\fP is called with a pattern that was successfully studied
1506  with the PCRE_STUDY_JIT_COMPILE option, the way that the matching is executed  with a JIT option, the way that the matching is executed is entirely different.
1507  is entirely different. However, there is still the possibility of runaway  However, there is still the possibility of runaway matching that goes on for a
1508  matching that goes on for a very long time, and so the \fImatch_limit\fP value  very long time, and so the \fImatch_limit\fP value is also used in this case
1509  is also used in this case (but in a different way) to limit how long the  (but in a different way) to limit how long the matching can continue.
 matching can continue.  
1510  .P  .P
1511  The default value for the limit can be set when PCRE is built; the default  The default value for the limit can be set when PCRE is built; the default
1512  default is 10 million, which handles all but the most extreme cases. You can  default is 10 million, which handles all but the most extreme cases. You can
# Line 1514  This limit is of use only if it is set s Line 1524  This limit is of use only if it is set s
1524  Limiting the recursion depth limits the amount of machine stack that can be  Limiting the recursion depth limits the amount of machine stack that can be
1525  used, or, when PCRE has been compiled to use memory on the heap instead of the  used, or, when PCRE has been compiled to use memory on the heap instead of the
1526  stack, the amount of heap memory that can be used. This limit is not relevant,  stack, the amount of heap memory that can be used. This limit is not relevant,
1527  and is ignored, if the pattern was successfully studied with  and is ignored, when matching is done using JIT compiled code.
 PCRE_STUDY_JIT_COMPILE.  
1528  .P  .P
1529  The default value for \fImatch_limit_recursion\fP can be set when PCRE is  The default value for \fImatch_limit_recursion\fP can be set when PCRE is
1530  built; the default default is the same value as the default for  built; the default default is the same value as the default for
# Line 1572  documentation. Line 1581  documentation.
1581  The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be  The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be
1582  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,  zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP,
1583  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
1584  PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and  PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and
1585  PCRE_PARTIAL_HARD.  PCRE_PARTIAL_SOFT.
1586  .P  .P
1587  If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE option,  If the pattern was successfully studied with one of the just-in-time (JIT)
1588  the only supported options for JIT execution are PCRE_NO_UTF8_CHECK,  compile options, the only supported options for JIT execution are
1589  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NOTEMPTY_ATSTART. Note in  PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,
1590  particular that partial matching is not supported. If an unsupported option is  PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an
1591  used, JIT execution is disabled and the normal interpretive code in  unsupported option is used, JIT execution is disabled and the normal
1592  \fBpcre_exec()\fP is run.  interpretive code in \fBpcre_exec()\fP is run.
1593  .sp  .sp
1594    PCRE_ANCHORED    PCRE_ANCHORED
1595  .sp  .sp
# Line 1699  causing performance to suffer, but ensur Line 1708  causing performance to suffer, but ensur
1708  "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)  "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
1709  are considered at every possible starting position in the subject string. If  are considered at every possible starting position in the subject string. If
1710  PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching  PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
1711  time.  time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
1712    matching is always done using interpretively.
1713  .P  .P
1714  Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.  Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation.
1715  Consider the pattern  Consider the pattern
# Line 1732  returned. Line 1742  returned.
1742  .sp  .sp
1743  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8  When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8
1744  string is automatically checked when \fBpcre_exec()\fP is subsequently called.  string is automatically checked when \fBpcre_exec()\fP is subsequently called.
1745  The value of \fIstartoffset\fP is also checked to ensure that it points to the  The entire string is checked before any other processing takes place. The value
1746  start of a UTF-8 character. There is a discussion about the validity of UTF-8  of \fIstartoffset\fP is also checked to ensure that it points to the start of a
1747  strings in the  UTF-8 character. There is a discussion about the
1748    .\" HTML <a href="pcreunicode.html#utf8strings">
1749    .\" </a>
1750    validity of UTF-8 strings
1751    .\"
1752    in the
1753  .\" HREF  .\" HREF
1754  \fBpcreunicode\fP  \fBpcreunicode\fP
1755  .\"  .\"
# Line 1882  string that it matched that is returned. Line 1897  string that it matched that is returned.
1897  .P  .P
1898  If the vector is too small to hold all the captured substring offsets, it is  If the vector is too small to hold all the captured substring offsets, it is
1899  used as far as possible (up to two-thirds of its length), and the function  used as far as possible (up to two-thirds of its length), and the function
1900  returns a value of zero. If neither the actual string matched not any captured  returns a value of zero. If neither the actual string matched nor any captured
1901  substrings are of interest, \fBpcre_exec()\fP may be called with \fIovector\fP  substrings are of interest, \fBpcre_exec()\fP may be called with \fIovector\fP
1902  passed as NULL and \fIovecsize\fP as zero. However, if the pattern contains  passed as NULL and \fIovecsize\fP as zero. However, if the pattern contains
1903  back references and the \fIovector\fP is not big enough to remember the related  back references and the \fIovector\fP is not big enough to remember the related
# Line 2082  time. Line 2097  time.
2097  .sp  .sp
2098    PCRE_ERROR_JIT_STACKLIMIT (-27)    PCRE_ERROR_JIT_STACKLIMIT (-27)
2099  .sp  .sp
2100  This error is returned when a pattern that was successfully studied using the  This error is returned when a pattern that was successfully studied using a
2101  PCRE_STUDY_JIT_COMPILE option is being matched, but the memory available for  JIT compile option is being matched, but the memory available for the
2102  the just-in-time processing stack is not large enough. See the  just-in-time processing stack is not large enough. See the
2103  .\" HREF  .\" HREF
2104  \fBpcrejit\fP  \fBpcrejit\fP
2105  .\"  .\"
2106  documentation for more details.  documentation for more details.
2107  .sp  .sp
2108    PCRE_ERROR_BADMODE (-28)    PCRE_ERROR_BADMODE        (-28)
2109  .sp  .sp
2110  This error is given if a pattern that was compiled by the 8-bit library is  This error is given if a pattern that was compiled by the 8-bit library is
2111  passed to a 16-bit library function, or vice versa.  passed to a 16-bit library function, or vice versa.
2112  .sp  .sp
2113    PCRE_ERROR_BADENDIANNESS (-29)    PCRE_ERROR_BADENDIANNESS  (-29)
2114  .sp  .sp
2115  This error is given if a pattern that was compiled and saved is reloaded on a  This error is given if a pattern that was compiled and saved is reloaded on a
2116  host with different endianness. The utility function  host with different endianness. The utility function
2117  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern
2118  so that it runs on the new host.  so that it runs on the new host.
2119  .P  .P
2120  Error numbers -16 to -20 and -22 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20, -22, and -30 are not used by \fBpcre_exec()\fP.
2121  .  .
2122  .  .
2123  .\" HTML <a name="badutf8reasons"></a>  .\" HTML <a name="badutf8reasons"></a>
# Line 2620  When a recursive subpattern is processed Line 2635  When a recursive subpattern is processed
2635  recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This  recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This
2636  error is given if the output vector is not large enough. This should be  error is given if the output vector is not large enough. This should be
2637  extremely rare, as a vector of size 1000 is used.  extremely rare, as a vector of size 1000 is used.
2638    .sp
2639      PCRE_ERROR_DFA_BADRESTART (-30)
2640    .sp
2641    When \fBpcre_dfa_exec()\fP is called with the \fBPCRE_DFA_RESTART\fP option,
2642    some plausibility checks are made on the contents of the workspace, which
2643    should contain data about the previous partial match. If any of these checks
2644    fail, this error is given.
2645  .  .
2646  .  .
2647  .SH "SEE ALSO"  .SH "SEE ALSO"
# Line 2644  Cambridge CB2 3QH, England. Line 2666  Cambridge CB2 3QH, England.
2666  .rs  .rs
2667  .sp  .sp
2668  .nf  .nf
2669  Last updated: 21 January 2012  Last updated: 04 May 2012
2670  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2671  .fi  .fi

Legend:
Removed from v.903  
changed lines
  Added in v.964

  ViewVC Help
Powered by ViewVC 1.1.5