/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC revision 930 by ph10, Fri Feb 24 12:05:54 2012 UTC
# Line 138  REVISION Line 138  REVISION
138         Last updated: 10 January 2012         Last updated: 10 January 2012
139         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
140  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
141    
142    
143  PCRE(3)                                                                PCRE(3)  PCRE(3)                                                                PCRE(3)
144    
145    
# Line 463  REVISION Line 463  REVISION
463         Last updated: 08 January 2012         Last updated: 08 January 2012
464         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
465  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
466    
467    
468  PCREBUILD(3)                                                      PCREBUILD(3)  PCREBUILD(3)                                                      PCREBUILD(3)
469    
470    
# Line 859  REVISION Line 859  REVISION
859         Last updated: 07 January 2012         Last updated: 07 January 2012
860         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
861  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
862    
863    
864  PCREMATCHING(3)                                                PCREMATCHING(3)  PCREMATCHING(3)                                                PCREMATCHING(3)
865    
866    
# Line 1066  REVISION Line 1066  REVISION
1066         Last updated: 08 January 2012         Last updated: 08 January 2012
1067         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
1068  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
1069    
1070    
1071  PCREAPI(3)                                                          PCREAPI(3)  PCREAPI(3)                                                          PCREAPI(3)
1072    
1073    
# Line 1511  COMPILING A PATTERN Line 1511  COMPILING A PATTERN
1511         different parts of the pattern, the contents of  the  options  argument         different parts of the pattern, the contents of  the  options  argument
1512         specifies their settings at the start of compilation and execution. The         specifies their settings at the start of compilation and execution. The
1513         PCRE_ANCHORED, PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK,  and         PCRE_ANCHORED, PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK,  and
1514         PCRE_NO_START_OPT options can be set at the time of matching as well as         PCRE_NO_START_OPTIMIZE  options  can  be set at the time of matching as
1515         at compile time.         well as at compile time.
1516    
1517         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
1518         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
# Line 1921  STUDYING A PATTERN Line 1921  STUDYING A PATTERN
1921         wants  to  pass  any  of   the   other   fields   to   pcre_exec()   or         wants  to  pass  any  of   the   other   fields   to   pcre_exec()   or
1922         pcre_dfa_exec(), it must set up its own pcre_extra block.         pcre_dfa_exec(), it must set up its own pcre_extra block.
1923    
1924         The second argument of pcre_study() contains option bits. There is only         The  second  argument  of  pcre_study() contains option bits. There are
1925         one option: PCRE_STUDY_JIT_COMPILE. If this is set,  and  the  just-in-         three options:
1926         time  compiler  is  available,  the  pattern  is  further compiled into  
1927         machine code that executes much faster than  the  pcre_exec()  matching           PCRE_STUDY_JIT_COMPILE
1928         function. If the just-in-time compiler is not available, this option is           PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
1929         ignored. All other bits in the options argument must be zero.           PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
1930    
1931           If any of these are set, and the just-in-time  compiler  is  available,
1932           the  pattern  is  further compiled into machine code that executes much
1933           faster than the pcre_exec()  interpretive  matching  function.  If  the
1934           just-in-time  compiler is not available, these options are ignored. All
1935           other bits in the options argument must be zero.
1936    
1937         JIT compilation is a heavyweight optimization. It can  take  some  time         JIT compilation is a heavyweight optimization. It can  take  some  time
1938         for  patterns  to  be analyzed, and for one-off matches and simple pat-         for  patterns  to  be analyzed, and for one-off matches and simple pat-
# Line 1947  STUDYING A PATTERN Line 1953  STUDYING A PATTERN
1953         the study data by calling pcre_free_study(). This function was added to         the study data by calling pcre_free_study(). This function was added to
1954         the  API  for  release  8.20. For earlier versions, the memory could be         the  API  for  release  8.20. For earlier versions, the memory could be
1955         freed with pcre_free(), just like the pattern itself. This  will  still         freed with pcre_free(), just like the pattern itself. This  will  still
1956         work  in  cases  where  PCRE_STUDY_JIT_COMPILE  is  not used, but it is         work  in  cases where JIT optimization is not used, but it is advisable
1957         advisable to change to the new function when convenient.         to change to the new function when convenient.
1958    
1959         This is a typical way in which pcre_study() is used (except that  in  a         This is a typical way in which pcre_study() is used (except that  in  a
1960         real application there should be tests for errors):         real application there should be tests for errors):
# Line 1981  STUDYING A PATTERN Line 1987  STUDYING A PATTERN
1987         which to start matching. (In 16-bit mode, the bitmap is used for 16-bit         which to start matching. (In 16-bit mode, the bitmap is used for 16-bit
1988         values less than 256.)         values less than 256.)
1989    
1990         These  two optimizations apply to both pcre_exec() and pcre_dfa_exec().         These  two optimizations apply to both pcre_exec() and pcre_dfa_exec(),
1991         However, they are not used by pcre_exec()  if  pcre_study()  is  called         and the information is also used by the JIT  compiler.   The  optimiza-
1992         with  the  PCRE_STUDY_JIT_COMPILE option, and just-in-time compiling is         tions can be disabled by setting the PCRE_NO_START_OPTIMIZE option when
1993         successful.  The  optimizations  can  be  disabled   by   setting   the         calling pcre_exec() or pcre_dfa_exec(), but if this is done, JIT execu-
1994         PCRE_NO_START_OPTIMIZE    option    when    calling    pcre_exec()   or         tion  is  also disabled. You might want to do this if your pattern con-
1995         pcre_dfa_exec(). You might want to do this  if  your  pattern  contains         tains callouts or (*MARK) and you want to make use of these  facilities
1996         callouts  or (*MARK) (which cannot be handled by the JIT compiler), and         in    cases    where    matching   fails.   See   the   discussion   of
1997         you want to make use of these facilities in cases where matching fails.         PCRE_NO_START_OPTIMIZE below.
        See the discussion of PCRE_NO_START_OPTIMIZE below.  
1998    
1999    
2000  LOCALE SUPPORT  LOCALE SUPPORT
2001    
2002         PCRE  handles  caseless matching, and determines whether characters are         PCRE handles caseless matching, and determines whether  characters  are
2003         letters, digits, or whatever, by reference to a set of tables,  indexed         letters,  digits, or whatever, by reference to a set of tables, indexed
2004         by  character  value.  When running in UTF-8 mode, this applies only to         by character value. When running in UTF-8 mode, this  applies  only  to
2005         characters with codes less than 128. By  default,  higher-valued  codes         characters  with  codes  less than 128. By default, higher-valued codes
2006         never match escapes such as \w or \d, but they can be tested with \p if         never match escapes such as \w or \d, but they can be tested with \p if
2007         PCRE is built with Unicode character property  support.  Alternatively,         PCRE  is  built with Unicode character property support. Alternatively,
2008         the  PCRE_UCP  option  can  be  set at compile time; this causes \w and         the PCRE_UCP option can be set at compile  time;  this  causes  \w  and
2009         friends to use Unicode property support instead of built-in tables. The         friends to use Unicode property support instead of built-in tables. The
2010         use of locales with Unicode is discouraged. If you are handling charac-         use of locales with Unicode is discouraged. If you are handling charac-
2011         ters with codes greater than 128, you should either use UTF-8 and  Uni-         ters  with codes greater than 128, you should either use UTF-8 and Uni-
2012         code, or use locales, but not try to mix the two.         code, or use locales, but not try to mix the two.
2013    
2014         PCRE  contains  an  internal set of tables that are used when the final         PCRE contains an internal set of tables that are used  when  the  final
2015         argument of pcre_compile() is  NULL.  These  are  sufficient  for  many         argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
2016         applications.  Normally, the internal tables recognize only ASCII char-         applications.  Normally, the internal tables recognize only ASCII char-
2017         acters. However, when PCRE is built, it is possible to cause the inter-         acters. However, when PCRE is built, it is possible to cause the inter-
2018         nal tables to be rebuilt in the default "C" locale of the local system,         nal tables to be rebuilt in the default "C" locale of the local system,
2019         which may cause them to be different.         which may cause them to be different.
2020    
2021         The internal tables can always be overridden by tables supplied by  the         The  internal tables can always be overridden by tables supplied by the
2022         application that calls PCRE. These may be created in a different locale         application that calls PCRE. These may be created in a different locale
2023         from the default. As more and more applications change  to  using  Uni-         from  the  default.  As more and more applications change to using Uni-
2024         code, the need for this locale support is expected to die away.         code, the need for this locale support is expected to die away.
2025    
2026         External  tables  are  built by calling the pcre_maketables() function,         External tables are built by calling  the  pcre_maketables()  function,
2027         which has no arguments, in the relevant locale. The result can then  be         which  has no arguments, in the relevant locale. The result can then be
2028         passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For         passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
2029         example, to build and use tables that are appropriate  for  the  French         example,  to  build  and use tables that are appropriate for the French
2030         locale  (where  accented  characters  with  values greater than 128 are         locale (where accented characters with  values  greater  than  128  are
2031         treated as letters), the following code could be used:         treated as letters), the following code could be used:
2032    
2033           setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
2034           tables = pcre_maketables();           tables = pcre_maketables();
2035           re = pcre_compile(..., tables);           re = pcre_compile(..., tables);
2036    
2037         The locale name "fr_FR" is used on Linux and other  Unix-like  systems;         The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
2038         if you are using Windows, the name for the French locale is "french".         if you are using Windows, the name for the French locale is "french".
2039    
2040         When  pcre_maketables()  runs,  the  tables are built in memory that is         When pcre_maketables() runs, the tables are built  in  memory  that  is
2041         obtained via pcre_malloc. It is the caller's responsibility  to  ensure         obtained  via  pcre_malloc. It is the caller's responsibility to ensure
2042         that  the memory containing the tables remains available for as long as         that the memory containing the tables remains available for as long  as
2043         it is needed.         it is needed.
2044    
2045         The pointer that is passed to pcre_compile() is saved with the compiled         The pointer that is passed to pcre_compile() is saved with the compiled
2046         pattern,  and the same tables are used via this pointer by pcre_study()         pattern, and the same tables are used via this pointer by  pcre_study()
2047         and normally also by pcre_exec(). Thus, by default, for any single pat-         and normally also by pcre_exec(). Thus, by default, for any single pat-
2048         tern, compilation, studying and matching all happen in the same locale,         tern, compilation, studying and matching all happen in the same locale,
2049         but different patterns can be compiled in different locales.         but different patterns can be compiled in different locales.
2050    
2051         It is possible to pass a table pointer or NULL (indicating the  use  of         It  is  possible to pass a table pointer or NULL (indicating the use of
2052         the  internal  tables)  to  pcre_exec(). Although not intended for this         the internal tables) to pcre_exec(). Although  not  intended  for  this
2053         purpose, this facility could be used to match a pattern in a  different         purpose,  this facility could be used to match a pattern in a different
2054         locale from the one in which it was compiled. Passing table pointers at         locale from the one in which it was compiled. Passing table pointers at
2055         run time is discussed below in the section on matching a pattern.         run time is discussed below in the section on matching a pattern.
2056    
# Line 2055  INFORMATION ABOUT A PATTERN Line 2060  INFORMATION ABOUT A PATTERN
2060         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
2061              int what, void *where);              int what, void *where);
2062    
2063         The pcre_fullinfo() function returns information about a compiled  pat-         The  pcre_fullinfo() function returns information about a compiled pat-
2064         tern.  It replaces the pcre_info() function, which was removed from the         tern. It replaces the pcre_info() function, which was removed from  the
2065         library at version 8.30, after more than 10 years of obsolescence.         library at version 8.30, after more than 10 years of obsolescence.
2066    
2067         The first argument for pcre_fullinfo() is a  pointer  to  the  compiled         The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
2068         pattern.  The second argument is the result of pcre_study(), or NULL if         pattern. The second argument is the result of pcre_study(), or NULL  if
2069         the pattern was not studied. The third argument specifies  which  piece         the  pattern  was not studied. The third argument specifies which piece
2070         of  information  is required, and the fourth argument is a pointer to a         of information is required, and the fourth argument is a pointer  to  a
2071         variable to receive the data. The yield of the  function  is  zero  for         variable  to  receive  the  data. The yield of the function is zero for
2072         success, or one of the following negative numbers:         success, or one of the following negative numbers:
2073    
2074           PCRE_ERROR_NULL           the argument code was NULL           PCRE_ERROR_NULL           the argument code was NULL
# Line 2073  INFORMATION ABOUT A PATTERN Line 2078  INFORMATION ABOUT A PATTERN
2078                                     endianness                                     endianness
2079           PCRE_ERROR_BADOPTION      the value of what was invalid           PCRE_ERROR_BADOPTION      the value of what was invalid
2080    
2081         The  "magic  number" is placed at the start of each compiled pattern as         The "magic number" is placed at the start of each compiled  pattern  as
2082         an simple check against passing an arbitrary memory pointer. The  endi-         an  simple check against passing an arbitrary memory pointer. The endi-
2083         anness error can occur if a compiled pattern is saved and reloaded on a         anness error can occur if a compiled pattern is saved and reloaded on a
2084         different host. Here is a typical call of  pcre_fullinfo(),  to  obtain         different  host.  Here  is a typical call of pcre_fullinfo(), to obtain
2085         the length of the compiled pattern:         the length of the compiled pattern:
2086    
2087           int rc;           int rc;
# Line 2087  INFORMATION ABOUT A PATTERN Line 2092  INFORMATION ABOUT A PATTERN
2092             PCRE_INFO_SIZE,   /* what is required */             PCRE_INFO_SIZE,   /* what is required */
2093             &length);         /* where to put the data */             &length);         /* where to put the data */
2094    
2095         The  possible  values for the third argument are defined in pcre.h, and         The possible values for the third argument are defined in  pcre.h,  and
2096         are as follows:         are as follows:
2097    
2098           PCRE_INFO_BACKREFMAX           PCRE_INFO_BACKREFMAX
2099    
2100         Return the number of the highest back reference  in  the  pattern.  The         Return  the  number  of  the highest back reference in the pattern. The
2101         fourth  argument  should  point to an int variable. Zero is returned if         fourth argument should point to an int variable. Zero  is  returned  if
2102         there are no back references.         there are no back references.
2103    
2104           PCRE_INFO_CAPTURECOUNT           PCRE_INFO_CAPTURECOUNT
2105    
2106         Return the number of capturing subpatterns in the pattern.  The  fourth         Return  the  number of capturing subpatterns in the pattern. The fourth
2107         argument should point to an int variable.         argument should point to an int variable.
2108    
2109           PCRE_INFO_DEFAULT_TABLES           PCRE_INFO_DEFAULT_TABLES
2110    
2111         Return  a pointer to the internal default character tables within PCRE.         Return a pointer to the internal default character tables within  PCRE.
2112         The fourth argument should point to an unsigned char *  variable.  This         The  fourth  argument should point to an unsigned char * variable. This
2113         information call is provided for internal use by the pcre_study() func-         information call is provided for internal use by the pcre_study() func-
2114         tion. External callers can cause PCRE to use  its  internal  tables  by         tion.  External  callers  can  cause PCRE to use its internal tables by
2115         passing a NULL table pointer.         passing a NULL table pointer.
2116    
2117           PCRE_INFO_FIRSTBYTE           PCRE_INFO_FIRSTBYTE
2118    
2119         Return information about the first data unit of any matched string, for         Return information about the first data unit of any matched string, for
2120         a non-anchored pattern. (The name of this option refers  to  the  8-bit         a  non-anchored  pattern.  (The name of this option refers to the 8-bit
2121         library,  where data units are bytes.) The fourth argument should point         library, where data units are bytes.) The fourth argument should  point
2122         to an int variable.         to an int variable.
2123    
2124         If there is a fixed first value, for example, the  letter  "c"  from  a         If  there  is  a  fixed first value, for example, the letter "c" from a
2125         pattern  such  as (cat|cow|coyote), its value is returned. In the 8-bit         pattern such as (cat|cow|coyote), its value is returned. In  the  8-bit
2126         library, the value is always less than 256; in the 16-bit  library  the         library,  the  value is always less than 256; in the 16-bit library the
2127         value can be up to 0xffff.         value can be up to 0xffff.
2128    
2129         If there is no fixed first value, and if either         If there is no fixed first value, and if either
2130    
2131         (a)  the pattern was compiled with the PCRE_MULTILINE option, and every         (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
2132         branch starts with "^", or         branch starts with "^", or
2133    
2134         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
2135         set (if it were set, the pattern would be anchored),         set (if it were set, the pattern would be anchored),
2136    
2137         -1  is  returned, indicating that the pattern matches only at the start         -1 is returned, indicating that the pattern matches only at  the  start
2138         of a subject string or after any newline within the  string.  Otherwise         of  a  subject string or after any newline within the string. Otherwise
2139         -2 is returned. For anchored patterns, -2 is returned.         -2 is returned. For anchored patterns, -2 is returned.
2140    
2141           PCRE_INFO_FIRSTTABLE           PCRE_INFO_FIRSTTABLE
2142    
2143         If  the pattern was studied, and this resulted in the construction of a         If the pattern was studied, and this resulted in the construction of  a
2144         256-bit table indicating a fixed set of values for the first data  unit         256-bit  table indicating a fixed set of values for the first data unit
2145         in  any  matching string, a pointer to the table is returned. Otherwise         in any matching string, a pointer to the table is  returned.  Otherwise
2146         NULL is returned. The fourth argument should point to an unsigned  char         NULL  is returned. The fourth argument should point to an unsigned char
2147         * variable.         * variable.
2148    
2149           PCRE_INFO_HASCRORLF           PCRE_INFO_HASCRORLF
2150    
2151         Return  1  if  the  pattern  contains any explicit matches for CR or LF         Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
2152         characters, otherwise 0. The fourth argument should  point  to  an  int         characters,  otherwise  0.  The  fourth argument should point to an int
2153         variable.  An explicit match is either a literal CR or LF character, or         variable. An explicit match is either a literal CR or LF character,  or
2154         \r or \n.         \r or \n.
2155    
2156           PCRE_INFO_JCHANGED           PCRE_INFO_JCHANGED
2157    
2158         Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,         Return  1  if  the (?J) or (?-J) option setting is used in the pattern,
2159         otherwise  0. The fourth argument should point to an int variable. (?J)         otherwise 0. The fourth argument should point to an int variable.  (?J)
2160         and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.         and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.
2161    
2162           PCRE_INFO_JIT           PCRE_INFO_JIT
2163    
2164         Return 1 if the pattern was  studied  with  the  PCRE_STUDY_JIT_COMPILE         Return  1  if  the pattern was studied with one of the JIT options, and
2165         option,  and just-in-time compiling was successful. The fourth argument         just-in-time compiling was successful. The fourth argument should point
2166         should point to an int variable. A return value of  0  means  that  JIT         to  an  int variable. A return value of 0 means that JIT support is not
2167         support  is  not available in this version of PCRE, or that the pattern         available in this version of PCRE, or that the pattern was not  studied
2168         was not studied with the PCRE_STUDY_JIT_COMPILE option, or that the JIT         with  a JIT option, or that the JIT compiler could not handle this par-
2169         compiler could not handle this particular pattern. See the pcrejit doc-         ticular pattern. See the pcrejit documentation for details of what  can
2170         umentation for details of what can and cannot be handled.         and cannot be handled.
2171    
2172           PCRE_INFO_JITSIZE           PCRE_INFO_JITSIZE
2173    
2174         If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE         If  the  pattern was successfully studied with a JIT option, return the
2175         option,  return  the  size  of  the JIT compiled code, otherwise return         size of the JIT compiled code, otherwise return zero. The fourth  argu-
2176         zero. The fourth argument should point to a size_t variable.         ment should point to a size_t variable.
2177    
2178           PCRE_INFO_LASTLITERAL           PCRE_INFO_LASTLITERAL
2179    
2180         Return the value of the rightmost literal data unit that must exist  in         Return  the value of the rightmost literal data unit that must exist in
2181         any  matched  string, other than at its start, if such a value has been         any matched string, other than at its start, if such a value  has  been
2182         recorded. The fourth argument should point to an int variable. If there         recorded. The fourth argument should point to an int variable. If there
2183         is no such value, -1 is returned. For anchored patterns, a last literal         is no such value, -1 is returned. For anchored patterns, a last literal
2184         value is recorded only if it follows something of variable length.  For         value  is recorded only if it follows something of variable length. For
2185         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
2186         /^a\dz\d/ the returned value is -1.         /^a\dz\d/ the returned value is -1.
2187    
2188           PCRE_INFO_MINLENGTH           PCRE_INFO_MINLENGTH
2189    
2190         If the pattern was studied and a minimum length  for  matching  subject         If  the  pattern  was studied and a minimum length for matching subject
2191         strings  was  computed,  its  value is returned. Otherwise the returned         strings was computed, its value is  returned.  Otherwise  the  returned
2192         value is -1. The value is a number of characters, which in  UTF-8  mode         value  is  -1. The value is a number of characters, which in UTF-8 mode
2193         may  be  different from the number of bytes. The fourth argument should         may be different from the number of bytes. The fourth  argument  should
2194         point to an int variable. A non-negative value is a lower bound to  the         point  to an int variable. A non-negative value is a lower bound to the
2195         length  of  any  matching  string. There may not be any strings of that         length of any matching string. There may not be  any  strings  of  that
2196         length that do actually match, but every string that does match  is  at         length  that  do actually match, but every string that does match is at
2197         least that long.         least that long.
2198    
2199           PCRE_INFO_NAMECOUNT           PCRE_INFO_NAMECOUNT
2200           PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
2201           PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
2202    
2203         PCRE  supports the use of named as well as numbered capturing parenthe-         PCRE supports the use of named as well as numbered capturing  parenthe-
2204         ses. The names are just an additional way of identifying the  parenthe-         ses.  The names are just an additional way of identifying the parenthe-
2205         ses, which still acquire numbers. Several convenience functions such as         ses, which still acquire numbers. Several convenience functions such as
2206         pcre_get_named_substring() are provided for  extracting  captured  sub-         pcre_get_named_substring()  are  provided  for extracting captured sub-
2207         strings  by  name. It is also possible to extract the data directly, by         strings by name. It is also possible to extract the data  directly,  by
2208         first converting the name to a number in order to  access  the  correct         first  converting  the  name to a number in order to access the correct
2209         pointers in the output vector (described with pcre_exec() below). To do         pointers in the output vector (described with pcre_exec() below). To do
2210         the conversion, you need  to  use  the  name-to-number  map,  which  is         the  conversion,  you  need  to  use  the  name-to-number map, which is
2211         described by these three values.         described by these three values.
2212    
2213         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
2214         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
2215         of  each  entry;  both  of  these  return  an int value. The entry size         of each entry; both of these  return  an  int  value.  The  entry  size
2216         depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns         depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
2217         a pointer to the first entry of the table. This is a pointer to char in         a pointer to the first entry of the table. This is a pointer to char in
2218         the 8-bit library, where the first two bytes of each entry are the num-         the 8-bit library, where the first two bytes of each entry are the num-
2219         ber  of  the capturing parenthesis, most significant byte first. In the         ber of the capturing parenthesis, most significant byte first.  In  the
2220         16-bit library, the pointer points to 16-bit data units, the  first  of         16-bit  library,  the pointer points to 16-bit data units, the first of
2221         which  contains  the  parenthesis  number. The rest of the entry is the         which contains the parenthesis number. The rest of  the  entry  is  the
2222         corresponding name, zero terminated.         corresponding name, zero terminated.
2223    
2224         The names are in alphabetical order. Duplicate names may appear if  (?|         The  names are in alphabetical order. Duplicate names may appear if (?|
2225         is used to create multiple groups with the same number, as described in         is used to create multiple groups with the same number, as described in
2226         the section on duplicate subpattern numbers in  the  pcrepattern  page.         the  section  on  duplicate subpattern numbers in the pcrepattern page.
2227         Duplicate  names  for  subpatterns with different numbers are permitted         Duplicate names for subpatterns with different  numbers  are  permitted
2228         only if PCRE_DUPNAMES is set. In all cases  of  duplicate  names,  they         only  if  PCRE_DUPNAMES  is  set. In all cases of duplicate names, they
2229         appear  in  the table in the order in which they were found in the pat-         appear in the table in the order in which they were found in  the  pat-
2230         tern. In the absence of (?| this is the  order  of  increasing  number;         tern.  In  the  absence  of (?| this is the order of increasing number;
2231         when (?| is used this is not necessarily the case because later subpat-         when (?| is used this is not necessarily the case because later subpat-
2232         terns may have lower numbers.         terns may have lower numbers.
2233    
2234         As a simple example of the name/number table,  consider  the  following         As  a  simple  example of the name/number table, consider the following
2235         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is
2236         set, so white space - including newlines - is ignored):         set, so white space - including newlines - is ignored):
2237    
2238           (?<date> (?<year>(\d\d)?\d\d) -           (?<date> (?<year>(\d\d)?\d\d) -
2239           (?<month>\d\d) - (?<day>\d\d) )           (?<month>\d\d) - (?<day>\d\d) )
2240    
2241         There are four named subpatterns, so the table has  four  entries,  and         There  are  four  named subpatterns, so the table has four entries, and
2242         each  entry  in the table is eight bytes long. The table is as follows,         each entry in the table is eight bytes long. The table is  as  follows,
2243         with non-printing bytes shows in hexadecimal, and undefined bytes shown         with non-printing bytes shows in hexadecimal, and undefined bytes shown
2244         as ??:         as ??:
2245    
# Line 2243  INFORMATION ABOUT A PATTERN Line 2248  INFORMATION ABOUT A PATTERN
2248           00 04 m  o  n  t  h  00           00 04 m  o  n  t  h  00
2249           00 02 y  e  a  r  00 ??           00 02 y  e  a  r  00 ??
2250    
2251         When  writing  code  to  extract  data from named subpatterns using the         When writing code to extract data  from  named  subpatterns  using  the
2252         name-to-number map, remember that the length of the entries  is  likely         name-to-number  map,  remember that the length of the entries is likely
2253         to be different for each compiled pattern.         to be different for each compiled pattern.
2254    
2255           PCRE_INFO_OKPARTIAL           PCRE_INFO_OKPARTIAL
2256    
2257         Return  1  if  the  pattern  can  be  used  for  partial  matching with         Return 1  if  the  pattern  can  be  used  for  partial  matching  with
2258         pcre_exec(), otherwise 0. The fourth argument should point  to  an  int         pcre_exec(),  otherwise  0.  The fourth argument should point to an int
2259         variable.  From  release  8.00,  this  always  returns  1,  because the         variable. From  release  8.00,  this  always  returns  1,  because  the
2260         restrictions that previously applied  to  partial  matching  have  been         restrictions  that  previously  applied  to  partial matching have been
2261         lifted.  The  pcrepartial documentation gives details of partial match-         lifted. The pcrepartial documentation gives details of  partial  match-
2262         ing.         ing.
2263    
2264           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
2265    
2266         Return a copy of the options with which the pattern was  compiled.  The         Return  a  copy of the options with which the pattern was compiled. The
2267         fourth  argument  should  point to an unsigned long int variable. These         fourth argument should point to an unsigned long  int  variable.  These
2268         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
2269         by any top-level option settings at the start of the pattern itself. In         by any top-level option settings at the start of the pattern itself. In
2270         other words, they are the options that will be in force  when  matching         other  words,  they are the options that will be in force when matching
2271         starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with         starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
2272         the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,         the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
2273         and PCRE_EXTENDED.         and PCRE_EXTENDED.
2274    
2275         A  pattern  is  automatically  anchored by PCRE if all of its top-level         A pattern is automatically anchored by PCRE if  all  of  its  top-level
2276         alternatives begin with one of the following:         alternatives begin with one of the following:
2277    
2278           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 2281  INFORMATION ABOUT A PATTERN Line 2286  INFORMATION ABOUT A PATTERN
2286    
2287           PCRE_INFO_SIZE           PCRE_INFO_SIZE
2288    
2289         Return  the size of the compiled pattern in bytes (for both libraries).         Return the size of the compiled pattern in bytes (for both  libraries).
2290         The fourth argument should point to a size_t variable. This value  does         The  fourth argument should point to a size_t variable. This value does
2291         not  include  the  size  of  the  pcre  structure  that  is returned by         not include the  size  of  the  pcre  structure  that  is  returned  by
2292         pcre_compile(). The value that is passed as the argument  to  pcre_mal-         pcre_compile().  The  value that is passed as the argument to pcre_mal-
2293         loc()  when pcre_compile() is getting memory in which to place the com-         loc() when pcre_compile() is getting memory in which to place the  com-
2294         piled data is the value returned by this option plus the  size  of  the         piled  data  is  the value returned by this option plus the size of the
2295         pcre  structure. Studying a compiled pattern, with or without JIT, does         pcre structure. Studying a compiled pattern, with or without JIT,  does
2296         not alter the value returned by this option.         not alter the value returned by this option.
2297    
2298           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
2299    
2300         Return the size in bytes of the data block pointed to by the study_data         Return the size in bytes of the data block pointed to by the study_data
2301         field  in  a  pcre_extra  block.  If pcre_extra is NULL, or there is no         field in a pcre_extra block. If pcre_extra is  NULL,  or  there  is  no
2302         study data, zero is returned. The fourth argument  should  point  to  a         study  data,  zero  is  returned. The fourth argument should point to a
2303         size_t  variable. The study_data field is set by pcre_study() to record         size_t variable. The study_data field is set by pcre_study() to  record
2304         information that will speed  up  matching  (see  the  section  entitled         information  that  will  speed  up  matching  (see the section entitled
2305         "Studying a pattern" above). The format of the study_data block is pri-         "Studying a pattern" above). The format of the study_data block is pri-
2306         vate, but its length is made available via this option so that  it  can         vate,  but  its length is made available via this option so that it can
2307         be  saved  and  restored  (see  the  pcreprecompile  documentation  for         be  saved  and  restored  (see  the  pcreprecompile  documentation  for
2308         details).         details).
2309    
# Line 2307  REFERENCE COUNTS Line 2312  REFERENCE COUNTS
2312    
2313         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
2314    
2315         The pcre_refcount() function is used to maintain a reference  count  in         The  pcre_refcount()  function is used to maintain a reference count in
2316         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
2317         benefit of applications that  operate  in  an  object-oriented  manner,         benefit  of  applications  that  operate  in an object-oriented manner,
2318         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
2319         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
2320    
2321         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
2322         zero.   It is changed only by calling this function, whose action is to         zero.  It is changed only by calling this function, whose action is  to
2323         add the adjust value (which may be positive or  negative)  to  it.  The         add  the  adjust  value  (which may be positive or negative) to it. The
2324         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
2325         is constrained to lie between 0 and 65535, inclusive. If the new  value         is  constrained to lie between 0 and 65535, inclusive. If the new value
2326         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
2327    
2328         Except  when it is zero, the reference count is not correctly preserved         Except when it is zero, the reference count is not correctly  preserved
2329         if a pattern is compiled on one host and then  transferred  to  a  host         if  a  pattern  is  compiled on one host and then transferred to a host
2330         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
2331    
2332    
# Line 2331  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2336  MATCHING A PATTERN: THE TRADITIONAL FUNC
2336              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
2337              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
2338    
2339         The  function pcre_exec() is called to match a subject string against a         The function pcre_exec() is called to match a subject string against  a
2340         compiled pattern, which is passed in the code argument. If the  pattern         compiled  pattern, which is passed in the code argument. If the pattern
2341         was  studied,  the  result  of  the study should be passed in the extra         was studied, the result of the study should  be  passed  in  the  extra
2342         argument. You can call pcre_exec() with the same code and  extra  argu-         argument.  You  can call pcre_exec() with the same code and extra argu-
2343         ments  as  many  times as you like, in order to match different subject         ments as many times as you like, in order to  match  different  subject
2344         strings with the same pattern.         strings with the same pattern.
2345    
2346         This function is the main matching facility  of  the  library,  and  it         This  function  is  the  main  matching facility of the library, and it
2347         operates  in  a  Perl-like  manner. For specialist use there is also an         operates in a Perl-like manner. For specialist use  there  is  also  an
2348         alternative matching function, which is described below in the  section         alternative  matching function, which is described below in the section
2349         about the pcre_dfa_exec() function.         about the pcre_dfa_exec() function.
2350    
2351         In  most applications, the pattern will have been compiled (and option-         In most applications, the pattern will have been compiled (and  option-
2352         ally studied) in the same process that calls pcre_exec().  However,  it         ally  studied)  in the same process that calls pcre_exec(). However, it
2353         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
2354         later in different processes, possibly even on different hosts.  For  a         later  in  different processes, possibly even on different hosts. For a
2355         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
2356    
2357         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 2365  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2370  MATCHING A PATTERN: THE TRADITIONAL FUNC
2370    
2371     Extra data for pcre_exec()     Extra data for pcre_exec()
2372    
2373         If  the  extra argument is not NULL, it must point to a pcre_extra data         If the extra argument is not NULL, it must point to a  pcre_extra  data
2374         block. The pcre_study() function returns such a block (when it  doesn't         block.  The pcre_study() function returns such a block (when it doesn't
2375         return  NULL), but you can also create one for yourself, and pass addi-         return NULL), but you can also create one for yourself, and pass  addi-
2376         tional information in it. The pcre_extra block contains  the  following         tional  information  in it. The pcre_extra block contains the following
2377         fields (not necessarily in this order):         fields (not necessarily in this order):
2378    
2379           unsigned long int flags;           unsigned long int flags;
# Line 2380  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2385  MATCHING A PATTERN: THE TRADITIONAL FUNC
2385           const unsigned char *tables;           const unsigned char *tables;
2386           unsigned char **mark;           unsigned char **mark;
2387    
2388         In  the  16-bit  version  of  this  structure,  the mark field has type         In the 16-bit version of  this  structure,  the  mark  field  has  type
2389         "PCRE_UCHAR16 **".         "PCRE_UCHAR16 **".
2390    
2391         The flags field is a bitmap that specifies which of  the  other  fields         The  flags  field is used to specify which of the other fields are set.
2392         are set. The flag bits are:         The flag bits are:
2393    
2394           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_CALLOUT_DATA
2395           PCRE_EXTRA_EXECUTABLE_JIT           PCRE_EXTRA_EXECUTABLE_JIT
2396             PCRE_EXTRA_MARK
2397           PCRE_EXTRA_MATCH_LIMIT           PCRE_EXTRA_MATCH_LIMIT
2398           PCRE_EXTRA_MATCH_LIMIT_RECURSION           PCRE_EXTRA_MATCH_LIMIT_RECURSION
2399           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_STUDY_DATA
2400           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
          PCRE_EXTRA_MARK  
2401    
2402         Other  flag  bits should be set to zero. The study_data field and some-         Other flag bits should be set to zero. The study_data field  and  some-
2403         times the executable_jit field are set in the pcre_extra block that  is         times  the executable_jit field are set in the pcre_extra block that is
2404         returned  by pcre_study(), together with the appropriate flag bits. You         returned by pcre_study(), together with the appropriate flag bits.  You
2405         should not set these yourself, but you may add to the block by  setting         should  not set these yourself, but you may add to the block by setting
2406         the other fields and their corresponding flag bits.         other fields and their corresponding flag bits.
2407    
2408         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
2409         a vast amount of resources when running patterns that are not going  to         a  vast amount of resources when running patterns that are not going to
2410         match,  but  which  have  a very large number of possibilities in their         match, but which have a very large number  of  possibilities  in  their
2411         search trees. The classic example is a pattern that uses nested  unlim-         search  trees. The classic example is a pattern that uses nested unlim-
2412         ited repeats.         ited repeats.
2413    
2414         Internally,  pcre_exec() uses a function called match(), which it calls         Internally, pcre_exec() uses a function called match(), which it  calls
2415         repeatedly (sometimes recursively). The limit  set  by  match_limit  is         repeatedly  (sometimes  recursively).  The  limit set by match_limit is
2416         imposed  on the number of times this function is called during a match,         imposed on the number of times this function is called during a  match,
2417         which has the effect of limiting the amount of  backtracking  that  can         which  has  the  effect of limiting the amount of backtracking that can
2418         take place. For patterns that are not anchored, the count restarts from         take place. For patterns that are not anchored, the count restarts from
2419         zero for each position in the subject string.         zero for each position in the subject string.
2420    
2421         When pcre_exec() is called with a pattern that was successfully studied         When pcre_exec() is called with a pattern that was successfully studied
2422         with  the  PCRE_STUDY_JIT_COMPILE  option, the way that the matching is         with a JIT option, the way that the matching is  executed  is  entirely
2423         executed is entirely different. However, there is still the possibility         different.  However, there is still the possibility of runaway matching
2424         of  runaway  matching  that  goes  on  for a very long time, and so the         that goes on for a very long time, and so the match_limit value is also
2425         match_limit value is also used in this case (but in a different way) to         used in this case (but in a different way) to limit how long the match-
2426         limit how long the matching can continue.         ing can continue.
2427    
2428         The  default  value  for  the  limit can be set when PCRE is built; the         The default value for the limit can be set  when  PCRE  is  built;  the
2429         default default is 10 million, which handles all but the  most  extreme         default  default  is 10 million, which handles all but the most extreme
2430         cases.  You  can  override  the  default by suppling pcre_exec() with a         cases. You can override the default  by  suppling  pcre_exec()  with  a
2431         pcre_extra    block    in    which    match_limit    is    set,     and         pcre_extra     block    in    which    match_limit    is    set,    and
2432         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is
2433         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
2434    
2435         The match_limit_recursion field is similar to match_limit, but  instead         The  match_limit_recursion field is similar to match_limit, but instead
2436         of limiting the total number of times that match() is called, it limits         of limiting the total number of times that match() is called, it limits
2437         the depth of recursion. The recursion depth is a  smaller  number  than         the  depth  of  recursion. The recursion depth is a smaller number than
2438         the  total number of calls, because not all calls to match() are recur-         the total number of calls, because not all calls to match() are  recur-
2439         sive.  This limit is of use only if it is set smaller than match_limit.         sive.  This limit is of use only if it is set smaller than match_limit.
2440    
2441         Limiting the recursion depth limits the amount of  machine  stack  that         Limiting  the  recursion  depth limits the amount of machine stack that
2442         can  be used, or, when PCRE has been compiled to use memory on the heap         can be used, or, when PCRE has been compiled to use memory on the  heap
2443         instead of the stack, the amount of heap memory that can be used.  This         instead  of the stack, the amount of heap memory that can be used. This
2444         limit  is not relevant, and is ignored, if the pattern was successfully         limit is not relevant, and is ignored, when matching is done using  JIT
2445         studied with PCRE_STUDY_JIT_COMPILE.         compiled code.
2446    
2447         The default value for match_limit_recursion can be  set  when  PCRE  is         The  default  value  for  match_limit_recursion can be set when PCRE is
2448         built;  the  default  default  is  the  same  value  as the default for         built; the default default  is  the  same  value  as  the  default  for
2449         match_limit. You can override the default by suppling pcre_exec()  with         match_limit.  You can override the default by suppling pcre_exec() with
2450         a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and         a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
2451         PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the         PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
2452         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
2453    
2454         The  callout_data  field is used in conjunction with the "callout" fea-         The callout_data field is used in conjunction with the  "callout"  fea-
2455         ture, and is described in the pcrecallout documentation.         ture, and is described in the pcrecallout documentation.
2456    
2457         The tables field  is  used  to  pass  a  character  tables  pointer  to         The  tables  field  is  used  to  pass  a  character  tables pointer to
2458         pcre_exec();  this overrides the value that is stored with the compiled         pcre_exec(); this overrides the value that is stored with the  compiled
2459         pattern. A non-NULL value is stored with the compiled pattern  only  if         pattern.  A  non-NULL value is stored with the compiled pattern only if
2460         custom  tables  were  supplied to pcre_compile() via its tableptr argu-         custom tables were supplied to pcre_compile() via  its  tableptr  argu-
2461         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
2462         PCRE's  internal  tables  to be used. This facility is helpful when re-         PCRE's internal tables to be used. This facility is  helpful  when  re-
2463         using patterns that have been saved after compiling  with  an  external         using  patterns  that  have been saved after compiling with an external
2464         set  of  tables,  because  the  external tables might be at a different         set of tables, because the external tables  might  be  at  a  different
2465         address when pcre_exec() is called. See the  pcreprecompile  documenta-         address  when  pcre_exec() is called. See the pcreprecompile documenta-
2466         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
2467    
2468         If  PCRE_EXTRA_MARK  is  set in the flags field, the mark field must be         If PCRE_EXTRA_MARK is set in the flags field, the mark  field  must  be
2469         set to point to a suitable variable. If the pattern contains any  back-         set  to point to a suitable variable. If the pattern contains any back-
2470         tracking  control verbs such as (*MARK:NAME), and the execution ends up         tracking control verbs such as (*MARK:NAME), and the execution ends  up
2471         with a name to pass back, a pointer to the  name  string  (zero  termi-         with  a  name  to  pass back, a pointer to the name string (zero termi-
2472         nated)  is  placed  in  the  variable pointed to by the mark field. The         nated) is placed in the variable pointed to  by  the  mark  field.  The
2473         names are within the compiled pattern; if you wish  to  retain  such  a         names  are  within  the  compiled pattern; if you wish to retain such a
2474         name  you must copy it before freeing the memory of a compiled pattern.         name you must copy it before freeing the memory of a compiled  pattern.
2475         If there is no name to pass back, the variable pointed to by  the  mark         If  there  is no name to pass back, the variable pointed to by the mark
2476         field  is  set  to NULL. For details of the backtracking control verbs,         field is set to NULL. For details of the  backtracking  control  verbs,
2477         see the section entitled "Backtracking control" in the pcrepattern doc-         see the section entitled "Backtracking control" in the pcrepattern doc-
2478         umentation.         umentation.
2479    
2480     Option bits for pcre_exec()     Option bits for pcre_exec()
2481    
2482         The  unused  bits of the options argument for pcre_exec() must be zero.         The unused bits of the options argument for pcre_exec() must  be  zero.
2483         The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,         The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
2484         PCRE_NOTBOL,    PCRE_NOTEOL,    PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,         PCRE_NOTBOL,   PCRE_NOTEOL,    PCRE_NOTEMPTY,    PCRE_NOTEMPTY_ATSTART,
2485         PCRE_NO_START_OPTIMIZE,  PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_SOFT,   and         PCRE_NO_START_OPTIMIZE,   PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_HARD,  and
2486         PCRE_PARTIAL_HARD.         PCRE_PARTIAL_SOFT.
2487    
2488         If the pattern was successfully studied with the PCRE_STUDY_JIT_COMPILE         If the pattern was successfully studied with one  of  the  just-in-time
2489         option,  the   only   supported   options   for   JIT   execution   are         (JIT) compile options, the only supported options for JIT execution are
2490         PCRE_NO_UTF8_CHECK,   PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,  and         PCRE_NO_UTF8_CHECK,    PCRE_NOTBOL,     PCRE_NOTEOL,     PCRE_NOTEMPTY,
2491         PCRE_NOTEMPTY_ATSTART. Note in particular that partial matching is  not         PCRE_NOTEMPTY_ATSTART,  PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an
2492         supported.  If an unsupported option is used, JIT execution is disabled         unsupported option is used, JIT execution is disabled  and  the  normal
2493         and the normal interpretive code in pcre_exec() is run.         interpretive code in pcre_exec() is run.
2494    
2495           PCRE_ANCHORED           PCRE_ANCHORED
2496    
2497         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
2498         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
2499         turned out to be anchored by virtue of its contents, it cannot be  made         turned  out to be anchored by virtue of its contents, it cannot be made
2500         unachored at matching time.         unachored at matching time.
2501    
2502           PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
2503           PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
2504    
2505         These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
2506         sequence matches. The choice is either to match only CR, LF,  or  CRLF,         sequence  matches.  The choice is either to match only CR, LF, or CRLF,
2507         or  to  match  any Unicode newline sequence. These options override the         or to match any Unicode newline sequence. These  options  override  the
2508         choice that was made or defaulted when the pattern was compiled.         choice that was made or defaulted when the pattern was compiled.
2509    
2510           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
# Line 2508  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2513  MATCHING A PATTERN: THE TRADITIONAL FUNC
2513           PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
2514           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
2515    
2516         These options override  the  newline  definition  that  was  chosen  or         These  options  override  the  newline  definition  that  was chosen or
2517         defaulted  when the pattern was compiled. For details, see the descrip-         defaulted when the pattern was compiled. For details, see the  descrip-
2518         tion of pcre_compile()  above.  During  matching,  the  newline  choice         tion  of  pcre_compile()  above.  During  matching,  the newline choice
2519         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-         affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
2520         ters. It may also alter the way the match position is advanced after  a         ters.  It may also alter the way the match position is advanced after a
2521         match failure for an unanchored pattern.         match failure for an unanchored pattern.
2522    
2523         When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is         When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is
2524         set, and a match attempt for an unanchored pattern fails when the  cur-         set,  and a match attempt for an unanchored pattern fails when the cur-
2525         rent  position  is  at  a  CRLF  sequence,  and the pattern contains no         rent position is at a  CRLF  sequence,  and  the  pattern  contains  no
2526         explicit matches for  CR  or  LF  characters,  the  match  position  is         explicit  matches  for  CR  or  LF  characters,  the  match position is
2527         advanced by two characters instead of one, in other words, to after the         advanced by two characters instead of one, in other words, to after the
2528         CRLF.         CRLF.
2529    
2530         The above rule is a compromise that makes the most common cases work as         The above rule is a compromise that makes the most common cases work as
2531         expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL         expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL
2532         option is not set), it does not match the string "\r\nA" because, after         option is not set), it does not match the string "\r\nA" because, after
2533         failing  at the start, it skips both the CR and the LF before retrying.         failing at the start, it skips both the CR and the LF before  retrying.
2534         However, the pattern [\r\n]A does match that string,  because  it  con-         However,  the  pattern  [\r\n]A does match that string, because it con-
2535         tains an explicit CR or LF reference, and so advances only by one char-         tains an explicit CR or LF reference, and so advances only by one char-
2536         acter after the first failure.         acter after the first failure.
2537    
2538         An explicit match for CR of LF is either a literal appearance of one of         An explicit match for CR of LF is either a literal appearance of one of
2539         those  characters,  or  one  of the \r or \n escape sequences. Implicit         those characters, or one of the \r or  \n  escape  sequences.  Implicit
2540         matches such as [^X] do not count, nor does \s (which includes  CR  and         matches  such  as [^X] do not count, nor does \s (which includes CR and
2541         LF in the characters that it matches).         LF in the characters that it matches).
2542    
2543         Notwithstanding  the above, anomalous effects may still occur when CRLF         Notwithstanding the above, anomalous effects may still occur when  CRLF
2544         is a valid newline sequence and explicit \r or \n escapes appear in the         is a valid newline sequence and explicit \r or \n escapes appear in the
2545         pattern.         pattern.
2546    
2547           PCRE_NOTBOL           PCRE_NOTBOL
2548    
2549         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
2550         the beginning of a line, so the  circumflex  metacharacter  should  not         the  beginning  of  a  line, so the circumflex metacharacter should not
2551         match  before it. Setting this without PCRE_MULTILINE (at compile time)         match before it. Setting this without PCRE_MULTILINE (at compile  time)
2552         causes circumflex never to match. This option affects only  the  behav-         causes  circumflex  never to match. This option affects only the behav-
2553         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
2554    
2555           PCRE_NOTEOL           PCRE_NOTEOL
2556    
2557         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
2558         of a line, so the dollar metacharacter should not match it nor  (except         of  a line, so the dollar metacharacter should not match it nor (except
2559         in  multiline mode) a newline immediately before it. Setting this with-         in multiline mode) a newline immediately before it. Setting this  with-
2560         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
2561         option  affects only the behaviour of the dollar metacharacter. It does         option affects only the behaviour of the dollar metacharacter. It  does
2562         not affect \Z or \z.         not affect \Z or \z.
2563    
2564           PCRE_NOTEMPTY           PCRE_NOTEMPTY
2565    
2566         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
2567         set.  If  there are alternatives in the pattern, they are tried. If all         set. If there are alternatives in the pattern, they are tried.  If  all
2568         the alternatives match the empty string, the entire  match  fails.  For         the  alternatives  match  the empty string, the entire match fails. For
2569         example, if the pattern         example, if the pattern
2570    
2571           a?b?           a?b?
2572    
2573         is  applied  to  a  string not beginning with "a" or "b", it matches an         is applied to a string not beginning with "a" or  "b",  it  matches  an
2574         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
2575         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
2576         rences of "a" or "b".         rences of "a" or "b".
2577    
2578           PCRE_NOTEMPTY_ATSTART           PCRE_NOTEMPTY_ATSTART
2579    
2580         This is like PCRE_NOTEMPTY, except that an empty string match  that  is         This  is  like PCRE_NOTEMPTY, except that an empty string match that is
2581         not  at  the  start  of  the  subject  is  permitted. If the pattern is         not at the start of  the  subject  is  permitted.  If  the  pattern  is
2582         anchored, such a match can occur only if the pattern contains \K.         anchored, such a match can occur only if the pattern contains \K.
2583    
2584         Perl    has    no    direct    equivalent    of    PCRE_NOTEMPTY     or         Perl     has    no    direct    equivalent    of    PCRE_NOTEMPTY    or
2585         PCRE_NOTEMPTY_ATSTART,  but  it  does  make a special case of a pattern         PCRE_NOTEMPTY_ATSTART, but it does make a special  case  of  a  pattern
2586         match of the empty string within its split() function, and  when  using         match  of  the empty string within its split() function, and when using
2587         the  /g  modifier.  It  is  possible  to emulate Perl's behaviour after         the /g modifier. It is  possible  to  emulate  Perl's  behaviour  after
2588         matching a null string by first trying the match again at the same off-         matching a null string by first trying the match again at the same off-
2589         set  with  PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED,  and then if that         set with PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED,  and  then  if  that
2590         fails, by advancing the starting offset (see below) and trying an ordi-         fails, by advancing the starting offset (see below) and trying an ordi-
2591         nary  match  again. There is some code that demonstrates how to do this         nary match again. There is some code that demonstrates how to  do  this
2592         in the pcredemo sample program. In the most general case, you  have  to         in  the  pcredemo sample program. In the most general case, you have to
2593         check  to  see  if the newline convention recognizes CRLF as a newline,         check to see if the newline convention recognizes CRLF  as  a  newline,
2594         and if so, and the current character is CR followed by LF, advance  the         and  if so, and the current character is CR followed by LF, advance the
2595         starting offset by two characters instead of one.         starting offset by two characters instead of one.
2596    
2597           PCRE_NO_START_OPTIMIZE           PCRE_NO_START_OPTIMIZE
2598    
2599         There  are a number of optimizations that pcre_exec() uses at the start         There are a number of optimizations that pcre_exec() uses at the  start
2600         of a match, in order to speed up the process. For  example,  if  it  is         of  a  match,  in  order to speed up the process. For example, if it is
2601         known that an unanchored match must start with a specific character, it         known that an unanchored match must start with a specific character, it
2602         searches the subject for that character, and fails  immediately  if  it         searches  the  subject  for that character, and fails immediately if it
2603         cannot  find  it,  without actually running the main matching function.         cannot find it, without actually running the  main  matching  function.
2604         This means that a special item such as (*COMMIT) at the start of a pat-         This means that a special item such as (*COMMIT) at the start of a pat-
2605         tern  is  not  considered until after a suitable starting point for the         tern is not considered until after a suitable starting  point  for  the
2606         match has been found. When callouts or (*MARK) items are in use,  these         match  has been found. When callouts or (*MARK) items are in use, these
2607         "start-up" optimizations can cause them to be skipped if the pattern is         "start-up" optimizations can cause them to be skipped if the pattern is
2608         never actually used. The start-up optimizations are in  effect  a  pre-         never  actually  used.  The start-up optimizations are in effect a pre-
2609         scan of the subject that takes place before the pattern is run.         scan of the subject that takes place before the pattern is run.
2610    
2611         The  PCRE_NO_START_OPTIMIZE option disables the start-up optimizations,         The PCRE_NO_START_OPTIMIZE option disables the start-up  optimizations,
2612         possibly causing performance to suffer,  but  ensuring  that  in  cases         possibly  causing  performance  to  suffer,  but ensuring that in cases
2613         where  the  result is "no match", the callouts do occur, and that items         where the result is "no match", the callouts do occur, and  that  items
2614         such as (*COMMIT) and (*MARK) are considered at every possible starting         such as (*COMMIT) and (*MARK) are considered at every possible starting
2615         position  in  the  subject  string. If PCRE_NO_START_OPTIMIZE is set at         position in the subject string. If  PCRE_NO_START_OPTIMIZE  is  set  at
2616         compile time, it cannot be unset at matching time.         compile  time,  it  cannot  be  unset  at  matching  time.  The  use of
2617           PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, matching
2618           is always done using interpretively.
2619    
2620         Setting PCRE_NO_START_OPTIMIZE can change the  outcome  of  a  matching         Setting  PCRE_NO_START_OPTIMIZE  can  change  the outcome of a matching
2621         operation.  Consider the pattern         operation.  Consider the pattern
2622    
2623           (*COMMIT)ABC           (*COMMIT)ABC
2624    
2625         When  this  is  compiled, PCRE records the fact that a match must start         When this is compiled, PCRE records the fact that a  match  must  start
2626         with the character "A". Suppose the subject  string  is  "DEFABC".  The         with  the  character  "A".  Suppose the subject string is "DEFABC". The
2627         start-up  optimization  scans along the subject, finds "A" and runs the         start-up optimization scans along the subject, finds "A" and  runs  the
2628         first match attempt from there. The (*COMMIT) item means that the  pat-         first  match attempt from there. The (*COMMIT) item means that the pat-
2629         tern  must  match the current starting position, which in this case, it         tern must match the current starting position, which in this  case,  it
2630         does. However, if the same match  is  run  with  PCRE_NO_START_OPTIMIZE         does.  However,  if  the  same match is run with PCRE_NO_START_OPTIMIZE
2631         set,  the  initial  scan  along the subject string does not happen. The         set, the initial scan along the subject string  does  not  happen.  The
2632         first match attempt is run starting  from  "D"  and  when  this  fails,         first  match  attempt  is  run  starting  from "D" and when this fails,
2633         (*COMMIT)  prevents  any  further  matches  being tried, so the overall         (*COMMIT) prevents any further matches  being  tried,  so  the  overall
2634         result is "no match". If the pattern is studied,  more  start-up  opti-         result  is  "no  match". If the pattern is studied, more start-up opti-
2635         mizations  may  be  used. For example, a minimum length for the subject         mizations may be used. For example, a minimum length  for  the  subject
2636         may be recorded. Consider the pattern         may be recorded. Consider the pattern
2637    
2638           (*MARK:A)(X|Y)           (*MARK:A)(X|Y)
2639    
2640         The minimum length for a match is one  character.  If  the  subject  is         The  minimum  length  for  a  match is one character. If the subject is
2641         "ABC",  there  will  be  attempts  to  match "ABC", "BC", "C", and then         "ABC", there will be attempts to  match  "ABC",  "BC",  "C",  and  then
2642         finally an empty string.  If the pattern is studied, the final  attempt         finally  an empty string.  If the pattern is studied, the final attempt
2643         does  not take place, because PCRE knows that the subject is too short,         does not take place, because PCRE knows that the subject is too  short,
2644         and so the (*MARK) is never encountered.  In this  case,  studying  the         and  so  the  (*MARK) is never encountered.  In this case, studying the
2645         pattern  does  not  affect the overall match result, which is still "no         pattern does not affect the overall match result, which  is  still  "no
2646         match", but it does affect the auxiliary information that is returned.         match", but it does affect the auxiliary information that is returned.
2647    
2648           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
2649    
2650         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
2651         UTF-8  string is automatically checked when pcre_exec() is subsequently         UTF-8 string is automatically checked when pcre_exec() is  subsequently
2652         called.  The value of startoffset is also checked  to  ensure  that  it         called.   The  value  of  startoffset is also checked to ensure that it
2653         points  to  the start of a UTF-8 character. There is a discussion about         points to the start of a UTF-8 character. There is a  discussion  about
2654         the validity of UTF-8 strings in the pcreunicode page.  If  an  invalid         the  validity  of  UTF-8 strings in the pcreunicode page. If an invalid
2655         sequence   of   bytes   is   found,   pcre_exec()   returns  the  error         sequence  of  bytes   is   found,   pcre_exec()   returns   the   error
2656         PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a         PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
2657         truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In         truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In
2658         both cases, information about the precise nature of the error may  also         both  cases, information about the precise nature of the error may also
2659         be  returned (see the descriptions of these errors in the section enti-         be returned (see the descriptions of these errors in the section  enti-
2660         tled Error return values from pcre_exec() below).  If startoffset  con-         tled  Error return values from pcre_exec() below).  If startoffset con-
2661         tains a value that does not point to the start of a UTF-8 character (or         tains a value that does not point to the start of a UTF-8 character (or
2662         to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.         to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.
2663    
2664         If you already know that your subject is valid, and you  want  to  skip         If  you  already  know that your subject is valid, and you want to skip
2665         these    checks    for   performance   reasons,   you   can   set   the         these   checks   for   performance   reasons,   you   can    set    the
2666         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
2667         do  this  for the second and subsequent calls to pcre_exec() if you are         do this for the second and subsequent calls to pcre_exec() if  you  are
2668         making repeated calls to find all  the  matches  in  a  single  subject         making  repeated  calls  to  find  all  the matches in a single subject
2669         string.  However,  you  should  be  sure  that the value of startoffset         string. However, you should be  sure  that  the  value  of  startoffset
2670         points to the start of a character (or the end of  the  subject).  When         points  to  the  start of a character (or the end of the subject). When
2671         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a
2672         subject or an invalid value of startoffset is undefined.  Your  program         subject  or  an invalid value of startoffset is undefined. Your program
2673         may crash.         may crash.
2674    
2675           PCRE_PARTIAL_HARD           PCRE_PARTIAL_HARD
2676           PCRE_PARTIAL_SOFT           PCRE_PARTIAL_SOFT
2677    
2678         These  options turn on the partial matching feature. For backwards com-         These options turn on the partial matching feature. For backwards  com-
2679         patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A  partial         patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial
2680         match  occurs if the end of the subject string is reached successfully,         match occurs if the end of the subject string is reached  successfully,
2681         but there are not enough subject characters to complete the  match.  If         but  there  are not enough subject characters to complete the match. If
2682         this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,         this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,
2683         matching continues by testing any remaining alternatives.  Only  if  no         matching  continues  by  testing any remaining alternatives. Only if no
2684         complete  match  can be found is PCRE_ERROR_PARTIAL returned instead of         complete match can be found is PCRE_ERROR_PARTIAL returned  instead  of
2685         PCRE_ERROR_NOMATCH. In other words,  PCRE_PARTIAL_SOFT  says  that  the         PCRE_ERROR_NOMATCH.  In  other  words,  PCRE_PARTIAL_SOFT says that the
2686         caller  is  prepared to handle a partial match, but only if no complete         caller is prepared to handle a partial match, but only if  no  complete
2687         match can be found.         match can be found.
2688    
2689         If PCRE_PARTIAL_HARD is set, it overrides  PCRE_PARTIAL_SOFT.  In  this         If  PCRE_PARTIAL_HARD  is  set, it overrides PCRE_PARTIAL_SOFT. In this
2690         case,  if  a  partial  match  is found, pcre_exec() immediately returns         case, if a partial match  is  found,  pcre_exec()  immediately  returns
2691         PCRE_ERROR_PARTIAL, without  considering  any  other  alternatives.  In         PCRE_ERROR_PARTIAL,  without  considering  any  other  alternatives. In
2692         other  words, when PCRE_PARTIAL_HARD is set, a partial match is consid-         other words, when PCRE_PARTIAL_HARD is set, a partial match is  consid-
2693         ered to be more important that an alternative complete match.         ered to be more important that an alternative complete match.
2694    
2695         In both cases, the portion of the string that was  inspected  when  the         In  both  cases,  the portion of the string that was inspected when the
2696         partial match was found is set as the first matching string. There is a         partial match was found is set as the first matching string. There is a
2697         more detailed discussion of partial and  multi-segment  matching,  with         more  detailed  discussion  of partial and multi-segment matching, with
2698         examples, in the pcrepartial documentation.         examples, in the pcrepartial documentation.
2699    
2700     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
2701    
2702         The  subject string is passed to pcre_exec() as a pointer in subject, a         The subject string is passed to pcre_exec() as a pointer in subject,  a
2703         length in bytes in length, and a starting byte offset  in  startoffset.         length  in  bytes in length, and a starting byte offset in startoffset.
2704         If  this  is  negative  or  greater  than  the  length  of the subject,         If this is  negative  or  greater  than  the  length  of  the  subject,
2705         pcre_exec() returns PCRE_ERROR_BADOFFSET. When the starting  offset  is         pcre_exec()  returns  PCRE_ERROR_BADOFFSET. When the starting offset is
2706         zero,  the  search  for a match starts at the beginning of the subject,         zero, the search for a match starts at the beginning  of  the  subject,
2707         and this is by far the most common case. In UTF-8 mode, the byte offset         and this is by far the most common case. In UTF-8 mode, the byte offset
2708         must  point  to  the start of a UTF-8 character (or the end of the sub-         must point to the start of a UTF-8 character (or the end  of  the  sub-
2709         ject). Unlike the pattern string, the subject may contain  binary  zero         ject).  Unlike  the pattern string, the subject may contain binary zero
2710         bytes.         bytes.
2711    
2712         A  non-zero  starting offset is useful when searching for another match         A non-zero starting offset is useful when searching for  another  match
2713         in the same subject by calling pcre_exec() again after a previous  suc-         in  the same subject by calling pcre_exec() again after a previous suc-
2714         cess.   Setting  startoffset differs from just passing over a shortened         cess.  Setting startoffset differs from just passing over  a  shortened
2715         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
2716         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
2717    
2718           \Biss\B           \Biss\B
2719    
2720         which  finds  occurrences  of "iss" in the middle of words. (\B matches         which finds occurrences of "iss" in the middle of  words.  (\B  matches
2721         only if the current position in the subject is not  a  word  boundary.)         only  if  the  current position in the subject is not a word boundary.)
2722         When  applied  to the string "Mississipi" the first call to pcre_exec()         When applied to the string "Mississipi" the first call  to  pcre_exec()
2723         finds the first occurrence. If pcre_exec() is called  again  with  just         finds  the  first  occurrence. If pcre_exec() is called again with just
2724         the  remainder  of  the  subject,  namely  "issipi", it does not match,         the remainder of the subject,  namely  "issipi",  it  does  not  match,
2725         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
2726         to  be  a  word  boundary. However, if pcre_exec() is passed the entire         to be a word boundary. However, if pcre_exec()  is  passed  the  entire
2727         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
2728         rence  of "iss" because it is able to look behind the starting point to         rence of "iss" because it is able to look behind the starting point  to
2729         discover that it is preceded by a letter.         discover that it is preceded by a letter.
2730    
2731         Finding all the matches in a subject is tricky  when  the  pattern  can         Finding  all  the  matches  in a subject is tricky when the pattern can
2732         match an empty string. It is possible to emulate Perl's /g behaviour by         match an empty string. It is possible to emulate Perl's /g behaviour by
2733         first  trying  the  match  again  at  the   same   offset,   with   the         first   trying   the   match   again  at  the  same  offset,  with  the
2734         PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED  options,  and  then  if that         PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED  options,  and  then  if  that
2735         fails, advancing the starting  offset  and  trying  an  ordinary  match         fails,  advancing  the  starting  offset  and  trying an ordinary match
2736         again. There is some code that demonstrates how to do this in the pcre-         again. There is some code that demonstrates how to do this in the pcre-
2737         demo sample program. In the most general case, you have to check to see         demo sample program. In the most general case, you have to check to see
2738         if  the newline convention recognizes CRLF as a newline, and if so, and         if the newline convention recognizes CRLF as a newline, and if so,  and
2739         the current character is CR followed by LF, advance the starting offset         the current character is CR followed by LF, advance the starting offset
2740         by two characters instead of one.         by two characters instead of one.
2741    
2742         If  a  non-zero starting offset is passed when the pattern is anchored,         If a non-zero starting offset is passed when the pattern  is  anchored,
2743         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
2744         if  the  pattern  does  not require the match to be at the start of the         if the pattern does not require the match to be at  the  start  of  the
2745         subject.         subject.
2746    
2747     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
2748    
2749         In general, a pattern matches a certain portion of the subject, and  in         In  general, a pattern matches a certain portion of the subject, and in
2750         addition,  further  substrings  from  the  subject may be picked out by         addition, further substrings from the subject  may  be  picked  out  by
2751         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,
2752         this  is  called "capturing" in what follows, and the phrase "capturing         this is called "capturing" in what follows, and the  phrase  "capturing
2753         subpattern" is used for a fragment of a pattern that picks out  a  sub-         subpattern"  is  used for a fragment of a pattern that picks out a sub-
2754         string.  PCRE  supports several other kinds of parenthesized subpattern         string. PCRE supports several other kinds of  parenthesized  subpattern
2755         that do not cause substrings to be captured.         that do not cause substrings to be captured.
2756    
2757         Captured substrings are returned to the caller via a vector of integers         Captured substrings are returned to the caller via a vector of integers
2758         whose  address is passed in ovector. The number of elements in the vec-         whose address is passed in ovector. The number of elements in the  vec-
2759         tor is passed in ovecsize, which must be a non-negative  number.  Note:         tor  is  passed in ovecsize, which must be a non-negative number. Note:
2760         this argument is NOT the size of ovector in bytes.         this argument is NOT the size of ovector in bytes.
2761    
2762         The  first  two-thirds of the vector is used to pass back captured sub-         The first two-thirds of the vector is used to pass back  captured  sub-
2763         strings, each substring using a pair of integers. The  remaining  third         strings,  each  substring using a pair of integers. The remaining third
2764         of  the  vector is used as workspace by pcre_exec() while matching cap-         of the vector is used as workspace by pcre_exec() while  matching  cap-
2765         turing subpatterns, and is not available for passing back  information.         turing  subpatterns, and is not available for passing back information.
2766         The  number passed in ovecsize should always be a multiple of three. If         The number passed in ovecsize should always be a multiple of three.  If
2767         it is not, it is rounded down.         it is not, it is rounded down.
2768    
2769         When a match is successful, information about  captured  substrings  is         When  a  match  is successful, information about captured substrings is
2770         returned  in  pairs  of integers, starting at the beginning of ovector,         returned in pairs of integers, starting at the  beginning  of  ovector,
2771         and continuing up to two-thirds of its length at the  most.  The  first         and  continuing  up  to two-thirds of its length at the most. The first
2772         element  of  each pair is set to the byte offset of the first character         element of each pair is set to the byte offset of the  first  character
2773         in a substring, and the second is set to the byte offset of  the  first         in  a  substring, and the second is set to the byte offset of the first
2774         character  after  the end of a substring. Note: these values are always         character after the end of a substring. Note: these values  are  always
2775         byte offsets, even in UTF-8 mode. They are not character counts.         byte offsets, even in UTF-8 mode. They are not character counts.
2776    
2777         The first pair of integers, ovector[0]  and  ovector[1],  identify  the         The  first  pair  of  integers, ovector[0] and ovector[1], identify the
2778         portion  of  the subject string matched by the entire pattern. The next         portion of the subject string matched by the entire pattern.  The  next
2779         pair is used for the first capturing subpattern, and so on.  The  value         pair  is  used for the first capturing subpattern, and so on. The value
2780         returned by pcre_exec() is one more than the highest numbered pair that         returned by pcre_exec() is one more than the highest numbered pair that
2781         has been set.  For example, if two substrings have been  captured,  the         has  been  set.  For example, if two substrings have been captured, the
2782         returned  value is 3. If there are no capturing subpatterns, the return         returned value is 3. If there are no capturing subpatterns, the  return
2783         value from a successful match is 1, indicating that just the first pair         value from a successful match is 1, indicating that just the first pair
2784         of offsets has been set.         of offsets has been set.
2785    
2786         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
2787         of the string that it matched that is returned.         of the string that it matched that is returned.
2788    
2789         If the vector is too small to hold all the captured substring  offsets,         If  the vector is too small to hold all the captured substring offsets,
2790         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
2791         function returns a value of zero. If neither the actual string  matched         function  returns a value of zero. If neither the actual string matched
2792         not  any captured substrings are of interest, pcre_exec() may be called         not any captured substrings are of interest, pcre_exec() may be  called
2793         with ovector passed as NULL and ovecsize as zero. However, if the  pat-         with  ovector passed as NULL and ovecsize as zero. However, if the pat-
2794         tern  contains  back  references  and  the ovector is not big enough to         tern contains back references and the ovector  is  not  big  enough  to
2795         remember the related substrings, PCRE has to get additional memory  for         remember  the related substrings, PCRE has to get additional memory for
2796         use  during matching. Thus it is usually advisable to supply an ovector         use during matching. Thus it is usually advisable to supply an  ovector
2797         of reasonable size.         of reasonable size.
2798    
2799         There are some cases where zero is returned  (indicating  vector  over-         There  are  some  cases where zero is returned (indicating vector over-
2800         flow)  when  in fact the vector is exactly the right size for the final         flow) when in fact the vector is exactly the right size for  the  final
2801         match. For example, consider the pattern         match. For example, consider the pattern
2802    
2803           (a)(?:(b)c|bd)           (a)(?:(b)c|bd)
2804    
2805         If a vector of 6 elements (allowing for only 1 captured  substring)  is         If  a  vector of 6 elements (allowing for only 1 captured substring) is
2806         given with subject string "abd", pcre_exec() will try to set the second         given with subject string "abd", pcre_exec() will try to set the second
2807         captured string, thereby recording a vector overflow, before failing to         captured string, thereby recording a vector overflow, before failing to
2808         match  "c"  and  backing  up  to  try  the second alternative. The zero         match "c" and backing up  to  try  the  second  alternative.  The  zero
2809         return, however, does correctly indicate that  the  maximum  number  of         return,  however,  does  correctly  indicate that the maximum number of
2810         slots (namely 2) have been filled. In similar cases where there is tem-         slots (namely 2) have been filled. In similar cases where there is tem-
2811         porary overflow, but the final number of used slots  is  actually  less         porary  overflow,  but  the final number of used slots is actually less
2812         than the maximum, a non-zero value is returned.         than the maximum, a non-zero value is returned.
2813    
2814         The pcre_fullinfo() function can be used to find out how many capturing         The pcre_fullinfo() function can be used to find out how many capturing
2815         subpatterns there are in a compiled  pattern.  The  smallest  size  for         subpatterns  there  are  in  a  compiled pattern. The smallest size for
2816         ovector  that  will allow for n captured substrings, in addition to the         ovector that will allow for n captured substrings, in addition  to  the
2817         offsets of the substring matched by the whole pattern, is (n+1)*3.         offsets of the substring matched by the whole pattern, is (n+1)*3.
2818    
2819         It is possible for capturing subpattern number n+1 to match  some  part         It  is  possible for capturing subpattern number n+1 to match some part
2820         of the subject when subpattern n has not been used at all. For example,         of the subject when subpattern n has not been used at all. For example,
2821         if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the         if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
2822         return from the function is 4, and subpatterns 1 and 3 are matched, but         return from the function is 4, and subpatterns 1 and 3 are matched, but
2823         2 is not. When this happens, both values in  the  offset  pairs  corre-         2  is  not.  When  this happens, both values in the offset pairs corre-
2824         sponding to unused subpatterns are set to -1.         sponding to unused subpatterns are set to -1.
2825    
2826         Offset  values  that correspond to unused subpatterns at the end of the         Offset values that correspond to unused subpatterns at the end  of  the
2827         expression are also set to -1. For example,  if  the  string  "abc"  is         expression  are  also  set  to  -1. For example, if the string "abc" is
2828         matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not         matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
2829         matched. The return from the function is 2, because  the  highest  used         matched.  The  return  from the function is 2, because the highest used
2830         capturing  subpattern  number  is 1, and the offsets for for the second         capturing subpattern number is 1, and the offsets for  for  the  second
2831         and third capturing subpatterns (assuming the vector is  large  enough,         and  third  capturing subpatterns (assuming the vector is large enough,
2832         of course) are set to -1.         of course) are set to -1.
2833    
2834         Note:  Elements  in  the first two-thirds of ovector that do not corre-         Note: Elements in the first two-thirds of ovector that  do  not  corre-
2835         spond to capturing parentheses in the pattern are never  changed.  That         spond  to  capturing parentheses in the pattern are never changed. That
2836         is,  if  a pattern contains n capturing parentheses, no more than ovec-         is, if a pattern contains n capturing parentheses, no more  than  ovec-
2837         tor[0] to ovector[2n+1] are set by pcre_exec(). The other elements  (in         tor[0]  to ovector[2n+1] are set by pcre_exec(). The other elements (in
2838         the first two-thirds) retain whatever values they previously had.         the first two-thirds) retain whatever values they previously had.
2839    
2840         Some  convenience  functions  are  provided for extracting the captured         Some convenience functions are provided  for  extracting  the  captured
2841         substrings as separate strings. These are described below.         substrings as separate strings. These are described below.
2842    
2843     Error return values from pcre_exec()     Error return values from pcre_exec()
2844    
2845         If pcre_exec() fails, it returns a negative number. The  following  are         If  pcre_exec()  fails, it returns a negative number. The following are
2846         defined in the header file:         defined in the header file:
2847    
2848           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 2844  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2851  MATCHING A PATTERN: THE TRADITIONAL FUNC
2851    
2852           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
2853    
2854         Either  code  or  subject  was  passed as NULL, or ovector was NULL and         Either code or subject was passed as NULL,  or  ovector  was  NULL  and
2855         ovecsize was not zero.         ovecsize was not zero.
2856    
2857           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 2853  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2860  MATCHING A PATTERN: THE TRADITIONAL FUNC
2860    
2861           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
2862    
2863         PCRE stores a 4-byte "magic number" at the start of the compiled  code,         PCRE  stores a 4-byte "magic number" at the start of the compiled code,
2864         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
2865         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
2866         an  environment  with the other endianness. This is the error that PCRE         an environment with the other endianness. This is the error  that  PCRE
2867         gives when the magic number is not present.         gives when the magic number is not present.
2868    
2869           PCRE_ERROR_UNKNOWN_OPCODE (-5)           PCRE_ERROR_UNKNOWN_OPCODE (-5)
2870    
2871         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
2872         compiled  pattern.  This  error  could be caused by a bug in PCRE or by         compiled pattern. This error could be caused by a bug  in  PCRE  or  by
2873         overwriting of the compiled pattern.         overwriting of the compiled pattern.
2874    
2875           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2876    
2877         If a pattern contains back references, but the ovector that  is  passed         If  a  pattern contains back references, but the ovector that is passed
2878         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
2879         PCRE gets a block of memory at the start of matching to  use  for  this         PCRE  gets  a  block of memory at the start of matching to use for this
2880         purpose.  If the call via pcre_malloc() fails, this error is given. The         purpose. If the call via pcre_malloc() fails, this error is given.  The
2881         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
2882    
2883         This error is also given if pcre_stack_malloc() fails  in  pcre_exec().         This  error  is also given if pcre_stack_malloc() fails in pcre_exec().
2884         This  can happen only when PCRE has been compiled with --disable-stack-         This can happen only when PCRE has been compiled with  --disable-stack-
2885         for-recursion.         for-recursion.
2886    
2887           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2888    
2889         This error is used by the pcre_copy_substring(),  pcre_get_substring(),         This  error is used by the pcre_copy_substring(), pcre_get_substring(),
2890         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
2891         returned by pcre_exec().         returned by pcre_exec().
2892    
2893           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
2894    
2895         The backtracking limit, as specified by  the  match_limit  field  in  a         The  backtracking  limit,  as  specified  by the match_limit field in a
2896         pcre_extra  structure  (or  defaulted) was reached. See the description         pcre_extra structure (or defaulted) was reached.  See  the  description
2897         above.         above.
2898    
2899           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
2900    
2901         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
2902         use  by  callout functions that want to yield a distinctive error code.         use by callout functions that want to yield a distinctive  error  code.
2903         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
2904    
2905           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
2906    
2907         A string that contains an invalid UTF-8 byte sequence was passed  as  a         A  string  that contains an invalid UTF-8 byte sequence was passed as a
2908         subject,  and the PCRE_NO_UTF8_CHECK option was not set. If the size of         subject, and the PCRE_NO_UTF8_CHECK option was not set. If the size  of
2909         the output vector (ovecsize) is at least 2,  the  byte  offset  to  the         the  output  vector  (ovecsize)  is  at least 2, the byte offset to the
2910         start  of  the  the invalid UTF-8 character is placed in the first ele-         start of the the invalid UTF-8 character is placed in  the  first  ele-
2911         ment, and a reason code is placed in the  second  element.  The  reason         ment,  and  a  reason  code is placed in the second element. The reason
2912         codes are listed in the following section.  For backward compatibility,         codes are listed in the following section.  For backward compatibility,
2913         if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8  char-         if  PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 char-
2914         acter   at   the   end   of   the   subject  (reason  codes  1  to  5),         acter  at  the  end  of  the   subject   (reason   codes   1   to   5),
2915         PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.         PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.
2916    
2917           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
2918    
2919         The UTF-8 byte sequence that was passed as a subject  was  checked  and         The  UTF-8  byte  sequence that was passed as a subject was checked and
2920         found  to be valid (the PCRE_NO_UTF8_CHECK option was not set), but the         found to be valid (the PCRE_NO_UTF8_CHECK option was not set), but  the
2921         value of startoffset did not point to the beginning of a UTF-8  charac-         value  of startoffset did not point to the beginning of a UTF-8 charac-
2922         ter or the end of the subject.         ter or the end of the subject.
2923    
2924           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
2925    
2926         The  subject  string did not match, but it did match partially. See the         The subject string did not match, but it did match partially.  See  the
2927         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
2928    
2929           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
2930    
2931         This code is no longer in  use.  It  was  formerly  returned  when  the         This  code  is  no  longer  in  use.  It was formerly returned when the
2932         PCRE_PARTIAL  option  was used with a compiled pattern containing items         PCRE_PARTIAL option was used with a compiled pattern  containing  items
2933         that were  not  supported  for  partial  matching.  From  release  8.00         that  were  not  supported  for  partial  matching.  From  release 8.00
2934         onwards, there are no restrictions on partial matching.         onwards, there are no restrictions on partial matching.
2935    
2936           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
2937    
2938         An  unexpected  internal error has occurred. This error could be caused         An unexpected internal error has occurred. This error could  be  caused
2939         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
2940    
2941           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
# Line 2938  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2945  MATCHING A PATTERN: THE TRADITIONAL FUNC
2945           PCRE_ERROR_RECURSIONLIMIT (-21)           PCRE_ERROR_RECURSIONLIMIT (-21)
2946    
2947         The internal recursion limit, as specified by the match_limit_recursion         The internal recursion limit, as specified by the match_limit_recursion
2948         field  in  a  pcre_extra  structure (or defaulted) was reached. See the         field in a pcre_extra structure (or defaulted)  was  reached.  See  the
2949         description above.         description above.
2950    
2951           PCRE_ERROR_BADNEWLINE     (-23)           PCRE_ERROR_BADNEWLINE     (-23)
# Line 2952  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2959  MATCHING A PATTERN: THE TRADITIONAL FUNC
2959    
2960           PCRE_ERROR_SHORTUTF8      (-25)           PCRE_ERROR_SHORTUTF8      (-25)
2961    
2962         This  error  is returned instead of PCRE_ERROR_BADUTF8 when the subject         This error is returned instead of PCRE_ERROR_BADUTF8 when  the  subject
2963         string ends with a truncated UTF-8 character and the  PCRE_PARTIAL_HARD         string  ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD
2964         option  is  set.   Information  about  the  failure  is returned as for         option is set.  Information  about  the  failure  is  returned  as  for
2965         PCRE_ERROR_BADUTF8. It is in fact sufficient to detect this  case,  but         PCRE_ERROR_BADUTF8.  It  is in fact sufficient to detect this case, but
2966         this  special error code for PCRE_PARTIAL_HARD precedes the implementa-         this special error code for PCRE_PARTIAL_HARD precedes the  implementa-
2967         tion of returned information; it is retained for backwards  compatibil-         tion  of returned information; it is retained for backwards compatibil-
2968         ity.         ity.
2969    
2970           PCRE_ERROR_RECURSELOOP    (-26)           PCRE_ERROR_RECURSELOOP    (-26)
2971    
2972         This error is returned when pcre_exec() detects a recursion loop within         This error is returned when pcre_exec() detects a recursion loop within
2973         the pattern. Specifically, it means that either the whole pattern or  a         the  pattern. Specifically, it means that either the whole pattern or a
2974         subpattern  has been called recursively for the second time at the same         subpattern has been called recursively for the second time at the  same
2975         position in the subject string. Some simple patterns that might do this         position in the subject string. Some simple patterns that might do this
2976         are  detected  and faulted at compile time, but more complicated cases,         are detected and faulted at compile time, but more  complicated  cases,
2977         in particular mutual recursions between two different subpatterns, can-         in particular mutual recursions between two different subpatterns, can-
2978         not be detected until run time.         not be detected until run time.
2979    
2980           PCRE_ERROR_JIT_STACKLIMIT (-27)           PCRE_ERROR_JIT_STACKLIMIT (-27)
2981    
2982         This  error  is  returned  when a pattern that was successfully studied         This error is returned when a pattern  that  was  successfully  studied
2983         using the PCRE_STUDY_JIT_COMPILE option is being matched, but the  mem-         using  a  JIT compile option is being matched, but the memory available
2984         ory  available  for  the  just-in-time  processing  stack  is not large         for the just-in-time processing stack is  not  large  enough.  See  the
2985         enough. See the pcrejit documentation for more details.         pcrejit documentation for more details.
2986    
2987           PCRE_ERROR_BADMODE (-28)           PCRE_ERROR_BADMODE (-28)
2988    
# Line 2984  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2991  MATCHING A PATTERN: THE TRADITIONAL FUNC
2991    
2992           PCRE_ERROR_BADENDIANNESS (-29)           PCRE_ERROR_BADENDIANNESS (-29)
2993    
2994         This  error  is  given  if  a  pattern  that  was compiled and saved is         This error is given if  a  pattern  that  was  compiled  and  saved  is
2995         reloaded on a host with  different  endianness.  The  utility  function         reloaded  on  a  host  with  different endianness. The utility function
2996         pcre_pattern_to_host_byte_order() can be used to convert such a pattern         pcre_pattern_to_host_byte_order() can be used to convert such a pattern
2997         so that it runs on the new host.         so that it runs on the new host.
2998    
# Line 2993  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3000  MATCHING A PATTERN: THE TRADITIONAL FUNC
3000    
3001     Reason codes for invalid UTF-8 strings     Reason codes for invalid UTF-8 strings
3002    
3003         This section applies only  to  the  8-bit  library.  The  corresponding         This  section  applies  only  to  the  8-bit library. The corresponding
3004         information for the 16-bit library is given in the pcre16 page.         information for the 16-bit library is given in the pcre16 page.
3005    
3006         When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-         When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-
3007         UTF8, and the size of the output vector (ovecsize) is at least  2,  the         UTF8,  and  the size of the output vector (ovecsize) is at least 2, the
3008         offset  of  the  start  of the invalid UTF-8 character is placed in the         offset of the start of the invalid UTF-8 character  is  placed  in  the
3009         first output vector element (ovector[0]) and a reason code is placed in         first output vector element (ovector[0]) and a reason code is placed in
3010         the  second  element  (ovector[1]). The reason codes are given names in         the second element (ovector[1]). The reason codes are  given  names  in
3011         the pcre.h header file:         the pcre.h header file:
3012    
3013           PCRE_UTF8_ERR1           PCRE_UTF8_ERR1
# Line 3009  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3016  MATCHING A PATTERN: THE TRADITIONAL FUNC
3016           PCRE_UTF8_ERR4           PCRE_UTF8_ERR4
3017           PCRE_UTF8_ERR5           PCRE_UTF8_ERR5
3018    
3019         The string ends with a truncated UTF-8 character;  the  code  specifies         The  string  ends  with a truncated UTF-8 character; the code specifies
3020         how  many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8         how many bytes are missing (1 to 5). Although RFC 3629 restricts  UTF-8
3021         characters to be no longer than 4 bytes, the  encoding  scheme  (origi-         characters  to  be  no longer than 4 bytes, the encoding scheme (origi-
3022         nally  defined  by  RFC  2279)  allows  for  up to 6 bytes, and this is         nally defined by RFC 2279) allows for  up  to  6  bytes,  and  this  is
3023         checked first; hence the possibility of 4 or 5 missing bytes.         checked first; hence the possibility of 4 or 5 missing bytes.
3024    
3025           PCRE_UTF8_ERR6           PCRE_UTF8_ERR6
# Line 3022  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3029  MATCHING A PATTERN: THE TRADITIONAL FUNC
3029           PCRE_UTF8_ERR10           PCRE_UTF8_ERR10
3030    
3031         The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of         The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of
3032         the  character  do  not have the binary value 0b10 (that is, either the         the character do not have the binary value 0b10 (that  is,  either  the
3033         most significant bit is 0, or the next bit is 1).         most significant bit is 0, or the next bit is 1).
3034    
3035           PCRE_UTF8_ERR11           PCRE_UTF8_ERR11
3036           PCRE_UTF8_ERR12           PCRE_UTF8_ERR12
3037    
3038         A character that is valid by the RFC 2279 rules is either 5 or 6  bytes         A  character that is valid by the RFC 2279 rules is either 5 or 6 bytes
3039         long; these code points are excluded by RFC 3629.         long; these code points are excluded by RFC 3629.
3040    
3041           PCRE_UTF8_ERR13           PCRE_UTF8_ERR13
3042    
3043         A  4-byte character has a value greater than 0x10fff; these code points         A 4-byte character has a value greater than 0x10fff; these code  points
3044         are excluded by RFC 3629.         are excluded by RFC 3629.
3045    
3046           PCRE_UTF8_ERR14           PCRE_UTF8_ERR14
3047    
3048         A 3-byte character has a value in the  range  0xd800  to  0xdfff;  this         A  3-byte  character  has  a  value in the range 0xd800 to 0xdfff; this
3049         range  of code points are reserved by RFC 3629 for use with UTF-16, and         range of code points are reserved by RFC 3629 for use with UTF-16,  and
3050         so are excluded from UTF-8.         so are excluded from UTF-8.
3051    
3052           PCRE_UTF8_ERR15           PCRE_UTF8_ERR15
# Line 3048  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3055  MATCHING A PATTERN: THE TRADITIONAL FUNC
3055           PCRE_UTF8_ERR18           PCRE_UTF8_ERR18
3056           PCRE_UTF8_ERR19           PCRE_UTF8_ERR19
3057    
3058         A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it  codes         A  2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes
3059         for  a  value that can be represented by fewer bytes, which is invalid.         for a value that can be represented by fewer bytes, which  is  invalid.
3060         For example, the two bytes 0xc0, 0xae give the value 0x2e,  whose  cor-         For  example,  the two bytes 0xc0, 0xae give the value 0x2e, whose cor-
3061         rect coding uses just one byte.         rect coding uses just one byte.
3062    
3063           PCRE_UTF8_ERR20           PCRE_UTF8_ERR20
3064    
3065         The two most significant bits of the first byte of a character have the         The two most significant bits of the first byte of a character have the
3066         binary value 0b10 (that is, the most significant bit is 1 and the  sec-         binary  value 0b10 (that is, the most significant bit is 1 and the sec-
3067         ond  is  0). Such a byte can only validly occur as the second or subse-         ond is 0). Such a byte can only validly occur as the second  or  subse-
3068         quent byte of a multi-byte character.         quent byte of a multi-byte character.
3069    
3070           PCRE_UTF8_ERR21           PCRE_UTF8_ERR21
3071    
3072         The first byte of a character has the value 0xfe or 0xff. These  values         The  first byte of a character has the value 0xfe or 0xff. These values
3073         can never occur in a valid UTF-8 string.         can never occur in a valid UTF-8 string.
3074    
3075    
# Line 3079  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 3086  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
3086         int pcre_get_substring_list(const char *subject,         int pcre_get_substring_list(const char *subject,
3087              int *ovector, int stringcount, const char ***listptr);              int *ovector, int stringcount, const char ***listptr);
3088    
3089         Captured  substrings  can  be  accessed  directly  by using the offsets         Captured substrings can be  accessed  directly  by  using  the  offsets
3090         returned by pcre_exec() in  ovector.  For  convenience,  the  functions         returned  by  pcre_exec()  in  ovector.  For convenience, the functions
3091         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
3092         string_list() are provided for extracting captured substrings  as  new,         string_list()  are  provided for extracting captured substrings as new,
3093         separate,  zero-terminated strings. These functions identify substrings         separate, zero-terminated strings. These functions identify  substrings
3094         by number. The next section describes functions  for  extracting  named         by  number.  The  next section describes functions for extracting named
3095         substrings.         substrings.
3096    
3097         A  substring that contains a binary zero is correctly extracted and has         A substring that contains a binary zero is correctly extracted and  has
3098         a further zero added on the end, but the result is not, of course, a  C         a  further zero added on the end, but the result is not, of course, a C
3099         string.   However,  you  can  process such a string by referring to the         string.  However, you can process such a string  by  referring  to  the
3100         length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-         length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-
3101         string().  Unfortunately, the interface to pcre_get_substring_list() is         string().  Unfortunately, the interface to pcre_get_substring_list() is
3102         not adequate for handling strings containing binary zeros, because  the         not  adequate for handling strings containing binary zeros, because the
3103         end of the final string is not independently indicated.         end of the final string is not independently indicated.
3104    
3105         The  first  three  arguments  are the same for all three of these func-         The first three arguments are the same for all  three  of  these  func-
3106         tions: subject is the subject string that has  just  been  successfully         tions:  subject  is  the subject string that has just been successfully
3107         matched, ovector is a pointer to the vector of integer offsets that was         matched, ovector is a pointer to the vector of integer offsets that was
3108         passed to pcre_exec(), and stringcount is the number of substrings that         passed to pcre_exec(), and stringcount is the number of substrings that
3109         were  captured  by  the match, including the substring that matched the         were captured by the match, including the substring  that  matched  the
3110         entire regular expression. This is the value returned by pcre_exec() if         entire regular expression. This is the value returned by pcre_exec() if
3111         it  is greater than zero. If pcre_exec() returned zero, indicating that         it is greater than zero. If pcre_exec() returned zero, indicating  that
3112         it ran out of space in ovector, the value passed as stringcount  should         it  ran out of space in ovector, the value passed as stringcount should
3113         be the number of elements in the vector divided by three.         be the number of elements in the vector divided by three.
3114    
3115         The  functions pcre_copy_substring() and pcre_get_substring() extract a         The functions pcre_copy_substring() and pcre_get_substring() extract  a
3116         single substring, whose number is given as  stringnumber.  A  value  of         single  substring,  whose  number  is given as stringnumber. A value of
3117         zero  extracts  the  substring that matched the entire pattern, whereas         zero extracts the substring that matched the  entire  pattern,  whereas
3118         higher values  extract  the  captured  substrings.  For  pcre_copy_sub-         higher  values  extract  the  captured  substrings.  For pcre_copy_sub-
3119         string(),  the  string  is  placed  in buffer, whose length is given by         string(), the string is placed in buffer,  whose  length  is  given  by
3120         buffersize, while for pcre_get_substring() a new  block  of  memory  is         buffersize,  while  for  pcre_get_substring()  a new block of memory is
3121         obtained  via  pcre_malloc,  and its address is returned via stringptr.         obtained via pcre_malloc, and its address is  returned  via  stringptr.
3122         The yield of the function is the length of the  string,  not  including         The  yield  of  the function is the length of the string, not including
3123         the terminating zero, or one of these error codes:         the terminating zero, or one of these error codes:
3124    
3125           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
3126    
3127         The  buffer  was too small for pcre_copy_substring(), or the attempt to         The buffer was too small for pcre_copy_substring(), or the  attempt  to
3128         get memory failed for pcre_get_substring().         get memory failed for pcre_get_substring().
3129    
3130           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
3131    
3132         There is no substring whose number is stringnumber.         There is no substring whose number is stringnumber.
3133    
3134         The pcre_get_substring_list()  function  extracts  all  available  sub-         The  pcre_get_substring_list()  function  extracts  all  available sub-
3135         strings  and  builds  a list of pointers to them. All this is done in a         strings and builds a list of pointers to them. All this is  done  in  a
3136         single block of memory that is obtained via pcre_malloc. The address of         single block of memory that is obtained via pcre_malloc. The address of
3137         the  memory  block  is returned via listptr, which is also the start of         the memory block is returned via listptr, which is also  the  start  of
3138         the list of string pointers. The end of the list is marked  by  a  NULL         the  list  of  string pointers. The end of the list is marked by a NULL
3139         pointer.  The  yield  of  the function is zero if all went well, or the         pointer. The yield of the function is zero if all  went  well,  or  the
3140         error code         error code
3141    
3142           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
3143    
3144         if the attempt to get the memory block failed.         if the attempt to get the memory block failed.
3145    
3146         When any of these functions encounter a substring that is unset,  which         When  any of these functions encounter a substring that is unset, which
3147         can  happen  when  capturing subpattern number n+1 matches some part of         can happen when capturing subpattern number n+1 matches  some  part  of
3148         the subject, but subpattern n has not been used at all, they return  an         the  subject, but subpattern n has not been used at all, they return an
3149         empty string. This can be distinguished from a genuine zero-length sub-         empty string. This can be distinguished from a genuine zero-length sub-
3150         string by inspecting the appropriate offset in ovector, which is  nega-         string  by inspecting the appropriate offset in ovector, which is nega-
3151         tive for unset substrings.         tive for unset substrings.
3152    
3153         The  two convenience functions pcre_free_substring() and pcre_free_sub-         The two convenience functions pcre_free_substring() and  pcre_free_sub-
3154         string_list() can be used to free the memory  returned  by  a  previous         string_list()  can  be  used  to free the memory returned by a previous
3155         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
3156         tively. They do nothing more than  call  the  function  pointed  to  by         tively.  They  do  nothing  more  than  call the function pointed to by
3157         pcre_free,  which  of course could be called directly from a C program.         pcre_free, which of course could be called directly from a  C  program.
3158         However, PCRE is used in some situations where it is linked via a  spe-         However,  PCRE is used in some situations where it is linked via a spe-
3159         cial   interface  to  another  programming  language  that  cannot  use         cial  interface  to  another  programming  language  that  cannot   use
3160         pcre_free directly; it is for these cases that the functions  are  pro-         pcre_free  directly;  it is for these cases that the functions are pro-
3161         vided.         vided.
3162    
3163    
# Line 3169  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 3176  EXTRACTING CAPTURED SUBSTRINGS BY NAME
3176              int stringcount, const char *stringname,              int stringcount, const char *stringname,
3177              const char **stringptr);              const char **stringptr);
3178    
3179         To  extract a substring by name, you first have to find associated num-         To extract a substring by name, you first have to find associated  num-
3180         ber.  For example, for this pattern         ber.  For example, for this pattern
3181    
3182           (a+)b(?<xxx>\d+)...           (a+)b(?<xxx>\d+)...
# Line 3178  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 3185  EXTRACTING CAPTURED SUBSTRINGS BY NAME
3185         be unique (PCRE_DUPNAMES was not set), you can find the number from the         be unique (PCRE_DUPNAMES was not set), you can find the number from the
3186         name by calling pcre_get_stringnumber(). The first argument is the com-         name by calling pcre_get_stringnumber(). The first argument is the com-
3187         piled pattern, and the second is the name. The yield of the function is         piled pattern, and the second is the name. The yield of the function is
3188         the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no         the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no
3189         subpattern of that name.         subpattern of that name.
3190    
3191         Given the number, you can extract the substring directly, or use one of         Given the number, you can extract the substring directly, or use one of
3192         the functions described in the previous section. For convenience, there         the functions described in the previous section. For convenience, there
3193         are also two functions that do the whole job.         are also two functions that do the whole job.
3194    
3195         Most    of    the    arguments   of   pcre_copy_named_substring()   and         Most   of   the   arguments    of    pcre_copy_named_substring()    and
3196         pcre_get_named_substring() are the same  as  those  for  the  similarly         pcre_get_named_substring()  are  the  same  as  those for the similarly
3197         named  functions  that extract by number. As these are described in the         named functions that extract by number. As these are described  in  the
3198         previous section, they are not re-described here. There  are  just  two         previous  section,  they  are not re-described here. There are just two
3199         differences:         differences:
3200    
3201         First,  instead  of a substring number, a substring name is given. Sec-         First, instead of a substring number, a substring name is  given.  Sec-
3202         ond, there is an extra argument, given at the start, which is a pointer         ond, there is an extra argument, given at the start, which is a pointer
3203         to  the compiled pattern. This is needed in order to gain access to the         to the compiled pattern. This is needed in order to gain access to  the
3204         name-to-number translation table.         name-to-number translation table.
3205    
3206         These functions call pcre_get_stringnumber(), and if it succeeds,  they         These  functions call pcre_get_stringnumber(), and if it succeeds, they
3207         then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
3208         ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
3209         behaviour may not be what you want (see the next section).         behaviour may not be what you want (see the next section).
3210    
3211         Warning: If the pattern uses the (?| feature to set up multiple subpat-         Warning: If the pattern uses the (?| feature to set up multiple subpat-
3212         terns with the same number, as described in the  section  on  duplicate         terns  with  the  same number, as described in the section on duplicate
3213         subpattern  numbers  in  the  pcrepattern page, you cannot use names to         subpattern numbers in the pcrepattern page, you  cannot  use  names  to
3214         distinguish the different subpatterns, because names are  not  included         distinguish  the  different subpatterns, because names are not included
3215         in  the compiled code. The matching process uses only numbers. For this         in the compiled code. The matching process uses only numbers. For  this
3216         reason, the use of different names for subpatterns of the  same  number         reason,  the  use of different names for subpatterns of the same number
3217         causes an error at compile time.         causes an error at compile time.
3218    
3219    
# Line 3215  DUPLICATE SUBPATTERN NAMES Line 3222  DUPLICATE SUBPATTERN NAMES
3222         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
3223              const char *name, char **first, char **last);              const char *name, char **first, char **last);
3224    
3225         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
3226         subpatterns are not required to be unique. (Duplicate names are  always         subpatterns  are not required to be unique. (Duplicate names are always
3227         allowed  for subpatterns with the same number, created by using the (?|         allowed for subpatterns with the same number, created by using the  (?|
3228         feature. Indeed, if such subpatterns are named, they  are  required  to         feature.  Indeed,  if  such subpatterns are named, they are required to
3229         use the same names.)         use the same names.)
3230    
3231         Normally, patterns with duplicate names are such that in any one match,         Normally, patterns with duplicate names are such that in any one match,
3232         only one of the named subpatterns participates. An example is shown  in         only  one of the named subpatterns participates. An example is shown in
3233         the pcrepattern documentation.         the pcrepattern documentation.
3234    
3235         When    duplicates   are   present,   pcre_copy_named_substring()   and         When   duplicates   are   present,   pcre_copy_named_substring()    and
3236         pcre_get_named_substring() return the first substring corresponding  to         pcre_get_named_substring()  return the first substring corresponding to
3237         the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING         the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
3238         (-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()         (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
3239         function  returns one of the numbers that are associated with the name,         function returns one of the numbers that are associated with the  name,
3240         but it is not defined which it is.         but it is not defined which it is.
3241    
3242         If you want to get full details of all captured substrings for a  given         If  you want to get full details of all captured substrings for a given
3243         name,  you  must  use  the pcre_get_stringtable_entries() function. The         name, you must use  the  pcre_get_stringtable_entries()  function.  The
3244         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
3245         third  and  fourth  are  pointers to variables which are updated by the         third and fourth are pointers to variables which  are  updated  by  the
3246         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
3247         the  name-to-number  table  for  the  given  name.  The function itself         the name-to-number table  for  the  given  name.  The  function  itself
3248         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
3249         there  are none. The format of the table is described above in the sec-         there are none. The format of the table is described above in the  sec-
3250         tion entitled Information about a pattern above.  Given all  the  rele-         tion  entitled  Information about a pattern above.  Given all the rele-
3251         vant  entries  for the name, you can extract each of their numbers, and         vant entries for the name, you can extract each of their  numbers,  and
3252         hence the captured data, if any.         hence the captured data, if any.
3253    
3254    
3255  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
3256    
3257         The traditional matching function uses a  similar  algorithm  to  Perl,         The  traditional  matching  function  uses a similar algorithm to Perl,
3258         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
3259         the subject. If you want to find all possible matches, or  the  longest         the  subject.  If you want to find all possible matches, or the longest
3260         possible  match,  consider using the alternative matching function (see         possible match, consider using the alternative matching  function  (see
3261         below) instead. If you cannot use the alternative function,  but  still         below)  instead.  If you cannot use the alternative function, but still
3262         need  to  find all possible matches, you can kludge it up by making use         need to find all possible matches, you can kludge it up by  making  use
3263         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
3264         tation.         tation.
3265    
3266         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
3267         tern.  When your callout function is called, extract and save the  cur-         tern.   When your callout function is called, extract and save the cur-
3268         rent  matched  substring.  Then  return  1, which forces pcre_exec() to         rent matched substring. Then return  1,  which  forces  pcre_exec()  to
3269         backtrack and try other alternatives. Ultimately, when it runs  out  of         backtrack  and  try other alternatives. Ultimately, when it runs out of
3270         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
3271    
3272    
3273  OBTAINING AN ESTIMATE OF STACK USAGE  OBTAINING AN ESTIMATE OF STACK USAGE
3274    
3275         Matching  certain  patterns  using pcre_exec() can use a lot of process         Matching certain patterns using pcre_exec() can use a  lot  of  process
3276         stack, which in certain environments can be  rather  limited  in  size.         stack,  which  in  certain  environments can be rather limited in size.
3277         Some  users  find it helpful to have an estimate of the amount of stack         Some users find it helpful to have an estimate of the amount  of  stack
3278         that is used by pcre_exec(), to help  them  set  recursion  limits,  as         that  is  used  by  pcre_exec(),  to help them set recursion limits, as
3279         described  in  the pcrestack documentation. The estimate that is output         described in the pcrestack documentation. The estimate that  is  output
3280         by pcretest when called with the -m and -C options is obtained by call-         by pcretest when called with the -m and -C options is obtained by call-
3281         ing  pcre_exec with the values NULL, NULL, NULL, -999, and -999 for its         ing pcre_exec with the values NULL, NULL, NULL, -999, and -999 for  its
3282         first five arguments.         first five arguments.
3283    
3284         Normally, if  its  first  argument  is  NULL,  pcre_exec()  immediately         Normally,  if  its  first  argument  is  NULL,  pcre_exec() immediately
3285         returns  the negative error code PCRE_ERROR_NULL, but with this special         returns the negative error code PCRE_ERROR_NULL, but with this  special
3286         combination of arguments, it returns instead a  negative  number  whose         combination  of  arguments,  it returns instead a negative number whose
3287         absolute  value  is the approximate stack frame size in bytes. (A nega-         absolute value is the approximate stack frame size in bytes.  (A  nega-
3288         tive number is used so that it is clear that no  match  has  happened.)         tive  number  is  used so that it is clear that no match has happened.)
3289         The  value  is  approximate  because  in some cases, recursive calls to         The value is approximate because in  some  cases,  recursive  calls  to
3290         pcre_exec() occur when there are one or two additional variables on the         pcre_exec() occur when there are one or two additional variables on the
3291         stack.         stack.
3292    
3293         If  PCRE  has  been  compiled  to use the heap instead of the stack for         If PCRE has been compiled to use the heap  instead  of  the  stack  for
3294         recursion, the value returned  is  the  size  of  each  block  that  is         recursion,  the  value  returned  is  the  size  of  each block that is
3295         obtained from the heap.         obtained from the heap.
3296    
3297    
# Line 3295  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3302  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3302              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
3303              int *workspace, int wscount);              int *workspace, int wscount);
3304    
3305         The  function  pcre_dfa_exec()  is  called  to  match  a subject string         The function pcre_dfa_exec()  is  called  to  match  a  subject  string
3306         against a compiled pattern, using a matching algorithm that  scans  the         against  a  compiled pattern, using a matching algorithm that scans the
3307         subject  string  just  once, and does not backtrack. This has different         subject string just once, and does not backtrack.  This  has  different
3308         characteristics to the normal algorithm, and  is  not  compatible  with         characteristics  to  the  normal  algorithm, and is not compatible with
3309         Perl.  Some  of the features of PCRE patterns are not supported. Never-         Perl. Some of the features of PCRE patterns are not  supported.  Never-
3310         theless, there are times when this kind of matching can be useful.  For         theless,  there are times when this kind of matching can be useful. For
3311         a  discussion  of  the  two matching algorithms, and a list of features         a discussion of the two matching algorithms, and  a  list  of  features
3312         that pcre_dfa_exec() does not support, see the pcrematching  documenta-         that  pcre_dfa_exec() does not support, see the pcrematching documenta-
3313         tion.         tion.
3314    
3315         The  arguments  for  the  pcre_dfa_exec()  function are the same as for         The arguments for the pcre_dfa_exec() function  are  the  same  as  for
3316         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
3317         ent  way,  and  this is described below. The other common arguments are         ent way, and this is described below. The other  common  arguments  are
3318         used in the same way as for pcre_exec(), so their  description  is  not         used  in  the  same way as for pcre_exec(), so their description is not
3319         repeated here.         repeated here.
3320    
3321         The  two  additional  arguments provide workspace for the function. The         The two additional arguments provide workspace for  the  function.  The
3322         workspace vector should contain at least 20 elements. It  is  used  for         workspace  vector  should  contain at least 20 elements. It is used for
3323         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
3324         workspace will be needed for patterns and subjects where  there  are  a         workspace  will  be  needed for patterns and subjects where there are a
3325         lot of potential matches.         lot of potential matches.
3326    
3327         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 3336  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3343  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3343    
3344     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
3345    
3346         The  unused  bits  of  the options argument for pcre_dfa_exec() must be         The unused bits of the options argument  for  pcre_dfa_exec()  must  be
3347         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
3348         LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,         LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
3349         PCRE_NOTEMPTY_ATSTART,      PCRE_NO_UTF8_CHECK,       PCRE_BSR_ANYCRLF,         PCRE_NOTEMPTY_ATSTART,       PCRE_NO_UTF8_CHECK,      PCRE_BSR_ANYCRLF,
3350         PCRE_BSR_UNICODE,  PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD, PCRE_PAR-         PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD,  PCRE_PAR-
3351         TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART.  All but  the  last         TIAL_SOFT,  PCRE_DFA_SHORTEST,  and PCRE_DFA_RESTART.  All but the last
3352         four  of  these  are  exactly  the  same  as  for pcre_exec(), so their         four of these are  exactly  the  same  as  for  pcre_exec(),  so  their
3353         description is not repeated here.         description is not repeated here.
3354    
3355           PCRE_PARTIAL_HARD           PCRE_PARTIAL_HARD
3356           PCRE_PARTIAL_SOFT           PCRE_PARTIAL_SOFT
3357    
3358         These have the same general effect as they do for pcre_exec(), but  the         These  have the same general effect as they do for pcre_exec(), but the
3359         details  are  slightly  different.  When  PCRE_PARTIAL_HARD  is set for         details are slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for
3360         pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of  the  sub-         pcre_dfa_exec(),  it  returns PCRE_ERROR_PARTIAL if the end of the sub-
3361         ject  is  reached  and there is still at least one matching possibility         ject is reached and there is still at least  one  matching  possibility
3362         that requires additional characters. This happens even if some complete         that requires additional characters. This happens even if some complete
3363         matches have also been found. When PCRE_PARTIAL_SOFT is set, the return         matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
3364         code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end         code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
3365         of  the  subject  is  reached, there have been no complete matches, but         of the subject is reached, there have been  no  complete  matches,  but
3366         there is still at least one matching possibility. The  portion  of  the         there  is  still  at least one matching possibility. The portion of the
3367         string  that  was inspected when the longest partial match was found is         string that was inspected when the longest partial match was  found  is
3368         set as the first matching string  in  both  cases.   There  is  a  more         set  as  the  first  matching  string  in  both cases.  There is a more
3369         detailed  discussion  of partial and multi-segment matching, with exam-         detailed discussion of partial and multi-segment matching,  with  exam-
3370         ples, in the pcrepartial documentation.         ples, in the pcrepartial documentation.
3371    
3372           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
3373    
3374         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
3375         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
3376         tive algorithm works, this is necessarily the shortest  possible  match         tive  algorithm  works, this is necessarily the shortest possible match
3377         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
3378    
3379           PCRE_DFA_RESTART           PCRE_DFA_RESTART
3380    
3381         When pcre_dfa_exec() returns a partial match, it is possible to call it         When pcre_dfa_exec() returns a partial match, it is possible to call it
3382         again, with additional subject characters, and have  it  continue  with         again,  with  additional  subject characters, and have it continue with
3383         the  same match. The PCRE_DFA_RESTART option requests this action; when         the same match. The PCRE_DFA_RESTART option requests this action;  when
3384         it is set, the workspace and wscount options must  reference  the  same         it  is  set,  the workspace and wscount options must reference the same
3385         vector  as  before  because data about the match so far is left in them         vector as before because data about the match so far is  left  in  them
3386         after a partial match. There is more discussion of this facility in the         after a partial match. There is more discussion of this facility in the
3387         pcrepartial documentation.         pcrepartial documentation.
3388    
3389     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
3390    
3391         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
3392         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
3393         of  the  function  start  at the same point in the subject. The shorter         of the function start at the same point in  the  subject.  The  shorter
3394         matches are all initial substrings of the longer matches. For  example,         matches  are all initial substrings of the longer matches. For example,
3395         if the pattern         if the pattern
3396    
3397           <.*>           <.*>
# Line 3399  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3406  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3406           <something> <something else>           <something> <something else>
3407           <something> <something else> <something further>           <something> <something else> <something further>
3408    
3409         On  success,  the  yield of the function is a number greater than zero,         On success, the yield of the function is a number  greater  than  zero,
3410         which is the number of matched substrings.  The  substrings  themselves         which  is  the  number of matched substrings. The substrings themselves
3411         are  returned  in  ovector. Each string uses two elements; the first is         are returned in ovector. Each string uses two elements;  the  first  is
3412         the offset to the start, and the second is the offset to  the  end.  In         the  offset  to  the start, and the second is the offset to the end. In
3413         fact,  all  the  strings  have the same start offset. (Space could have         fact, all the strings have the same start  offset.  (Space  could  have
3414         been saved by giving this only once, but it was decided to retain  some         been  saved by giving this only once, but it was decided to retain some
3415         compatibility  with  the  way pcre_exec() returns data, even though the         compatibility with the way pcre_exec() returns data,  even  though  the
3416         meaning of the strings is different.)         meaning of the strings is different.)
3417    
3418         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
3419         est  matching  string is given first. If there were too many matches to         est matching string is given first. If there were too many  matches  to
3420         fit into ovector, the yield of the function is zero, and the vector  is         fit  into ovector, the yield of the function is zero, and the vector is
3421         filled  with  the  longest matches. Unlike pcre_exec(), pcre_dfa_exec()         filled with the longest matches.  Unlike  pcre_exec(),  pcre_dfa_exec()
3422         can use the entire ovector for returning matched strings.         can use the entire ovector for returning matched strings.
3423    
3424     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
3425    
3426         The pcre_dfa_exec() function returns a negative number when  it  fails.         The  pcre_dfa_exec()  function returns a negative number when it fails.
3427         Many  of  the  errors  are  the  same as for pcre_exec(), and these are         Many of the errors are the same  as  for  pcre_exec(),  and  these  are
3428         described above.  There are in addition the following errors  that  are         described  above.   There are in addition the following errors that are
3429         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
3430    
3431           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
3432    
3433         This  return is given if pcre_dfa_exec() encounters an item in the pat-         This return is given if pcre_dfa_exec() encounters an item in the  pat-
3434         tern that it does not support, for instance, the use of \C  or  a  back         tern  that  it  does not support, for instance, the use of \C or a back
3435         reference.         reference.
3436    
3437           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
3438    
3439         This  return  is  given  if pcre_dfa_exec() encounters a condition item         This return is given if pcre_dfa_exec()  encounters  a  condition  item
3440         that uses a back reference for the condition, or a test  for  recursion         that  uses  a back reference for the condition, or a test for recursion
3441         in a specific group. These are not supported.         in a specific group. These are not supported.
3442    
3443           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
3444    
3445         This  return  is given if pcre_dfa_exec() is called with an extra block         This return is given if pcre_dfa_exec() is called with an  extra  block
3446         that contains a setting of  the  match_limit  or  match_limit_recursion         that  contains  a  setting  of the match_limit or match_limit_recursion
3447         fields.  This  is  not  supported (these fields are meaningless for DFA         fields. This is not supported (these fields  are  meaningless  for  DFA
3448         matching).         matching).
3449    
3450           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
3451    
3452         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
3453         workspace vector.         workspace vector.
3454    
3455           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
3456    
3457         When  a  recursive subpattern is processed, the matching function calls         When a recursive subpattern is processed, the matching  function  calls
3458         itself recursively, using private vectors for  ovector  and  workspace.         itself  recursively,  using  private vectors for ovector and workspace.
3459         This  error  is  given  if  the output vector is not large enough. This         This error is given if the output vector  is  not  large  enough.  This
3460         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
3461    
3462    
3463  SEE ALSO  SEE ALSO
3464    
3465         pcre16(3),  pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),   pcrematch-         pcre16(3),   pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),  pcrematch-
3466         ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),         ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),
3467         pcrestack(3).         pcrestack(3).
3468    
# Line 3469  AUTHOR Line 3476  AUTHOR
3476    
3477  REVISION  REVISION
3478    
3479         Last updated: 21 January 2012         Last updated: 22 February 2012
3480         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3481  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3482    
3483    
3484  PCRECALLOUT(3)                                                  PCRECALLOUT(3)  PCRECALLOUT(3)                                                  PCRECALLOUT(3)
3485    
3486    
# Line 3671  REVISION Line 3678  REVISION
3678         Last updated: 08 Janurary 2012         Last updated: 08 Janurary 2012
3679         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3680  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3681    
3682    
3683  PCRECOMPAT(3)                                                    PCRECOMPAT(3)  PCRECOMPAT(3)                                                    PCRECOMPAT(3)
3684    
3685    
# Line 3846  REVISION Line 3853  REVISION
3853         Last updated: 08 Januray 2012         Last updated: 08 Januray 2012
3854         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3855  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3856    
3857    
3858  PCREPATTERN(3)                                                  PCREPATTERN(3)  PCREPATTERN(3)                                                  PCREPATTERN(3)
3859    
3860    
# Line 6181  BACKTRACKING CONTROL Line 6188  BACKTRACKING CONTROL
6188         follows the colon, the effect is as if the colon were  not  there.  Any         follows the colon, the effect is as if the colon were  not  there.  Any
6189         number of these verbs may occur in a pattern.         number of these verbs may occur in a pattern.
6190    
6191       Optimizations that affect backtracking verbs
6192    
6193         PCRE  contains some optimizations that are used to speed up matching by         PCRE  contains some optimizations that are used to speed up matching by
6194         running some checks at the start of each match attempt. For example, it         running some checks at the start of each match attempt. For example, it
6195         may  know  the minimum length of matching subject, or that a particular         may  know  the minimum length of matching subject, or that a particular
# Line 6189  BACKTRACKING CONTROL Line 6198  BACKTRACKING CONTROL
6198         course, be processed. You can suppress the start-of-match optimizations         course, be processed. You can suppress the start-of-match optimizations
6199         by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-         by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-
6200         pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).         pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
6201           There is more discussion of this option in the section entitled "Option
6202           bits for pcre_exec()" in the pcreapi documentation.
6203    
6204         Experiments with Perl suggest that it too  has  similar  optimizations,         Experiments with Perl suggest that it too  has  similar  optimizations,
6205         sometimes leading to anomalous results.         sometimes leading to anomalous results.
# Line 6268  BACKTRACKING CONTROL Line 6279  BACKTRACKING CONTROL
6279           No match, mark = B           No match, mark = B
6280    
6281         Note  that  in  this  unanchored  example the mark is retained from the         Note  that  in  this  unanchored  example the mark is retained from the
6282         match attempt that started at the letter "X". Subsequent match attempts         match attempt that started at the letter "X" in the subject. Subsequent
6283         starting  at "P" and then with an empty string do not get as far as the         match attempts starting at "P" and then with an empty string do not get
6284         (*MARK) item, but nevertheless do not reset it.         as far as the (*MARK) item, but nevertheless do not reset it.
6285    
6286           If you are interested in  (*MARK)  values  after  failed  matches,  you
6287           should  probably  set  the PCRE_NO_START_OPTIMIZE option (see above) to
6288           ensure that the match is always attempted.
6289    
6290     Verbs that act after backtracking     Verbs that act after backtracking
6291    
# Line 6448  AUTHOR Line 6463  AUTHOR
6463    
6464  REVISION  REVISION
6465    
6466         Last updated: 09 January 2012         Last updated: 24 February 2012
6467         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
6468  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6469    
6470    
6471  PCRESYNTAX(3)                                                    PCRESYNTAX(3)  PCRESYNTAX(3)                                                    PCRESYNTAX(3)
6472    
6473    
# Line 6827  REVISION Line 6842  REVISION
6842         Last updated: 10 January 2012         Last updated: 10 January 2012
6843         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
6844  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6845    
6846    
6847  PCREUNICODE(3)                                                  PCREUNICODE(3)  PCREUNICODE(3)                                                  PCREUNICODE(3)
6848    
6849    
# Line 7025  REVISION Line 7040  REVISION
7040         Last updated: 13 January 2012         Last updated: 13 January 2012
7041         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7042  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7043    
7044    
7045  PCREJIT(3)                                                          PCREJIT(3)  PCREJIT(3)                                                          PCREJIT(3)
7046    
7047    
# Line 7072  AVAILABILITY OF JIT SUPPORT Line 7087  AVAILABILITY OF JIT SUPPORT
7087           MIPS 32-bit           MIPS 32-bit
7088           Power PC 32-bit and 64-bit           Power PC 32-bit and 64-bit
7089    
7090         The Power PC support is designated as experimental because it  has  not         If --enable-jit is set on an unsupported platform, compilation fails.
        been  fully  tested. If --enable-jit is set on an unsupported platform,  
        compilation fails.  
7091    
7092         A program that is linked with PCRE 8.20 or later can tell if  JIT  sup-         A program that is linked with PCRE 8.20 or later can tell if  JIT  sup-
7093         port  is  available  by  calling pcre_config() with the PCRE_CONFIG_JIT         port  is  available  by  calling pcre_config() with the PCRE_CONFIG_JIT
7094         option. The result is 1 when JIT is available, and  0  otherwise.  How-         option. The result is 1 when JIT is available, and  0  otherwise.  How-
7095         ever, a simple program does not need to check this in order to use JIT.         ever, a simple program does not need to check this in order to use JIT.
7096         The API is implemented in a way that falls back to  the  ordinary  PCRE         The API is implemented in a way that falls  back  to  the  interpretive
7097         code if JIT is not available.         code if JIT is not available.
7098    
7099         If  your program may sometimes be linked with versions of PCRE that are         If  your program may sometimes be linked with versions of PCRE that are
# Line 7099  SIMPLE USE OF JIT Line 7112  SIMPLE USE OF JIT
7112               pcre_exec().               pcre_exec().
7113    
7114           (2) Use pcre_free_study() to free the pcre_extra block when it is           (2) Use pcre_free_study() to free the pcre_extra block when it is
7115               no longer needed instead of just freeing it yourself. This               no longer needed, instead of just freeing it yourself. This
7116               ensures that any JIT data is also freed.               ensures that any JIT data is also freed.
7117    
7118         For  a  program  that may be linked with pre-8.20 versions of PCRE, you         For  a  program  that may be linked with pre-8.20 versions of PCRE, you
# Line 7118  SIMPLE USE OF JIT Line 7131  SIMPLE USE OF JIT
7131               pcre_free(study_ptr);               pcre_free(study_ptr);
7132           #endif           #endif
7133    
7134         In  some circumstances you may need to call additional functions. These         PCRE_STUDY_JIT_COMPILE  requests  the JIT compiler to generate code for
7135         are described in the  section  entitled  "Controlling  the  JIT  stack"         complete matches.  If  you  want  to  run  partial  matches  using  the
7136           PCRE_PARTIAL_HARD  or  PCRE_PARTIAL_SOFT  options  of  pcre_exec(), you
7137           should set one or both of the following  options  in  addition  to,  or
7138           instead of, PCRE_STUDY_JIT_COMPILE when you call pcre_study():
7139    
7140             PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
7141             PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
7142    
7143           The  JIT  compiler  generates  different optimized code for each of the
7144           three modes (normal, soft partial, hard partial). When  pcre_exec()  is
7145           called,  the appropriate code is run if it is available. Otherwise, the
7146           pattern is matched using interpretive code.
7147    
7148           In some circumstances you may need to call additional functions.  These
7149           are  described  in  the  section  entitled  "Controlling the JIT stack"
7150         below.         below.
7151    
7152         If JIT support is not available, PCRE_STUDY_JIT_COMPILE is ignored, and         If JIT  support  is  not  available,  PCRE_STUDY_JIT_COMPILE  etc.  are
7153         no JIT data is set up. Otherwise, the compiled pattern is passed to the         ignored, and no JIT data is created. Otherwise, the compiled pattern is
7154         JIT  compiler,  which  turns  it  into  machine code that executes much         passed to the JIT compiler, which turns it into machine code that  exe-
7155         faster than the normal interpretive code. When pcre_exec() is passed  a         cutes  much  faster than the normal interpretive code. When pcre_exec()
7156         pcre_extra  block  containing  a  pointer  to  JIT  code, it obeys that         is passed a pcre_extra block containing a pointer to JIT  code  of  the
7157         instead of the normal code. The result is identical, but the code  runs         appropriate  mode  (normal  or  hard/soft  partial), it obeys that code
7158         much faster.         instead of running the interpreter. The result is  identical,  but  the
7159           compiled JIT code runs much faster.
7160    
7161         There  are some pcre_exec() options that are not supported for JIT exe-         There  are some pcre_exec() options that are not supported for JIT exe-
7162         cution. There are also some  pattern  items  that  JIT  cannot  handle.         cution. There are also some  pattern  items  that  JIT  cannot  handle.
7163         Details  are  given below. In both cases, execution automatically falls         Details  are  given below. In both cases, execution automatically falls
7164         back to the interpretive code.         back to the interpretive code. If you want  to  know  whether  JIT  was
7165           actually  used  for  a  particular  match, you should arrange for a JIT
7166           callback function to be set up as described  in  the  section  entitled
7167           "Controlling  the JIT stack" below, even if you do not need to supply a
7168           non-default JIT stack. Such a callback function is called whenever  JIT
7169           code  is about to be obeyed. If the execution options are not right for
7170           JIT execution, the callback function is not obeyed.
7171    
7172         If the JIT compiler finds an unsupported item, no JIT  data  is  gener-         If the JIT compiler finds an unsupported item, no JIT  data  is  gener-
7173         ated.  You  can find out if JIT execution is available after studying a         ated.  You  can find out if JIT execution is available after studying a
7174         pattern by calling pcre_fullinfo() with  the  PCRE_INFO_JIT  option.  A         pattern by calling pcre_fullinfo() with  the  PCRE_INFO_JIT  option.  A
7175         result  of  1  means that JIT compilation was successful. A result of 0         result  of  1  means that JIT compilation was successful. A result of 0
7176         means that JIT support is not available, or the pattern was not studied         means that JIT support is not available, or the pattern was not studied
7177         with PCRE_STUDY_JIT_COMPILE, or the JIT compiler was not able to handle         with  PCRE_STUDY_JIT_COMPILE  etc., or the JIT compiler was not able to
7178         the pattern.         handle the pattern.
7179    
7180         Once a pattern has been studied, with or without JIT, it can be used as         Once a pattern has been studied, with or without JIT, it can be used as
7181         many times as you like for matching different subject strings.         many times as you like for matching different subject strings.
# Line 7150  SIMPLE USE OF JIT Line 7184  SIMPLE USE OF JIT
7184  UNSUPPORTED OPTIONS AND PATTERN ITEMS  UNSUPPORTED OPTIONS AND PATTERN ITEMS
7185    
7186         The  only  pcre_exec() options that are supported for JIT execution are         The  only  pcre_exec() options that are supported for JIT execution are
7187         PCRE_NO_UTF8_CHECK,  PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   and         PCRE_NO_UTF8_CHECK,    PCRE_NOTBOL,     PCRE_NOTEOL,     PCRE_NOTEMPTY,
7188         PCRE_NOTEMPTY_ATSTART.  Note in particular that partial matching is not         PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT.
        supported.  
7189    
7190         The unsupported pattern items are:         The unsupported pattern items are:
7191    
# Line 7169  UNSUPPORTED OPTIONS AND PATTERN ITEMS Line 7202  UNSUPPORTED OPTIONS AND PATTERN ITEMS
7202    
7203  RETURN VALUES FROM JIT EXECUTION  RETURN VALUES FROM JIT EXECUTION
7204    
7205         When a pattern is matched using JIT execution, the  return  values  are         When  a  pattern  is matched using JIT execution, the return values are
7206         the  same as those given by the interpretive pcre_exec() code, with the         the same as those given by the interpretive pcre_exec() code, with  the
7207         addition of one new error code: PCRE_ERROR_JIT_STACKLIMIT.  This  means         addition  of  one new error code: PCRE_ERROR_JIT_STACKLIMIT. This means
7208         that  the memory used for the JIT stack was insufficient. See "Control-         that the memory used for the JIT stack was insufficient. See  "Control-
7209         ling the JIT stack" below for a discussion of JIT stack usage. For com-         ling the JIT stack" below for a discussion of JIT stack usage. For com-
7210         patibility  with  the  interpretive pcre_exec() code, no more than two-         patibility with the interpretive pcre_exec() code, no  more  than  two-
7211         thirds of the ovector argument is used for passing back  captured  sub-         thirds  of  the ovector argument is used for passing back captured sub-
7212         strings.         strings.
7213    
7214         The  error  code  PCRE_ERROR_MATCHLIMIT  is returned by the JIT code if         The error code PCRE_ERROR_MATCHLIMIT is returned by  the  JIT  code  if
7215         searching a very large pattern tree goes on for too long, as it  is  in         searching  a  very large pattern tree goes on for too long, as it is in
7216         the  same circumstance when JIT is not used, but the details of exactly         the same circumstance when JIT is not used, but the details of  exactly
7217         what is counted are not the same. The  PCRE_ERROR_RECURSIONLIMIT  error         what  is  counted are not the same. The PCRE_ERROR_RECURSIONLIMIT error
7218         code is never returned by JIT execution.         code is never returned by JIT execution.
7219    
7220    
7221  SAVING AND RESTORING COMPILED PATTERNS  SAVING AND RESTORING COMPILED PATTERNS
7222    
7223         The  code  that  is  generated by the JIT compiler is architecture-spe-         The code that is generated by the  JIT  compiler  is  architecture-spe-
7224         cific, and is also position dependent. For those reasons it  cannot  be         cific,  and  is also position dependent. For those reasons it cannot be
7225         saved  (in a file or database) and restored later like the bytecode and         saved (in a file or database) and restored later like the bytecode  and
7226         other data of a compiled pattern. Saving and  restoring  compiled  pat-         other  data  of  a compiled pattern. Saving and restoring compiled pat-
7227         terns  is not something many people do. More detail about this facility         terns is not something many people do. More detail about this  facility
7228         is given in the pcreprecompile documentation. It should be possible  to         is  given in the pcreprecompile documentation. It should be possible to
7229         run  pcre_study() on a saved and restored pattern, and thereby recreate         run pcre_study() on a saved and restored pattern, and thereby  recreate
7230         the JIT data, but because JIT compilation uses  significant  resources,         the  JIT  data, but because JIT compilation uses significant resources,
7231         it  is  probably  not worth doing this; you might as well recompile the         it is probably not worth doing this; you might as  well  recompile  the
7232         original pattern.         original pattern.
7233    
7234    
7235  CONTROLLING THE JIT STACK  CONTROLLING THE JIT STACK
7236    
7237         When the compiled JIT code runs, it needs a block of memory to use as a         When the compiled JIT code runs, it needs a block of memory to use as a
7238         stack.   By  default,  it  uses 32K on the machine stack. However, some         stack.  By default, it uses 32K on the  machine  stack.  However,  some
7239         large  or  complicated  patterns  need  more  than  this.   The   error         large   or   complicated  patterns  need  more  than  this.  The  error
7240         PCRE_ERROR_JIT_STACKLIMIT  is  given  when  there  is not enough stack.         PCRE_ERROR_JIT_STACKLIMIT is given when  there  is  not  enough  stack.
7241         Three functions are provided for managing blocks of memory for  use  as         Three  functions  are provided for managing blocks of memory for use as
7242         JIT  stacks. There is further discussion about the use of JIT stacks in         JIT stacks. There is further discussion about the use of JIT stacks  in
7243         the section entitled "JIT stack FAQ" below.         the section entitled "JIT stack FAQ" below.
7244    
7245         The pcre_jit_stack_alloc() function creates a JIT stack. Its  arguments         The  pcre_jit_stack_alloc() function creates a JIT stack. Its arguments
7246         are  a starting size and a maximum size, and it returns a pointer to an         are a starting size and a maximum size, and it returns a pointer to  an
7247         opaque structure of type pcre_jit_stack, or NULL if there is an  error.         opaque  structure of type pcre_jit_stack, or NULL if there is an error.
7248         The  pcre_jit_stack_free() function can be used to free a stack that is         The pcre_jit_stack_free() function can be used to free a stack that  is
7249         no longer needed. (For the technically minded:  the  address  space  is         no  longer  needed.  (For  the technically minded: the address space is
7250         allocated by mmap or VirtualAlloc.)         allocated by mmap or VirtualAlloc.)
7251    
7252         JIT  uses far less memory for recursion than the interpretive code, and         JIT uses far less memory for recursion than the interpretive code,  and
7253         a maximum stack size of 512K to 1M should be more than enough  for  any         a  maximum  stack size of 512K to 1M should be more than enough for any
7254         pattern.         pattern.
7255    
7256         The  pcre_assign_jit_stack()  function  specifies  which stack JIT code         The pcre_assign_jit_stack() function specifies  which  stack  JIT  code
7257         should use. Its arguments are as follows:         should use. Its arguments are as follows:
7258    
7259           pcre_extra         *extra           pcre_extra         *extra
7260           pcre_jit_callback  callback           pcre_jit_callback  callback
7261           void               *data           void               *data
7262    
7263         The extra argument must be  the  result  of  studying  a  pattern  with         The  extra  argument  must  be  the  result  of studying a pattern with
7264         PCRE_STUDY_JIT_COMPILE.  There  are  three  cases for the values of the         PCRE_STUDY_JIT_COMPILE etc. There are three cases for the values of the
7265         other two options:         other two options:
7266    
7267           (1) If callback is NULL and data is NULL, an internal 32K block           (1) If callback is NULL and data is NULL, an internal 32K block
# Line 7237  CONTROLLING THE JIT STACK Line 7270  CONTROLLING THE JIT STACK
7270           (2) If callback is NULL and data is not NULL, data must be           (2) If callback is NULL and data is not NULL, data must be
7271               a valid JIT stack, the result of calling pcre_jit_stack_alloc().               a valid JIT stack, the result of calling pcre_jit_stack_alloc().
7272    
7273           (3) If callback not NULL, it must point to a function that is called           (3) If callback is not NULL, it must point to a function that is
7274               with data as an argument at the start of matching, in order to               called with data as an argument at the start of matching, in
7275               set up a JIT stack. If the result is NULL, the internal 32K stack               order to set up a JIT stack. If the return from the callback
7276               is used; otherwise the return value must be a valid JIT stack,               function is NULL, the internal 32K stack is used; otherwise the
7277               the result of calling pcre_jit_stack_alloc().               return value must be a valid JIT stack, the result of calling
7278                 pcre_jit_stack_alloc().
7279         You may safely assign the same JIT stack to more than one  pattern,  as  
7280         long as they are all matched sequentially in the same thread. In a mul-         A  callback function is obeyed whenever JIT code is about to be run; it
7281         tithread application, each thread must use its own JIT stack.         is not obeyed when pcre_exec() is called with options that  are  incom-
7282           patible for JIT execution. A callback function can therefore be used to
7283         Strictly speaking, even more is allowed. You can assign the same  stack         determine whether a match operation was  executed  by  JIT  or  by  the
7284         to  any number of patterns as long as they are not used for matching by         interpreter.
7285         multiple threads at the same time. For example, you can assign the same  
7286         stack  to all compiled patterns, and use a global mutex in the callback         You may safely use the same JIT stack for more than one pattern (either
7287         to wait until the stack is available for use. However, this is an inef-         by assigning directly or by callback), as long as the patterns are  all
7288         ficient solution, and not recommended.         matched  sequentially in the same thread. In a multithread application,
7289           if you do not specify a JIT stack, or if you assign or pass  back  NULL
7290           from  a  callback, that is thread-safe, because each thread has its own
7291           machine stack. However, if you assign  or  pass  back  a  non-NULL  JIT
7292           stack,  this  must  be  a  different  stack for each thread so that the
7293           application is thread-safe.
7294    
7295           Strictly speaking, even more is allowed. You can assign the  same  non-
7296           NULL  stack  to any number of patterns as long as they are not used for
7297           matching by multiple threads at the same time.  For  example,  you  can
7298           assign  the same stack to all compiled patterns, and use a global mutex
7299           in the callback to wait until the stack is available for use.  However,
7300           this is an inefficient solution, and not recommended.
7301    
7302         This  is  a  suggestion  for  how a typical multithreaded program might         This  is a suggestion for how a multithreaded program that needs to set
7303         operate:         up non-default JIT stacks might operate:
7304    
7305           During thread initalization           During thread initalization
7306             thread_local_var = pcre_jit_stack_alloc(...)             thread_local_var = pcre_jit_stack_alloc(...)
# Line 7269  CONTROLLING THE JIT STACK Line 7314  CONTROLLING THE JIT STACK
7314         All the functions described in this section do nothing if  JIT  is  not         All the functions described in this section do nothing if  JIT  is  not
7315         available,  and  pcre_assign_jit_stack()  does nothing unless the extra         available,  and  pcre_assign_jit_stack()  does nothing unless the extra
7316         argument is non-NULL and points to  a  pcre_extra  block  that  is  the         argument is non-NULL and points to  a  pcre_extra  block  that  is  the
7317         result of a successful study with PCRE_STUDY_JIT_COMPILE.         result of a successful study with PCRE_STUDY_JIT_COMPILE etc.
7318    
7319    
7320  JIT STACK FAQ  JIT STACK FAQ
# Line 7329  JIT STACK FAQ Line 7374  JIT STACK FAQ
7374    
7375         Especially on embedded sytems, it might be a good idea to release  mem-         Especially on embedded sytems, it might be a good idea to release  mem-
7376         ory  sometimes  without  freeing the stack. There is no API for this at         ory  sometimes  without  freeing the stack. There is no API for this at
7377         the moment. Probably a function call which returns with  the  currently         the moment.  Probably a function call which returns with the  currently
7378         allocated  memory for any stack and another which allows releasing mem-         allocated  memory for any stack and another which allows releasing mem-
7379         ory (shrinking the stack) would be a good idea if someone needs this.         ory (shrinking the stack) would be a good idea if someone needs this.
7380    
# Line 7378  AUTHOR Line 7423  AUTHOR
7423    
7424  REVISION  REVISION
7425    
7426         Last updated: 08 January 2012         Last updated: 23 February 2012
7427         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7428  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7429    
7430    
7431  PCREPARTIAL(3)                                                  PCREPARTIAL(3)  PCREPARTIAL(3)                                                  PCREPARTIAL(3)
7432    
7433    
# Line 7422  PARTIAL MATCHING IN PCRE Line 7467  PARTIAL MATCHING IN PCRE
7467         matching function. If both options  are  set,  PCRE_PARTIAL_HARD  takes         matching function. If both options  are  set,  PCRE_PARTIAL_HARD  takes
7468         precedence.         precedence.
7469    
7470         Setting  a partial matching option disables the use of any just-in-time         If  you  want to use partial matching with just-in-time optimized code,
7471         code that was  set  up  by  studying  the  compiled  pattern  with  the         you must call pcre_study() or pcre16_study() with one or both of  these
7472         PCRE_STUDY_JIT_COMPILE  option. It also disables two of PCRE's standard         options:
7473         optimizations. PCRE remembers the last literal data unit in a  pattern,  
7474         and  abandons  matching immediately if it is not present in the subject           PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
7475             PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
7476    
7477           PCRE_STUDY_JIT_COMPILE  should also be set if you are going to run non-
7478           partial matches on the same pattern. If the appropriate JIT study  mode
7479           has not been set for a match, the interpretive matching code is used.
7480    
7481           Setting a partial matching option disables two of PCRE's standard opti-
7482           mizations. PCRE remembers the last literal data unit in a pattern,  and
7483           abandons  matching  immediately  if  it  is  not present in the subject
7484         string. This optimization cannot be used  for  a  subject  string  that         string. This optimization cannot be used  for  a  subject  string  that
7485         might  match only partially. If the pattern was studied, PCRE knows the         might  match only partially. If the pattern was studied, PCRE knows the
7486         minimum length of a matching string, and does not  bother  to  run  the         minimum length of a matching string, and does not  bother  to  run  the
# Line 7801  AUTHOR Line 7855  AUTHOR
7855    
7856  REVISION  REVISION
7857    
7858         Last updated: 21 January 2012         Last updated: 18 February 2012
7859         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7860  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7861    
7862    
7863  PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)  PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)
7864    
7865    
# Line 7939  REVISION Line 7993  REVISION
7993         Last updated: 10 January 2012         Last updated: 10 January 2012
7994         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7995  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7996    
7997    
7998  PCREPERFORM(3)                                                  PCREPERFORM(3)  PCREPERFORM(3)                                                  PCREPERFORM(3)
7999    
8000    
# Line 8109  REVISION Line 8163  REVISION
8163         Last updated: 09 January 2012         Last updated: 09 January 2012
8164         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8165  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8166    
8167    
8168  PCREPOSIX(3)                                                      PCREPOSIX(3)  PCREPOSIX(3)                                                      PCREPOSIX(3)
8169    
8170    
# Line 8373  REVISION Line 8427  REVISION
8427         Last updated: 09 January 2012         Last updated: 09 January 2012
8428         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8429  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8430    
8431    
8432  PCRECPP(3)                                                          PCRECPP(3)  PCRECPP(3)                                                          PCRECPP(3)
8433    
8434    
# Line 8715  REVISION Line 8769  REVISION
8769    
8770         Last updated: 08 January 2012         Last updated: 08 January 2012
8771  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8772    
8773    
8774  PCRESAMPLE(3)                                                    PCRESAMPLE(3)  PCRESAMPLE(3)                                                    PCRESAMPLE(3)
8775    
8776    
# Line 8859  REVISION Line 8913  REVISION
8913         Last updated: 08 January 2012         Last updated: 08 January 2012
8914         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8915  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8916    
8917    
8918  PCRESTACK(3)                                                      PCRESTACK(3)  PCRESTACK(3)                                                      PCRESTACK(3)
8919    
8920    
# Line 9044  REVISION Line 9098  REVISION
9098         Last updated: 21 January 2012         Last updated: 21 January 2012
9099         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9100  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9101    
9102    

Legend:
Removed from v.903  
changed lines
  Added in v.930

  ViewVC Help
Powered by ViewVC 1.1.5