/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 953 by ph10, Fri Feb 24 12:05:54 2012 UTC revision 954 by ph10, Sat Mar 31 18:09:26 2012 UTC
# Line 2185  INFORMATION ABOUT A PATTERN Line 2185  INFORMATION ABOUT A PATTERN
2185         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
2186         /^a\dz\d/ the returned value is -1.         /^a\dz\d/ the returned value is -1.
2187    
2188             PCRE_INFO_MAXLOOKBEHIND
2189    
2190           Return  the  number of characters (NB not bytes) in the longest lookbe-
2191           hind assertion in the pattern. Note that the simple assertions  \b  and
2192           \B  require a one-character lookbehind. This information is useful when
2193           doing multi-segment matching using the partial matching facilities.
2194    
2195           PCRE_INFO_MINLENGTH           PCRE_INFO_MINLENGTH
2196    
2197         If  the  pattern  was studied and a minimum length for matching subject         If the pattern was studied and a minimum length  for  matching  subject
2198         strings was computed, its value is  returned.  Otherwise  the  returned         strings  was  computed,  its  value is returned. Otherwise the returned
2199         value  is  -1. The value is a number of characters, which in UTF-8 mode         value is -1. The value is a number of characters, which in  UTF-8  mode
2200         may be different from the number of bytes. The fourth  argument  should         may  be  different from the number of bytes. The fourth argument should
2201         point  to an int variable. A non-negative value is a lower bound to the         point to an int variable. A non-negative value is a lower bound to  the
2202         length of any matching string. There may not be  any  strings  of  that         length  of  any  matching  string. There may not be any strings of that
2203         length  that  do actually match, but every string that does match is at         length that do actually match, but every string that does match  is  at
2204         least that long.         least that long.
2205    
2206           PCRE_INFO_NAMECOUNT           PCRE_INFO_NAMECOUNT
2207           PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
2208           PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
2209    
2210         PCRE supports the use of named as well as numbered capturing  parenthe-         PCRE  supports the use of named as well as numbered capturing parenthe-
2211         ses.  The names are just an additional way of identifying the parenthe-         ses. The names are just an additional way of identifying the  parenthe-
2212         ses, which still acquire numbers. Several convenience functions such as         ses, which still acquire numbers. Several convenience functions such as
2213         pcre_get_named_substring()  are  provided  for extracting captured sub-         pcre_get_named_substring() are provided for  extracting  captured  sub-
2214         strings by name. It is also possible to extract the data  directly,  by         strings  by  name. It is also possible to extract the data directly, by
2215         first  converting  the  name to a number in order to access the correct         first converting the name to a number in order to  access  the  correct
2216         pointers in the output vector (described with pcre_exec() below). To do         pointers in the output vector (described with pcre_exec() below). To do
2217         the  conversion,  you  need  to  use  the  name-to-number map, which is         the conversion, you need  to  use  the  name-to-number  map,  which  is
2218         described by these three values.         described by these three values.
2219    
2220         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
2221         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
2222         of each entry; both of these  return  an  int  value.  The  entry  size         of  each  entry;  both  of  these  return  an int value. The entry size
2223         depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns         depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
2224         a pointer to the first entry of the table. This is a pointer to char in         a pointer to the first entry of the table. This is a pointer to char in
2225         the 8-bit library, where the first two bytes of each entry are the num-         the 8-bit library, where the first two bytes of each entry are the num-
2226         ber of the capturing parenthesis, most significant byte first.  In  the         ber  of  the capturing parenthesis, most significant byte first. In the
2227         16-bit  library,  the pointer points to 16-bit data units, the first of         16-bit library, the pointer points to 16-bit data units, the  first  of
2228         which contains the parenthesis number. The rest of  the  entry  is  the         which  contains  the  parenthesis  number. The rest of the entry is the
2229         corresponding name, zero terminated.         corresponding name, zero terminated.
2230    
2231         The  names are in alphabetical order. Duplicate names may appear if (?|         The names are in alphabetical order. Duplicate names may appear if  (?|
2232         is used to create multiple groups with the same number, as described in         is used to create multiple groups with the same number, as described in
2233         the  section  on  duplicate subpattern numbers in the pcrepattern page.         the section on duplicate subpattern numbers in  the  pcrepattern  page.
2234         Duplicate names for subpatterns with different  numbers  are  permitted         Duplicate  names  for  subpatterns with different numbers are permitted
2235         only  if  PCRE_DUPNAMES  is  set. In all cases of duplicate names, they         only if PCRE_DUPNAMES is set. In all cases  of  duplicate  names,  they
2236         appear in the table in the order in which they were found in  the  pat-         appear  in  the table in the order in which they were found in the pat-
2237         tern.  In  the  absence  of (?| this is the order of increasing number;         tern. In the absence of (?| this is the  order  of  increasing  number;
2238         when (?| is used this is not necessarily the case because later subpat-         when (?| is used this is not necessarily the case because later subpat-
2239         terns may have lower numbers.         terns may have lower numbers.
2240    
2241         As  a  simple  example of the name/number table, consider the following         As a simple example of the name/number table,  consider  the  following
2242         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is
2243         set, so white space - including newlines - is ignored):         set, so white space - including newlines - is ignored):
2244    
2245           (?<date> (?<year>(\d\d)?\d\d) -           (?<date> (?<year>(\d\d)?\d\d) -
2246           (?<month>\d\d) - (?<day>\d\d) )           (?<month>\d\d) - (?<day>\d\d) )
2247    
2248         There  are  four  named subpatterns, so the table has four entries, and         There are four named subpatterns, so the table has  four  entries,  and
2249         each entry in the table is eight bytes long. The table is  as  follows,         each  entry  in the table is eight bytes long. The table is as follows,
2250         with non-printing bytes shows in hexadecimal, and undefined bytes shown         with non-printing bytes shows in hexadecimal, and undefined bytes shown
2251         as ??:         as ??:
2252    
# Line 2248  INFORMATION ABOUT A PATTERN Line 2255  INFORMATION ABOUT A PATTERN
2255           00 04 m  o  n  t  h  00           00 04 m  o  n  t  h  00
2256           00 02 y  e  a  r  00 ??           00 02 y  e  a  r  00 ??
2257    
2258         When writing code to extract data  from  named  subpatterns  using  the         When  writing  code  to  extract  data from named subpatterns using the
2259         name-to-number  map,  remember that the length of the entries is likely         name-to-number map, remember that the length of the entries  is  likely
2260         to be different for each compiled pattern.         to be different for each compiled pattern.
2261    
2262           PCRE_INFO_OKPARTIAL           PCRE_INFO_OKPARTIAL
2263    
2264         Return 1  if  the  pattern  can  be  used  for  partial  matching  with         Return  1  if  the  pattern  can  be  used  for  partial  matching with
2265         pcre_exec(),  otherwise  0.  The fourth argument should point to an int         pcre_exec(), otherwise 0. The fourth argument should point  to  an  int
2266         variable. From  release  8.00,  this  always  returns  1,  because  the         variable.  From  release  8.00,  this  always  returns  1,  because the
2267         restrictions  that  previously  applied  to  partial matching have been         restrictions that previously applied  to  partial  matching  have  been
2268         lifted. The pcrepartial documentation gives details of  partial  match-         lifted.  The  pcrepartial documentation gives details of partial match-
2269         ing.         ing.
2270    
2271           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
2272    
2273         Return  a  copy of the options with which the pattern was compiled. The         Return a copy of the options with which the pattern was  compiled.  The
2274         fourth argument should point to an unsigned long  int  variable.  These         fourth  argument  should  point to an unsigned long int variable. These
2275         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
2276         by any top-level option settings at the start of the pattern itself. In         by any top-level option settings at the start of the pattern itself. In
2277         other  words,  they are the options that will be in force when matching         other words, they are the options that will be in force  when  matching
2278         starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with         starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with
2279         the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,         the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,
2280         and PCRE_EXTENDED.         and PCRE_EXTENDED.
2281    
2282         A pattern is automatically anchored by PCRE if  all  of  its  top-level         A  pattern  is  automatically  anchored by PCRE if all of its top-level
2283         alternatives begin with one of the following:         alternatives begin with one of the following:
2284    
2285           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 2286  INFORMATION ABOUT A PATTERN Line 2293  INFORMATION ABOUT A PATTERN
2293    
2294           PCRE_INFO_SIZE           PCRE_INFO_SIZE
2295    
2296         Return the size of the compiled pattern in bytes (for both  libraries).         Return  the size of the compiled pattern in bytes (for both libraries).
2297         The  fourth argument should point to a size_t variable. This value does         The fourth argument should point to a size_t variable. This value  does
2298         not include the  size  of  the  pcre  structure  that  is  returned  by         not  include  the  size  of  the  pcre  structure  that  is returned by
2299         pcre_compile().  The  value that is passed as the argument to pcre_mal-         pcre_compile(). The value that is passed as the argument  to  pcre_mal-
2300         loc() when pcre_compile() is getting memory in which to place the  com-         loc()  when pcre_compile() is getting memory in which to place the com-
2301         piled  data  is  the value returned by this option plus the size of the         piled data is the value returned by this option plus the  size  of  the
2302         pcre structure. Studying a compiled pattern, with or without JIT,  does         pcre  structure. Studying a compiled pattern, with or without JIT, does
2303         not alter the value returned by this option.         not alter the value returned by this option.
2304    
2305           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
2306    
2307         Return the size in bytes of the data block pointed to by the study_data         Return the size in bytes of the data block pointed to by the study_data
2308         field in a pcre_extra block. If pcre_extra is  NULL,  or  there  is  no         field  in  a  pcre_extra  block.  If pcre_extra is NULL, or there is no
2309         study  data,  zero  is  returned. The fourth argument should point to a         study data, zero is returned. The fourth argument  should  point  to  a
2310         size_t variable. The study_data field is set by pcre_study() to  record         size_t  variable. The study_data field is set by pcre_study() to record
2311         information  that  will  speed  up  matching  (see the section entitled         information that will speed  up  matching  (see  the  section  entitled
2312         "Studying a pattern" above). The format of the study_data block is pri-         "Studying a pattern" above). The format of the study_data block is pri-
2313         vate,  but  its length is made available via this option so that it can         vate, but its length is made available via this option so that  it  can
2314         be  saved  and  restored  (see  the  pcreprecompile  documentation  for         be  saved  and  restored  (see  the  pcreprecompile  documentation  for
2315         details).         details).
2316    
# Line 2312  REFERENCE COUNTS Line 2319  REFERENCE COUNTS
2319    
2320         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
2321    
2322         The  pcre_refcount()  function is used to maintain a reference count in         The pcre_refcount() function is used to maintain a reference  count  in
2323         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
2324         benefit  of  applications  that  operate  in an object-oriented manner,         benefit of applications that  operate  in  an  object-oriented  manner,
2325         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
2326         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
2327    
2328         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
2329         zero.  It is changed only by calling this function, whose action is  to         zero.   It is changed only by calling this function, whose action is to
2330         add  the  adjust  value  (which may be positive or negative) to it. The         add the adjust value (which may be positive or  negative)  to  it.  The
2331         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
2332         is  constrained to lie between 0 and 65535, inclusive. If the new value         is constrained to lie between 0 and 65535, inclusive. If the new  value
2333         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
2334    
2335         Except when it is zero, the reference count is not correctly  preserved         Except  when it is zero, the reference count is not correctly preserved
2336         if  a  pattern  is  compiled on one host and then transferred to a host         if a pattern is compiled on one host and then  transferred  to  a  host
2337         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
2338    
2339    
# Line 2336  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2343  MATCHING A PATTERN: THE TRADITIONAL FUNC
2343              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
2344              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
2345    
2346         The function pcre_exec() is called to match a subject string against  a         The  function pcre_exec() is called to match a subject string against a
2347         compiled  pattern, which is passed in the code argument. If the pattern         compiled pattern, which is passed in the code argument. If the  pattern
2348         was studied, the result of the study should  be  passed  in  the  extra         was  studied,  the  result  of  the study should be passed in the extra
2349         argument.  You  can call pcre_exec() with the same code and extra argu-         argument. You can call pcre_exec() with the same code and  extra  argu-
2350         ments as many times as you like, in order to  match  different  subject         ments  as  many  times as you like, in order to match different subject
2351         strings with the same pattern.         strings with the same pattern.
2352    
2353         This  function  is  the  main  matching facility of the library, and it         This function is the main matching facility  of  the  library,  and  it
2354         operates in a Perl-like manner. For specialist use  there  is  also  an         operates  in  a  Perl-like  manner. For specialist use there is also an
2355         alternative  matching function, which is described below in the section         alternative matching function, which is described below in the  section
2356         about the pcre_dfa_exec() function.         about the pcre_dfa_exec() function.
2357    
2358         In most applications, the pattern will have been compiled (and  option-         In  most applications, the pattern will have been compiled (and option-
2359         ally  studied)  in the same process that calls pcre_exec(). However, it         ally studied) in the same process that calls pcre_exec().  However,  it
2360         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
2361         later  in  different processes, possibly even on different hosts. For a         later in different processes, possibly even on different hosts.  For  a
2362         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
2363    
2364         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 2370  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2377  MATCHING A PATTERN: THE TRADITIONAL FUNC
2377    
2378     Extra data for pcre_exec()     Extra data for pcre_exec()
2379    
2380         If the extra argument is not NULL, it must point to a  pcre_extra  data         If  the  extra argument is not NULL, it must point to a pcre_extra data
2381         block.  The pcre_study() function returns such a block (when it doesn't         block. The pcre_study() function returns such a block (when it  doesn't
2382         return NULL), but you can also create one for yourself, and pass  addi-         return  NULL), but you can also create one for yourself, and pass addi-
2383         tional  information  in it. The pcre_extra block contains the following         tional information in it. The pcre_extra block contains  the  following
2384         fields (not necessarily in this order):         fields (not necessarily in this order):
2385    
2386           unsigned long int flags;           unsigned long int flags;
# Line 2385  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2392  MATCHING A PATTERN: THE TRADITIONAL FUNC
2392           const unsigned char *tables;           const unsigned char *tables;
2393           unsigned char **mark;           unsigned char **mark;
2394    
2395         In the 16-bit version of  this  structure,  the  mark  field  has  type         In  the  16-bit  version  of  this  structure,  the mark field has type
2396         "PCRE_UCHAR16 **".         "PCRE_UCHAR16 **".
2397    
2398         The  flags  field is used to specify which of the other fields are set.         The flags field is used to specify which of the other fields  are  set.
2399         The flag bits are:         The flag bits are:
2400    
2401           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_CALLOUT_DATA
# Line 2399  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2406  MATCHING A PATTERN: THE TRADITIONAL FUNC
2406           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_STUDY_DATA
2407           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
2408    
2409         Other flag bits should be set to zero. The study_data field  and  some-         Other  flag  bits should be set to zero. The study_data field and some-
2410         times  the executable_jit field are set in the pcre_extra block that is         times the executable_jit field are set in the pcre_extra block that  is
2411         returned by pcre_study(), together with the appropriate flag bits.  You         returned  by pcre_study(), together with the appropriate flag bits. You
2412         should  not set these yourself, but you may add to the block by setting         should not set these yourself, but you may add to the block by  setting
2413         other fields and their corresponding flag bits.         other fields and their corresponding flag bits.
2414    
2415         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
2416         a  vast amount of resources when running patterns that are not going to         a vast amount of resources when running patterns that are not going  to
2417         match, but which have a very large number  of  possibilities  in  their         match,  but  which  have  a very large number of possibilities in their
2418         search  trees. The classic example is a pattern that uses nested unlim-         search trees. The classic example is a pattern that uses nested  unlim-
2419         ited repeats.         ited repeats.
2420    
2421         Internally, pcre_exec() uses a function called match(), which it  calls         Internally,  pcre_exec() uses a function called match(), which it calls
2422         repeatedly  (sometimes  recursively).  The  limit set by match_limit is         repeatedly (sometimes recursively). The limit  set  by  match_limit  is
2423         imposed on the number of times this function is called during a  match,         imposed  on the number of times this function is called during a match,
2424         which  has  the  effect of limiting the amount of backtracking that can         which has the effect of limiting the amount of  backtracking  that  can
2425         take place. For patterns that are not anchored, the count restarts from         take place. For patterns that are not anchored, the count restarts from
2426         zero for each position in the subject string.         zero for each position in the subject string.
2427    
2428         When pcre_exec() is called with a pattern that was successfully studied         When pcre_exec() is called with a pattern that was successfully studied
2429         with a JIT option, the way that the matching is  executed  is  entirely         with  a  JIT  option, the way that the matching is executed is entirely
2430         different.  However, there is still the possibility of runaway matching         different.  However, there is still the possibility of runaway matching
2431         that goes on for a very long time, and so the match_limit value is also         that goes on for a very long time, and so the match_limit value is also
2432         used in this case (but in a different way) to limit how long the match-         used in this case (but in a different way) to limit how long the match-
2433         ing can continue.         ing can continue.
2434    
2435         The default value for the limit can be set  when  PCRE  is  built;  the         The  default  value  for  the  limit can be set when PCRE is built; the
2436         default  default  is 10 million, which handles all but the most extreme         default default is 10 million, which handles all but the  most  extreme
2437         cases. You can override the default  by  suppling  pcre_exec()  with  a         cases.  You  can  override  the  default by suppling pcre_exec() with a
2438         pcre_extra     block    in    which    match_limit    is    set,    and         pcre_extra    block    in    which    match_limit    is    set,     and
2439         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
2440         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
2441    
2442         The  match_limit_recursion field is similar to match_limit, but instead         The match_limit_recursion field is similar to match_limit, but  instead
2443         of limiting the total number of times that match() is called, it limits         of limiting the total number of times that match() is called, it limits
2444         the  depth  of  recursion. The recursion depth is a smaller number than         the depth of recursion. The recursion depth is a  smaller  number  than
2445         the total number of calls, because not all calls to match() are  recur-         the  total number of calls, because not all calls to match() are recur-
2446         sive.  This limit is of use only if it is set smaller than match_limit.         sive.  This limit is of use only if it is set smaller than match_limit.
2447    
2448         Limiting  the  recursion  depth limits the amount of machine stack that         Limiting the recursion depth limits the amount of  machine  stack  that
2449         can be used, or, when PCRE has been compiled to use memory on the  heap         can  be used, or, when PCRE has been compiled to use memory on the heap
2450         instead  of the stack, the amount of heap memory that can be used. This         instead of the stack, the amount of heap memory that can be used.  This
2451         limit is not relevant, and is ignored, when matching is done using  JIT         limit  is not relevant, and is ignored, when matching is done using JIT
2452         compiled code.         compiled code.
2453    
2454         The  default  value  for  match_limit_recursion can be set when PCRE is         The default value for match_limit_recursion can be  set  when  PCRE  is
2455         built; the default default  is  the  same  value  as  the  default  for         built;  the  default  default  is  the  same  value  as the default for
2456         match_limit.  You can override the default by suppling pcre_exec() with         match_limit. You can override the default by suppling pcre_exec()  with
2457         a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and         a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and
2458         PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the         PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the
2459         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
2460    
2461         The callout_data field is used in conjunction with the  "callout"  fea-         The  callout_data  field is used in conjunction with the "callout" fea-
2462         ture, and is described in the pcrecallout documentation.         ture, and is described in the pcrecallout documentation.
2463    
2464         The  tables  field  is  used  to  pass  a  character  tables pointer to         The tables field  is  used  to  pass  a  character  tables  pointer  to
2465         pcre_exec(); this overrides the value that is stored with the  compiled         pcre_exec();  this overrides the value that is stored with the compiled
2466         pattern.  A  non-NULL value is stored with the compiled pattern only if         pattern. A non-NULL value is stored with the compiled pattern  only  if
2467         custom tables were supplied to pcre_compile() via  its  tableptr  argu-         custom  tables  were  supplied to pcre_compile() via its tableptr argu-
2468         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
2469         PCRE's internal tables to be used. This facility is  helpful  when  re-         PCRE's  internal  tables  to be used. This facility is helpful when re-
2470         using  patterns  that  have been saved after compiling with an external         using patterns that have been saved after compiling  with  an  external
2471         set of tables, because the external tables  might  be  at  a  different         set  of  tables,  because  the  external tables might be at a different
2472         address  when  pcre_exec() is called. See the pcreprecompile documenta-         address when pcre_exec() is called. See the  pcreprecompile  documenta-
2473         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
2474    
2475         If PCRE_EXTRA_MARK is set in the flags field, the mark  field  must  be         If  PCRE_EXTRA_MARK  is  set in the flags field, the mark field must be
2476         set  to point to a suitable variable. If the pattern contains any back-         set to point to a suitable variable. If the pattern contains any  back-
2477         tracking control verbs such as (*MARK:NAME), and the execution ends  up         tracking  control verbs such as (*MARK:NAME), and the execution ends up
2478         with  a  name  to  pass back, a pointer to the name string (zero termi-         with a name to pass back, a pointer to the  name  string  (zero  termi-
2479         nated) is placed in the variable pointed to  by  the  mark  field.  The         nated)  is  placed  in  the  variable pointed to by the mark field. The
2480         names  are  within  the  compiled pattern; if you wish to retain such a         names are within the compiled pattern; if you wish  to  retain  such  a
2481         name you must copy it before freeing the memory of a compiled  pattern.         name  you must copy it before freeing the memory of a compiled pattern.
2482         If  there  is no name to pass back, the variable pointed to by the mark         If there is no name to pass back, the variable pointed to by  the  mark
2483         field is set to NULL. For details of the  backtracking  control  verbs,         field  is  set  to NULL. For details of the backtracking control verbs,
2484         see the section entitled "Backtracking control" in the pcrepattern doc-         see the section entitled "Backtracking control" in the pcrepattern doc-
2485         umentation.         umentation.
2486    
2487     Option bits for pcre_exec()     Option bits for pcre_exec()
2488    
2489         The unused bits of the options argument for pcre_exec() must  be  zero.         The  unused  bits of the options argument for pcre_exec() must be zero.
2490         The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,         The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,
2491         PCRE_NOTBOL,   PCRE_NOTEOL,    PCRE_NOTEMPTY,    PCRE_NOTEMPTY_ATSTART,         PCRE_NOTBOL,    PCRE_NOTEOL,    PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,
2492         PCRE_NO_START_OPTIMIZE,   PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_HARD,  and         PCRE_NO_START_OPTIMIZE,  PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_HARD,   and
2493         PCRE_PARTIAL_SOFT.         PCRE_PARTIAL_SOFT.
2494    
2495         If the pattern was successfully studied with one  of  the  just-in-time         If  the  pattern  was successfully studied with one of the just-in-time
2496         (JIT) compile options, the only supported options for JIT execution are         (JIT) compile options, the only supported options for JIT execution are
2497         PCRE_NO_UTF8_CHECK,    PCRE_NOTBOL,     PCRE_NOTEOL,     PCRE_NOTEMPTY,         PCRE_NO_UTF8_CHECK,     PCRE_NOTBOL,     PCRE_NOTEOL,    PCRE_NOTEMPTY,
2498         PCRE_NOTEMPTY_ATSTART,  PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an         PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If  an
2499         unsupported option is used, JIT execution is disabled  and  the  normal         unsupported  option  is  used, JIT execution is disabled and the normal
2500         interpretive code in pcre_exec() is run.         interpretive code in pcre_exec() is run.
2501    
2502           PCRE_ANCHORED           PCRE_ANCHORED
2503    
2504         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
2505         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
2506         turned  out to be anchored by virtue of its contents, it cannot be made         turned out to be anchored by virtue of its contents, it cannot be  made
2507         unachored at matching time.         unachored at matching time.
2508    
2509           PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
2510           PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
2511    
2512         These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
2513         sequence  matches.  The choice is either to match only CR, LF, or CRLF,         sequence matches. The choice is either to match only CR, LF,  or  CRLF,
2514         or to match any Unicode newline sequence. These  options  override  the         or  to  match  any Unicode newline sequence. These options override the
2515         choice that was made or defaulted when the pattern was compiled.         choice that was made or defaulted when the pattern was compiled.
2516    
2517           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
# Line 2513  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2520  MATCHING A PATTERN: THE TRADITIONAL FUNC
2520           PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
2521           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
2522    
2523         These  options  override  the  newline  definition  that  was chosen or         These options override  the  newline  definition  that  was  chosen  or
2524         defaulted when the pattern was compiled. For details, see the  descrip-         defaulted  when the pattern was compiled. For details, see the descrip-
2525         tion  of  pcre_compile()  above.  During  matching,  the newline choice         tion of pcre_compile()  above.  During  matching,  the  newline  choice
2526         affects the behaviour of the dot, circumflex,  and  dollar  metacharac-         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-
2527         ters.  It may also alter the way the match position is advanced after a         ters. It may also alter the way the match position is advanced after  a
2528         match failure for an unanchored pattern.         match failure for an unanchored pattern.
2529    
2530         When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is         When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is
2531         set,  and a match attempt for an unanchored pattern fails when the cur-         set, and a match attempt for an unanchored pattern fails when the  cur-
2532         rent position is at a  CRLF  sequence,  and  the  pattern  contains  no         rent  position  is  at  a  CRLF  sequence,  and the pattern contains no
2533         explicit  matches  for  CR  or  LF  characters,  the  match position is         explicit matches for  CR  or  LF  characters,  the  match  position  is
2534         advanced by two characters instead of one, in other words, to after the         advanced by two characters instead of one, in other words, to after the
2535         CRLF.         CRLF.
2536    
2537         The above rule is a compromise that makes the most common cases work as         The above rule is a compromise that makes the most common cases work as
2538         expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL         expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL
2539         option is not set), it does not match the string "\r\nA" because, after         option is not set), it does not match the string "\r\nA" because, after
2540         failing at the start, it skips both the CR and the LF before  retrying.         failing  at the start, it skips both the CR and the LF before retrying.
2541         However,  the  pattern  [\r\n]A does match that string, because it con-         However, the pattern [\r\n]A does match that string,  because  it  con-
2542         tains an explicit CR or LF reference, and so advances only by one char-         tains an explicit CR or LF reference, and so advances only by one char-
2543         acter after the first failure.         acter after the first failure.
2544    
2545         An explicit match for CR of LF is either a literal appearance of one of         An explicit match for CR of LF is either a literal appearance of one of
2546         those characters, or one of the \r or  \n  escape  sequences.  Implicit         those  characters,  or  one  of the \r or \n escape sequences. Implicit
2547         matches  such  as [^X] do not count, nor does \s (which includes CR and         matches such as [^X] do not count, nor does \s (which includes  CR  and
2548         LF in the characters that it matches).         LF in the characters that it matches).
2549    
2550         Notwithstanding the above, anomalous effects may still occur when  CRLF         Notwithstanding  the above, anomalous effects may still occur when CRLF
2551         is a valid newline sequence and explicit \r or \n escapes appear in the         is a valid newline sequence and explicit \r or \n escapes appear in the
2552         pattern.         pattern.
2553    
2554           PCRE_NOTBOL           PCRE_NOTBOL
2555    
2556         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
2557         the  beginning  of  a  line, so the circumflex metacharacter should not         the beginning of a line, so the  circumflex  metacharacter  should  not
2558         match before it. Setting this without PCRE_MULTILINE (at compile  time)         match  before it. Setting this without PCRE_MULTILINE (at compile time)
2559         causes  circumflex  never to match. This option affects only the behav-         causes circumflex never to match. This option affects only  the  behav-
2560         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
2561    
2562           PCRE_NOTEOL           PCRE_NOTEOL
2563    
2564         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
2565         of  a line, so the dollar metacharacter should not match it nor (except         of a line, so the dollar metacharacter should not match it nor  (except
2566         in multiline mode) a newline immediately before it. Setting this  with-         in  multiline mode) a newline immediately before it. Setting this with-
2567         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
2568         option affects only the behaviour of the dollar metacharacter. It  does         option  affects only the behaviour of the dollar metacharacter. It does
2569         not affect \Z or \z.         not affect \Z or \z.
2570    
2571           PCRE_NOTEMPTY           PCRE_NOTEMPTY
2572    
2573         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
2574         set. If there are alternatives in the pattern, they are tried.  If  all         set.  If  there are alternatives in the pattern, they are tried. If all
2575         the  alternatives  match  the empty string, the entire match fails. For         the alternatives match the empty string, the entire  match  fails.  For
2576         example, if the pattern         example, if the pattern
2577    
2578           a?b?           a?b?
2579    
2580         is applied to a string not beginning with "a" or  "b",  it  matches  an         is  applied  to  a  string not beginning with "a" or "b", it matches an
2581         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
2582         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
2583         rences of "a" or "b".         rences of "a" or "b".
2584    
2585           PCRE_NOTEMPTY_ATSTART           PCRE_NOTEMPTY_ATSTART
2586    
2587         This  is  like PCRE_NOTEMPTY, except that an empty string match that is         This is like PCRE_NOTEMPTY, except that an empty string match  that  is
2588         not at the start of  the  subject  is  permitted.  If  the  pattern  is         not  at  the  start  of  the  subject  is  permitted. If the pattern is
2589         anchored, such a match can occur only if the pattern contains \K.         anchored, such a match can occur only if the pattern contains \K.
2590    
2591         Perl     has    no    direct    equivalent    of    PCRE_NOTEMPTY    or         Perl    has    no    direct    equivalent    of    PCRE_NOTEMPTY     or
2592         PCRE_NOTEMPTY_ATSTART, but it does make a special  case  of  a  pattern         PCRE_NOTEMPTY_ATSTART,  but  it  does  make a special case of a pattern
2593         match  of  the empty string within its split() function, and when using         match of the empty string within its split() function, and  when  using
2594         the /g modifier. It is  possible  to  emulate  Perl's  behaviour  after         the  /g  modifier.  It  is  possible  to emulate Perl's behaviour after
2595         matching a null string by first trying the match again at the same off-         matching a null string by first trying the match again at the same off-
2596         set with PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED,  and  then  if  that         set  with  PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED,  and then if that
2597         fails, by advancing the starting offset (see below) and trying an ordi-         fails, by advancing the starting offset (see below) and trying an ordi-
2598         nary match again. There is some code that demonstrates how to  do  this         nary  match  again. There is some code that demonstrates how to do this
2599         in  the  pcredemo sample program. In the most general case, you have to         in the pcredemo sample program. In the most general case, you  have  to
2600         check to see if the newline convention recognizes CRLF  as  a  newline,         check  to  see  if the newline convention recognizes CRLF as a newline,
2601         and  if so, and the current character is CR followed by LF, advance the         and if so, and the current character is CR followed by LF, advance  the
2602         starting offset by two characters instead of one.         starting offset by two characters instead of one.
2603    
2604           PCRE_NO_START_OPTIMIZE           PCRE_NO_START_OPTIMIZE
2605    
2606         There are a number of optimizations that pcre_exec() uses at the  start         There  are a number of optimizations that pcre_exec() uses at the start
2607         of  a  match,  in  order to speed up the process. For example, if it is         of a match, in order to speed up the process. For  example,  if  it  is
2608         known that an unanchored match must start with a specific character, it         known that an unanchored match must start with a specific character, it
2609         searches  the  subject  for that character, and fails immediately if it         searches the subject for that character, and fails  immediately  if  it
2610         cannot find it, without actually running the  main  matching  function.         cannot  find  it,  without actually running the main matching function.
2611         This means that a special item such as (*COMMIT) at the start of a pat-         This means that a special item such as (*COMMIT) at the start of a pat-
2612         tern is not considered until after a suitable starting  point  for  the         tern  is  not  considered until after a suitable starting point for the
2613         match  has been found. When callouts or (*MARK) items are in use, these         match has been found. When callouts or (*MARK) items are in use,  these
2614         "start-up" optimizations can cause them to be skipped if the pattern is         "start-up" optimizations can cause them to be skipped if the pattern is
2615         never  actually  used.  The start-up optimizations are in effect a pre-         never actually used. The start-up optimizations are in  effect  a  pre-
2616         scan of the subject that takes place before the pattern is run.         scan of the subject that takes place before the pattern is run.
2617    
2618         The PCRE_NO_START_OPTIMIZE option disables the start-up  optimizations,         The  PCRE_NO_START_OPTIMIZE option disables the start-up optimizations,
2619         possibly  causing  performance  to  suffer,  but ensuring that in cases         possibly causing performance to suffer,  but  ensuring  that  in  cases
2620         where the result is "no match", the callouts do occur, and  that  items         where  the  result is "no match", the callouts do occur, and that items
2621         such as (*COMMIT) and (*MARK) are considered at every possible starting         such as (*COMMIT) and (*MARK) are considered at every possible starting
2622         position in the subject string. If  PCRE_NO_START_OPTIMIZE  is  set  at         position  in  the  subject  string. If PCRE_NO_START_OPTIMIZE is set at
2623         compile  time,  it  cannot  be  unset  at  matching  time.  The  use of         compile time,  it  cannot  be  unset  at  matching  time.  The  use  of
2624         PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, matching         PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, matching
2625         is always done using interpretively.         is always done using interpretively.
2626    
2627         Setting  PCRE_NO_START_OPTIMIZE  can  change  the outcome of a matching         Setting PCRE_NO_START_OPTIMIZE can change the  outcome  of  a  matching
2628         operation.  Consider the pattern         operation.  Consider the pattern
2629    
2630           (*COMMIT)ABC           (*COMMIT)ABC
2631    
2632         When this is compiled, PCRE records the fact that a  match  must  start         When  this  is  compiled, PCRE records the fact that a match must start
2633         with  the  character  "A".  Suppose the subject string is "DEFABC". The         with the character "A". Suppose the subject  string  is  "DEFABC".  The
2634         start-up optimization scans along the subject, finds "A" and  runs  the         start-up  optimization  scans along the subject, finds "A" and runs the
2635         first  match attempt from there. The (*COMMIT) item means that the pat-         first match attempt from there. The (*COMMIT) item means that the  pat-
2636         tern must match the current starting position, which in this  case,  it         tern  must  match the current starting position, which in this case, it
2637         does.  However,  if  the  same match is run with PCRE_NO_START_OPTIMIZE         does. However, if the same match  is  run  with  PCRE_NO_START_OPTIMIZE
2638         set, the initial scan along the subject string  does  not  happen.  The         set,  the  initial  scan  along the subject string does not happen. The
2639         first  match  attempt  is  run  starting  from "D" and when this fails,         first match attempt is run starting  from  "D"  and  when  this  fails,
2640         (*COMMIT) prevents any further matches  being  tried,  so  the  overall         (*COMMIT)  prevents  any  further  matches  being tried, so the overall
2641         result  is  "no  match". If the pattern is studied, more start-up opti-         result is "no match". If the pattern is studied,  more  start-up  opti-
2642         mizations may be used. For example, a minimum length  for  the  subject         mizations  may  be  used. For example, a minimum length for the subject
2643         may be recorded. Consider the pattern         may be recorded. Consider the pattern
2644    
2645           (*MARK:A)(X|Y)           (*MARK:A)(X|Y)
2646    
2647         The  minimum  length  for  a  match is one character. If the subject is         The minimum length for a match is one  character.  If  the  subject  is
2648         "ABC", there will be attempts to  match  "ABC",  "BC",  "C",  and  then         "ABC",  there  will  be  attempts  to  match "ABC", "BC", "C", and then
2649         finally  an empty string.  If the pattern is studied, the final attempt         finally an empty string.  If the pattern is studied, the final  attempt
2650         does not take place, because PCRE knows that the subject is too  short,         does  not take place, because PCRE knows that the subject is too short,
2651         and  so  the  (*MARK) is never encountered.  In this case, studying the         and so the (*MARK) is never encountered.  In this  case,  studying  the
2652         pattern does not affect the overall match result, which  is  still  "no         pattern  does  not  affect the overall match result, which is still "no
2653         match", but it does affect the auxiliary information that is returned.         match", but it does affect the auxiliary information that is returned.
2654    
2655           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
2656    
2657         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
2658         UTF-8 string is automatically checked when pcre_exec() is  subsequently         UTF-8  string is automatically checked when pcre_exec() is subsequently
2659         called.   The  value  of  startoffset is also checked to ensure that it         called.  The value of startoffset is also checked  to  ensure  that  it
2660         points to the start of a UTF-8 character. There is a  discussion  about         points  to  the start of a UTF-8 character. There is a discussion about
2661         the  validity  of  UTF-8 strings in the pcreunicode page. If an invalid         the validity of UTF-8 strings in the pcreunicode page.  If  an  invalid
2662         sequence  of  bytes   is   found,   pcre_exec()   returns   the   error         sequence   of   bytes   is   found,   pcre_exec()   returns  the  error
2663         PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a         PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
2664         truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In         truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In
2665         both  cases, information about the precise nature of the error may also         both cases, information about the precise nature of the error may  also
2666         be returned (see the descriptions of these errors in the section  enti-         be  returned (see the descriptions of these errors in the section enti-
2667         tled  Error return values from pcre_exec() below).  If startoffset con-         tled Error return values from pcre_exec() below).  If startoffset  con-
2668         tains a value that does not point to the start of a UTF-8 character (or         tains a value that does not point to the start of a UTF-8 character (or
2669         to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.         to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.
2670    
2671         If  you  already  know that your subject is valid, and you want to skip         If you already know that your subject is valid, and you  want  to  skip
2672         these   checks   for   performance   reasons,   you   can    set    the         these    checks    for   performance   reasons,   you   can   set   the
2673         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to
2674         do this for the second and subsequent calls to pcre_exec() if  you  are         do  this  for the second and subsequent calls to pcre_exec() if you are
2675         making  repeated  calls  to  find  all  the matches in a single subject         making repeated calls to find all  the  matches  in  a  single  subject
2676         string. However, you should be  sure  that  the  value  of  startoffset         string.  However,  you  should  be  sure  that the value of startoffset
2677         points  to  the  start of a character (or the end of the subject). When         points to the start of a character (or the end of  the  subject).  When
2678         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a
2679         subject  or  an invalid value of startoffset is undefined. Your program         subject or an invalid value of startoffset is undefined.  Your  program
2680         may crash.         may crash.
2681    
2682           PCRE_PARTIAL_HARD           PCRE_PARTIAL_HARD
2683           PCRE_PARTIAL_SOFT           PCRE_PARTIAL_SOFT
2684    
2685         These options turn on the partial matching feature. For backwards  com-         These  options turn on the partial matching feature. For backwards com-
2686         patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial         patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A  partial
2687         match occurs if the end of the subject string is reached  successfully,         match  occurs if the end of the subject string is reached successfully,
2688         but  there  are not enough subject characters to complete the match. If         but there are not enough subject characters to complete the  match.  If
2689         this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,         this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,
2690         matching  continues  by  testing any remaining alternatives. Only if no         matching continues by testing any remaining alternatives.  Only  if  no
2691         complete match can be found is PCRE_ERROR_PARTIAL returned  instead  of         complete  match  can be found is PCRE_ERROR_PARTIAL returned instead of
2692         PCRE_ERROR_NOMATCH.  In  other  words,  PCRE_PARTIAL_SOFT says that the         PCRE_ERROR_NOMATCH. In other words,  PCRE_PARTIAL_SOFT  says  that  the
2693         caller is prepared to handle a partial match, but only if  no  complete         caller  is  prepared to handle a partial match, but only if no complete
2694         match can be found.         match can be found.
2695    
2696         If  PCRE_PARTIAL_HARD  is  set, it overrides PCRE_PARTIAL_SOFT. In this         If PCRE_PARTIAL_HARD is set, it overrides  PCRE_PARTIAL_SOFT.  In  this
2697         case, if a partial match  is  found,  pcre_exec()  immediately  returns         case,  if  a  partial  match  is found, pcre_exec() immediately returns
2698         PCRE_ERROR_PARTIAL,  without  considering  any  other  alternatives. In         PCRE_ERROR_PARTIAL, without  considering  any  other  alternatives.  In
2699         other words, when PCRE_PARTIAL_HARD is set, a partial match is  consid-         other  words, when PCRE_PARTIAL_HARD is set, a partial match is consid-
2700         ered to be more important that an alternative complete match.         ered to be more important that an alternative complete match.
2701    
2702         In  both  cases,  the portion of the string that was inspected when the         In both cases, the portion of the string that was  inspected  when  the
2703         partial match was found is set as the first matching string. There is a         partial match was found is set as the first matching string. There is a
2704         more  detailed  discussion  of partial and multi-segment matching, with         more detailed discussion of partial and  multi-segment  matching,  with
2705         examples, in the pcrepartial documentation.         examples, in the pcrepartial documentation.
2706    
2707     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
2708    
2709         The subject string is passed to pcre_exec() as a pointer in subject,  a         The  subject string is passed to pcre_exec() as a pointer in subject, a
2710         length  in  bytes in length, and a starting byte offset in startoffset.         length in bytes in length, and a starting byte offset  in  startoffset.
2711         If this is  negative  or  greater  than  the  length  of  the  subject,         If  this  is  negative  or  greater  than  the  length  of the subject,
2712         pcre_exec()  returns  PCRE_ERROR_BADOFFSET. When the starting offset is         pcre_exec() returns PCRE_ERROR_BADOFFSET. When the starting  offset  is
2713         zero, the search for a match starts at the beginning  of  the  subject,         zero,  the  search  for a match starts at the beginning of the subject,
2714         and this is by far the most common case. In UTF-8 mode, the byte offset         and this is by far the most common case. In UTF-8 mode, the byte offset
2715         must point to the start of a UTF-8 character (or the end  of  the  sub-         must  point  to  the start of a UTF-8 character (or the end of the sub-
2716         ject).  Unlike  the pattern string, the subject may contain binary zero         ject). Unlike the pattern string, the subject may contain  binary  zero
2717         bytes.         bytes.
2718    
2719         A non-zero starting offset is useful when searching for  another  match         A  non-zero  starting offset is useful when searching for another match
2720         in  the same subject by calling pcre_exec() again after a previous suc-         in the same subject by calling pcre_exec() again after a previous  suc-
2721         cess.  Setting startoffset differs from just passing over  a  shortened         cess.   Setting  startoffset differs from just passing over a shortened
2722         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
2723         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
2724    
2725           \Biss\B           \Biss\B
2726    
2727         which finds occurrences of "iss" in the middle of  words.  (\B  matches         which  finds  occurrences  of "iss" in the middle of words. (\B matches
2728         only  if  the  current position in the subject is not a word boundary.)         only if the current position in the subject is not  a  word  boundary.)
2729         When applied to the string "Mississipi" the first call  to  pcre_exec()         When  applied  to the string "Mississipi" the first call to pcre_exec()
2730         finds  the  first  occurrence. If pcre_exec() is called again with just         finds the first occurrence. If pcre_exec() is called  again  with  just
2731         the remainder of the subject,  namely  "issipi",  it  does  not  match,         the  remainder  of  the  subject,  namely  "issipi", it does not match,
2732         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
2733         to be a word boundary. However, if pcre_exec()  is  passed  the  entire         to  be  a  word  boundary. However, if pcre_exec() is passed the entire
2734         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
2735         rence of "iss" because it is able to look behind the starting point  to         rence  of "iss" because it is able to look behind the starting point to
2736         discover that it is preceded by a letter.         discover that it is preceded by a letter.
2737    
2738         Finding  all  the  matches  in a subject is tricky when the pattern can         Finding all the matches in a subject is tricky  when  the  pattern  can
2739         match an empty string. It is possible to emulate Perl's /g behaviour by         match an empty string. It is possible to emulate Perl's /g behaviour by
2740         first   trying   the   match   again  at  the  same  offset,  with  the         first  trying  the  match  again  at  the   same   offset,   with   the
2741         PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED  options,  and  then  if  that         PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED  options,  and  then  if that
2742         fails,  advancing  the  starting  offset  and  trying an ordinary match         fails, advancing the starting  offset  and  trying  an  ordinary  match
2743         again. There is some code that demonstrates how to do this in the pcre-         again. There is some code that demonstrates how to do this in the pcre-
2744         demo sample program. In the most general case, you have to check to see         demo sample program. In the most general case, you have to check to see
2745         if the newline convention recognizes CRLF as a newline, and if so,  and         if  the newline convention recognizes CRLF as a newline, and if so, and
2746         the current character is CR followed by LF, advance the starting offset         the current character is CR followed by LF, advance the starting offset
2747         by two characters instead of one.         by two characters instead of one.
2748    
2749         If a non-zero starting offset is passed when the pattern  is  anchored,         If  a  non-zero starting offset is passed when the pattern is anchored,
2750         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
2751         if the pattern does not require the match to be at  the  start  of  the         if  the  pattern  does  not require the match to be at the start of the
2752         subject.         subject.
2753    
2754     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
2755    
2756         In  general, a pattern matches a certain portion of the subject, and in         In general, a pattern matches a certain portion of the subject, and  in
2757         addition, further substrings from the subject  may  be  picked  out  by         addition,  further  substrings  from  the  subject may be picked out by
2758         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
2759         this is called "capturing" in what follows, and the  phrase  "capturing         this  is  called "capturing" in what follows, and the phrase "capturing
2760         subpattern"  is  used for a fragment of a pattern that picks out a sub-         subpattern" is used for a fragment of a pattern that picks out  a  sub-
2761         string. PCRE supports several other kinds of  parenthesized  subpattern         string.  PCRE  supports several other kinds of parenthesized subpattern
2762         that do not cause substrings to be captured.         that do not cause substrings to be captured.
2763    
2764         Captured substrings are returned to the caller via a vector of integers         Captured substrings are returned to the caller via a vector of integers
2765         whose address is passed in ovector. The number of elements in the  vec-         whose  address is passed in ovector. The number of elements in the vec-
2766         tor  is  passed in ovecsize, which must be a non-negative number. Note:         tor is passed in ovecsize, which must be a non-negative  number.  Note:
2767         this argument is NOT the size of ovector in bytes.         this argument is NOT the size of ovector in bytes.
2768    
2769         The first two-thirds of the vector is used to pass back  captured  sub-         The  first  two-thirds of the vector is used to pass back captured sub-
2770         strings,  each  substring using a pair of integers. The remaining third         strings, each substring using a pair of integers. The  remaining  third
2771         of the vector is used as workspace by pcre_exec() while  matching  cap-         of  the  vector is used as workspace by pcre_exec() while matching cap-
2772         turing  subpatterns, and is not available for passing back information.         turing subpatterns, and is not available for passing back  information.
2773         The number passed in ovecsize should always be a multiple of three.  If         The  number passed in ovecsize should always be a multiple of three. If
2774         it is not, it is rounded down.         it is not, it is rounded down.
2775    
2776         When  a  match  is successful, information about captured substrings is         When a match is successful, information about  captured  substrings  is
2777         returned in pairs of integers, starting at the  beginning  of  ovector,         returned  in  pairs  of integers, starting at the beginning of ovector,
2778         and  continuing  up  to two-thirds of its length at the most. The first         and continuing up to two-thirds of its length at the  most.  The  first
2779         element of each pair is set to the byte offset of the  first  character         element  of  each pair is set to the byte offset of the first character
2780         in  a  substring, and the second is set to the byte offset of the first         in a substring, and the second is set to the byte offset of  the  first
2781         character after the end of a substring. Note: these values  are  always         character  after  the end of a substring. Note: these values are always
2782         byte offsets, even in UTF-8 mode. They are not character counts.         byte offsets, even in UTF-8 mode. They are not character counts.
2783    
2784         The  first  pair  of  integers, ovector[0] and ovector[1], identify the         The first pair of integers, ovector[0]  and  ovector[1],  identify  the
2785         portion of the subject string matched by the entire pattern.  The  next         portion  of  the subject string matched by the entire pattern. The next
2786         pair  is  used for the first capturing subpattern, and so on. The value         pair is used for the first capturing subpattern, and so on.  The  value
2787         returned by pcre_exec() is one more than the highest numbered pair that         returned by pcre_exec() is one more than the highest numbered pair that
2788         has  been  set.  For example, if two substrings have been captured, the         has been set.  For example, if two substrings have been  captured,  the
2789         returned value is 3. If there are no capturing subpatterns, the  return         returned  value is 3. If there are no capturing subpatterns, the return
2790         value from a successful match is 1, indicating that just the first pair         value from a successful match is 1, indicating that just the first pair
2791         of offsets has been set.         of offsets has been set.
2792    
2793         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
2794         of the string that it matched that is returned.         of the string that it matched that is returned.
2795    
2796         If  the vector is too small to hold all the captured substring offsets,         If the vector is too small to hold all the captured substring  offsets,
2797         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
2798         function  returns a value of zero. If neither the actual string matched         function returns a value of zero. If neither the actual string  matched
2799         not any captured substrings are of interest, pcre_exec() may be  called         nor  any captured substrings are of interest, pcre_exec() may be called
2800         with  ovector passed as NULL and ovecsize as zero. However, if the pat-         with ovector passed as NULL and ovecsize as zero. However, if the  pat-
2801         tern contains back references and the ovector  is  not  big  enough  to         tern  contains  back  references  and  the ovector is not big enough to
2802         remember  the related substrings, PCRE has to get additional memory for         remember the related substrings, PCRE has to get additional memory  for
2803         use during matching. Thus it is usually advisable to supply an  ovector         use  during matching. Thus it is usually advisable to supply an ovector
2804         of reasonable size.         of reasonable size.
2805    
2806         There  are  some  cases where zero is returned (indicating vector over-         There are some cases where zero is returned  (indicating  vector  over-
2807         flow) when in fact the vector is exactly the right size for  the  final         flow)  when  in fact the vector is exactly the right size for the final
2808         match. For example, consider the pattern         match. For example, consider the pattern
2809    
2810           (a)(?:(b)c|bd)           (a)(?:(b)c|bd)
2811    
2812         If  a  vector of 6 elements (allowing for only 1 captured substring) is         If a vector of 6 elements (allowing for only 1 captured  substring)  is
2813         given with subject string "abd", pcre_exec() will try to set the second         given with subject string "abd", pcre_exec() will try to set the second
2814         captured string, thereby recording a vector overflow, before failing to         captured string, thereby recording a vector overflow, before failing to
2815         match "c" and backing up  to  try  the  second  alternative.  The  zero         match  "c"  and  backing  up  to  try  the second alternative. The zero
2816         return,  however,  does  correctly  indicate that the maximum number of         return, however, does correctly indicate that  the  maximum  number  of
2817         slots (namely 2) have been filled. In similar cases where there is tem-         slots (namely 2) have been filled. In similar cases where there is tem-
2818         porary  overflow,  but  the final number of used slots is actually less         porary overflow, but the final number of used slots  is  actually  less
2819         than the maximum, a non-zero value is returned.         than the maximum, a non-zero value is returned.
2820    
2821         The pcre_fullinfo() function can be used to find out how many capturing         The pcre_fullinfo() function can be used to find out how many capturing
2822         subpatterns  there  are  in  a  compiled pattern. The smallest size for         subpatterns there are in a compiled  pattern.  The  smallest  size  for
2823         ovector that will allow for n captured substrings, in addition  to  the         ovector  that  will allow for n captured substrings, in addition to the
2824         offsets of the substring matched by the whole pattern, is (n+1)*3.         offsets of the substring matched by the whole pattern, is (n+1)*3.
2825    
2826         It  is  possible for capturing subpattern number n+1 to match some part         It is possible for capturing subpattern number n+1 to match  some  part
2827         of the subject when subpattern n has not been used at all. For example,         of the subject when subpattern n has not been used at all. For example,
2828         if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the         if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
2829         return from the function is 4, and subpatterns 1 and 3 are matched, but         return from the function is 4, and subpatterns 1 and 3 are matched, but
2830         2  is  not.  When  this happens, both values in the offset pairs corre-         2 is not. When this happens, both values in  the  offset  pairs  corre-
2831         sponding to unused subpatterns are set to -1.         sponding to unused subpatterns are set to -1.
2832    
2833         Offset values that correspond to unused subpatterns at the end  of  the         Offset  values  that correspond to unused subpatterns at the end of the
2834         expression  are  also  set  to  -1. For example, if the string "abc" is         expression are also set to -1. For example,  if  the  string  "abc"  is
2835         matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not         matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
2836         matched.  The  return  from the function is 2, because the highest used         matched. The return from the function is 2, because  the  highest  used
2837         capturing subpattern number is 1, and the offsets for  for  the  second         capturing  subpattern  number  is 1, and the offsets for for the second
2838         and  third  capturing subpatterns (assuming the vector is large enough,         and third capturing subpatterns (assuming the vector is  large  enough,
2839         of course) are set to -1.         of course) are set to -1.
2840    
2841         Note: Elements in the first two-thirds of ovector that  do  not  corre-         Note:  Elements  in  the first two-thirds of ovector that do not corre-
2842         spond  to  capturing parentheses in the pattern are never changed. That         spond to capturing parentheses in the pattern are never  changed.  That
2843         is, if a pattern contains n capturing parentheses, no more  than  ovec-         is,  if  a pattern contains n capturing parentheses, no more than ovec-
2844         tor[0]  to ovector[2n+1] are set by pcre_exec(). The other elements (in         tor[0] to ovector[2n+1] are set by pcre_exec(). The other elements  (in
2845         the first two-thirds) retain whatever values they previously had.         the first two-thirds) retain whatever values they previously had.
2846    
2847         Some convenience functions are provided  for  extracting  the  captured         Some  convenience  functions  are  provided for extracting the captured
2848         substrings as separate strings. These are described below.         substrings as separate strings. These are described below.
2849    
2850     Error return values from pcre_exec()     Error return values from pcre_exec()
2851    
2852         If  pcre_exec()  fails, it returns a negative number. The following are         If pcre_exec() fails, it returns a negative number. The  following  are
2853         defined in the header file:         defined in the header file:
2854    
2855           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 2851  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2858  MATCHING A PATTERN: THE TRADITIONAL FUNC
2858    
2859           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
2860    
2861         Either code or subject was passed as NULL,  or  ovector  was  NULL  and         Either  code  or  subject  was  passed as NULL, or ovector was NULL and
2862         ovecsize was not zero.         ovecsize was not zero.
2863    
2864           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 2860  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2867  MATCHING A PATTERN: THE TRADITIONAL FUNC
2867    
2868           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
2869    
2870         PCRE  stores a 4-byte "magic number" at the start of the compiled code,         PCRE stores a 4-byte "magic number" at the start of the compiled  code,
2871         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
2872         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
2873         an environment with the other endianness. This is the error  that  PCRE         an  environment  with the other endianness. This is the error that PCRE
2874         gives when the magic number is not present.         gives when the magic number is not present.
2875    
2876           PCRE_ERROR_UNKNOWN_OPCODE (-5)           PCRE_ERROR_UNKNOWN_OPCODE (-5)
2877    
2878         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
2879         compiled pattern. This error could be caused by a bug  in  PCRE  or  by         compiled  pattern.  This  error  could be caused by a bug in PCRE or by
2880         overwriting of the compiled pattern.         overwriting of the compiled pattern.
2881    
2882           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2883    
2884         If  a  pattern contains back references, but the ovector that is passed         If a pattern contains back references, but the ovector that  is  passed
2885         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
2886         PCRE  gets  a  block of memory at the start of matching to use for this         PCRE gets a block of memory at the start of matching to  use  for  this
2887         purpose. If the call via pcre_malloc() fails, this error is given.  The         purpose.  If the call via pcre_malloc() fails, this error is given. The
2888         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
2889    
2890         This  error  is also given if pcre_stack_malloc() fails in pcre_exec().         This error is also given if pcre_stack_malloc() fails  in  pcre_exec().
2891         This can happen only when PCRE has been compiled with  --disable-stack-         This  can happen only when PCRE has been compiled with --disable-stack-
2892         for-recursion.         for-recursion.
2893    
2894           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2895    
2896         This  error is used by the pcre_copy_substring(), pcre_get_substring(),         This error is used by the pcre_copy_substring(),  pcre_get_substring(),
2897         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
2898         returned by pcre_exec().         returned by pcre_exec().
2899    
2900           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
2901    
2902         The  backtracking  limit,  as  specified  by the match_limit field in a         The backtracking limit, as specified by  the  match_limit  field  in  a
2903         pcre_extra structure (or defaulted) was reached.  See  the  description         pcre_extra  structure  (or  defaulted) was reached. See the description
2904         above.         above.
2905    
2906           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
2907    
2908         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
2909         use by callout functions that want to yield a distinctive  error  code.         use  by  callout functions that want to yield a distinctive error code.
2910         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
2911    
2912           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
2913    
2914         A  string  that contains an invalid UTF-8 byte sequence was passed as a         A string that contains an invalid UTF-8 byte sequence was passed  as  a
2915         subject, and the PCRE_NO_UTF8_CHECK option was not set. If the size  of         subject,  and the PCRE_NO_UTF8_CHECK option was not set. If the size of
2916         the  output  vector  (ovecsize)  is  at least 2, the byte offset to the         the output vector (ovecsize) is at least 2,  the  byte  offset  to  the
2917         start of the the invalid UTF-8 character is placed in  the  first  ele-         start  of  the  the invalid UTF-8 character is placed in the first ele-
2918         ment,  and  a  reason  code is placed in the second element. The reason         ment, and a reason code is placed in the  second  element.  The  reason
2919         codes are listed in the following section.  For backward compatibility,         codes are listed in the following section.  For backward compatibility,
2920         if  PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 char-         if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8  char-
2921         acter  at  the  end  of  the   subject   (reason   codes   1   to   5),         acter   at   the   end   of   the   subject  (reason  codes  1  to  5),
2922         PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.         PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.
2923    
2924           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
2925    
2926         The  UTF-8  byte  sequence that was passed as a subject was checked and         The UTF-8 byte sequence that was passed as a subject  was  checked  and
2927         found to be valid (the PCRE_NO_UTF8_CHECK option was not set), but  the         found  to be valid (the PCRE_NO_UTF8_CHECK option was not set), but the
2928         value  of startoffset did not point to the beginning of a UTF-8 charac-         value of startoffset did not point to the beginning of a UTF-8  charac-
2929         ter or the end of the subject.         ter or the end of the subject.
2930    
2931           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
2932    
2933         The subject string did not match, but it did match partially.  See  the         The  subject  string did not match, but it did match partially. See the
2934         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
2935    
2936           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
2937    
2938         This  code  is  no  longer  in  use.  It was formerly returned when the         This code is no longer in  use.  It  was  formerly  returned  when  the
2939         PCRE_PARTIAL option was used with a compiled pattern  containing  items         PCRE_PARTIAL  option  was used with a compiled pattern containing items
2940         that  were  not  supported  for  partial  matching.  From  release 8.00         that were  not  supported  for  partial  matching.  From  release  8.00
2941         onwards, there are no restrictions on partial matching.         onwards, there are no restrictions on partial matching.
2942    
2943           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
2944    
2945         An unexpected internal error has occurred. This error could  be  caused         An  unexpected  internal error has occurred. This error could be caused
2946         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
2947    
2948           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
# Line 2945  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2952  MATCHING A PATTERN: THE TRADITIONAL FUNC
2952           PCRE_ERROR_RECURSIONLIMIT (-21)           PCRE_ERROR_RECURSIONLIMIT (-21)
2953    
2954         The internal recursion limit, as specified by the match_limit_recursion         The internal recursion limit, as specified by the match_limit_recursion
2955         field in a pcre_extra structure (or defaulted)  was  reached.  See  the         field  in  a  pcre_extra  structure (or defaulted) was reached. See the
2956         description above.         description above.
2957    
2958           PCRE_ERROR_BADNEWLINE     (-23)           PCRE_ERROR_BADNEWLINE     (-23)
# Line 2959  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2966  MATCHING A PATTERN: THE TRADITIONAL FUNC
2966    
2967           PCRE_ERROR_SHORTUTF8      (-25)           PCRE_ERROR_SHORTUTF8      (-25)
2968    
2969         This error is returned instead of PCRE_ERROR_BADUTF8 when  the  subject         This  error  is returned instead of PCRE_ERROR_BADUTF8 when the subject
2970         string  ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD         string ends with a truncated UTF-8 character and the  PCRE_PARTIAL_HARD
2971         option is set.  Information  about  the  failure  is  returned  as  for         option  is  set.   Information  about  the  failure  is returned as for
2972         PCRE_ERROR_BADUTF8.  It  is in fact sufficient to detect this case, but         PCRE_ERROR_BADUTF8. It is in fact sufficient to detect this  case,  but
2973         this special error code for PCRE_PARTIAL_HARD precedes the  implementa-         this  special error code for PCRE_PARTIAL_HARD precedes the implementa-
2974         tion  of returned information; it is retained for backwards compatibil-         tion of returned information; it is retained for backwards  compatibil-
2975         ity.         ity.
2976    
2977           PCRE_ERROR_RECURSELOOP    (-26)           PCRE_ERROR_RECURSELOOP    (-26)
2978    
2979         This error is returned when pcre_exec() detects a recursion loop within         This error is returned when pcre_exec() detects a recursion loop within
2980         the  pattern. Specifically, it means that either the whole pattern or a         the pattern. Specifically, it means that either the whole pattern or  a
2981         subpattern has been called recursively for the second time at the  same         subpattern  has been called recursively for the second time at the same
2982         position in the subject string. Some simple patterns that might do this         position in the subject string. Some simple patterns that might do this
2983         are detected and faulted at compile time, but more  complicated  cases,         are  detected  and faulted at compile time, but more complicated cases,
2984         in particular mutual recursions between two different subpatterns, can-         in particular mutual recursions between two different subpatterns, can-
2985         not be detected until run time.         not be detected until run time.
2986    
2987           PCRE_ERROR_JIT_STACKLIMIT (-27)           PCRE_ERROR_JIT_STACKLIMIT (-27)
2988    
2989         This error is returned when a pattern  that  was  successfully  studied         This  error  is  returned  when a pattern that was successfully studied
2990         using  a  JIT compile option is being matched, but the memory available         using a JIT compile option is being matched, but the  memory  available
2991         for the just-in-time processing stack is  not  large  enough.  See  the         for  the  just-in-time  processing  stack  is not large enough. See the
2992         pcrejit documentation for more details.         pcrejit documentation for more details.
2993    
2994           PCRE_ERROR_BADMODE (-28)           PCRE_ERROR_BADMODE (-28)
# Line 2991  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2998  MATCHING A PATTERN: THE TRADITIONAL FUNC
2998    
2999           PCRE_ERROR_BADENDIANNESS (-29)           PCRE_ERROR_BADENDIANNESS (-29)
3000    
3001         This error is given if  a  pattern  that  was  compiled  and  saved  is         This  error  is  given  if  a  pattern  that  was compiled and saved is
3002         reloaded  on  a  host  with  different endianness. The utility function         reloaded on a host with  different  endianness.  The  utility  function
3003         pcre_pattern_to_host_byte_order() can be used to convert such a pattern         pcre_pattern_to_host_byte_order() can be used to convert such a pattern
3004         so that it runs on the new host.         so that it runs on the new host.
3005    
# Line 3000  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3007  MATCHING A PATTERN: THE TRADITIONAL FUNC
3007    
3008     Reason codes for invalid UTF-8 strings     Reason codes for invalid UTF-8 strings
3009    
3010         This  section  applies  only  to  the  8-bit library. The corresponding         This section applies only  to  the  8-bit  library.  The  corresponding
3011         information for the 16-bit library is given in the pcre16 page.         information for the 16-bit library is given in the pcre16 page.
3012    
3013         When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-         When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-
3014         UTF8,  and  the size of the output vector (ovecsize) is at least 2, the         UTF8, and the size of the output vector (ovecsize) is at least  2,  the
3015         offset of the start of the invalid UTF-8 character  is  placed  in  the         offset  of  the  start  of the invalid UTF-8 character is placed in the
3016         first output vector element (ovector[0]) and a reason code is placed in         first output vector element (ovector[0]) and a reason code is placed in
3017         the second element (ovector[1]). The reason codes are  given  names  in         the  second  element  (ovector[1]). The reason codes are given names in
3018         the pcre.h header file:         the pcre.h header file:
3019    
3020           PCRE_UTF8_ERR1           PCRE_UTF8_ERR1
# Line 3016  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3023  MATCHING A PATTERN: THE TRADITIONAL FUNC
3023           PCRE_UTF8_ERR4           PCRE_UTF8_ERR4
3024           PCRE_UTF8_ERR5           PCRE_UTF8_ERR5
3025    
3026         The  string  ends  with a truncated UTF-8 character; the code specifies         The string ends with a truncated UTF-8 character;  the  code  specifies
3027         how many bytes are missing (1 to 5). Although RFC 3629 restricts  UTF-8         how  many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8
3028         characters  to  be  no longer than 4 bytes, the encoding scheme (origi-         characters to be no longer than 4 bytes, the  encoding  scheme  (origi-
3029         nally defined by RFC 2279) allows for  up  to  6  bytes,  and  this  is         nally  defined  by  RFC  2279)  allows  for  up to 6 bytes, and this is
3030         checked first; hence the possibility of 4 or 5 missing bytes.         checked first; hence the possibility of 4 or 5 missing bytes.
3031    
3032           PCRE_UTF8_ERR6           PCRE_UTF8_ERR6
# Line 3029  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3036  MATCHING A PATTERN: THE TRADITIONAL FUNC
3036           PCRE_UTF8_ERR10           PCRE_UTF8_ERR10
3037    
3038         The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of         The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of
3039         the character do not have the binary value 0b10 (that  is,  either  the         the  character  do  not have the binary value 0b10 (that is, either the
3040         most significant bit is 0, or the next bit is 1).         most significant bit is 0, or the next bit is 1).
3041    
3042           PCRE_UTF8_ERR11           PCRE_UTF8_ERR11
3043           PCRE_UTF8_ERR12           PCRE_UTF8_ERR12
3044    
3045         A  character that is valid by the RFC 2279 rules is either 5 or 6 bytes         A character that is valid by the RFC 2279 rules is either 5 or 6  bytes
3046         long; these code points are excluded by RFC 3629.         long; these code points are excluded by RFC 3629.
3047    
3048           PCRE_UTF8_ERR13           PCRE_UTF8_ERR13
3049    
3050         A 4-byte character has a value greater than 0x10fff; these code  points         A  4-byte character has a value greater than 0x10fff; these code points
3051         are excluded by RFC 3629.         are excluded by RFC 3629.
3052    
3053           PCRE_UTF8_ERR14           PCRE_UTF8_ERR14
3054    
3055         A  3-byte  character  has  a  value in the range 0xd800 to 0xdfff; this         A 3-byte character has a value in the  range  0xd800  to  0xdfff;  this
3056         range of code points are reserved by RFC 3629 for use with UTF-16,  and         range  of code points are reserved by RFC 3629 for use with UTF-16, and
3057         so are excluded from UTF-8.         so are excluded from UTF-8.
3058    
3059           PCRE_UTF8_ERR15           PCRE_UTF8_ERR15
# Line 3055  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3062  MATCHING A PATTERN: THE TRADITIONAL FUNC
3062           PCRE_UTF8_ERR18           PCRE_UTF8_ERR18
3063           PCRE_UTF8_ERR19           PCRE_UTF8_ERR19
3064    
3065         A  2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes         A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it  codes
3066         for a value that can be represented by fewer bytes, which  is  invalid.         for  a  value that can be represented by fewer bytes, which is invalid.
3067         For  example,  the two bytes 0xc0, 0xae give the value 0x2e, whose cor-         For example, the two bytes 0xc0, 0xae give the value 0x2e,  whose  cor-
3068         rect coding uses just one byte.         rect coding uses just one byte.
3069    
3070           PCRE_UTF8_ERR20           PCRE_UTF8_ERR20
3071    
3072         The two most significant bits of the first byte of a character have the         The two most significant bits of the first byte of a character have the
3073         binary  value 0b10 (that is, the most significant bit is 1 and the sec-         binary value 0b10 (that is, the most significant bit is 1 and the  sec-
3074         ond is 0). Such a byte can only validly occur as the second  or  subse-         ond  is  0). Such a byte can only validly occur as the second or subse-
3075         quent byte of a multi-byte character.         quent byte of a multi-byte character.
3076    
3077           PCRE_UTF8_ERR21           PCRE_UTF8_ERR21
3078    
3079         The  first byte of a character has the value 0xfe or 0xff. These values         The first byte of a character has the value 0xfe or 0xff. These  values
3080         can never occur in a valid UTF-8 string.         can never occur in a valid UTF-8 string.
3081    
3082    
# Line 3086  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 3093  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
3093         int pcre_get_substring_list(const char *subject,         int pcre_get_substring_list(const char *subject,
3094              int *ovector, int stringcount, const char ***listptr);              int *ovector, int stringcount, const char ***listptr);
3095    
3096         Captured substrings can be  accessed  directly  by  using  the  offsets         Captured  substrings  can  be  accessed  directly  by using the offsets
3097         returned  by  pcre_exec()  in  ovector.  For convenience, the functions         returned by pcre_exec() in  ovector.  For  convenience,  the  functions
3098         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
3099         string_list()  are  provided for extracting captured substrings as new,         string_list() are provided for extracting captured substrings  as  new,
3100         separate, zero-terminated strings. These functions identify  substrings         separate,  zero-terminated strings. These functions identify substrings
3101         by  number.  The  next section describes functions for extracting named         by number. The next section describes functions  for  extracting  named
3102         substrings.         substrings.
3103    
3104         A substring that contains a binary zero is correctly extracted and  has         A  substring that contains a binary zero is correctly extracted and has
3105         a  further zero added on the end, but the result is not, of course, a C         a further zero added on the end, but the result is not, of course, a  C
3106         string.  However, you can process such a string  by  referring  to  the         string.   However,  you  can  process such a string by referring to the
3107         length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-         length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-
3108         string().  Unfortunately, the interface to pcre_get_substring_list() is         string().  Unfortunately, the interface to pcre_get_substring_list() is
3109         not  adequate for handling strings containing binary zeros, because the         not adequate for handling strings containing binary zeros, because  the
3110         end of the final string is not independently indicated.         end of the final string is not independently indicated.
3111    
3112         The first three arguments are the same for all  three  of  these  func-         The  first  three  arguments  are the same for all three of these func-
3113         tions:  subject  is  the subject string that has just been successfully         tions: subject is the subject string that has  just  been  successfully
3114         matched, ovector is a pointer to the vector of integer offsets that was         matched, ovector is a pointer to the vector of integer offsets that was
3115         passed to pcre_exec(), and stringcount is the number of substrings that         passed to pcre_exec(), and stringcount is the number of substrings that
3116         were captured by the match, including the substring  that  matched  the         were  captured  by  the match, including the substring that matched the
3117         entire regular expression. This is the value returned by pcre_exec() if         entire regular expression. This is the value returned by pcre_exec() if
3118         it is greater than zero. If pcre_exec() returned zero, indicating  that         it  is greater than zero. If pcre_exec() returned zero, indicating that
3119         it  ran out of space in ovector, the value passed as stringcount should         it ran out of space in ovector, the value passed as stringcount  should
3120         be the number of elements in the vector divided by three.         be the number of elements in the vector divided by three.
3121    
3122         The functions pcre_copy_substring() and pcre_get_substring() extract  a         The  functions pcre_copy_substring() and pcre_get_substring() extract a
3123         single  substring,  whose  number  is given as stringnumber. A value of         single substring, whose number is given as  stringnumber.  A  value  of
3124         zero extracts the substring that matched the  entire  pattern,  whereas         zero  extracts  the  substring that matched the entire pattern, whereas
3125         higher  values  extract  the  captured  substrings.  For pcre_copy_sub-         higher values  extract  the  captured  substrings.  For  pcre_copy_sub-
3126         string(), the string is placed in buffer,  whose  length  is  given  by         string(),  the  string  is  placed  in buffer, whose length is given by
3127         buffersize,  while  for  pcre_get_substring()  a new block of memory is         buffersize, while for pcre_get_substring() a new  block  of  memory  is
3128         obtained via pcre_malloc, and its address is  returned  via  stringptr.         obtained  via  pcre_malloc,  and its address is returned via stringptr.
3129         The  yield  of  the function is the length of the string, not including         The yield of the function is the length of the  string,  not  including
3130         the terminating zero, or one of these error codes:         the terminating zero, or one of these error codes:
3131    
3132           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
3133    
3134         The buffer was too small for pcre_copy_substring(), or the  attempt  to         The  buffer  was too small for pcre_copy_substring(), or the attempt to
3135         get memory failed for pcre_get_substring().         get memory failed for pcre_get_substring().
3136    
3137           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
3138    
3139         There is no substring whose number is stringnumber.         There is no substring whose number is stringnumber.
3140    
3141         The  pcre_get_substring_list()  function  extracts  all  available sub-         The pcre_get_substring_list()  function  extracts  all  available  sub-
3142         strings and builds a list of pointers to them. All this is  done  in  a         strings  and  builds  a list of pointers to them. All this is done in a
3143         single block of memory that is obtained via pcre_malloc. The address of         single block of memory that is obtained via pcre_malloc. The address of
3144         the memory block is returned via listptr, which is also  the  start  of         the  memory  block  is returned via listptr, which is also the start of
3145         the  list  of  string pointers. The end of the list is marked by a NULL         the list of string pointers. The end of the list is marked  by  a  NULL
3146         pointer. The yield of the function is zero if all  went  well,  or  the         pointer.  The  yield  of  the function is zero if all went well, or the
3147         error code         error code
3148    
3149           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
3150    
3151         if the attempt to get the memory block failed.         if the attempt to get the memory block failed.
3152    
3153         When  any of these functions encounter a substring that is unset, which         When any of these functions encounter a substring that is unset,  which
3154         can happen when capturing subpattern number n+1 matches  some  part  of         can  happen  when  capturing subpattern number n+1 matches some part of
3155         the  subject, but subpattern n has not been used at all, they return an         the subject, but subpattern n has not been used at all, they return  an
3156         empty string. This can be distinguished from a genuine zero-length sub-         empty string. This can be distinguished from a genuine zero-length sub-
3157         string  by inspecting the appropriate offset in ovector, which is nega-         string by inspecting the appropriate offset in ovector, which is  nega-
3158         tive for unset substrings.         tive for unset substrings.
3159    
3160         The two convenience functions pcre_free_substring() and  pcre_free_sub-         The  two convenience functions pcre_free_substring() and pcre_free_sub-
3161         string_list()  can  be  used  to free the memory returned by a previous         string_list() can be used to free the memory  returned  by  a  previous
3162         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
3163         tively.  They  do  nothing  more  than  call the function pointed to by         tively. They do nothing more than  call  the  function  pointed  to  by
3164         pcre_free, which of course could be called directly from a  C  program.         pcre_free,  which  of course could be called directly from a C program.
3165         However,  PCRE is used in some situations where it is linked via a spe-         However, PCRE is used in some situations where it is linked via a  spe-
3166         cial  interface  to  another  programming  language  that  cannot   use         cial   interface  to  another  programming  language  that  cannot  use
3167         pcre_free  directly;  it is for these cases that the functions are pro-         pcre_free directly; it is for these cases that the functions  are  pro-
3168         vided.         vided.
3169    
3170    
# Line 3176  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 3183  EXTRACTING CAPTURED SUBSTRINGS BY NAME
3183              int stringcount, const char *stringname,              int stringcount, const char *stringname,
3184              const char **stringptr);              const char **stringptr);
3185    
3186         To extract a substring by name, you first have to find associated  num-         To  extract a substring by name, you first have to find associated num-
3187         ber.  For example, for this pattern         ber.  For example, for this pattern
3188    
3189           (a+)b(?<xxx>\d+)...           (a+)b(?<xxx>\d+)...
# Line 3185  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 3192  EXTRACTING CAPTURED SUBSTRINGS BY NAME
3192         be unique (PCRE_DUPNAMES was not set), you can find the number from the         be unique (PCRE_DUPNAMES was not set), you can find the number from the
3193         name by calling pcre_get_stringnumber(). The first argument is the com-         name by calling pcre_get_stringnumber(). The first argument is the com-
3194         piled pattern, and the second is the name. The yield of the function is         piled pattern, and the second is the name. The yield of the function is
3195         the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no         the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no
3196         subpattern of that name.         subpattern of that name.
3197    
3198         Given the number, you can extract the substring directly, or use one of         Given the number, you can extract the substring directly, or use one of
3199         the functions described in the previous section. For convenience, there         the functions described in the previous section. For convenience, there
3200         are also two functions that do the whole job.         are also two functions that do the whole job.
3201    
3202         Most   of   the   arguments    of    pcre_copy_named_substring()    and         Most    of    the    arguments   of   pcre_copy_named_substring()   and
3203         pcre_get_named_substring()  are  the  same  as  those for the similarly         pcre_get_named_substring() are the same  as  those  for  the  similarly
3204         named functions that extract by number. As these are described  in  the         named  functions  that extract by number. As these are described in the
3205         previous  section,  they  are not re-described here. There are just two         previous section, they are not re-described here. There  are  just  two
3206         differences:         differences:
3207    
3208         First, instead of a substring number, a substring name is  given.  Sec-         First,  instead  of a substring number, a substring name is given. Sec-
3209         ond, there is an extra argument, given at the start, which is a pointer         ond, there is an extra argument, given at the start, which is a pointer
3210         to the compiled pattern. This is needed in order to gain access to  the         to  the compiled pattern. This is needed in order to gain access to the
3211         name-to-number translation table.         name-to-number translation table.
3212    
3213         These  functions call pcre_get_stringnumber(), and if it succeeds, they         These functions call pcre_get_stringnumber(), and if it succeeds,  they
3214         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-         then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-
3215         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the         ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the
3216         behaviour may not be what you want (see the next section).         behaviour may not be what you want (see the next section).
3217    
3218         Warning: If the pattern uses the (?| feature to set up multiple subpat-         Warning: If the pattern uses the (?| feature to set up multiple subpat-
3219         terns  with  the  same number, as described in the section on duplicate         terns with the same number, as described in the  section  on  duplicate
3220         subpattern numbers in the pcrepattern page, you  cannot  use  names  to         subpattern  numbers  in  the  pcrepattern page, you cannot use names to
3221         distinguish  the  different subpatterns, because names are not included         distinguish the different subpatterns, because names are  not  included
3222         in the compiled code. The matching process uses only numbers. For  this         in  the compiled code. The matching process uses only numbers. For this
3223         reason,  the  use of different names for subpatterns of the same number         reason, the use of different names for subpatterns of the  same  number
3224         causes an error at compile time.         causes an error at compile time.
3225    
3226    
# Line 3222  DUPLICATE SUBPATTERN NAMES Line 3229  DUPLICATE SUBPATTERN NAMES
3229         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
3230              const char *name, char **first, char **last);              const char *name, char **first, char **last);
3231    
3232         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
3233         subpatterns  are not required to be unique. (Duplicate names are always         subpatterns are not required to be unique. (Duplicate names are  always
3234         allowed for subpatterns with the same number, created by using the  (?|         allowed  for subpatterns with the same number, created by using the (?|
3235         feature.  Indeed,  if  such subpatterns are named, they are required to         feature. Indeed, if such subpatterns are named, they  are  required  to
3236         use the same names.)         use the same names.)
3237    
3238         Normally, patterns with duplicate names are such that in any one match,         Normally, patterns with duplicate names are such that in any one match,
3239         only  one of the named subpatterns participates. An example is shown in         only one of the named subpatterns participates. An example is shown  in
3240         the pcrepattern documentation.         the pcrepattern documentation.
3241    
3242         When   duplicates   are   present,   pcre_copy_named_substring()    and         When    duplicates   are   present,   pcre_copy_named_substring()   and
3243         pcre_get_named_substring()  return the first substring corresponding to         pcre_get_named_substring() return the first substring corresponding  to
3244         the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING         the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING
3245         (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()         (-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()
3246         function returns one of the numbers that are associated with the  name,         function  returns one of the numbers that are associated with the name,
3247         but it is not defined which it is.         but it is not defined which it is.
3248    
3249         If  you want to get full details of all captured substrings for a given         If you want to get full details of all captured substrings for a  given
3250         name, you must use  the  pcre_get_stringtable_entries()  function.  The         name,  you  must  use  the pcre_get_stringtable_entries() function. The
3251         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
3252         third and fourth are pointers to variables which  are  updated  by  the         third  and  fourth  are  pointers to variables which are updated by the
3253         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
3254         the name-to-number table  for  the  given  name.  The  function  itself         the  name-to-number  table  for  the  given  name.  The function itself
3255         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
3256         there are none. The format of the table is described above in the  sec-         there  are none. The format of the table is described above in the sec-
3257         tion  entitled  Information about a pattern above.  Given all the rele-         tion entitled Information about a pattern above.  Given all  the  rele-
3258         vant entries for the name, you can extract each of their  numbers,  and         vant  entries  for the name, you can extract each of their numbers, and
3259         hence the captured data, if any.         hence the captured data, if any.
3260    
3261    
3262  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
3263    
3264         The  traditional  matching  function  uses a similar algorithm to Perl,         The traditional matching function uses a  similar  algorithm  to  Perl,
3265         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
3266         the  subject.  If you want to find all possible matches, or the longest         the subject. If you want to find all possible matches, or  the  longest
3267         possible match, consider using the alternative matching  function  (see         possible  match,  consider using the alternative matching function (see
3268         below)  instead.  If you cannot use the alternative function, but still         below) instead. If you cannot use the alternative function,  but  still
3269         need to find all possible matches, you can kludge it up by  making  use         need  to  find all possible matches, you can kludge it up by making use
3270         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
3271         tation.         tation.
3272    
3273         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
3274         tern.   When your callout function is called, extract and save the cur-         tern.  When your callout function is called, extract and save the  cur-
3275         rent matched substring. Then return  1,  which  forces  pcre_exec()  to         rent  matched  substring.  Then  return  1, which forces pcre_exec() to
3276         backtrack  and  try other alternatives. Ultimately, when it runs out of         backtrack and try other alternatives. Ultimately, when it runs  out  of
3277         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
3278    
3279    
3280  OBTAINING AN ESTIMATE OF STACK USAGE  OBTAINING AN ESTIMATE OF STACK USAGE
3281    
3282         Matching certain patterns using pcre_exec() can use a  lot  of  process         Matching  certain  patterns  using pcre_exec() can use a lot of process
3283         stack,  which  in  certain  environments can be rather limited in size.         stack, which in certain environments can be  rather  limited  in  size.
3284         Some users find it helpful to have an estimate of the amount  of  stack         Some  users  find it helpful to have an estimate of the amount of stack
3285         that  is  used  by  pcre_exec(),  to help them set recursion limits, as         that is used by pcre_exec(), to help  them  set  recursion  limits,  as
3286         described in the pcrestack documentation. The estimate that  is  output         described  in  the pcrestack documentation. The estimate that is output
3287         by pcretest when called with the -m and -C options is obtained by call-         by pcretest when called with the -m and -C options is obtained by call-
3288         ing pcre_exec with the values NULL, NULL, NULL, -999, and -999 for  its         ing  pcre_exec with the values NULL, NULL, NULL, -999, and -999 for its
3289         first five arguments.         first five arguments.
3290    
3291         Normally,  if  its  first  argument  is  NULL,  pcre_exec() immediately         Normally, if  its  first  argument  is  NULL,  pcre_exec()  immediately
3292         returns the negative error code PCRE_ERROR_NULL, but with this  special         returns  the negative error code PCRE_ERROR_NULL, but with this special
3293         combination  of  arguments,  it returns instead a negative number whose         combination of arguments, it returns instead a  negative  number  whose
3294         absolute value is the approximate stack frame size in bytes.  (A  nega-         absolute  value  is the approximate stack frame size in bytes. (A nega-
3295         tive  number  is  used so that it is clear that no match has happened.)         tive number is used so that it is clear that no  match  has  happened.)
3296         The value is approximate because in  some  cases,  recursive  calls  to         The  value  is  approximate  because  in some cases, recursive calls to
3297         pcre_exec() occur when there are one or two additional variables on the         pcre_exec() occur when there are one or two additional variables on the
3298         stack.         stack.
3299    
3300         If PCRE has been compiled to use the heap  instead  of  the  stack  for         If  PCRE  has  been  compiled  to use the heap instead of the stack for
3301         recursion,  the  value  returned  is  the  size  of  each block that is         recursion, the value returned  is  the  size  of  each  block  that  is
3302         obtained from the heap.         obtained from the heap.
3303    
3304    
# Line 3302  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3309  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3309              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
3310              int *workspace, int wscount);              int *workspace, int wscount);
3311    
3312         The function pcre_dfa_exec()  is  called  to  match  a  subject  string         The  function  pcre_dfa_exec()  is  called  to  match  a subject string
3313         against  a  compiled pattern, using a matching algorithm that scans the         against a compiled pattern, using a matching algorithm that  scans  the
3314         subject string just once, and does not backtrack.  This  has  different         subject  string  just  once, and does not backtrack. This has different
3315         characteristics  to  the  normal  algorithm, and is not compatible with         characteristics to the normal algorithm, and  is  not  compatible  with
3316         Perl. Some of the features of PCRE patterns are not  supported.  Never-         Perl.  Some  of the features of PCRE patterns are not supported. Never-
3317         theless,  there are times when this kind of matching can be useful. For         theless, there are times when this kind of matching can be useful.  For
3318         a discussion of the two matching algorithms, and  a  list  of  features         a  discussion  of  the  two matching algorithms, and a list of features
3319         that  pcre_dfa_exec() does not support, see the pcrematching documenta-         that pcre_dfa_exec() does not support, see the pcrematching  documenta-
3320         tion.         tion.
3321    
3322         The arguments for the pcre_dfa_exec() function  are  the  same  as  for         The  arguments  for  the  pcre_dfa_exec()  function are the same as for
3323         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
3324         ent way, and this is described below. The other  common  arguments  are         ent  way,  and  this is described below. The other common arguments are
3325         used  in  the  same way as for pcre_exec(), so their description is not         used in the same way as for pcre_exec(), so their  description  is  not
3326         repeated here.         repeated here.
3327    
3328         The two additional arguments provide workspace for  the  function.  The         The  two  additional  arguments provide workspace for the function. The
3329         workspace  vector  should  contain at least 20 elements. It is used for         workspace vector should contain at least 20 elements. It  is  used  for
3330         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
3331         workspace  will  be  needed for patterns and subjects where there are a         workspace will be needed for patterns and subjects where  there  are  a
3332         lot of potential matches.         lot of potential matches.
3333    
3334         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 3343  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3350  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3350    
3351     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
3352    
3353         The unused bits of the options argument  for  pcre_dfa_exec()  must  be         The  unused  bits  of  the options argument for pcre_dfa_exec() must be
3354         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
3355         LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,         LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
3356         PCRE_NOTEMPTY_ATSTART,       PCRE_NO_UTF8_CHECK,      PCRE_BSR_ANYCRLF,         PCRE_NOTEMPTY_ATSTART,      PCRE_NO_UTF8_CHECK,       PCRE_BSR_ANYCRLF,
3357         PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD,  PCRE_PAR-         PCRE_BSR_UNICODE,  PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD, PCRE_PAR-
3358         TIAL_SOFT,  PCRE_DFA_SHORTEST,  and PCRE_DFA_RESTART.  All but the last         TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART.  All but  the  last
3359         four of these are  exactly  the  same  as  for  pcre_exec(),  so  their         four  of  these  are  exactly  the  same  as  for pcre_exec(), so their
3360         description is not repeated here.         description is not repeated here.
3361    
3362           PCRE_PARTIAL_HARD           PCRE_PARTIAL_HARD
3363           PCRE_PARTIAL_SOFT           PCRE_PARTIAL_SOFT
3364    
3365         These  have the same general effect as they do for pcre_exec(), but the         These have the same general effect as they do for pcre_exec(), but  the
3366         details are slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for         details  are  slightly  different.  When  PCRE_PARTIAL_HARD  is set for
3367         pcre_dfa_exec(),  it  returns PCRE_ERROR_PARTIAL if the end of the sub-         pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of  the  sub-
3368         ject is reached and there is still at least  one  matching  possibility         ject  is  reached  and there is still at least one matching possibility
3369         that requires additional characters. This happens even if some complete         that requires additional characters. This happens even if some complete
3370         matches have also been found. When PCRE_PARTIAL_SOFT is set, the return         matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
3371         code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end         code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
3372         of the subject is reached, there have been  no  complete  matches,  but         of  the  subject  is  reached, there have been no complete matches, but
3373         there  is  still  at least one matching possibility. The portion of the         there is still at least one matching possibility. The  portion  of  the
3374         string that was inspected when the longest partial match was  found  is         string  that  was inspected when the longest partial match was found is
3375         set  as  the  first  matching  string  in  both cases.  There is a more         set as the first matching string  in  both  cases.   There  is  a  more
3376         detailed discussion of partial and multi-segment matching,  with  exam-         detailed  discussion  of partial and multi-segment matching, with exam-
3377         ples, in the pcrepartial documentation.         ples, in the pcrepartial documentation.
3378    
3379           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
3380    
3381         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
3382         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
3383         tive  algorithm  works, this is necessarily the shortest possible match         tive algorithm works, this is necessarily the shortest  possible  match
3384         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
3385    
3386           PCRE_DFA_RESTART           PCRE_DFA_RESTART
3387    
3388         When pcre_dfa_exec() returns a partial match, it is possible to call it         When pcre_dfa_exec() returns a partial match, it is possible to call it
3389         again,  with  additional  subject characters, and have it continue with         again, with additional subject characters, and have  it  continue  with
3390         the same match. The PCRE_DFA_RESTART option requests this action;  when         the  same match. The PCRE_DFA_RESTART option requests this action; when
3391         it  is  set,  the workspace and wscount options must reference the same         it is set, the workspace and wscount options must  reference  the  same
3392         vector as before because data about the match so far is  left  in  them         vector  as  before  because data about the match so far is left in them
3393         after a partial match. There is more discussion of this facility in the         after a partial match. There is more discussion of this facility in the
3394         pcrepartial documentation.         pcrepartial documentation.
3395    
3396     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
3397    
3398         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
3399         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
3400         of the function start at the same point in  the  subject.  The  shorter         of  the  function  start  at the same point in the subject. The shorter
3401         matches  are all initial substrings of the longer matches. For example,         matches are all initial substrings of the longer matches. For  example,
3402         if the pattern         if the pattern
3403    
3404           <.*>           <.*>
# Line 3406  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 3413  MATCHING A PATTERN: THE ALTERNATIVE FUNC
3413           <something> <something else>           <something> <something else>
3414           <something> <something else> <something further>           <something> <something else> <something further>
3415    
3416         On success, the yield of the function is a number  greater  than  zero,         On  success,  the  yield of the function is a number greater than zero,
3417         which  is  the  number of matched substrings. The substrings themselves         which is the number of matched substrings.  The  substrings  themselves
3418         are returned in ovector. Each string uses two elements;  the  first  is         are  returned  in  ovector. Each string uses two elements; the first is
3419         the  offset  to  the start, and the second is the offset to the end. In         the offset to the start, and the second is the offset to  the  end.  In
3420         fact, all the strings have the same start  offset.  (Space  could  have         fact,  all  the  strings  have the same start offset. (Space could have
3421         been  saved by giving this only once, but it was decided to retain some         been saved by giving this only once, but it was decided to retain  some
3422         compatibility with the way pcre_exec() returns data,  even  though  the         compatibility  with  the  way pcre_exec() returns data, even though the
3423         meaning of the strings is different.)         meaning of the strings is different.)
3424    
3425         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
3426         est matching string is given first. If there were too many  matches  to         est  matching  string is given first. If there were too many matches to
3427         fit  into ovector, the yield of the function is zero, and the vector is         fit into ovector, the yield of the function is zero, and the vector  is
3428         filled with the longest matches.  Unlike  pcre_exec(),  pcre_dfa_exec()         filled  with  the  longest matches. Unlike pcre_exec(), pcre_dfa_exec()
3429         can use the entire ovector for returning matched strings.         can use the entire ovector for returning matched strings.
3430    
3431     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
3432    
3433         The  pcre_dfa_exec()  function returns a negative number when it fails.         The pcre_dfa_exec() function returns a negative number when  it  fails.
3434         Many of the errors are the same  as  for  pcre_exec(),  and  these  are         Many  of  the  errors  are  the  same as for pcre_exec(), and these are
3435         described  above.   There are in addition the following errors that are         described above.  There are in addition the following errors  that  are
3436         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
3437    
3438           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
3439    
3440         This return is given if pcre_dfa_exec() encounters an item in the  pat-         This  return is given if pcre_dfa_exec() encounters an item in the pat-
3441         tern  that  it  does not support, for instance, the use of \C or a back         tern that it does not support, for instance, the use of \C  or  a  back
3442         reference.         reference.
3443    
3444           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
3445    
3446         This return is given if pcre_dfa_exec()  encounters  a  condition  item         This  return  is  given  if pcre_dfa_exec() encounters a condition item
3447         that  uses  a back reference for the condition, or a test for recursion         that uses a back reference for the condition, or a test  for  recursion
3448         in a specific group. These are not supported.         in a specific group. These are not supported.
3449    
3450           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
3451    
3452         This return is given if pcre_dfa_exec() is called with an  extra  block         This  return  is given if pcre_dfa_exec() is called with an extra block
3453         that  contains  a  setting  of the match_limit or match_limit_recursion         that contains a setting of  the  match_limit  or  match_limit_recursion
3454         fields. This is not supported (these fields  are  meaningless  for  DFA         fields.  This  is  not  supported (these fields are meaningless for DFA
3455         matching).         matching).
3456    
3457           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
3458    
3459         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
3460         workspace vector.         workspace vector.
3461    
3462           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
3463    
3464         When a recursive subpattern is processed, the matching  function  calls         When  a  recursive subpattern is processed, the matching function calls
3465         itself  recursively,  using  private vectors for ovector and workspace.         itself recursively, using private vectors for  ovector  and  workspace.
3466         This error is given if the output vector  is  not  large  enough.  This         This  error  is  given  if  the output vector is not large enough. This
3467         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
3468    
3469    
3470  SEE ALSO  SEE ALSO
3471    
3472         pcre16(3),   pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),  pcrematch-         pcre16(3),  pcrebuild(3),  pcrecallout(3),  pcrecpp(3)(3),   pcrematch-
3473         ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),         ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcresample(3),
3474         pcrestack(3).         pcrestack(3).
3475    
# Line 3476  AUTHOR Line 3483  AUTHOR
3483    
3484  REVISION  REVISION
3485    
3486         Last updated: 22 February 2012         Last updated: 24 February 2012
3487         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3488  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3489    
# Line 4373  BACKSLASH Line 4380  BACKSLASH
4380         Those that are not part of an identified script are lumped together  as         Those that are not part of an identified script are lumped together  as
4381         "Common". The current list of scripts is:         "Common". The current list of scripts is:
4382    
4383         Arabic, Armenian, Avestan, Balinese, Bamum, Bengali, Bopomofo, Braille,         Arabic,  Armenian,  Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
4384         Buginese, Buhid, Canadian_Aboriginal, Carian, Cham,  Cherokee,  Common,         Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Chakma,
4385         Coptic,   Cuneiform,  Cypriot,  Cyrillic,  Deseret,  Devanagari,  Egyp-         Cham,  Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
4386         tian_Hieroglyphs,  Ethiopic,  Georgian,  Glagolitic,   Gothic,   Greek,         Devanagari,  Egyptian_Hieroglyphs,  Ethiopic,   Georgian,   Glagolitic,
4387         Gujarati,  Gurmukhi,  Han,  Hangul,  Hanunoo,  Hebrew,  Hiragana, Impe-         Gothic,  Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
4388         rial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,         gana,  Imperial_Aramaic,  Inherited,  Inscriptional_Pahlavi,   Inscrip-
4389         Javanese,  Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao,         tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,
4390         Latin,  Lepcha,  Limbu,  Linear_B,  Lisu,  Lycian,  Lydian,  Malayalam,         Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B,  Lisu,  Lycian,
4391         Meetei_Mayek,  Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic,         Lydian,    Malayalam,    Mandaic,    Meetei_Mayek,    Meroitic_Cursive,
4392         Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki,  Oriya,  Osmanya,         Meroitic_Hieroglyphs,  Miao,  Mongolian,  Myanmar,  New_Tai_Lue,   Nko,
4393         Phags_Pa,  Phoenician,  Rejang,  Runic, Samaritan, Saurashtra, Shavian,         Ogham,    Old_Italic,   Old_Persian,   Old_South_Arabian,   Old_Turkic,
4394         Sinhala, Sundanese, Syloti_Nagri, Syriac,  Tagalog,  Tagbanwa,  Tai_Le,         Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic,  Samari-
4395         Tai_Tham,  Tai_Viet,  Tamil,  Telugu,  Thaana, Thai, Tibetan, Tifinagh,         tan,  Saurashtra,  Sharada,  Shavian, Sinhala, Sora_Sompeng, Sundanese,
4396         Ugaritic, Vai, Yi.         Syloti_Nagri, Syriac, Tagalog, Tagbanwa,  Tai_Le,  Tai_Tham,  Tai_Viet,
4397           Takri,  Tamil,  Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
4398           Yi.
4399    
4400         Each character has exactly one Unicode general category property, spec-         Each character has exactly one Unicode general category property, spec-
4401         ified  by a two-letter abbreviation. For compatibility with Perl, nega-         ified  by a two-letter abbreviation. For compatibility with Perl, nega-
# Line 6586  PCRE SPECIAL CATEGORY PROPERTIES FOR \p Line 6595  PCRE SPECIAL CATEGORY PROPERTIES FOR \p
6595    
6596  SCRIPT NAMES FOR \p AND \P  SCRIPT NAMES FOR \p AND \P
6597    
6598         Arabic, Armenian, Avestan, Balinese, Bamum, Bengali, Bopomofo, Braille,         Arabic,  Armenian,  Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
6599         Buginese, Buhid, Canadian_Aboriginal, Carian, Cham,  Cherokee,  Common,         Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian,  Chakma,
6600         Coptic,   Cuneiform,  Cypriot,  Cyrillic,  Deseret,  Devanagari,  Egyp-         Cham,  Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
6601         tian_Hieroglyphs,  Ethiopic,  Georgian,  Glagolitic,   Gothic,   Greek,         Devanagari,  Egyptian_Hieroglyphs,  Ethiopic,   Georgian,   Glagolitic,
6602         Gujarati,  Gurmukhi,  Han,  Hangul,  Hanunoo,  Hebrew,  Hiragana, Impe-         Gothic,  Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
6603         rial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian,         gana,  Imperial_Aramaic,  Inherited,  Inscriptional_Pahlavi,   Inscrip-
6604         Javanese,  Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao,         tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,
6605         Latin,  Lepcha,  Limbu,  Linear_B,  Lisu,  Lycian,  Lydian,  Malayalam,         Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B,  Lisu,  Lycian,
6606         Meetei_Mayek,  Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic,         Lydian,    Malayalam,    Mandaic,    Meetei_Mayek,    Meroitic_Cursive,
6607         Old_Persian, Old_South_Arabian, Old_Turkic, Ol_Chiki,  Oriya,  Osmanya,         Meroitic_Hieroglyphs,  Miao,  Mongolian,  Myanmar,  New_Tai_Lue,   Nko,
6608         Phags_Pa,  Phoenician,  Rejang,  Runic, Samaritan, Saurashtra, Shavian,         Ogham,    Old_Italic,   Old_Persian,   Old_South_Arabian,   Old_Turkic,
6609         Sinhala, Sundanese, Syloti_Nagri, Syriac,  Tagalog,  Tagbanwa,  Tai_Le,         Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic,  Samari-
6610         Tai_Tham,  Tai_Viet,  Tamil,  Telugu,  Thaana, Thai, Tibetan, Tifinagh,         tan,  Saurashtra,  Sharada,  Shavian, Sinhala, Sora_Sompeng, Sundanese,
6611         Ugaritic, Vai, Yi.         Syloti_Nagri, Syriac, Tagalog, Tagbanwa,  Tai_Le,  Tai_Tham,  Tai_Viet,
6612           Takri,  Tamil,  Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
6613           Yi.
6614    
6615    
6616  CHARACTER CLASSES  CHARACTER CLASSES
# Line 7736  MULTI-SEGMENT MATCHING WITH pcre_exec() Line 7747  MULTI-SEGMENT MATCHING WITH pcre_exec()
7747    
7748         At this stage, an application could discard the text preceding  "23ja",         At this stage, an application could discard the text preceding  "23ja",
7749         add  on  text  from  the  next  segment, and call the matching function         add  on  text  from  the  next  segment, and call the matching function
7750         again. Unlike the DFA matching functions  the  entire  matching  string         again. Unlike the DFA matching functions, the  entire  matching  string
7751         must  always be available, and the complete matching process occurs for         must  always be available, and the complete matching process occurs for
7752         each call, so more memory and more processing time is needed.         each call, so more memory and more processing time is needed.
7753    
# Line 7744  MULTI-SEGMENT MATCHING WITH pcre_exec() Line 7755  MULTI-SEGMENT MATCHING WITH pcre_exec()
7755         with \b or \B, the string that is returned for a partial match includes         with \b or \B, the string that is returned for a partial match includes
7756         characters that precede the partially matched  string  itself,  because         characters that precede the partially matched  string  itself,  because
7757         these  must be retained when adding on more characters for a subsequent         these  must be retained when adding on more characters for a subsequent
7758         matching attempt.         matching attempt.  However, in some cases you may need to  retain  even
7759           earlier characters, as discussed in the next section.
7760    
7761    
7762  ISSUES WITH MULTI-SEGMENT MATCHING  ISSUES WITH MULTI-SEGMENT MATCHING
# Line 7753  ISSUES WITH MULTI-SEGMENT MATCHING Line 7765  ISSUES WITH MULTI-SEGMENT MATCHING
7765         whichever matching function is used.         whichever matching function is used.
7766    
7767         1. If the pattern contains a test for the beginning of a line, you need         1. If the pattern contains a test for the beginning of a line, you need
7768         to pass the PCRE_NOTBOL option when the subject  string  for  any  call         to  pass  the  PCRE_NOTBOL  option when the subject string for any call
7769         does  start  at  the  beginning  of a line. There is also a PCRE_NOTEOL         does start at the beginning of a line.  There  is  also  a  PCRE_NOTEOL
7770         option, but in practice when doing multi-segment matching you should be         option, but in practice when doing multi-segment matching you should be
7771         using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL.         using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL.
7772    
7773         2.  Lookbehind  assertions at the start of a pattern are catered for in         2. Lookbehind assertions that have already been obeyed are catered  for
7774         the offsets that are returned for a partial match. However, in  theory,         in the offsets that are returned for a partial match. However a lookbe-
7775         a  lookbehind assertion later in the pattern could require even earlier         hind assertion later in the pattern could require even earlier  charac-
7776         characters to be inspected, and it might not have been reached  when  a         ters   to  be  inspected.  You  can  handle  this  case  by  using  the
7777         partial  match occurs. This is probably an extremely unlikely case; you         PCRE_INFO_MAXLOOKBEHIND    option    of    the    pcre_fullinfo()    or
7778         could guard against it to a certain extent by  always  including  extra         pcre16_fullinfo() functions to obtain the length of the largest lookbe-
7779         characters at the start.         hind in the pattern. This length is given in characters, not bytes.  If
7780           you  always  retain  at least that many characters before the partially
7781           matched string, all should be well. (Of course, near the start  of  the
7782           subject,  fewer  characters may be present; in that case all characters
7783           should be retained.)
7784    
7785           3. Because a partial match must always contain at least one  character,
7786           what  might  be  considered a partial match of an empty string actually
7787           gives a "no match" result. For example:
7788    
7789         3.  Matching  a subject string that is split into multiple segments may             re> /c(?<=abc)x/
7790             data> ab\P
7791             No match
7792    
7793           If the next segment begins "cx", a match should be found, but this will
7794           only  happen  if characters from the previous segment are retained. For
7795           this reason, a "no match" result  should  be  interpreted  as  "partial
7796           match of an empty string" when the pattern contains lookbehinds.
7797    
7798           4.  Matching  a subject string that is split into multiple segments may
7799         not always produce exactly the same result as matching over one  single         not always produce exactly the same result as matching over one  single
7800         long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section         long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section
7801         "Partial Matching and Word Boundaries" above describes  an  issue  that         "Partial Matching and Word Boundaries" above describes  an  issue  that
# Line 7810  ISSUES WITH MULTI-SEGMENT MATCHING Line 7839  ISSUES WITH MULTI-SEGMENT MATCHING
7839           data> gsb\R\P\P\D           data> gsb\R\P\P\D
7840           Partial match: gsb           Partial match: gsb
7841    
7842         4. Patterns that contain alternatives at the top level which do not all         5. Patterns that contain alternatives at the top level which do not all
7843         start  with  the  same  pattern  item  may  not  work  as expected when         start  with  the  same  pattern  item  may  not  work  as expected when
7844         PCRE_DFA_RESTART is used. For example, consider this pattern:         PCRE_DFA_RESTART is used. For example, consider this pattern:
7845    
# Line 7855  AUTHOR Line 7884  AUTHOR
7884    
7885  REVISION  REVISION
7886    
7887         Last updated: 18 February 2012         Last updated: 24 February 2012
7888         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7889  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7890    

Legend:
Removed from v.953  
changed lines
  Added in v.954

  ViewVC Help
Powered by ViewVC 1.1.5