/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1297 by ph10, Sun Nov 11 20:27:03 2012 UTC revision 1298 by ph10, Fri Mar 22 16:13:13 2013 UTC
# Line 8  pcretest commands. Line 8  pcretest commands.
8  -----------------------------------------------------------------------------  -----------------------------------------------------------------------------
9    
10    
11  PCRE(3)                                                                PCRE(3)  PCRE(3)                    Library Functions Manual                    PCRE(3)
12    
13    
14    
15  NAME  NAME
16         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
17    
   
18  INTRODUCTION  INTRODUCTION
19    
20         The  PCRE  library is a set of functions that implement regular expres-         The  PCRE  library is a set of functions that implement regular expres-
# Line 177  REVISION Line 177  REVISION
177         Last updated: 11 November 2012         Last updated: 11 November 2012
178         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
179  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
180    
181    
182    PCRE(3)                    Library Functions Manual                    PCRE(3)
183    
184    
 PCRE(3)                                                                PCRE(3)  
   
185    
186  NAME  NAME
187         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
# Line 506  REVISION Line 507  REVISION
507         Last updated: 08 November 2012         Last updated: 08 November 2012
508         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
509  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
510    
511    
512    PCRE(3)                    Library Functions Manual                    PCRE(3)
513    
514    
 PCRE(3)                                                                PCRE(3)  
   
515    
516  NAME  NAME
517         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
# Line 832  REVISION Line 834  REVISION
834         Last updated: 08 November 2012         Last updated: 08 November 2012
835         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
836  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
837    
838    
839    PCREBUILD(3)               Library Functions Manual               PCREBUILD(3)
840    
841    
 PCREBUILD(3)                                                      PCREBUILD(3)  
   
842    
843  NAME  NAME
844         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
845    
   
846  PCRE BUILD-TIME OPTIONS  PCRE BUILD-TIME OPTIONS
847    
848         This  document  describes  the  optional  features  of PCRE that can be         This  document  describes  the  optional  features  of PCRE that can be
# Line 1322  REVISION Line 1324  REVISION
1324         Last updated: 30 October 2012         Last updated: 30 October 2012
1325         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
1326  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
1327    
1328    
1329    PCREMATCHING(3)            Library Functions Manual            PCREMATCHING(3)
1330    
1331    
 PCREMATCHING(3)                                                PCREMATCHING(3)  
   
1332    
1333  NAME  NAME
1334         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
1335    
   
1336  PCRE MATCHING ALGORITHMS  PCRE MATCHING ALGORITHMS
1337    
1338         This document describes the two different algorithms that are available         This document describes the two different algorithms that are available
# Line 1531  REVISION Line 1533  REVISION
1533         Last updated: 08 January 2012         Last updated: 08 January 2012
1534         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
1535  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
1536    
1537    
1538    PCREAPI(3)                 Library Functions Manual                 PCREAPI(3)
1539    
1540    
 PCREAPI(3)                                                          PCREAPI(3)  
   
1541    
1542  NAME  NAME
1543         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
# Line 2707  INFORMATION ABOUT A PATTERN Line 2710  INFORMATION ABOUT A PATTERN
2710           PCRE_INFO_MAXLOOKBEHIND           PCRE_INFO_MAXLOOKBEHIND
2711    
2712         Return  the  number of characters (NB not bytes) in the longest lookbe-         Return  the  number of characters (NB not bytes) in the longest lookbe-
2713         hind assertion in the pattern. Note that the simple assertions  \b  and         hind assertion in the pattern. This information is  useful  when  doing
2714         \B  require a one-character lookbehind. This information is useful when         multi-segment matching using the partial matching facilities. Note that
2715         doing multi-segment matching using the partial matching facilities.         the simple assertions \b and \B require a one-character lookbehind.  \A
2716           also  registers a one-character lookbehind, though it does not actually
2717           inspect the previous character. This is to ensure  that  at  least  one
2718           character  from  the old segment is retained when a new segment is pro-
2719           cessed. Otherwise, if there are no lookbehinds in the pattern, \A might
2720           match incorrectly at the start of a new segment.
2721    
2722           PCRE_INFO_MINLENGTH           PCRE_INFO_MINLENGTH
2723    
2724         If the pattern was studied and a minimum length  for  matching  subject         If  the  pattern  was studied and a minimum length for matching subject
2725         strings  was  computed,  its  value is returned. Otherwise the returned         strings was computed, its value is  returned.  Otherwise  the  returned
2726         value is -1. The value is a number of characters, which in  UTF-8  mode         value  is  -1. The value is a number of characters, which in UTF-8 mode
2727         may  be  different from the number of bytes. The fourth argument should         may be different from the number of bytes. The fourth  argument  should
2728         point to an int variable. A non-negative value is a lower bound to  the         point  to an int variable. A non-negative value is a lower bound to the
2729         length  of  any  matching  string. There may not be any strings of that         length of any matching string. There may not be  any  strings  of  that
2730         length that do actually match, but every string that does match  is  at         length  that  do actually match, but every string that does match is at
2731         least that long.         least that long.
2732    
2733           PCRE_INFO_NAMECOUNT           PCRE_INFO_NAMECOUNT
2734           PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
2735           PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
2736    
2737         PCRE  supports the use of named as well as numbered capturing parenthe-         PCRE supports the use of named as well as numbered capturing  parenthe-
2738         ses. The names are just an additional way of identifying the  parenthe-         ses.  The names are just an additional way of identifying the parenthe-
2739         ses, which still acquire numbers. Several convenience functions such as         ses, which still acquire numbers. Several convenience functions such as
2740         pcre_get_named_substring() are provided for  extracting  captured  sub-         pcre_get_named_substring()  are  provided  for extracting captured sub-
2741         strings  by  name. It is also possible to extract the data directly, by         strings by name. It is also possible to extract the data  directly,  by
2742         first converting the name to a number in order to  access  the  correct         first  converting  the  name to a number in order to access the correct
2743         pointers in the output vector (described with pcre_exec() below). To do         pointers in the output vector (described with pcre_exec() below). To do
2744         the conversion, you need  to  use  the  name-to-number  map,  which  is         the  conversion,  you  need  to  use  the  name-to-number map, which is
2745         described by these three values.         described by these three values.
2746    
2747         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
2748         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
2749         of  each  entry;  both  of  these  return  an int value. The entry size         of each entry; both of these  return  an  int  value.  The  entry  size
2750         depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns         depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
2751         a pointer to the first entry of the table. This is a pointer to char in         a pointer to the first entry of the table. This is a pointer to char in
2752         the 8-bit library, where the first two bytes of each entry are the num-         the 8-bit library, where the first two bytes of each entry are the num-
2753         ber  of  the capturing parenthesis, most significant byte first. In the         ber of the capturing parenthesis, most significant byte first.  In  the
2754         16-bit library, the pointer points to 16-bit data units, the  first  of         16-bit  library,  the pointer points to 16-bit data units, the first of
2755         which  contains  the  parenthesis  number.   In the 32-bit library, the         which contains the parenthesis number.   In  the  32-bit  library,  the
2756         pointer points to 32-bit data units, the first of  which  contains  the         pointer  points  to  32-bit data units, the first of which contains the
2757         parenthesis  number.  The  rest of the entry is the corresponding name,         parenthesis number. The rest of the entry is  the  corresponding  name,
2758         zero terminated.         zero terminated.
2759    
2760         The names are in alphabetical order. Duplicate names may appear if  (?|         The  names are in alphabetical order. Duplicate names may appear if (?|
2761         is used to create multiple groups with the same number, as described in         is used to create multiple groups with the same number, as described in
2762         the section on duplicate subpattern numbers in  the  pcrepattern  page.         the  section  on  duplicate subpattern numbers in the pcrepattern page.
2763         Duplicate  names  for  subpatterns with different numbers are permitted         Duplicate names for subpatterns with different  numbers  are  permitted
2764         only if PCRE_DUPNAMES is set. In all cases  of  duplicate  names,  they         only  if  PCRE_DUPNAMES  is  set. In all cases of duplicate names, they
2765         appear  in  the table in the order in which they were found in the pat-         appear in the table in the order in which they were found in  the  pat-
2766         tern. In the absence of (?| this is the  order  of  increasing  number;         tern.  In  the  absence  of (?| this is the order of increasing number;
2767         when (?| is used this is not necessarily the case because later subpat-         when (?| is used this is not necessarily the case because later subpat-
2768         terns may have lower numbers.         terns may have lower numbers.
2769    
2770         As a simple example of the name/number table,  consider  the  following         As  a  simple  example of the name/number table, consider the following
2771         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is         pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is
2772         set, so white space - including newlines - is ignored):         set, so white space - including newlines - is ignored):
2773    
2774           (?<date> (?<year>(\d\d)?\d\d) -           (?<date> (?<year>(\d\d)?\d\d) -
2775           (?<month>\d\d) - (?<day>\d\d) )           (?<month>\d\d) - (?<day>\d\d) )
2776    
2777         There are four named subpatterns, so the table has  four  entries,  and         There  are  four  named subpatterns, so the table has four entries, and
2778         each  entry  in the table is eight bytes long. The table is as follows,         each entry in the table is eight bytes long. The table is  as  follows,
2779         with non-printing bytes shows in hexadecimal, and undefined bytes shown         with non-printing bytes shows in hexadecimal, and undefined bytes shown
2780         as ??:         as ??:
2781    
# Line 2776  INFORMATION ABOUT A PATTERN Line 2784  INFORMATION ABOUT A PATTERN
2784           00 04 m  o  n  t  h  00           00 04 m  o  n  t  h  00
2785           00 02 y  e  a  r  00 ??           00 02 y  e  a  r  00 ??
2786    
2787         When  writing  code  to  extract  data from named subpatterns using the         When writing code to extract data  from  named  subpatterns  using  the
2788         name-to-number map, remember that the length of the entries  is  likely         name-to-number  map,  remember that the length of the entries is likely
2789         to be different for each compiled pattern.         to be different for each compiled pattern.
2790    
2791           PCRE_INFO_OKPARTIAL           PCRE_INFO_OKPARTIAL
2792    
2793         Return  1  if  the  pattern  can  be  used  for  partial  matching with         Return 1  if  the  pattern  can  be  used  for  partial  matching  with
2794         pcre_exec(), otherwise 0. The fourth argument should point  to  an  int         pcre_exec(),  otherwise  0.  The fourth argument should point to an int
2795         variable.  From  release  8.00,  this  always  returns  1,  because the         variable. From  release  8.00,  this  always  returns  1,  because  the
2796         restrictions that previously applied  to  partial  matching  have  been         restrictions  that  previously  applied  to  partial matching have been
2797         lifted.  The  pcrepartial documentation gives details of partial match-         lifted. The pcrepartial documentation gives details of  partial  match-
2798         ing.         ing.
2799    
2800           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
2801    
2802         Return a copy of the options with which the pattern was  compiled.  The         Return  a  copy of the options with which the pattern was compiled. The
2803         fourth  argument  should  point to an unsigned long int variable. These         fourth argument should point to an unsigned long  int  variable.  These
2804         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
2805         by any top-level option settings at the start of the pattern itself. In         by any top-level option settings at the start of the pattern itself. In
2806         other words, they are the options that will be in force  when  matching         other  words,  they are the options that will be in force when matching
2807         starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with         starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
2808         the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,         the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
2809         and PCRE_EXTENDED.         and PCRE_EXTENDED.
2810    
2811         A  pattern  is  automatically  anchored by PCRE if all of its top-level         A pattern is automatically anchored by PCRE if  all  of  its  top-level
2812         alternatives begin with one of the following:         alternatives begin with one of the following:
2813    
2814           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 2814  INFORMATION ABOUT A PATTERN Line 2822  INFORMATION ABOUT A PATTERN
2822    
2823           PCRE_INFO_SIZE           PCRE_INFO_SIZE
2824    
2825         Return  the size of the compiled pattern in bytes (for both libraries).         Return the size of the compiled pattern in bytes (for both  libraries).
2826         The fourth argument should point to a size_t variable. This value  does         The  fourth argument should point to a size_t variable. This value does
2827         not  include  the  size  of  the  pcre  structure  that  is returned by         not include the  size  of  the  pcre  structure  that  is  returned  by
2828         pcre_compile(). The value that is passed as the argument  to  pcre_mal-         pcre_compile().  The  value that is passed as the argument to pcre_mal-
2829         loc()  when pcre_compile() is getting memory in which to place the com-         loc() when pcre_compile() is getting memory in which to place the  com-
2830         piled data is the value returned by this option plus the  size  of  the         piled  data  is  the value returned by this option plus the size of the
2831         pcre  structure. Studying a compiled pattern, with or without JIT, does         pcre structure. Studying a compiled pattern, with or without JIT,  does
2832         not alter the value returned by this option.         not alter the value returned by this option.
2833    
2834           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
2835    
2836         Return the size in bytes of the data block pointed to by the study_data         Return the size in bytes of the data block pointed to by the study_data
2837         field  in  a  pcre_extra  block.  If pcre_extra is NULL, or there is no         field in a pcre_extra block. If pcre_extra is  NULL,  or  there  is  no
2838         study data, zero is returned. The fourth argument  should  point  to  a         study  data,  zero  is  returned. The fourth argument should point to a
2839         size_t  variable. The study_data field is set by pcre_study() to record         size_t variable. The study_data field is set by pcre_study() to  record
2840         information that will speed  up  matching  (see  the  section  entitled         information  that  will  speed  up  matching  (see the section entitled
2841         "Studying a pattern" above). The format of the study_data block is pri-         "Studying a pattern" above). The format of the study_data block is pri-
2842         vate, but its length is made available via this option so that  it  can         vate,  but  its length is made available via this option so that it can
2843         be  saved  and  restored  (see  the  pcreprecompile  documentation  for         be  saved  and  restored  (see  the  pcreprecompile  documentation  for
2844         details).         details).
2845    
2846           PCRE_INFO_FIRSTCHARACTERFLAGS           PCRE_INFO_FIRSTCHARACTERFLAGS
2847    
2848         Return information about the first data unit of any matched string, for         Return information about the first data unit of any matched string, for
2849         a  non-anchored  pattern.  The  fourth  argument should point to an int         a non-anchored pattern. The fourth argument  should  point  to  an  int
2850         variable.         variable.
2851    
2852         If there is a fixed first value, for example, the  letter  "c"  from  a         If  there  is  a  fixed first value, for example, the letter "c" from a
2853         pattern  such  as  (cat|cow|coyote),  1  is returned, and the character         pattern such as (cat|cow|coyote), 1  is  returned,  and  the  character
2854         value can be retrieved using PCRE_INFO_FIRSTCHARACTER.         value can be retrieved using PCRE_INFO_FIRSTCHARACTER.
2855    
2856         If there is no fixed first value, and if either         If there is no fixed first value, and if either
2857    
2858         (a) the pattern was compiled with the PCRE_MULTILINE option, and  every         (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
2859         branch starts with "^", or         branch starts with "^", or
2860    
2861         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
# Line 2859  INFORMATION ABOUT A PATTERN Line 2867  INFORMATION ABOUT A PATTERN
2867    
2868           PCRE_INFO_FIRSTCHARACTER           PCRE_INFO_FIRSTCHARACTER
2869    
2870         Return the fixed first character  value,  if  PCRE_INFO_FIRSTCHARACTER-         Return  the  fixed  first character value, if PCRE_INFO_FIRSTCHARACTER-
2871         FLAGS returned 1; otherwise returns 0. The fourth argument should point         FLAGS returned 1; otherwise returns 0. The fourth argument should point
2872         to an uint_t variable.         to an uint_t variable.
2873    
2874         In the 8-bit library, the value is always less than 256. In the  16-bit         In  the 8-bit library, the value is always less than 256. In the 16-bit
2875         library  the value can be up to 0xffff. In the 32-bit library in UTF-32         library the value can be up to 0xffff. In the 32-bit library in  UTF-32
2876         mode the value can be up to 0x10ffff, and up  to  0xffffffff  when  not         mode  the  value  can  be up to 0x10ffff, and up to 0xffffffff when not
2877         using UTF-32 mode.         using UTF-32 mode.
2878    
2879         If there is no fixed first value, and if either         If there is no fixed first value, and if either
2880    
2881         (a)  the pattern was compiled with the PCRE_MULTILINE option, and every         (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
2882         branch starts with "^", or         branch starts with "^", or
2883    
2884         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
2885         set (if it were set, the pattern would be anchored),         set (if it were set, the pattern would be anchored),
2886    
2887         -1  is  returned, indicating that the pattern matches only at the start         -1 is returned, indicating that the pattern matches only at  the  start
2888         of a subject string or after any newline within the  string.  Otherwise         of  a  subject string or after any newline within the string. Otherwise
2889         -2 is returned. For anchored patterns, -2 is returned.         -2 is returned. For anchored patterns, -2 is returned.
2890    
2891           PCRE_INFO_REQUIREDCHARFLAGS           PCRE_INFO_REQUIREDCHARFLAGS
2892    
2893         Returns  1 if there is a rightmost literal data unit that must exist in         Returns 1 if there is a rightmost literal data unit that must exist  in
2894         any matched string, other than at its start. The fourth argument should         any matched string, other than at its start. The fourth argument should
2895         point  to an int variable. If there is no such value, 0 is returned. If         point to an int variable. If there is no such value, 0 is returned.  If
2896         returning  1,  the  character  value  itself  can  be  retrieved  using         returning  1,  the  character  value  itself  can  be  retrieved  using
2897         PCRE_INFO_REQUIREDCHAR.         PCRE_INFO_REQUIREDCHAR.
2898    
2899         For anchored patterns, a last literal value is recorded only if it fol-         For anchored patterns, a last literal value is recorded only if it fol-
2900         lows something  of  variable  length.  For  example,  for  the  pattern         lows  something  of  variable  length.  For  example,  for  the pattern
2901         /^a\d+z\d+/   the   returned   value   1   (with   "z"   returned  from         /^a\d+z\d+/  the   returned   value   1   (with   "z"   returned   from
2902         PCRE_INFO_REQUIREDCHAR), but for /^a\dz\d/ the returned value is 0.         PCRE_INFO_REQUIREDCHAR), but for /^a\dz\d/ the returned value is 0.
2903    
2904           PCRE_INFO_REQUIREDCHAR           PCRE_INFO_REQUIREDCHAR
2905    
2906         Return the value of the rightmost literal data unit that must exist  in         Return  the value of the rightmost literal data unit that must exist in
2907         any  matched  string, other than at its start, if such a value has been         any matched string, other than at its start, if such a value  has  been
2908         recorded. The fourth argument should point to an uint32_t variable.  If         recorded.  The fourth argument should point to an uint32_t variable. If
2909         there is no such value, 0 is returned.         there is no such value, 0 is returned.
2910    
2911    
# Line 2905  REFERENCE COUNTS Line 2913  REFERENCE COUNTS
2913    
2914         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
2915    
2916         The  pcre_refcount()  function is used to maintain a reference count in         The pcre_refcount() function is used to maintain a reference  count  in
2917         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
2918         benefit  of  applications  that  operate  in an object-oriented manner,         benefit of applications that  operate  in  an  object-oriented  manner,
2919         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
2920         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
2921    
2922         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
2923         zero.  It is changed only by calling this function, whose action is  to         zero.   It is changed only by calling this function, whose action is to
2924         add  the  adjust  value  (which may be positive or negative) to it. The         add the adjust value (which may be positive or  negative)  to  it.  The
2925         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
2926         is  constrained to lie between 0 and 65535, inclusive. If the new value         is constrained to lie between 0 and 65535, inclusive. If the new  value
2927         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
2928    
2929         Except when it is zero, the reference count is not correctly  preserved         Except  when it is zero, the reference count is not correctly preserved
2930         if  a  pattern  is  compiled on one host and then transferred to a host         if a pattern is compiled on one host and then  transferred  to  a  host
2931         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
2932    
2933    
# Line 2929  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2937  MATCHING A PATTERN: THE TRADITIONAL FUNC
2937              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
2938              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
2939    
2940         The function pcre_exec() is called to match a subject string against  a         The  function pcre_exec() is called to match a subject string against a
2941         compiled  pattern, which is passed in the code argument. If the pattern         compiled pattern, which is passed in the code argument. If the  pattern
2942         was studied, the result of the study should  be  passed  in  the  extra         was  studied,  the  result  of  the study should be passed in the extra
2943         argument.  You  can call pcre_exec() with the same code and extra argu-         argument. You can call pcre_exec() with the same code and  extra  argu-
2944         ments as many times as you like, in order to  match  different  subject         ments  as  many  times as you like, in order to match different subject
2945         strings with the same pattern.         strings with the same pattern.
2946    
2947         This  function  is  the  main  matching facility of the library, and it         This function is the main matching facility  of  the  library,  and  it
2948         operates in a Perl-like manner. For specialist use  there  is  also  an         operates  in  a  Perl-like  manner. For specialist use there is also an
2949         alternative  matching function, which is described below in the section         alternative matching function, which is described below in the  section
2950         about the pcre_dfa_exec() function.         about the pcre_dfa_exec() function.
2951    
2952         In most applications, the pattern will have been compiled (and  option-         In  most applications, the pattern will have been compiled (and option-
2953         ally  studied)  in the same process that calls pcre_exec(). However, it         ally studied) in the same process that calls pcre_exec().  However,  it
2954         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
2955         later  in  different processes, possibly even on different hosts. For a         later in different processes, possibly even on different hosts.  For  a
2956         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
2957    
2958         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 2963  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2971  MATCHING A PATTERN: THE TRADITIONAL FUNC
2971    
2972     Extra data for pcre_exec()     Extra data for pcre_exec()
2973    
2974         If the extra argument is not NULL, it must point to a  pcre_extra  data         If  the  extra argument is not NULL, it must point to a pcre_extra data
2975         block.  The pcre_study() function returns such a block (when it doesn't         block. The pcre_study() function returns such a block (when it  doesn't
2976         return NULL), but you can also create one for yourself, and pass  addi-         return  NULL), but you can also create one for yourself, and pass addi-
2977         tional  information  in it. The pcre_extra block contains the following         tional information in it. The pcre_extra block contains  the  following
2978         fields (not necessarily in this order):         fields (not necessarily in this order):
2979    
2980           unsigned long int flags;           unsigned long int flags;
# Line 2978  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2986  MATCHING A PATTERN: THE TRADITIONAL FUNC
2986           const unsigned char *tables;           const unsigned char *tables;
2987           unsigned char **mark;           unsigned char **mark;
2988    
2989         In the 16-bit version of  this  structure,  the  mark  field  has  type         In  the  16-bit  version  of  this  structure,  the mark field has type
2990         "PCRE_UCHAR16 **".         "PCRE_UCHAR16 **".
2991    
2992         In  the  32-bit  version  of  this  structure,  the mark field has type         In the 32-bit version of  this  structure,  the  mark  field  has  type
2993         "PCRE_UCHAR32 **".         "PCRE_UCHAR32 **".
2994    
2995         The flags field is used to specify which of the other fields  are  set.         The  flags  field is used to specify which of the other fields are set.
2996         The flag bits are:         The flag bits are:
2997    
2998           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_CALLOUT_DATA
# Line 2995  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3003  MATCHING A PATTERN: THE TRADITIONAL FUNC
3003           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_STUDY_DATA
3004           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
3005    
3006         Other  flag  bits should be set to zero. The study_data field and some-         Other flag bits should be set to zero. The study_data field  and  some-
3007         times the executable_jit field are set in the pcre_extra block that  is         times  the executable_jit field are set in the pcre_extra block that is
3008         returned  by pcre_study(), together with the appropriate flag bits. You         returned by pcre_study(), together with the appropriate flag bits.  You
3009         should not set these yourself, but you may add to the block by  setting         should  not set these yourself, but you may add to the block by setting
3010         other fields and their corresponding flag bits.         other fields and their corresponding flag bits.
3011    
3012         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
3013         a vast amount of resources when running patterns that are not going  to         a  vast amount of resources when running patterns that are not going to
3014         match,  but  which  have  a very large number of possibilities in their         match, but which have a very large number  of  possibilities  in  their
3015         search trees. The classic example is a pattern that uses nested  unlim-         search  trees. The classic example is a pattern that uses nested unlim-
3016         ited repeats.         ited repeats.
3017    
3018         Internally,  pcre_exec() uses a function called match(), which it calls         Internally, pcre_exec() uses a function called match(), which it  calls
3019         repeatedly (sometimes recursively). The limit  set  by  match_limit  is         repeatedly  (sometimes  recursively).  The  limit set by match_limit is
3020         imposed  on the number of times this function is called during a match,         imposed on the number of times this function is called during a  match,
3021         which has the effect of limiting the amount of  backtracking  that  can         which  has  the  effect of limiting the amount of backtracking that can
3022         take place. For patterns that are not anchored, the count restarts from         take place. For patterns that are not anchored, the count restarts from
3023         zero for each position in the subject string.         zero for each position in the subject string.
3024    
3025         When pcre_exec() is called with a pattern that was successfully studied         When pcre_exec() is called with a pattern that was successfully studied
3026         with  a  JIT  option, the way that the matching is executed is entirely         with a JIT option, the way that the matching is  executed  is  entirely
3027         different.  However, there is still the possibility of runaway matching         different.  However, there is still the possibility of runaway matching
3028         that goes on for a very long time, and so the match_limit value is also         that goes on for a very long time, and so the match_limit value is also
3029         used in this case (but in a different way) to limit how long the match-         used in this case (but in a different way) to limit how long the match-
3030         ing can continue.         ing can continue.
3031    
3032         The  default  value  for  the  limit can be set when PCRE is built; the         The default value for the limit can be set  when  PCRE  is  built;  the
3033         default default is 10 million, which handles all but the  most  extreme         default  default  is 10 million, which handles all but the most extreme
3034         cases.  You  can  override  the  default by suppling pcre_exec() with a         cases. You can override the default  by  suppling  pcre_exec()  with  a
3035         pcre_extra    block    in    which    match_limit    is    set,     and         pcre_extra     block    in    which    match_limit    is    set,    and
3036         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is
3037         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
3038    
3039         The match_limit_recursion field is similar to match_limit, but  instead         The  match_limit_recursion field is similar to match_limit, but instead
3040         of limiting the total number of times that match() is called, it limits         of limiting the total number of times that match() is called, it limits
3041         the depth of recursion. The recursion depth is a  smaller  number  than         the  depth  of  recursion. The recursion depth is a smaller number than
3042         the  total number of calls, because not all calls to match() are recur-         the total number of calls, because not all calls to match() are  recur-
3043         sive.  This limit is of use only if it is set smaller than match_limit.         sive.  This limit is of use only if it is set smaller than match_limit.
3044    
3045         Limiting the recursion depth limits the amount of  machine  stack  that         Limiting  the  recursion  depth limits the amount of machine stack that
3046         can  be used, or, when PCRE has been compiled to use memory on the heap         can be used, or, when PCRE has been compiled to use memory on the  heap
3047         instead of the stack, the amount of heap memory that can be used.  This         instead  of the stack, the amount of heap memory that can be used. This
3048         limit  is not relevant, and is ignored, when matching is done using JIT         limit is not relevant, and is ignored, when matching is done using  JIT
3049         compiled code.         compiled code.
3050    
3051         The default value for match_limit_recursion can be  set  when  PCRE  is         The  default  value  for  match_limit_recursion can be set when PCRE is
3052         built;  the  default  default  is  the  same  value  as the default for         built; the default default  is  the  same  value  as  the  default  for
3053         match_limit. You can override the default by suppling pcre_exec()  with         match_limit.  You can override the default by suppling pcre_exec() with
3054         a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and         a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
3055         PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the         PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
3056         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
3057    
3058         The  callout_data  field is used in conjunction with the "callout" fea-         The callout_data field is used in conjunction with the  "callout"  fea-
3059         ture, and is described in the pcrecallout documentation.         ture, and is described in the pcrecallout documentation.
3060    
3061         The tables field  is  used  to  pass  a  character  tables  pointer  to         The  tables  field  is  used  to  pass  a  character  tables pointer to
3062         pcre_exec();  this overrides the value that is stored with the compiled         pcre_exec(); this overrides the value that is stored with the  compiled
3063         pattern. A non-NULL value is stored with the compiled pattern  only  if         pattern.  A  non-NULL value is stored with the compiled pattern only if
3064         custom  tables  were  supplied to pcre_compile() via its tableptr argu-         custom tables were supplied to pcre_compile() via  its  tableptr  argu-
3065         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
3066         PCRE's  internal  tables  to be used. This facility is helpful when re-         PCRE's internal tables to be used. This facility is  helpful  when  re-
3067         using patterns that have been saved after compiling  with  an  external         using  patterns  that  have been saved after compiling with an external
3068         set  of  tables,  because  the  external tables might be at a different         set of tables, because the external tables  might  be  at  a  different
3069         address when pcre_exec() is called. See the  pcreprecompile  documenta-         address  when  pcre_exec() is called. See the pcreprecompile documenta-
3070         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
3071    
3072         If  PCRE_EXTRA_MARK  is  set in the flags field, the mark field must be         If PCRE_EXTRA_MARK is set in the flags field, the mark  field  must  be
3073         set to point to a suitable variable. If the pattern contains any  back-         set  to point to a suitable variable. If the pattern contains any back-
3074         tracking  control verbs such as (*MARK:NAME), and the execution ends up         tracking control verbs such as (*MARK:NAME), and the execution ends  up
3075         with a name to pass back, a pointer to the  name  string  (zero  termi-         with  a  name  to  pass back, a pointer to the name string (zero termi-
3076         nated)  is  placed  in  the  variable pointed to by the mark field. The         nated) is placed in the variable pointed to  by  the  mark  field.  The
3077         names are within the compiled pattern; if you wish  to  retain  such  a         names  are  within  the  compiled pattern; if you wish to retain such a
3078         name  you must copy it before freeing the memory of a compiled pattern.         name you must copy it before freeing the memory of a compiled  pattern.
3079         If there is no name to pass back, the variable pointed to by  the  mark         If  there  is no name to pass back, the variable pointed to by the mark
3080         field  is  set  to NULL. For details of the backtracking control verbs,         field is set to NULL. For details of the  backtracking  control  verbs,
3081         see the section entitled "Backtracking control" in the pcrepattern doc-         see the section entitled "Backtracking control" in the pcrepattern doc-
3082         umentation.         umentation.
3083    
3084     Option bits for pcre_exec()     Option bits for pcre_exec()
3085    
3086         The  unused  bits of the options argument for pcre_exec() must be zero.         The unused bits of the options argument for pcre_exec() must  be  zero.
3087         The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,         The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
3088         PCRE_NOTBOL,    PCRE_NOTEOL,    PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,         PCRE_NOTBOL,   PCRE_NOTEOL,    PCRE_NOTEMPTY,    PCRE_NOTEMPTY_ATSTART,
3089         PCRE_NO_START_OPTIMIZE,  PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_HARD,   and         PCRE_NO_START_OPTIMIZE,   PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_HARD,  and
3090         PCRE_PARTIAL_SOFT.         PCRE_PARTIAL_SOFT.
3091    
3092         If  the  pattern  was successfully studied with one of the just-in-time         If the pattern was successfully studied with one  of  the  just-in-time
3093         (JIT) compile options, the only supported options for JIT execution are         (JIT) compile options, the only supported options for JIT execution are
3094         PCRE_NO_UTF8_CHECK,     PCRE_NOTBOL,     PCRE_NOTEOL,    PCRE_NOTEMPTY,         PCRE_NO_UTF8_CHECK,    PCRE_NOTBOL,     PCRE_NOTEOL,     PCRE_NOTEMPTY,
3095         PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If  an         PCRE_NOTEMPTY_ATSTART,  PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an
3096         unsupported  option  is  used, JIT execution is disabled and the normal         unsupported option is used, JIT execution is disabled  and  the  normal
3097         interpretive code in pcre_exec() is run.         interpretive code in pcre_exec() is run.
3098    
3099           PCRE_ANCHORED           PCRE_ANCHORED
3100    
3101         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
3102         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
3103         turned out to be anchored by virtue of its contents, it cannot be  made         turned  out to be anchored by virtue of its contents, it cannot be made
3104         unachored at matching time.         unachored at matching time.
3105    
3106           PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
3107           PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
3108    
3109         These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
3110         sequence matches. The choice is either to match only CR, LF,  or  CRLF,         sequence  matches.  The choice is either to match only CR, LF, or CRLF,
3111         or  to  match  any Unicode newline sequence. These options override the         or to match any Unicode newline sequence. These  options  override  the
3112         choice that was made or defaulted when the pattern was compiled.         choice that was made or defaulted when the pattern was compiled.
3113    
3114           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
# Line 3109  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3117  MATCHING A PATTERN: THE TRADITIONAL FUNC
3117           PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
3118           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
3119    
3120         These options override  the  newline  definition  that  was  chosen  or         These  options  override  the  newline  definition  that  was chosen or
3121         defaulted  when the pattern was compiled. For details, see the descrip-         defaulted when the pattern was compiled. For details, see the  descrip-
3122         tion of pcre_compile()  above.  During  matching,  the  newline  choice         tion  of  pcre_compile()  above.  During  matching,  the newline choice
3123         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-         affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
3124         ters. It may also alter the way the match position is advanced after  a         ters.  It may also alter the way the match position is advanced after a
3125         match failure for an unanchored pattern.         match failure for an unanchored pattern.
3126    
3127         When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is         When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is
3128         set, and a match attempt for an unanchored pattern fails when the  cur-         set,  and a match attempt for an unanchored pattern fails when the cur-
3129         rent  position  is  at  a  CRLF  sequence,  and the pattern contains no         rent position is at a  CRLF  sequence,  and  the  pattern  contains  no
3130         explicit matches for  CR  or  LF  characters,  the  match  position  is         explicit  matches  for  CR  or  LF  characters,  the  match position is
3131         advanced by two characters instead of one, in other words, to after the         advanced by two characters instead of one, in other words, to after the
3132         CRLF.         CRLF.
3133    
3134         The above rule is a compromise that makes the most common cases work as         The above rule is a compromise that makes the most common cases work as
3135         expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL         expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL
3136         option is not set), it does not match the string "\r\nA" because, after         option is not set), it does not match the string "\r\nA" because, after
3137         failing  at the start, it skips both the CR and the LF before retrying.         failing at the start, it skips both the CR and the LF before  retrying.
3138         However, the pattern [\r\n]A does match that string,  because  it  con-         However,  the  pattern  [\r\n]A does match that string, because it con-
3139         tains an explicit CR or LF reference, and so advances only by one char-         tains an explicit CR or LF reference, and so advances only by one char-
3140         acter after the first failure.         acter after the first failure.
3141    
3142         An explicit match for CR of LF is either a literal appearance of one of         An explicit match for CR of LF is either a literal appearance of one of
3143         those  characters,  or  one  of the \r or \n escape sequences. Implicit         those characters, or one of the \r or  \n  escape  sequences.  Implicit
3144         matches such as [^X] do not count, nor does \s (which includes  CR  and         matches  such  as [^X] do not count, nor does \s (which includes CR and
3145         LF in the characters that it matches).         LF in the characters that it matches).
3146    
3147         Notwithstanding  the above, anomalous effects may still occur when CRLF         Notwithstanding the above, anomalous effects may still occur when  CRLF
3148         is a valid newline sequence and explicit \r or \n escapes appear in the         is a valid newline sequence and explicit \r or \n escapes appear in the
3149         pattern.         pattern.
3150    
3151           PCRE_NOTBOL           PCRE_NOTBOL
3152    
3153         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
3154         the beginning of a line, so the  circumflex  metacharacter  should  not         the  beginning  of  a  line, so the circumflex metacharacter should not
3155         match  before it. Setting this without PCRE_MULTILINE (at compile time)         match before it. Setting this without PCRE_MULTILINE (at compile  time)
3156         causes circumflex never to match. This option affects only  the  behav-         causes  circumflex  never to match. This option affects only the behav-
3157         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
3158    
3159           PCRE_NOTEOL           PCRE_NOTEOL
3160    
3161         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
3162         of a line, so the dollar metacharacter should not match it nor  (except         of  a line, so the dollar metacharacter should not match it nor (except
3163         in  multiline mode) a newline immediately before it. Setting this with-         in multiline mode) a newline immediately before it. Setting this  with-
3164         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
3165         option  affects only the behaviour of the dollar metacharacter. It does         option affects only the behaviour of the dollar metacharacter. It  does
3166         not affect \Z or \z.         not affect \Z or \z.
3167    
3168           PCRE_NOTEMPTY           PCRE_NOTEMPTY
3169    
3170         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
3171         set.  If  there are alternatives in the pattern, they are tried. If all         set. If there are alternatives in the pattern, they are tried.  If  all
3172         the alternatives match the empty string, the entire  match  fails.  For         the  alternatives  match  the empty string, the entire match fails. For
3173         example, if the pattern         example, if the pattern
3174    
3175           a?b?           a?b?
3176    
3177         is  applied  to  a  string not beginning with "a" or "b", it matches an         is applied to a string not beginning with "a" or  "b",  it  matches  an
3178         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
3179         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
3180         rences of "a" or "b".         rences of "a" or "b".
3181    
3182           PCRE_NOTEMPTY_ATSTART           PCRE_NOTEMPTY_ATSTART
3183    
3184         This is like PCRE_NOTEMPTY, except that an empty string match  that  is         This  is  like PCRE_NOTEMPTY, except that an empty string match that is
3185         not  at  the  start  of  the  subject  is  permitted. If the pattern is         not at the start of  the  subject  is  permitted.  If  the  pattern  is
3186         anchored, such a match can occur only if the pattern contains \K.         anchored, such a match can occur only if the pattern contains \K.
3187    
3188         Perl    has    no    direct    equivalent    of    PCRE_NOTEMPTY     or         Perl     has    no    direct    equivalent    of    PCRE_NOTEMPTY    or
3189         PCRE_NOTEMPTY_ATSTART,  but  it  does  make a special case of a pattern         PCRE_NOTEMPTY_ATSTART, but it does make a special  case  of  a  pattern
3190         match of the empty string within its split() function, and  when  using         match  of  the empty string within its split() function, and when using
3191         the  /g  modifier.  It  is  possible  to emulate Perl's behaviour after         the /g modifier. It is  possible  to  emulate  Perl's  behaviour  after
3192         matching a null string by first trying the match again at the same off-         matching a null string by first trying the match again at the same off-
3193         set  with  PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED,  and then if that         set with PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED,  and  then  if  that
3194         fails, by advancing the starting offset (see below) and trying an ordi-         fails, by advancing the starting offset (see below) and trying an ordi-
3195         nary  match  again. There is some code that demonstrates how to do this         nary match again. There is some code that demonstrates how to  do  this
3196         in the pcredemo sample program. In the most general case, you  have  to         in  the  pcredemo sample program. In the most general case, you have to
3197         check  to  see  if the newline convention recognizes CRLF as a newline,         check to see if the newline convention recognizes CRLF  as  a  newline,
3198         and if so, and the current character is CR followed by LF, advance  the         and  if so, and the current character is CR followed by LF, advance the
3199         starting offset by two characters instead of one.         starting offset by two characters instead of one.
3200    
3201           PCRE_NO_START_OPTIMIZE           PCRE_NO_START_OPTIMIZE
3202    
3203         There  are a number of optimizations that pcre_exec() uses at the start         There are a number of optimizations that pcre_exec() uses at the  start
3204         of a match, in order to speed up the process. For  example,  if  it  is         of  a  match,  in  order to speed up the process. For example, if it is
3205         known that an unanchored match must start with a specific character, it         known that an unanchored match must start with a specific character, it
3206         searches the subject for that character, and fails  immediately  if  it         searches  the  subject  for that character, and fails immediately if it
3207         cannot  find  it,  without actually running the main matching function.         cannot find it, without actually running the  main  matching  function.
3208         This means that a special item such as (*COMMIT) at the start of a pat-         This means that a special item such as (*COMMIT) at the start of a pat-
3209         tern  is  not  considered until after a suitable starting point for the         tern is not considered until after a suitable starting  point  for  the
3210         match has been found. When callouts or (*MARK) items are in use,  these         match  has been found. When callouts or (*MARK) items are in use, these
3211         "start-up" optimizations can cause them to be skipped if the pattern is         "start-up" optimizations can cause them to be skipped if the pattern is
3212         never actually used. The start-up optimizations are in  effect  a  pre-         never  actually  used.  The start-up optimizations are in effect a pre-
3213         scan of the subject that takes place before the pattern is run.         scan of the subject that takes place before the pattern is run.
3214    
3215         The  PCRE_NO_START_OPTIMIZE option disables the start-up optimizations,         The PCRE_NO_START_OPTIMIZE option disables the start-up  optimizations,
3216         possibly causing performance to suffer,  but  ensuring  that  in  cases         possibly  causing  performance  to  suffer,  but ensuring that in cases
3217         where  the  result is "no match", the callouts do occur, and that items         where the result is "no match", the callouts do occur, and  that  items
3218         such as (*COMMIT) and (*MARK) are considered at every possible starting         such as (*COMMIT) and (*MARK) are considered at every possible starting
3219         position  in  the  subject  string. If PCRE_NO_START_OPTIMIZE is set at         position in the subject string. If  PCRE_NO_START_OPTIMIZE  is  set  at
3220         compile time,  it  cannot  be  unset  at  matching  time.  The  use  of         compile  time,  it  cannot  be  unset  at  matching  time.  The  use of
3221         PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, matching         PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set, matching
3222         is always done using interpretively.         is always done using interpretively.
3223    
3224         Setting PCRE_NO_START_OPTIMIZE can change the  outcome  of  a  matching         Setting  PCRE_NO_START_OPTIMIZE  can  change  the outcome of a matching
3225         operation.  Consider the pattern         operation.  Consider the pattern
3226    
3227           (*COMMIT)ABC           (*COMMIT)ABC
3228    
3229         When  this  is  compiled, PCRE records the fact that a match must start         When this is compiled, PCRE records the fact that a  match  must  start
3230         with the character "A". Suppose the subject  string  is  "DEFABC".  The         with  the  character  "A".  Suppose the subject string is "DEFABC". The
3231         start-up  optimization  scans along the subject, finds "A" and runs the         start-up optimization scans along the subject, finds "A" and  runs  the
3232         first match attempt from there. The (*COMMIT) item means that the  pat-         first  match attempt from there. The (*COMMIT) item means that the pat-
3233         tern  must  match the current starting position, which in this case, it         tern must match the current starting position, which in this  case,  it
3234         does. However, if the same match  is  run  with  PCRE_NO_START_OPTIMIZE         does.  However,  if  the  same match is run with PCRE_NO_START_OPTIMIZE
3235         set,  the  initial  scan  along the subject string does not happen. The         set, the initial scan along the subject string  does  not  happen.  The
3236         first match attempt is run starting  from  "D"  and  when  this  fails,         first  match  attempt  is  run  starting  from "D" and when this fails,
3237         (*COMMIT)  prevents  any  further  matches  being tried, so the overall         (*COMMIT) prevents any further matches  being  tried,  so  the  overall
3238         result is "no match". If the pattern is studied,  more  start-up  opti-         result  is  "no  match". If the pattern is studied, more start-up opti-
3239         mizations  may  be  used. For example, a minimum length for the subject         mizations may be used. For example, a minimum length  for  the  subject
3240         may be recorded. Consider the pattern         may be recorded. Consider the pattern
3241    
3242           (*MARK:A)(X|Y)           (*MARK:A)(X|Y)
3243    
3244         The minimum length for a match is one  character.  If  the  subject  is         The  minimum  length  for  a  match is one character. If the subject is
3245         "ABC",  there  will  be  attempts  to  match "ABC", "BC", "C", and then         "ABC", there will be attempts to  match  "ABC",  "BC",  "C",  and  then
3246         finally an empty string.  If the pattern is studied, the final  attempt         finally  an empty string.  If the pattern is studied, the final attempt
3247         does  not take place, because PCRE knows that the subject is too short,         does not take place, because PCRE knows that the subject is too  short,
3248         and so the (*MARK) is never encountered.  In this  case,  studying  the         and  so  the  (*MARK) is never encountered.  In this case, studying the
3249         pattern  does  not  affect the overall match result, which is still "no         pattern does not affect the overall match result, which  is  still  "no
3250         match", but it does affect the auxiliary information that is returned.         match", but it does affect the auxiliary information that is returned.
3251    
3252           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
3253    
3254         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
3255         UTF-8  string is automatically checked when pcre_exec() is subsequently         UTF-8 string is automatically checked when pcre_exec() is  subsequently
3256         called.  The entire string is checked before any other processing takes         called.  The entire string is checked before any other processing takes
3257         place.  The  value  of  startoffset  is  also checked to ensure that it         place. The value of startoffset is  also  checked  to  ensure  that  it
3258         points to the start of a UTF-8 character. There is a  discussion  about         points  to  the start of a UTF-8 character. There is a discussion about
3259         the  validity  of  UTF-8 strings in the pcreunicode page. If an invalid         the validity of UTF-8 strings in the pcreunicode page.  If  an  invalid
3260         sequence  of  bytes   is   found,   pcre_exec()   returns   the   error         sequence   of   bytes   is   found,   pcre_exec()   returns  the  error
3261         PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a         PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a
3262         truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In         truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In
3263         both  cases, information about the precise nature of the error may also         both cases, information about the precise nature of the error may  also
3264         be returned (see the descriptions of these errors in the section  enti-         be  returned (see the descriptions of these errors in the section enti-
3265         tled  Error return values from pcre_exec() below).  If startoffset con-         tled Error return values from pcre_exec() below).  If startoffset  con-
3266         tains a value that does not point to the start of a UTF-8 character (or         tains a value that does not point to the start of a UTF-8 character (or
3267         to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.         to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned.
3268    
3269         If  you  already  know that your subject is valid, and you want to skip         If you already know that your subject is valid, and you  want  to  skip
3270         these   checks   for   performance   reasons,   you   can    set    the         these    checks    for   performance   reasons,   you   can   set   the
3271         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to
3272         do this for the second and subsequent calls to pcre_exec() if  you  are         do  this  for the second and subsequent calls to pcre_exec() if you are
3273         making  repeated  calls  to  find  all  the matches in a single subject         making repeated calls to find all  the  matches  in  a  single  subject
3274         string. However, you should be  sure  that  the  value  of  startoffset         string.  However,  you  should  be  sure  that the value of startoffset
3275         points  to  the  start of a character (or the end of the subject). When         points to the start of a character (or the end of  the  subject).  When
3276         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a         PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a
3277         subject  or  an invalid value of startoffset is undefined. Your program         subject or an invalid value of startoffset is undefined.  Your  program
3278         may crash.         may crash.
3279    
3280           PCRE_PARTIAL_HARD           PCRE_PARTIAL_HARD
3281           PCRE_PARTIAL_SOFT           PCRE_PARTIAL_SOFT
3282    
3283         These options turn on the partial matching feature. For backwards  com-         These  options turn on the partial matching feature. For backwards com-
3284         patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial         patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A  partial
3285         match occurs if the end of the subject string is reached  successfully,         match  occurs if the end of the subject string is reached successfully,
3286         but  there  are not enough subject characters to complete the match. If         but there are not enough subject characters to complete the  match.  If
3287         this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,         this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set,
3288         matching  continues  by  testing any remaining alternatives. Only if no         matching continues by testing any remaining alternatives.  Only  if  no
3289         complete match can be found is PCRE_ERROR_PARTIAL returned  instead  of         complete  match  can be found is PCRE_ERROR_PARTIAL returned instead of
3290         PCRE_ERROR_NOMATCH.  In  other  words,  PCRE_PARTIAL_SOFT says that the         PCRE_ERROR_NOMATCH. In other words,  PCRE_PARTIAL_SOFT  says  that  the
3291         caller is prepared to handle a partial match, but only if  no  complete         caller  is  prepared to handle a partial match, but only if no complete
3292         match can be found.         match can be found.
3293    
3294         If  PCRE_PARTIAL_HARD  is  set, it overrides PCRE_PARTIAL_SOFT. In this         If PCRE_PARTIAL_HARD is set, it overrides  PCRE_PARTIAL_SOFT.  In  this
3295         case, if a partial match  is  found,  pcre_exec()  immediately  returns         case,  if  a  partial  match  is found, pcre_exec() immediately returns
3296         PCRE_ERROR_PARTIAL,  without  considering  any  other  alternatives. In         PCRE_ERROR_PARTIAL, without  considering  any  other  alternatives.  In
3297         other words, when PCRE_PARTIAL_HARD is set, a partial match is  consid-         other  words, when PCRE_PARTIAL_HARD is set, a partial match is consid-
3298         ered to be more important that an alternative complete match.         ered to be more important that an alternative complete match.
3299    
3300         In  both  cases,  the portion of the string that was inspected when the         In both cases, the portion of the string that was  inspected  when  the
3301         partial match was found is set as the first matching string. There is a         partial match was found is set as the first matching string. There is a
3302         more  detailed  discussion  of partial and multi-segment matching, with         more detailed discussion of partial and  multi-segment  matching,  with
3303         examples, in the pcrepartial documentation.         examples, in the pcrepartial documentation.
3304    
3305     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
3306    
3307         The subject string is passed to pcre_exec() as a pointer in subject,  a         The  subject string is passed to pcre_exec() as a pointer in subject, a
3308         length  in  bytes in length, and a starting byte offset in startoffset.         length in bytes in length, and a starting byte offset  in  startoffset.
3309         If this is  negative  or  greater  than  the  length  of  the  subject,         If  this  is  negative  or  greater  than  the  length  of the subject,
3310         pcre_exec()  returns  PCRE_ERROR_BADOFFSET. When the starting offset is         pcre_exec() returns PCRE_ERROR_BADOFFSET. When the starting  offset  is
3311         zero, the search for a match starts at the beginning  of  the  subject,         zero,  the  search  for a match starts at the beginning of the subject,
3312         and this is by far the most common case. In UTF-8 mode, the byte offset         and this is by far the most common case. In UTF-8 mode, the byte offset
3313         must point to the start of a UTF-8 character (or the end  of  the  sub-         must  point  to  the start of a UTF-8 character (or the end of the sub-
3314         ject).  Unlike  the pattern string, the subject may contain binary zero         ject). Unlike the pattern string, the subject may contain  binary  zero
3315         bytes.         bytes.
3316    
3317         A non-zero starting offset is useful when searching for  another  match         A  non-zero  starting offset is useful when searching for another match
3318         in  the same subject by calling pcre_exec() again after a previous suc-         in the same subject by calling pcre_exec() again after a previous  suc-
3319         cess.  Setting startoffset differs from just passing over  a  shortened         cess.   Setting  startoffset differs from just passing over a shortened
3320         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
3321         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
3322    
3323           \Biss\B           \Biss\B
3324    
3325         which finds occurrences of "iss" in the middle of  words.  (\B  matches         which  finds  occurrences  of "iss" in the middle of words. (\B matches
3326         only  if  the  current position in the subject is not a word boundary.)         only if the current position in the subject is not  a  word  boundary.)
3327         When applied to the string "Mississipi" the first call  to  pcre_exec()         When  applied  to the string "Mississipi" the first call to pcre_exec()
3328         finds  the  first  occurrence. If pcre_exec() is called again with just         finds the first occurrence. If pcre_exec() is called  again  with  just
3329         the remainder of the subject,  namely  "issipi",  it  does  not  match,         the  remainder  of  the  subject,  namely  "issipi", it does not match,
3330         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
3331         to be a word boundary. However, if pcre_exec()  is  passed  the  entire         to  be  a  word  boundary. However, if pcre_exec() is passed the entire
3332         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
3333         rence of "iss" because it is able to look behind the starting point  to         rence  of "iss" because it is able to look behind the starting point to
3334         discover that it is preceded by a letter.         discover that it is preceded by a letter.
3335    
3336         Finding  all  the  matches  in a subject is tricky when the pattern can         Finding all the matches in a subject is tricky  when  the  pattern  can
3337         match an empty string. It is possible to emulate Perl's /g behaviour by         match an empty string. It is possible to emulate Perl's /g behaviour by
3338         first   trying   the   match   again  at  the  same  offset,  with  the         first  trying  the  match  again  at  the   same   offset,   with   the
3339         PCRE_NOTEMPTY_ATSTART and  PCRE_ANCHORED  options,  and  then  if  that         PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED  options,  and  then  if that
3340         fails,  advancing  the  starting  offset  and  trying an ordinary match         fails, advancing the starting  offset  and  trying  an  ordinary  match
3341         again. There is some code that demonstrates how to do this in the pcre-         again. There is some code that demonstrates how to do this in the pcre-
3342         demo sample program. In the most general case, you have to check to see         demo sample program. In the most general case, you have to check to see
3343         if the newline convention recognizes CRLF as a newline, and if so,  and         if  the newline convention recognizes CRLF as a newline, and if so, and
3344         the current character is CR followed by LF, advance the starting offset         the current character is CR followed by LF, advance the starting offset
3345         by two characters instead of one.         by two characters instead of one.
3346    
3347         If a non-zero starting offset is passed when the pattern  is  anchored,         If  a  non-zero starting offset is passed when the pattern is anchored,
3348         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
3349         if the pattern does not require the match to be at  the  start  of  the         if  the  pattern  does  not require the match to be at the start of the
3350         subject.         subject.
3351    
3352     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
3353    
3354         In  general, a pattern matches a certain portion of the subject, and in         In general, a pattern matches a certain portion of the subject, and  in
3355         addition, further substrings from the subject  may  be  picked  out  by         addition,  further  substrings  from  the  subject may be picked out by
3356         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
3357         this is called "capturing" in what follows, and the  phrase  "capturing         this  is  called "capturing" in what follows, and the phrase "capturing
3358         subpattern"  is  used for a fragment of a pattern that picks out a sub-         subpattern" is used for a fragment of a pattern that picks out  a  sub-
3359         string. PCRE supports several other kinds of  parenthesized  subpattern         string.  PCRE  supports several other kinds of parenthesized subpattern
3360         that do not cause substrings to be captured.         that do not cause substrings to be captured.
3361    
3362         Captured substrings are returned to the caller via a vector of integers         Captured substrings are returned to the caller via a vector of integers
3363         whose address is passed in ovector. The number of elements in the  vec-         whose  address is passed in ovector. The number of elements in the vec-
3364         tor  is  passed in ovecsize, which must be a non-negative number. Note:         tor is passed in ovecsize, which must be a non-negative  number.  Note:
3365         this argument is NOT the size of ovector in bytes.         this argument is NOT the size of ovector in bytes.
3366    
3367         The first two-thirds of the vector is used to pass back  captured  sub-         The  first  two-thirds of the vector is used to pass back captured sub-
3368         strings,  each  substring using a pair of integers. The remaining third         strings, each substring using a pair of integers. The  remaining  third
3369         of the vector is used as workspace by pcre_exec() while  matching  cap-         of  the  vector is used as workspace by pcre_exec() while matching cap-
3370         turing  subpatterns, and is not available for passing back information.         turing subpatterns, and is not available for passing back  information.
3371         The number passed in ovecsize should always be a multiple of three.  If         The  number passed in ovecsize should always be a multiple of three. If
3372         it is not, it is rounded down.         it is not, it is rounded down.
3373    
3374         When  a  match  is successful, information about captured substrings is         When a match is successful, information about  captured  substrings  is
3375         returned in pairs of integers, starting at the  beginning  of  ovector,         returned  in  pairs  of integers, starting at the beginning of ovector,
3376         and  continuing  up  to two-thirds of its length at the most. The first         and continuing up to two-thirds of its length at the  most.  The  first
3377         element of each pair is set to the byte offset of the  first  character         element  of  each pair is set to the byte offset of the first character
3378         in  a  substring, and the second is set to the byte offset of the first         in a substring, and the second is set to the byte offset of  the  first
3379         character after the end of a substring. Note: these values  are  always         character  after  the end of a substring. Note: these values are always
3380         byte offsets, even in UTF-8 mode. They are not character counts.         byte offsets, even in UTF-8 mode. They are not character counts.
3381    
3382         The  first  pair  of  integers, ovector[0] and ovector[1], identify the         The first pair of integers, ovector[0]  and  ovector[1],  identify  the
3383         portion of the subject string matched by the entire pattern.  The  next         portion  of  the subject string matched by the entire pattern. The next
3384         pair  is  used for the first capturing subpattern, and so on. The value         pair is used for the first capturing subpattern, and so on.  The  value
3385         returned by pcre_exec() is one more than the highest numbered pair that         returned by pcre_exec() is one more than the highest numbered pair that
3386         has  been  set.  For example, if two substrings have been captured, the         has been set.  For example, if two substrings have been  captured,  the
3387         returned value is 3. If there are no capturing subpatterns, the  return         returned  value is 3. If there are no capturing subpatterns, the return
3388         value from a successful match is 1, indicating that just the first pair         value from a successful match is 1, indicating that just the first pair
3389         of offsets has been set.         of offsets has been set.
3390    
3391         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
3392         of the string that it matched that is returned.         of the string that it matched that is returned.
3393    
3394         If  the vector is too small to hold all the captured substring offsets,         If the vector is too small to hold all the captured substring  offsets,
3395         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
3396         function  returns a value of zero. If neither the actual string matched         function returns a value of zero. If neither the actual string  matched
3397         nor any captured substrings are of interest, pcre_exec() may be  called         nor  any captured substrings are of interest, pcre_exec() may be called
3398         with  ovector passed as NULL and ovecsize as zero. However, if the pat-         with ovector passed as NULL and ovecsize as zero. However, if the  pat-
3399         tern contains back references and the ovector  is  not  big  enough  to         tern  contains  back  references  and  the ovector is not big enough to
3400         remember  the related substrings, PCRE has to get additional memory for         remember the related substrings, PCRE has to get additional memory  for
3401         use during matching. Thus it is usually advisable to supply an  ovector         use  during matching. Thus it is usually advisable to supply an ovector
3402         of reasonable size.         of reasonable size.
3403    
3404         There  are  some  cases where zero is returned (indicating vector over-         There are some cases where zero is returned  (indicating  vector  over-
3405         flow) when in fact the vector is exactly the right size for  the  final         flow)  when  in fact the vector is exactly the right size for the final
3406         match. For example, consider the pattern         match. For example, consider the pattern
3407    
3408           (a)(?:(b)c|bd)           (a)(?:(b)c|bd)
3409    
3410         If  a  vector of 6 elements (allowing for only 1 captured substring) is         If a vector of 6 elements (allowing for only 1 captured  substring)  is
3411         given with subject string "abd", pcre_exec() will try to set the second         given with subject string "abd", pcre_exec() will try to set the second
3412         captured string, thereby recording a vector overflow, before failing to         captured string, thereby recording a vector overflow, before failing to
3413         match "c" and backing up  to  try  the  second  alternative.  The  zero         match  "c"  and  backing  up  to  try  the second alternative. The zero
3414         return,  however,  does  correctly  indicate that the maximum number of         return, however, does correctly indicate that  the  maximum  number  of
3415         slots (namely 2) have been filled. In similar cases where there is tem-         slots (namely 2) have been filled. In similar cases where there is tem-
3416         porary  overflow,  but  the final number of used slots is actually less         porary overflow, but the final number of used slots  is  actually  less
3417         than the maximum, a non-zero value is returned.         than the maximum, a non-zero value is returned.
3418    
3419         The pcre_fullinfo() function can be used to find out how many capturing         The pcre_fullinfo() function can be used to find out how many capturing
3420         subpatterns  there  are  in  a  compiled pattern. The smallest size for         subpatterns there are in a compiled  pattern.  The  smallest  size  for
3421         ovector that will allow for n captured substrings, in addition  to  the         ovector  that  will allow for n captured substrings, in addition to the
3422         offsets of the substring matched by the whole pattern, is (n+1)*3.         offsets of the substring matched by the whole pattern, is (n+1)*3.
3423    
3424         It  is  possible for capturing subpattern number n+1 to match some part         It is possible for capturing subpattern number n+1 to match  some  part
3425         of the subject when subpattern n has not been used at all. For example,         of the subject when subpattern n has not been used at all. For example,
3426         if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the         if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
3427         return from the function is 4, and subpatterns 1 and 3 are matched, but         return from the function is 4, and subpatterns 1 and 3 are matched, but
3428         2  is  not.  When  this happens, both values in the offset pairs corre-         2 is not. When this happens, both values in  the  offset  pairs  corre-
3429         sponding to unused subpatterns are set to -1.         sponding to unused subpatterns are set to -1.
3430    
3431         Offset values that correspond to unused subpatterns at the end  of  the         Offset  values  that correspond to unused subpatterns at the end of the
3432         expression  are  also  set  to  -1. For example, if the string "abc" is         expression are also set to -1. For example,  if  the  string  "abc"  is
3433         matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not         matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
3434         matched.  The  return  from the function is 2, because the highest used         matched. The return from the function is 2, because  the  highest  used
3435         capturing subpattern number is 1, and the offsets for  for  the  second         capturing  subpattern  number  is 1, and the offsets for for the second
3436         and  third  capturing subpatterns (assuming the vector is large enough,         and third capturing subpatterns (assuming the vector is  large  enough,
3437         of course) are set to -1.         of course) are set to -1.
3438    
3439         Note: Elements in the first two-thirds of ovector that  do  not  corre-         Note:  Elements  in  the first two-thirds of ovector that do not corre-
3440         spond  to  capturing parentheses in the pattern are never changed. That         spond to capturing parentheses in the pattern are never  changed.  That
3441         is, if a pattern contains n capturing parentheses, no more  than  ovec-         is,  if  a pattern contains n capturing parentheses, no more than ovec-
3442         tor[0]  to ovector[2n+1] are set by pcre_exec(). The other elements (in         tor[0] to ovector[2n+1] are set by pcre_exec(). The other elements  (in
3443         the first two-thirds) retain whatever values they previously had.         the first two-thirds) retain whatever values they previously had.
3444    
3445         Some convenience functions are provided  for  extracting  the  captured         Some  convenience  functions  are  provided for extracting the captured
3446         substrings as separate strings. These are described below.         substrings as separate strings. These are described below.
3447    
3448     Error return values from pcre_exec()     Error return values from pcre_exec()
3449    
3450         If  pcre_exec()  fails, it returns a negative number. The following are         If pcre_exec() fails, it returns a negative number. The  following  are
3451         defined in the header file:         defined in the header file:
3452    
3453           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 3448  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3456  MATCHING A PATTERN: THE TRADITIONAL FUNC
3456    
3457           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
3458    
3459         Either code or subject was passed as NULL,  or  ovector  was  NULL  and         Either  code  or  subject  was  passed as NULL, or ovector was NULL and
3460         ovecsize was not zero.         ovecsize was not zero.
3461    
3462           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 3457  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3465  MATCHING A PATTERN: THE TRADITIONAL FUNC
3465    
3466           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
3467    
3468         PCRE  stores a 4-byte "magic number" at the start of the compiled code,         PCRE stores a 4-byte "magic number" at the start of the compiled  code,
3469         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
3470         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
3471         an environment with the other endianness. This is the error  that  PCRE         an  environment  with the other endianness. This is the error that PCRE
3472         gives when the magic number is not present.         gives when the magic number is not present.
3473    
3474           PCRE_ERROR_UNKNOWN_OPCODE (-5)           PCRE_ERROR_UNKNOWN_OPCODE (-5)
3475    
3476         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
3477         compiled pattern. This error could be caused by a bug  in  PCRE  or  by         compiled  pattern.  This  error  could be caused by a bug in PCRE or by
3478         overwriting of the compiled pattern.         overwriting of the compiled pattern.
3479    
3480           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
3481    
3482         If  a  pattern contains back references, but the ovector that is passed         If a pattern contains back references, but the ovector that  is  passed
3483         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
3484         PCRE  gets  a  block of memory at the start of matching to use for this         PCRE gets a block of memory at the start of matching to  use  for  this
3485         purpose. If the call via pcre_malloc() fails, this error is given.  The         purpose.  If the call via pcre_malloc() fails, this error is given. The
3486         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
3487    
3488         This  error  is also given if pcre_stack_malloc() fails in pcre_exec().         This error is also given if pcre_stack_malloc() fails  in  pcre_exec().
3489         This can happen only when PCRE has been compiled with  --disable-stack-         This  can happen only when PCRE has been compiled with --disable-stack-
3490         for-recursion.         for-recursion.
3491    
3492           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
3493    
3494         This  error is used by the pcre_copy_substring(), pcre_get_substring(),         This error is used by the pcre_copy_substring(),  pcre_get_substring(),
3495         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
3496         returned by pcre_exec().         returned by pcre_exec().
3497    
3498           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
3499    
3500         The  backtracking  limit,  as  specified  by the match_limit field in a         The backtracking limit, as specified by  the  match_limit  field  in  a
3501         pcre_extra structure (or defaulted) was reached.  See  the  description         pcre_extra  structure  (or  defaulted) was reached. See the description
3502         above.         above.
3503    
3504           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
3505    
3506         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
3507         use by callout functions that want to yield a distinctive  error  code.         use  by  callout functions that want to yield a distinctive error code.
3508         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
3509    
3510           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
3511    
3512         A  string  that contains an invalid UTF-8 byte sequence was passed as a         A string that contains an invalid UTF-8 byte sequence was passed  as  a
3513         subject, and the PCRE_NO_UTF8_CHECK option was not set. If the size  of         subject,  and the PCRE_NO_UTF8_CHECK option was not set. If the size of
3514         the  output  vector  (ovecsize)  is  at least 2, the byte offset to the         the output vector (ovecsize) is at least 2,  the  byte  offset  to  the
3515         start of the the invalid UTF-8 character is placed in  the  first  ele-         start  of  the  the invalid UTF-8 character is placed in the first ele-
3516         ment,  and  a  reason  code is placed in the second element. The reason         ment, and a reason code is placed in the  second  element.  The  reason
3517         codes are listed in the following section.  For backward compatibility,         codes are listed in the following section.  For backward compatibility,
3518         if  PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 char-         if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8  char-
3519         acter  at  the  end  of  the   subject   (reason   codes   1   to   5),         acter   at   the   end   of   the   subject  (reason  codes  1  to  5),
3520         PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.         PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8.
3521    
3522           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
3523    
3524         The  UTF-8  byte  sequence that was passed as a subject was checked and         The UTF-8 byte sequence that was passed as a subject  was  checked  and
3525         found to be valid (the PCRE_NO_UTF8_CHECK option was not set), but  the         found  to be valid (the PCRE_NO_UTF8_CHECK option was not set), but the
3526         value  of startoffset did not point to the beginning of a UTF-8 charac-         value of startoffset did not point to the beginning of a UTF-8  charac-
3527         ter or the end of the subject.         ter or the end of the subject.
3528    
3529           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
3530    
3531         The subject string did not match, but it did match partially.  See  the         The  subject  string did not match, but it did match partially. See the
3532         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
3533    
3534           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
3535    
3536         This  code  is  no  longer  in  use.  It was formerly returned when the         This code is no longer in  use.  It  was  formerly  returned  when  the
3537         PCRE_PARTIAL option was used with a compiled pattern  containing  items         PCRE_PARTIAL  option  was used with a compiled pattern containing items
3538         that  were  not  supported  for  partial  matching.  From  release 8.00         that were  not  supported  for  partial  matching.  From  release  8.00
3539         onwards, there are no restrictions on partial matching.         onwards, there are no restrictions on partial matching.
3540    
3541           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
3542    
3543         An unexpected internal error has occurred. This error could  be  caused         An  unexpected  internal error has occurred. This error could be caused
3544         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
3545    
3546           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
# Line 3542  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3550  MATCHING A PATTERN: THE TRADITIONAL FUNC
3550           PCRE_ERROR_RECURSIONLIMIT (-21)           PCRE_ERROR_RECURSIONLIMIT (-21)
3551    
3552         The internal recursion limit, as specified by the match_limit_recursion         The internal recursion limit, as specified by the match_limit_recursion
3553         field in a pcre_extra structure (or defaulted)  was  reached.  See  the         field  in  a  pcre_extra  structure (or defaulted) was reached. See the
3554         description above.         description above.
3555    
3556           PCRE_ERROR_BADNEWLINE     (-23)           PCRE_ERROR_BADNEWLINE     (-23)
# Line 3556  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3564  MATCHING A PATTERN: THE TRADITIONAL FUNC
3564    
3565           PCRE_ERROR_SHORTUTF8      (-25)           PCRE_ERROR_SHORTUTF8      (-25)
3566    
3567         This error is returned instead of PCRE_ERROR_BADUTF8 when  the  subject         This  error  is returned instead of PCRE_ERROR_BADUTF8 when the subject
3568         string  ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD         string ends with a truncated UTF-8 character and the  PCRE_PARTIAL_HARD
3569         option is set.  Information  about  the  failure  is  returned  as  for         option  is  set.   Information  about  the  failure  is returned as for
3570         PCRE_ERROR_BADUTF8.  It  is in fact sufficient to detect this case, but         PCRE_ERROR_BADUTF8. It is in fact sufficient to detect this  case,  but
3571         this special error code for PCRE_PARTIAL_HARD precedes the  implementa-         this  special error code for PCRE_PARTIAL_HARD precedes the implementa-
3572         tion  of returned information; it is retained for backwards compatibil-         tion of returned information; it is retained for backwards  compatibil-
3573         ity.         ity.
3574    
3575           PCRE_ERROR_RECURSELOOP    (-26)           PCRE_ERROR_RECURSELOOP    (-26)
3576    
3577         This error is returned when pcre_exec() detects a recursion loop within         This error is returned when pcre_exec() detects a recursion loop within
3578         the  pattern. Specifically, it means that either the whole pattern or a         the pattern. Specifically, it means that either the whole pattern or  a
3579         subpattern has been called recursively for the second time at the  same         subpattern  has been called recursively for the second time at the same
3580         position in the subject string. Some simple patterns that might do this         position in the subject string. Some simple patterns that might do this
3581         are detected and faulted at compile time, but more  complicated  cases,         are  detected  and faulted at compile time, but more complicated cases,
3582         in particular mutual recursions between two different subpatterns, can-         in particular mutual recursions between two different subpatterns, can-
3583         not be detected until run time.         not be detected until run time.
3584    
3585           PCRE_ERROR_JIT_STACKLIMIT (-27)           PCRE_ERROR_JIT_STACKLIMIT (-27)
3586    
3587         This error is returned when a pattern  that  was  successfully  studied         This  error  is  returned  when a pattern that was successfully studied
3588         using  a  JIT compile option is being matched, but the memory available         using a JIT compile option is being matched, but the  memory  available
3589         for the just-in-time processing stack is  not  large  enough.  See  the         for  the  just-in-time  processing  stack  is not large enough. See the
3590         pcrejit documentation for more details.         pcrejit documentation for more details.
3591    
3592           PCRE_ERROR_BADMODE        (-28)           PCRE_ERROR_BADMODE        (-28)
# Line 3588  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3596  MATCHING A PATTERN: THE TRADITIONAL FUNC
3596    
3597           PCRE_ERROR_BADENDIANNESS  (-29)           PCRE_ERROR_BADENDIANNESS  (-29)
3598    
3599         This error is given if  a  pattern  that  was  compiled  and  saved  is         This  error  is  given  if  a  pattern  that  was compiled and saved is
3600         reloaded  on  a  host  with  different endianness. The utility function         reloaded on a host with  different  endianness.  The  utility  function
3601         pcre_pattern_to_host_byte_order() can be used to convert such a pattern         pcre_pattern_to_host_byte_order() can be used to convert such a pattern
3602         so that it runs on the new host.         so that it runs on the new host.
3603    
3604           PCRE_ERROR_JIT_BADOPTION           PCRE_ERROR_JIT_BADOPTION
3605    
3606         This  error  is  returned  when a pattern that was successfully studied         This error is returned when a pattern  that  was  successfully  studied
3607         using a JIT compile option is being  matched,  but  the  matching  mode         using  a  JIT  compile  option  is being matched, but the matching mode
3608         (partial  or complete match) does not correspond to any JIT compilation         (partial or complete match) does not correspond to any JIT  compilation
3609         mode. When the JIT fast path function is used, this error may  be  also         mode.  When  the JIT fast path function is used, this error may be also
3610         given  for  invalid  options.  See  the  pcrejit documentation for more         given for invalid options.  See  the  pcrejit  documentation  for  more
3611         details.         details.
3612    
3613           PCRE_ERROR_BADLENGTH      (-32)           PCRE_ERROR_BADLENGTH      (-32)
3614    
3615         This error is given if pcre_exec() is called with a negative value  for         This  error is given if pcre_exec() is called with a negative value for
3616         the length argument.         the length argument.
3617    
3618         Error numbers -16 to -20, -22, and 30 are not used by pcre_exec().         Error numbers -16 to -20, -22, and 30 are not used by pcre_exec().
3619    
3620     Reason codes for invalid UTF-8 strings     Reason codes for invalid UTF-8 strings
3621    
3622         This  section  applies  only  to  the  8-bit library. The corresponding         This section applies only  to  the  8-bit  library.  The  corresponding
3623         information for the 16-bit and 32-bit libraries is given in the  pcre16         information  for the 16-bit and 32-bit libraries is given in the pcre16
3624         and pcre32 pages.         and pcre32 pages.
3625    
3626         When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-         When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT-
3627         UTF8, and the size of the output vector (ovecsize) is at least  2,  the         UTF8,  and  the size of the output vector (ovecsize) is at least 2, the
3628         offset  of  the  start  of the invalid UTF-8 character is placed in the         offset of the start of the invalid UTF-8 character  is  placed  in  the
3629         first output vector element (ovector[0]) and a reason code is placed in         first output vector element (ovector[0]) and a reason code is placed in
3630         the  second  element  (ovector[1]). The reason codes are given names in         the second element (ovector[1]). The reason codes are  given  names  in
3631         the pcre.h header file:         the pcre.h header file:
3632    
3633           PCRE_UTF8_ERR1           PCRE_UTF8_ERR1
# Line 3628  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3636  MATCHING A PATTERN: THE TRADITIONAL FUNC
3636           PCRE_UTF8_ERR4           PCRE_UTF8_ERR4
3637           PCRE_UTF8_ERR5           PCRE_UTF8_ERR5
3638    
3639         The string ends with a truncated UTF-8 character;  the  code  specifies         The  string  ends  with a truncated UTF-8 character; the code specifies
3640         how  many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8         how many bytes are missing (1 to 5). Although RFC 3629 restricts  UTF-8
3641         characters to be no longer than 4 bytes, the  encoding  scheme  (origi-         characters  to  be  no longer than 4 bytes, the encoding scheme (origi-
3642         nally  defined  by  RFC  2279)  allows  for  up to 6 bytes, and this is         nally defined by RFC 2279) allows for  up  to  6  bytes,  and  this  is
3643         checked first; hence the possibility of 4 or 5 missing bytes.         checked first; hence the possibility of 4 or 5 missing bytes.
3644    
3645           PCRE_UTF8_ERR6           PCRE_UTF8_ERR6
# Line 3641  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3649  MATCHING A PATTERN: THE TRADITIONAL FUNC
3649           PCRE_UTF8_ERR10           PCRE_UTF8_ERR10
3650    
3651         The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of         The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of
3652         the  character  do  not have the binary value 0b10 (that is, either the         the character do not have the binary value 0b10 (that  is,  either  the
3653         most significant bit is 0, or the next bit is 1).         most significant bit is 0, or the next bit is 1).
3654    
3655           PCRE_UTF8_ERR11           PCRE_UTF8_ERR11
3656           PCRE_UTF8_ERR12           PCRE_UTF8_ERR12
3657    
3658         A character that is valid by the RFC 2279 rules is either 5 or 6  bytes         A  character that is valid by the RFC 2279 rules is either 5 or 6 bytes
3659         long; these code points are excluded by RFC 3629.         long; these code points are excluded by RFC 3629.
3660    
3661           PCRE_UTF8_ERR13           PCRE_UTF8_ERR13
3662    
3663         A  4-byte character has a value greater than 0x10fff; these code points         A 4-byte character has a value greater than 0x10fff; these code  points
3664         are excluded by RFC 3629.         are excluded by RFC 3629.
3665    
3666           PCRE_UTF8_ERR14           PCRE_UTF8_ERR14
3667    
3668         A 3-byte character has a value in the  range  0xd800  to  0xdfff;  this         A  3-byte  character  has  a  value in the range 0xd800 to 0xdfff; this
3669         range  of code points are reserved by RFC 3629 for use with UTF-16, and         range of code points are reserved by RFC 3629 for use with UTF-16,  and
3670         so are excluded from UTF-8.         so are excluded from UTF-8.
3671    
3672           PCRE_UTF8_ERR15           PCRE_UTF8_ERR15
# Line 3667  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 3675  MATCHING A PATTERN: THE TRADITIONAL FUNC
3675           PCRE_UTF8_ERR18           PCRE_UTF8_ERR18
3676           PCRE_UTF8_ERR19           PCRE_UTF8_ERR19
3677    
3678         A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it  codes         A  2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes
3679         for  a  value that can be represented by fewer bytes, which is invalid.         for a value that can be represented by fewer bytes, which  is  invalid.
3680         For example, the two bytes 0xc0, 0xae give the value 0x2e,  whose  cor-         For  example,  the two bytes 0xc0, 0xae give the value 0x2e, whose cor-
3681         rect coding uses just one byte.         rect coding uses just one byte.
3682    
3683           PCRE_UTF8_ERR20           PCRE_UTF8_ERR20
3684    
3685         The two most significant bits of the first byte of a character have the         The two most significant bits of the first byte of a character have the
3686         binary value 0b10 (that is, the most significant bit is 1 and the  sec-         binary  value 0b10 (that is, the most significant bit is 1 and the sec-
3687         ond  is  0). Such a byte can only validly occur as the second or subse-         ond is 0). Such a byte can only validly occur as the second  or  subse-
3688         quent byte of a multi-byte character.         quent byte of a multi-byte character.
3689    
3690           PCRE_UTF8_ERR21           PCRE_UTF8_ERR21
3691    
3692         The first byte of a character has the value 0xfe or 0xff. These  values         The  first byte of a character has the value 0xfe or 0xff. These values
3693         can never occur in a valid UTF-8 string.         can never occur in a valid UTF-8 string.
3694    
3695           PCRE_UTF8_ERR2           PCRE_UTF8_ERR22
3696    
3697         Non-character. These are the last two characters in each plane (0xfffe,         This error code was formerly used when  the  presence  of  a  so-called
3698         0xffff, 0x1fffe, 0x1ffff .. 0x10fffe,  0x10ffff),  and  the  characters         "non-character"  caused an error. Unicode corrigendum #9 makes it clear
3699         0xfdd0..0xfdef.         that such characters should not cause a string to be rejected,  and  so
3700           this code is no longer in use and is never returned.
3701    
3702    
3703  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
# Line 4101  AUTHOR Line 4110  AUTHOR
4110    
4111  REVISION  REVISION
4112    
4113         Last updated: 08 November 2012         Last updated: 27 February 2013
4114         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
4115  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4116    
4117    
4118    PCRECALLOUT(3)             Library Functions Manual             PCRECALLOUT(3)
4119    
4120    
 PCRECALLOUT(3)                                                  PCRECALLOUT(3)  
   
4121    
4122  NAME  NAME
4123         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
4124    
   
4125  SYNOPSIS  SYNOPSIS
4126    
4127         #include <pcre.h>         #include <pcre.h>
# Line 4153  DESCRIPTION Line 4162  DESCRIPTION
4162         (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)         (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
4163    
4164         Notice  that  there  is a callout before and after each parenthesis and         Notice  that  there  is a callout before and after each parenthesis and
4165         alternation bar. Automatic  callouts  can  be  used  for  tracking  the         alternation bar. If the pattern contains a conditional group whose con-
4166         progress  of  pattern matching. The pcretest command has an option that         dition  is  an  assertion, an automatic callout is inserted immediately
4167         sets automatic callouts; when it is used, the output indicates how  the         before the condition. Such a callout may also be  inserted  explicitly,
4168         pattern  is  matched. This is useful information when you are trying to         for example:
4169         optimize the performance of a particular pattern.  
4170             (?(?C9)(?=a)ab|de)
4171         The use of callouts in a pattern makes it ineligible  for  optimization  
4172         by  the  just-in-time  compiler.  Studying  such  a  pattern  with  the         This  applies only to assertion conditions (because they are themselves
4173         PCRE_STUDY_JIT_COMPILE option always fails.         independent groups).
4174    
4175           Automatic callouts can be used for tracking  the  progress  of  pattern
4176           matching.  The pcretest command has an option that sets automatic call-
4177           outs; when it is used, the output indicates how the pattern is matched.
4178           This  is useful information when you are trying to optimize the perfor-
4179           mance of a particular pattern.
4180    
4181    
4182  MISSING CALLOUTS  MISSING CALLOUTS
# Line 4251  THE CALLOUT INTERFACE Line 4266  THE CALLOUT INTERFACE
4266         are used, because they do not support captured substrings.         are used, because they do not support captured substrings.
4267    
4268         The  capture_last  field  contains the number of the most recently cap-         The  capture_last  field  contains the number of the most recently cap-
4269         tured substring. If no substrings have been captured, its value is  -1.         tured substring. However, when a recursion exits, the value reverts  to
4270         This is always the case for the DFA matching functions.         what  it  was  outside  the recursion, as do the values of all captured
4271           substrings. If no substrings have been  captured,  the  value  of  cap-
4272           ture_last  is  -1.  This  is always the case for the DFA matching func-
4273           tions.
4274    
4275         The  callout_data  field  contains a value that is passed to a matching         The callout_data field contains a value that is passed  to  a  matching
4276         function specifically so that it can be passed back in callouts. It  is         function  specifically so that it can be passed back in callouts. It is
4277         passed  in  the callout_data field of a pcre_extra or pcre[16|32]_extra         passed in the callout_data field of a pcre_extra  or  pcre[16|32]_extra
4278         data structure. If no such data was passed, the value  of  callout_data         data  structure.  If no such data was passed, the value of callout_data
4279         in  a  callout  block is NULL. There is a description of the pcre_extra         in a callout block is NULL. There is a description  of  the  pcre_extra
4280         structure in the pcreapi documentation.         structure in the pcreapi documentation.
4281    
4282         The pattern_position field is present from version  1  of  the  callout         The  pattern_position  field  is  present from version 1 of the callout
4283         structure. It contains the offset to the next item to be matched in the         structure. It contains the offset to the next item to be matched in the
4284         pattern string.         pattern string.
4285    
4286         The next_item_length field is present from version  1  of  the  callout         The  next_item_length  field  is  present from version 1 of the callout
4287         structure. It contains the length of the next item to be matched in the         structure. It contains the length of the next item to be matched in the
4288         pattern string. When the callout immediately  precedes  an  alternation         pattern  string.  When  the callout immediately precedes an alternation
4289         bar,  a  closing  parenthesis, or the end of the pattern, the length is         bar, a closing parenthesis, or the end of the pattern,  the  length  is
4290         zero. When the callout precedes an opening parenthesis, the  length  is         zero.  When  the callout precedes an opening parenthesis, the length is
4291         that of the entire subpattern.         that of the entire subpattern.
4292    
4293         The  pattern_position  and next_item_length fields are intended to help         The pattern_position and next_item_length fields are intended  to  help
4294         in distinguishing between different automatic callouts, which all  have         in  distinguishing between different automatic callouts, which all have
4295         the same callout number. However, they are set for all callouts.         the same callout number. However, they are set for all callouts.
4296    
4297         The  mark  field is present from version 2 of the callout structure. In         The mark field is present from version 2 of the callout  structure.  In
4298         callouts from pcre_exec() or pcre[16|32]_exec() it contains  a  pointer         callouts  from  pcre_exec() or pcre[16|32]_exec() it contains a pointer
4299         to  the  zero-terminated  name  of  the  most  recently passed (*MARK),         to the zero-terminated  name  of  the  most  recently  passed  (*MARK),
4300         (*PRUNE), or (*THEN) item in the match, or NULL if no such  items  have         (*PRUNE),  or  (*THEN) item in the match, or NULL if no such items have
4301         been  passed.  Instances  of  (*PRUNE) or (*THEN) without a name do not         been passed. Instances of (*PRUNE) or (*THEN) without  a  name  do  not
4302         obliterate a previous (*MARK). In callouts from the DFA matching  func-         obliterate  a previous (*MARK). In callouts from the DFA matching func-
4303         tions this field always contains NULL.         tions this field always contains NULL.
4304    
4305    
4306  RETURN VALUES  RETURN VALUES
4307    
4308         The  external callout function returns an integer to PCRE. If the value         The external callout function returns an integer to PCRE. If the  value
4309         is zero, matching proceeds as normal. If  the  value  is  greater  than         is  zero,  matching  proceeds  as  normal. If the value is greater than
4310         zero,  matching  fails  at  the current point, but the testing of other         zero, matching fails at the current point, but  the  testing  of  other
4311         matching possibilities goes ahead, just as if a lookahead assertion had         matching possibilities goes ahead, just as if a lookahead assertion had
4312         failed.  If  the  value  is less than zero, the match is abandoned, the         failed. If the value is less than zero, the  match  is  abandoned,  the
4313         matching function returns the negative value.         matching function returns the negative value.
4314    
4315         Negative  values  should  normally  be   chosen   from   the   set   of         Negative   values   should   normally   be   chosen  from  the  set  of
4316         PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-         PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-
4317         dard "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT  is         dard  "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT is
4318         reserved  for  use  by callout functions; it will never be used by PCRE         reserved for use by callout functions; it will never be  used  by  PCRE
4319         itself.         itself.
4320    
4321    
# Line 4310  AUTHOR Line 4328  AUTHOR
4328    
4329  REVISION  REVISION
4330    
4331         Last updated: 24 June 2012         Last updated: 03 March 2013
4332         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
4333  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4334    
4335    
4336    PCRECOMPAT(3)              Library Functions Manual              PCRECOMPAT(3)
4337    
4338    
 PCRECOMPAT(3)                                                    PCRECOMPAT(3)  
   
4339    
4340  NAME  NAME
4341         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
4342    
   
4343  DIFFERENCES BETWEEN PCRE AND PERL  DIFFERENCES BETWEEN PCRE AND PERL
4344    
4345         This  document describes the differences in the ways that PCRE and Perl         This  document describes the differences in the ways that PCRE and Perl
# Line 4340  DIFFERENCES BETWEEN PCRE AND PERL Line 4358  DIFFERENCES BETWEEN PCRE AND PERL
4358    
4359         3.  Capturing  subpatterns  that occur inside negative lookahead asser-         3.  Capturing  subpatterns  that occur inside negative lookahead asser-
4360         tions are counted, but their entries in the offsets  vector  are  never         tions are counted, but their entries in the offsets  vector  are  never
4361         set.  Perl sets its numerical variables from any such patterns that are         set.  Perl sometimes (but not always) sets its numerical variables from
4362         matched before the assertion fails to match something (thereby succeed-         inside negative assertions.
        ing),  but  only  if the negative lookahead assertion contains just one  
        branch.  
4363    
4364         4. Though binary zero characters are supported in the  subject  string,         4. Though binary zero characters are supported in the  subject  string,
4365         they are not allowed in a pattern string because it is passed as a nor-         they are not allowed in a pattern string because it is passed as a nor-
# Line 4398  DIFFERENCES BETWEEN PCRE AND PERL Line 4414  DIFFERENCES BETWEEN PCRE AND PERL
4414         There is a discussion that explains these differences in more detail in         There is a discussion that explains these differences in more detail in
4415         the section on recursion differences from Perl in the pcrepattern page.         the section on recursion differences from Perl in the pcrepattern page.
4416    
4417         10. If any of the backtracking control verbs are used in  an  assertion         10. If any of the backtracking control verbs are used in  a  subpattern
4418         or  in  a  subpattern  that  is  called as a subroutine (whether or not         that  is  called  as  a  subroutine (whether or not recursively), their
4419         recursively), their effect is confined to that subpattern; it does  not         effect is confined to that subpattern; it does not extend to  the  sur-
4420         extend to the surrounding pattern. This is not always the case in Perl.         rounding  pattern.  This is not always the case in Perl. In particular,
4421         In particular, if (*THEN) is present in a group that  is  called  as  a         if (*THEN) is present in a group that is called as  a  subroutine,  its
4422         subroutine, its action is limited to that group, even if the group does         action is limited to that group, even if the group does not contain any
4423         not contain any | characters. There is one exception to this: the  name         | characters. Note that such subpatterns are processed as  anchored  at
4424         from  a *(MARK), (*PRUNE), or (*THEN) that is encountered in a success-         the point where they are tested.
4425         ful positive assertion is passed back when a  match  succeeds  (compare  
4426         capturing  parentheses  in  assertions). Note that such subpatterns are         11.  If a pattern contains more than one backtracking control verb, the
4427         processed as anchored at the point where they are tested.         first one that is backtracked onto acts. For example,  in  the  pattern
4428           A(*COMMIT)B(*PRUNE)C  a  failure in B triggers (*COMMIT), but a failure
4429           in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
4430           it is the same as PCRE, but there are examples where it differs.
4431    
4432         11. There are some differences that are concerned with the settings  of         12.  Most  backtracking  verbs in assertions have their normal actions.
4433           They are not confined to the assertion.
4434    
4435           13. There are some differences that are concerned with the settings  of
4436         captured  strings  when  part  of  a  pattern is repeated. For example,         captured  strings  when  part  of  a  pattern is repeated. For example,
4437         matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2         matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2
4438         unset, but in PCRE it is set to "b".         unset, but in PCRE it is set to "b".
4439    
4440         12.  PCRE's handling of duplicate subpattern numbers and duplicate sub-         14.  PCRE's handling of duplicate subpattern numbers and duplicate sub-
4441         pattern names is not as general as Perl's. This is a consequence of the         pattern names is not as general as Perl's. This is a consequence of the
4442         fact the PCRE works internally just with numbers, using an external ta-         fact the PCRE works internally just with numbers, using an external ta-
4443         ble to translate between numbers and names. In  particular,  a  pattern         ble to translate between numbers and names. In  particular,  a  pattern
# Line 4426  DIFFERENCES BETWEEN PCRE AND PERL Line 4448  DIFFERENCES BETWEEN PCRE AND PERL
4448         turing subpattern number 1. To avoid this confusing situation, an error         turing subpattern number 1. To avoid this confusing situation, an error
4449         is given at compile time.         is given at compile time.
4450    
4451         13. Perl recognizes comments in some places that  PCRE  does  not,  for         15. Perl recognizes comments in some places that  PCRE  does  not,  for
4452         example,  between  the  ( and ? at the start of a subpattern. If the /x         example,  between  the  ( and ? at the start of a subpattern. If the /x
4453         modifier is set, Perl allows white space between ( and ? but PCRE never         modifier is set, Perl allows white space between ( and ? but PCRE never
4454         does, even if the PCRE_EXTENDED option is set.         does, even if the PCRE_EXTENDED option is set.
4455    
4456         14. PCRE provides some extensions to the Perl regular expression facil-         16.  In  PCRE,  the upper/lower case character properties Lu and Ll are
4457         ities.  Perl 5.10 includes new features that are not  in  earlier  ver-         not affected when case-independent matching is specified. For  example,
4458         sions  of  Perl, some of which (such as named parentheses) have been in         \p{Lu} always matches an upper case letter. I think Perl has changed in
4459           this respect; in the release at the time of writing (5.16), \p{Lu}  and
4460           \p{Ll} match all letters, regardless of case, when case independence is
4461           specified.
4462    
4463           17. PCRE provides some extensions to the Perl regular expression facil-
4464           ities.   Perl  5.10  includes new features that are not in earlier ver-
4465           sions of Perl, some of which (such as named parentheses) have  been  in
4466         PCRE for some time. This list is with respect to Perl 5.10:         PCRE for some time. This list is with respect to Perl 5.10:
4467    
4468         (a) Although lookbehind assertions in  PCRE  must  match  fixed  length         (a)  Although  lookbehind  assertions  in  PCRE must match fixed length
4469         strings,  each alternative branch of a lookbehind assertion can match a         strings, each alternative branch of a lookbehind assertion can match  a
4470         different length of string. Perl requires them all  to  have  the  same         different  length  of  string.  Perl requires them all to have the same
4471         length.         length.
4472    
4473         (b)  If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $         (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the  $
4474         meta-character matches only at the very end of the string.         meta-character matches only at the very end of the string.
4475    
4476         (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-         (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
4477         cial meaning is faulted. Otherwise, like Perl, the backslash is quietly         cial meaning is faulted. Otherwise, like Perl, the backslash is quietly
4478         ignored.  (Perl can be made to issue a warning.)         ignored.  (Perl can be made to issue a warning.)
4479    
4480         (d) If PCRE_UNGREEDY is set, the greediness of the  repetition  quanti-         (d)  If  PCRE_UNGREEDY is set, the greediness of the repetition quanti-
4481         fiers is inverted, that is, by default they are not greedy, but if fol-         fiers is inverted, that is, by default they are not greedy, but if fol-
4482         lowed by a question mark they are.         lowed by a question mark they are.
4483    
# Line 4456  DIFFERENCES BETWEEN PCRE AND PERL Line 4485  DIFFERENCES BETWEEN PCRE AND PERL
4485         tried only at the first matching position in the subject string.         tried only at the first matching position in the subject string.
4486    
4487         (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,         (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART,
4488         and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no  Perl  equiva-         and  PCRE_NO_AUTO_CAPTURE  options for pcre_exec() have no Perl equiva-
4489         lents.         lents.
4490    
4491         (g)  The  \R escape sequence can be restricted to match only CR, LF, or         (g) The \R escape sequence can be restricted to match only CR,  LF,  or
4492         CRLF by the PCRE_BSR_ANYCRLF option.         CRLF by the PCRE_BSR_ANYCRLF option.
4493    
4494         (h) The callout facility is PCRE-specific.         (h) The callout facility is PCRE-specific.
# Line 4467  DIFFERENCES BETWEEN PCRE AND PERL Line 4496  DIFFERENCES BETWEEN PCRE AND PERL
4496         (i) The partial matching facility is PCRE-specific.         (i) The partial matching facility is PCRE-specific.
4497    
4498         (j) Patterns compiled by PCRE can be saved and re-used at a later time,         (j) Patterns compiled by PCRE can be saved and re-used at a later time,
4499         even  on  different hosts that have the other endianness. However, this         even on different hosts that have the other endianness.  However,  this
4500         does not apply to optimized data created by the just-in-time compiler.         does not apply to optimized data created by the just-in-time compiler.
4501    
4502         (k)    The    alternative    matching    functions    (pcre_dfa_exec(),         (k)    The    alternative    matching    functions    (pcre_dfa_exec(),
4503         pcre16_dfa_exec()  and pcre32_dfa_exec(),) match in a different way and         pcre16_dfa_exec() and pcre32_dfa_exec(),) match in a different way  and
4504         are not Perl-compatible.         are not Perl-compatible.
4505    
4506         (l) PCRE recognizes some special sequences such as (*CR) at  the  start         (l)  PCRE  recognizes some special sequences such as (*CR) at the start
4507         of a pattern that set overall options that cannot be changed within the         of a pattern that set overall options that cannot be changed within the
4508         pattern.         pattern.
4509    
# Line 4488  AUTHOR Line 4517  AUTHOR
4517    
4518  REVISION  REVISION
4519    
4520         Last updated: 25 August 2012         Last updated: 19 March 2013
4521         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
4522  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4523    
4524    
4525    PCREPATTERN(3)             Library Functions Manual             PCREPATTERN(3)
4526    
4527    
 PCREPATTERN(3)                                                  PCREPATTERN(3)  
   
4528    
4529  NAME  NAME
4530         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
4531    
   
4532  PCRE REGULAR EXPRESSION DETAILS  PCRE REGULAR EXPRESSION DETAILS
4533    
4534         The  syntax and semantics of the regular expressions that are supported         The  syntax and semantics of the regular expressions that are supported
# Line 5136  BACKSLASH Line 5165  BACKSLASH
5165         in the Unicode table.         in the Unicode table.
5166    
5167         Specifying  caseless  matching  does not affect these escape sequences.         Specifying  caseless  matching  does not affect these escape sequences.
5168         For example, \p{Lu} always matches only upper case letters.         For example, \p{Lu} always matches only upper  case  letters.  This  is
5169           different from the behaviour of current versions of Perl.
5170    
5171         Matching characters by Unicode property is not fast, because  PCRE  has         Matching  characters  by Unicode property is not fast, because PCRE has
5172         to  do  a  multistage table lookup in order to find a character's prop-         to do a multistage table lookup in order to find  a  character's  prop-
5173         erty. That is why the traditional escape sequences such as \d and \w do         erty. That is why the traditional escape sequences such as \d and \w do
5174         not use Unicode properties in PCRE by default, though you can make them         not use Unicode properties in PCRE by default, though you can make them
5175         do so by setting the PCRE_UCP option or by starting  the  pattern  with         do  so  by  setting the PCRE_UCP option or by starting the pattern with
5176         (*UCP).         (*UCP).
5177    
5178     Extended grapheme clusters     Extended grapheme clusters
5179    
5180         The  \X  escape  matches  any number of Unicode characters that form an         The \X escape matches any number of Unicode  characters  that  form  an
5181         "extended grapheme cluster", and treats the sequence as an atomic group         "extended grapheme cluster", and treats the sequence as an atomic group
5182         (see  below).   Up  to and including release 8.31, PCRE matched an ear-         (see below).  Up to and including release 8.31, PCRE  matched  an  ear-
5183         lier, simpler definition that was equivalent to         lier, simpler definition that was equivalent to
5184    
5185           (?>\PM\pM*)           (?>\PM\pM*)
5186    
5187         That is, it matched a character without the "mark"  property,  followed         That  is,  it matched a character without the "mark" property, followed
5188         by  zero  or  more characters with the "mark" property. Characters with         by zero or more characters with the "mark"  property.  Characters  with
5189         the "mark" property are typically non-spacing accents that  affect  the         the  "mark"  property are typically non-spacing accents that affect the
5190         preceding character.         preceding character.
5191    
5192         This  simple definition was extended in Unicode to include more compli-         This simple definition was extended in Unicode to include more  compli-
5193         cated kinds of composite character by giving each character a  grapheme         cated  kinds of composite character by giving each character a grapheme
5194         breaking  property,  and  creating  rules  that use these properties to         breaking property, and creating rules  that  use  these  properties  to
5195         define the boundaries of extended grapheme  clusters.  In  releases  of         define  the  boundaries  of  extended grapheme clusters. In releases of
5196         PCRE later than 8.31, \X matches one of these clusters.         PCRE later than 8.31, \X matches one of these clusters.
5197    
5198         \X  always  matches  at least one character. Then it decides whether to         \X always matches at least one character. Then it  decides  whether  to
5199         add additional characters according to the following rules for ending a         add additional characters according to the following rules for ending a
5200         cluster:         cluster:
5201    
5202         1. End at the end of the subject string.         1. End at the end of the subject string.
5203    
5204         2.  Do not end between CR and LF; otherwise end after any control char-         2. Do not end between CR and LF; otherwise end after any control  char-
5205         acter.         acter.
5206    
5207         3. Do not break Hangul (a Korean  script)  syllable  sequences.  Hangul         3.  Do  not  break  Hangul (a Korean script) syllable sequences. Hangul
5208         characters  are of five types: L, V, T, LV, and LVT. An L character may         characters are of five types: L, V, T, LV, and LVT. An L character  may
5209         be followed by an L, V, LV, or LVT character; an LV or V character  may         be  followed by an L, V, LV, or LVT character; an LV or V character may
5210         be followed by a V or T character; an LVT or T character may be follwed         be followed by a V or T character; an LVT or T character may be follwed
5211         only by a T character.         only by a T character.
5212    
5213         4. Do not end before extending characters or spacing marks.  Characters         4.  Do not end before extending characters or spacing marks. Characters
5214         with  the  "mark"  property  always have the "extend" grapheme breaking         with the "mark" property always have  the  "extend"  grapheme  breaking
5215         property.         property.
5216    
5217         5. Do not end after prepend characters.         5. Do not end after prepend characters.
# Line 5190  BACKSLASH Line 5220  BACKSLASH
5220    
5221     PCRE's additional properties     PCRE's additional properties
5222    
5223         As well as the standard Unicode properties described above,  PCRE  sup-         As  well  as the standard Unicode properties described above, PCRE sup-
5224         ports  four  more  that  make it possible to convert traditional escape         ports four more that make it possible  to  convert  traditional  escape
5225         sequences such as \w and \s and POSIX character classes to use  Unicode         sequences  such as \w and \s and POSIX character classes to use Unicode
5226         properties.  PCRE  uses  these non-standard, non-Perl properties inter-         properties. PCRE uses these non-standard,  non-Perl  properties  inter-
5227         nally when PCRE_UCP is set. They are:         nally  when PCRE_UCP is set. However, they may also be used explicitly.
5228           These properties are:
5229    
5230           Xan   Any alphanumeric character           Xan   Any alphanumeric character
5231           Xps   Any POSIX space character           Xps   Any POSIX space character
# Line 5207  BACKSLASH Line 5238  BACKSLASH
5238         (separator) property.  Xsp is the same as Xps, except that vertical tab         (separator) property.  Xsp is the same as Xps, except that vertical tab
5239         is excluded. Xwd matches the same characters as Xan, plus underscore.         is excluded. Xwd matches the same characters as Xan, plus underscore.
5240    
5241           There is another non-standard property, Xuc, which matches any  charac-
5242           ter  that  can  be represented by a Universal Character Name in C++ and
5243           other programming languages. These are the characters $,  @,  `  (grave
5244           accent),  and  all  characters with Unicode code points greater than or
5245           equal to U+00A0, except for the surrogates U+D800 to U+DFFF. Note  that
5246           most  base  (ASCII) characters are excluded. (Universal Character Names
5247           are of the form \uHHHH or \UHHHHHHHH where H is  a  hexadecimal  digit.
5248           Note that the Xuc property does not match these sequences but the char-
5249           acters that they represent.)
5250    
5251     Resetting the match start     Resetting the match start
5252    
5253         The escape sequence \K causes any previously matched characters not  to         The escape sequence \K causes any previously matched characters not  to
# Line 6227  ASSERTIONS Line 6268  ASSERTIONS
6268         tion contains capturing subpatterns within it, these  are  counted  for         tion contains capturing subpatterns within it, these  are  counted  for
6269         the  purposes  of numbering the capturing subpatterns in the whole pat-         the  purposes  of numbering the capturing subpatterns in the whole pat-
6270         tern. However, substring capturing is carried  out  only  for  positive         tern. However, substring capturing is carried  out  only  for  positive
6271         assertions, because it does not make sense for negative assertions.         assertions. (Perl sometimes, but not always, does do capturing in nega-
6272           tive assertions.)
6273    
6274         For  compatibility  with  Perl,  assertion subpatterns may be repeated;         For compatibility with Perl, assertion  subpatterns  may  be  repeated;
6275         though it makes no sense to assert the same thing  several  times,  the         though  it  makes  no sense to assert the same thing several times, the
6276         side  effect  of  capturing  parentheses may occasionally be useful. In         side effect of capturing parentheses may  occasionally  be  useful.  In
6277         practice, there only three cases:         practice, there only three cases:
6278    
6279         (1) If the quantifier is {0}, the  assertion  is  never  obeyed  during         (1)  If  the  quantifier  is  {0}, the assertion is never obeyed during
6280         matching.   However,  it  may  contain internal capturing parenthesized         matching.  However, it may  contain  internal  capturing  parenthesized
6281         groups that are called from elsewhere via the subroutine mechanism.         groups that are called from elsewhere via the subroutine mechanism.
6282    
6283         (2) If quantifier is {0,n} where n is greater than zero, it is  treated         (2)  If quantifier is {0,n} where n is greater than zero, it is treated
6284         as  if  it  were  {0,1}.  At run time, the rest of the pattern match is         as if it were {0,1}. At run time, the rest  of  the  pattern  match  is
6285         tried with and without the assertion, the order depending on the greed-         tried with and without the assertion, the order depending on the greed-
6286         iness of the quantifier.         iness of the quantifier.
6287    
6288         (3)  If  the minimum repetition is greater than zero, the quantifier is         (3) If the minimum repetition is greater than zero, the  quantifier  is
6289         ignored.  The assertion is obeyed just  once  when  encountered  during         ignored.   The  assertion  is  obeyed just once when encountered during
6290         matching.         matching.
6291    
6292     Lookahead assertions     Lookahead assertions
# Line 6254  ASSERTIONS Line 6296  ASSERTIONS
6296    
6297           \w+(?=;)           \w+(?=;)
6298    
6299         matches a word followed by a semicolon, but does not include the  semi-         matches  a word followed by a semicolon, but does not include the semi-
6300         colon in the match, and         colon in the match, and
6301    
6302           foo(?!bar)           foo(?!bar)
6303    
6304         matches  any  occurrence  of  "foo" that is not followed by "bar". Note         matches any occurrence of "foo" that is not  followed  by  "bar".  Note
6305         that the apparently similar pattern         that the apparently similar pattern
6306    
6307           (?!foo)bar           (?!foo)bar
6308    
6309         does not find an occurrence of "bar"  that  is  preceded  by  something         does  not  find  an  occurrence  of "bar" that is preceded by something
6310         other  than "foo"; it finds any occurrence of "bar" whatsoever, because         other than "foo"; it finds any occurrence of "bar" whatsoever,  because
6311         the assertion (?!foo) is always true when the next three characters are         the assertion (?!foo) is always true when the next three characters are
6312         "bar". A lookbehind assertion is needed to achieve the other effect.         "bar". A lookbehind assertion is needed to achieve the other effect.
6313    
6314         If you want to force a matching failure at some point in a pattern, the         If you want to force a matching failure at some point in a pattern, the
6315         most convenient way to do it is  with  (?!)  because  an  empty  string         most  convenient  way  to  do  it  is with (?!) because an empty string
6316         always  matches, so an assertion that requires there not to be an empty         always matches, so an assertion that requires there not to be an  empty
6317         string must always fail.  The backtracking control verb (*FAIL) or (*F)         string must always fail.  The backtracking control verb (*FAIL) or (*F)
6318         is a synonym for (?!).         is a synonym for (?!).
6319    
6320     Lookbehind assertions     Lookbehind assertions
6321    
6322         Lookbehind  assertions start with (?<= for positive assertions and (?<!         Lookbehind assertions start with (?<= for positive assertions and  (?<!
6323         for negative assertions. For example,         for negative assertions. For example,
6324    
6325           (?<!foo)bar           (?<!foo)bar
6326    
6327         does find an occurrence of "bar" that is not  preceded  by  "foo".  The         does  find  an  occurrence  of "bar" that is not preceded by "foo". The
6328         contents  of  a  lookbehind  assertion are restricted such that all the         contents of a lookbehind assertion are restricted  such  that  all  the
6329         strings it matches must have a fixed length. However, if there are sev-         strings it matches must have a fixed length. However, if there are sev-
6330         eral  top-level  alternatives,  they  do  not all have to have the same         eral top-level alternatives, they do not all  have  to  have  the  same
6331         fixed length. Thus         fixed length. Thus
6332    
6333           (?<=bullock|donkey)           (?<=bullock|donkey)
# Line 6294  ASSERTIONS Line 6336  ASSERTIONS
6336    
6337           (?<!dogs?|cats?)           (?<!dogs?|cats?)
6338    
6339         causes an error at compile time. Branches that match  different  length         causes  an  error at compile time. Branches that match different length
6340         strings  are permitted only at the top level of a lookbehind assertion.         strings are permitted only at the top level of a lookbehind  assertion.
6341         This is an extension compared with Perl, which requires all branches to         This is an extension compared with Perl, which requires all branches to
6342         match the same length of string. An assertion such as         match the same length of string. An assertion such as
6343    
6344           (?<=ab(c|de))           (?<=ab(c|de))
6345    
6346         is  not  permitted,  because  its single top-level branch can match two         is not permitted, because its single top-level  branch  can  match  two
6347         different lengths, but it is acceptable to PCRE if rewritten to use two         different lengths, but it is acceptable to PCRE if rewritten to use two
6348         top-level branches:         top-level branches:
6349    
6350           (?<=abc|abde)           (?<=abc|abde)
6351    
6352         In  some  cases, the escape sequence \K (see above) can be used instead         In some cases, the escape sequence \K (see above) can be  used  instead
6353         of a lookbehind assertion to get round the fixed-length restriction.         of a lookbehind assertion to get round the fixed-length restriction.
6354    
6355         The implementation of lookbehind assertions is, for  each  alternative,         The  implementation  of lookbehind assertions is, for each alternative,
6356         to  temporarily  move the current position back by the fixed length and         to temporarily move the current position back by the fixed  length  and
6357         then try to match. If there are insufficient characters before the cur-         then try to match. If there are insufficient characters before the cur-
6358         rent position, the assertion fails.         rent position, the assertion fails.
6359    
6360         In  a UTF mode, PCRE does not allow the \C escape (which matches a sin-         In a UTF mode, PCRE does not allow the \C escape (which matches a  sin-
6361         gle data unit even in a UTF mode) to appear in  lookbehind  assertions,         gle  data  unit even in a UTF mode) to appear in lookbehind assertions,
6362         because  it  makes it impossible to calculate the length of the lookbe-         because it makes it impossible to calculate the length of  the  lookbe-
6363         hind. The \X and \R escapes, which can match different numbers of  data         hind.  The \X and \R escapes, which can match different numbers of data
6364         units, are also not permitted.         units, are also not permitted.
6365    
6366         "Subroutine"  calls  (see below) such as (?2) or (?&X) are permitted in         "Subroutine" calls (see below) such as (?2) or (?&X) are  permitted  in
6367         lookbehinds, as long as the subpattern matches a  fixed-length  string.         lookbehinds,  as  long as the subpattern matches a fixed-length string.
6368         Recursion, however, is not supported.         Recursion, however, is not supported.
6369    
6370         Possessive  quantifiers  can  be  used  in  conjunction with lookbehind         Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
6371         assertions to specify efficient matching of fixed-length strings at the         assertions to specify efficient matching of fixed-length strings at the
6372         end of subject strings. Consider a simple pattern such as         end of subject strings. Consider a simple pattern such as
6373    
6374           abcd$           abcd$
6375    
6376         when  applied  to  a  long string that does not match. Because matching         when applied to a long string that does  not  match.  Because  matching
6377         proceeds from left to right, PCRE will look for each "a" in the subject         proceeds from left to right, PCRE will look for each "a" in the subject
6378         and  then  see  if what follows matches the rest of the pattern. If the         and then see if what follows matches the rest of the  pattern.  If  the
6379         pattern is specified as         pattern is specified as
6380    
6381           ^.*abcd$           ^.*abcd$
6382    
6383         the initial .* matches the entire string at first, but when this  fails         the  initial .* matches the entire string at first, but when this fails
6384         (because there is no following "a"), it backtracks to match all but the         (because there is no following "a"), it backtracks to match all but the
6385         last character, then all but the last two characters, and so  on.  Once         last  character,  then all but the last two characters, and so on. Once
6386         again  the search for "a" covers the entire string, from right to left,         again the search for "a" covers the entire string, from right to  left,
6387         so we are no better off. However, if the pattern is written as         so we are no better off. However, if the pattern is written as
6388    
6389           ^.*+(?<=abcd)           ^.*+(?<=abcd)
6390    
6391         there can be no backtracking for the .*+ item; it can  match  only  the         there  can  be  no backtracking for the .*+ item; it can match only the
6392         entire  string.  The subsequent lookbehind assertion does a single test         entire string. The subsequent lookbehind assertion does a  single  test
6393         on the last four characters. If it fails, the match fails  immediately.         on  the last four characters. If it fails, the match fails immediately.
6394         For  long  strings, this approach makes a significant difference to the         For long strings, this approach makes a significant difference  to  the
6395         processing time.         processing time.
6396    
6397     Using multiple assertions     Using multiple assertions
# Line 6358  ASSERTIONS Line 6400  ASSERTIONS
6400    
6401           (?<=\d{3})(?<!999)foo           (?<=\d{3})(?<!999)foo
6402    
6403         matches "foo" preceded by three digits that are not "999". Notice  that         matches  "foo" preceded by three digits that are not "999". Notice that
6404         each  of  the  assertions is applied independently at the same point in         each of the assertions is applied independently at the  same  point  in
6405         the subject string. First there is a  check  that  the  previous  three         the  subject  string.  First  there  is a check that the previous three
6406         characters  are  all  digits,  and  then there is a check that the same         characters are all digits, and then there is  a  check  that  the  same
6407         three characters are not "999".  This pattern does not match "foo" pre-         three characters are not "999".  This pattern does not match "foo" pre-
6408         ceded  by  six  characters,  the first of which are digits and the last         ceded by six characters, the first of which are  digits  and  the  last
6409         three of which are not "999". For example, it  doesn't  match  "123abc-         three  of  which  are not "999". For example, it doesn't match "123abc-
6410         foo". A pattern to do that is         foo". A pattern to do that is
6411    
6412           (?<=\d{3}...)(?<!999)foo           (?<=\d{3}...)(?<!999)foo
6413    
6414         This  time  the  first assertion looks at the preceding six characters,         This time the first assertion looks at the  preceding  six  characters,
6415         checking that the first three are digits, and then the second assertion         checking that the first three are digits, and then the second assertion
6416         checks that the preceding three characters are not "999".         checks that the preceding three characters are not "999".
6417    
# Line 6377  ASSERTIONS Line 6419  ASSERTIONS
6419    
6420           (?<=(?<!foo)bar)baz           (?<=(?<!foo)bar)baz
6421    
6422         matches  an occurrence of "baz" that is preceded by "bar" which in turn         matches an occurrence of "baz" that is preceded by "bar" which in  turn
6423         is not preceded by "foo", while         is not preceded by "foo", while
6424    
6425           (?<=\d{3}(?!999)...)foo           (?<=\d{3}(?!999)...)foo
6426    
6427         is another pattern that matches "foo" preceded by three digits and  any         is  another pattern that matches "foo" preceded by three digits and any
6428         three characters that are not "999".         three characters that are not "999".
6429    
6430    
6431  CONDITIONAL SUBPATTERNS  CONDITIONAL SUBPATTERNS
6432    
6433         It  is possible to cause the matching process to obey a subpattern con-         It is possible to cause the matching process to obey a subpattern  con-
6434         ditionally or to choose between two alternative subpatterns,  depending         ditionally  or to choose between two alternative subpatterns, depending
6435         on  the result of an assertion, or whether a specific capturing subpat-         on the result of an assertion, or whether a specific capturing  subpat-
6436         tern has already been matched. The two possible  forms  of  conditional         tern  has  already  been matched. The two possible forms of conditional
6437         subpattern are:         subpattern are:
6438    
6439           (?(condition)yes-pattern)           (?(condition)yes-pattern)
6440           (?(condition)yes-pattern|no-pattern)           (?(condition)yes-pattern|no-pattern)
6441    
6442         If  the  condition is satisfied, the yes-pattern is used; otherwise the         If the condition is satisfied, the yes-pattern is used;  otherwise  the
6443         no-pattern (if present) is used. If there are more  than  two  alterna-         no-pattern  (if  present)  is used. If there are more than two alterna-
6444         tives  in  the subpattern, a compile-time error occurs. Each of the two         tives in the subpattern, a compile-time error occurs. Each of  the  two
6445         alternatives may itself contain nested subpatterns of any form, includ-         alternatives may itself contain nested subpatterns of any form, includ-
6446         ing  conditional  subpatterns;  the  restriction  to  two  alternatives         ing  conditional  subpatterns;  the  restriction  to  two  alternatives
6447         applies only at the level of the condition. This pattern fragment is an         applies only at the level of the condition. This pattern fragment is an
# Line 6408  CONDITIONAL SUBPATTERNS Line 6450  CONDITIONAL SUBPATTERNS
6450           (?(1) (A|B|C) | (D | (?(2)E|F) | E) )           (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
6451    
6452    
6453         There  are  four  kinds of condition: references to subpatterns, refer-         There are four kinds of condition: references  to  subpatterns,  refer-
6454         ences to recursion, a pseudo-condition called DEFINE, and assertions.         ences to recursion, a pseudo-condition called DEFINE, and assertions.
6455    
6456     Checking for a used subpattern by number     Checking for a used subpattern by number
6457    
6458         If the text between the parentheses consists of a sequence  of  digits,         If  the  text between the parentheses consists of a sequence of digits,
6459         the condition is true if a capturing subpattern of that number has pre-         the condition is true if a capturing subpattern of that number has pre-
6460         viously matched. If there is more than one  capturing  subpattern  with         viously  matched.  If  there is more than one capturing subpattern with
6461         the  same  number  (see  the earlier section about duplicate subpattern         the same number (see the earlier  section  about  duplicate  subpattern
6462         numbers), the condition is true if any of them have matched. An  alter-         numbers),  the condition is true if any of them have matched. An alter-
6463         native  notation is to precede the digits with a plus or minus sign. In         native notation is to precede the digits with a plus or minus sign.  In
6464         this case, the subpattern number is relative rather than absolute.  The         this  case, the subpattern number is relative rather than absolute. The
6465         most  recently opened parentheses can be referenced by (?(-1), the next         most recently opened parentheses can be referenced by (?(-1), the  next
6466         most recent by (?(-2), and so on. Inside loops it can also  make  sense         most  recent  by (?(-2), and so on. Inside loops it can also make sense
6467         to refer to subsequent groups. The next parentheses to be opened can be         to refer to subsequent groups. The next parentheses to be opened can be
6468         referenced as (?(+1), and so on. (The value zero in any of these  forms         referenced  as (?(+1), and so on. (The value zero in any of these forms
6469         is not used; it provokes a compile-time error.)         is not used; it provokes a compile-time error.)
6470    
6471         Consider  the  following  pattern, which contains non-significant white         Consider the following pattern, which  contains  non-significant  white
6472         space to make it more readable (assume the PCRE_EXTENDED option) and to         space to make it more readable (assume the PCRE_EXTENDED option) and to
6473         divide it into three parts for ease of discussion:         divide it into three parts for ease of discussion:
6474    
6475           ( \( )?    [^()]+    (?(1) \) )           ( \( )?    [^()]+    (?(1) \) )
6476    
6477         The  first  part  matches  an optional opening parenthesis, and if that         The first part matches an optional opening  parenthesis,  and  if  that
6478         character is present, sets it as the first captured substring. The sec-         character is present, sets it as the first captured substring. The sec-
6479         ond  part  matches one or more characters that are not parentheses. The         ond part matches one or more characters that are not  parentheses.  The
6480         third part is a conditional subpattern that tests whether  or  not  the         third  part  is  a conditional subpattern that tests whether or not the
6481         first  set  of  parentheses  matched.  If they did, that is, if subject         first set of parentheses matched. If they  did,  that  is,  if  subject
6482         started with an opening parenthesis, the condition is true, and so  the         started  with an opening parenthesis, the condition is true, and so the
6483         yes-pattern  is  executed and a closing parenthesis is required. Other-         yes-pattern is executed and a closing parenthesis is  required.  Other-
6484         wise, since no-pattern is not present, the subpattern matches  nothing.         wise,  since no-pattern is not present, the subpattern matches nothing.
6485         In  other  words,  this  pattern matches a sequence of non-parentheses,         In other words, this pattern matches  a  sequence  of  non-parentheses,
6486         optionally enclosed in parentheses.         optionally enclosed in parentheses.
6487    
6488         If you were embedding this pattern in a larger one,  you  could  use  a         If  you  were  embedding  this pattern in a larger one, you could use a
6489         relative reference:         relative reference:
6490    
6491           ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...           ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
6492    
6493         This  makes  the  fragment independent of the parentheses in the larger         This makes the fragment independent of the parentheses  in  the  larger
6494         pattern.         pattern.
6495    
6496     Checking for a used subpattern by name     Checking for a used subpattern by name
6497    
6498         Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a         Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
6499         used  subpattern  by  name.  For compatibility with earlier versions of         used subpattern by name. For compatibility  with  earlier  versions  of
6500         PCRE, which had this facility before Perl, the syntax  (?(name)...)  is         PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
6501         also  recognized. However, there is a possible ambiguity with this syn-         also recognized. However, there is a possible ambiguity with this  syn-
6502         tax, because subpattern names may  consist  entirely  of  digits.  PCRE         tax,  because  subpattern  names  may  consist entirely of digits. PCRE
6503         looks  first for a named subpattern; if it cannot find one and the name         looks first for a named subpattern; if it cannot find one and the  name
6504         consists entirely of digits, PCRE looks for a subpattern of  that  num-         consists  entirely  of digits, PCRE looks for a subpattern of that num-
6505         ber,  which must be greater than zero. Using subpattern names that con-         ber, which must be greater than zero. Using subpattern names that  con-
6506         sist entirely of digits is not recommended.         sist entirely of digits is not recommended.
6507    
6508         Rewriting the above example to use a named subpattern gives this:         Rewriting the above example to use a named subpattern gives this:
6509    
6510           (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )           (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
6511    
6512         If the name used in a condition of this kind is a duplicate,  the  test         If  the  name used in a condition of this kind is a duplicate, the test
6513         is  applied to all subpatterns of the same name, and is true if any one         is applied to all subpatterns of the same name, and is true if any  one
6514         of them has matched.         of them has matched.
6515    
6516     Checking for pattern recursion     Checking for pattern recursion
6517    
6518         If the condition is the string (R), and there is no subpattern with the         If the condition is the string (R), and there is no subpattern with the
6519         name  R, the condition is true if a recursive call to the whole pattern         name R, the condition is true if a recursive call to the whole  pattern
6520         or any subpattern has been made. If digits or a name preceded by amper-         or any subpattern has been made. If digits or a name preceded by amper-
6521         sand follow the letter R, for example:         sand follow the letter R, for example:
6522    
# Line 6482  CONDITIONAL SUBPATTERNS Line 6524  CONDITIONAL SUBPATTERNS
6524    
6525         the condition is true if the most recent recursion is into a subpattern         the condition is true if the most recent recursion is into a subpattern
6526         whose number or name is given. This condition does not check the entire         whose number or name is given. This condition does not check the entire
6527         recursion  stack.  If  the  name  used in a condition of this kind is a         recursion stack. If the name used in a condition  of  this  kind  is  a
6528         duplicate, the test is applied to all subpatterns of the same name, and         duplicate, the test is applied to all subpatterns of the same name, and
6529         is true if any one of them is the most recent recursion.         is true if any one of them is the most recent recursion.
6530    
6531         At  "top  level",  all  these recursion test conditions are false.  The         At "top level", all these recursion test  conditions  are  false.   The
6532         syntax for recursive patterns is described below.         syntax for recursive patterns is described below.
6533    
6534     Defining subpatterns for use by reference only     Defining subpatterns for use by reference only
6535    
6536         If the condition is the string (DEFINE), and  there  is  no  subpattern         If  the  condition  is  the string (DEFINE), and there is no subpattern
6537         with  the  name  DEFINE,  the  condition is always false. In this case,         with the name DEFINE, the condition is  always  false.  In  this  case,
6538         there may be only one alternative  in  the  subpattern.  It  is  always         there  may  be  only  one  alternative  in the subpattern. It is always
6539         skipped  if  control  reaches  this  point  in the pattern; the idea of         skipped if control reaches this point  in  the  pattern;  the  idea  of
6540         DEFINE is that it can be used to define subroutines that can be  refer-         DEFINE  is that it can be used to define subroutines that can be refer-
6541         enced  from elsewhere. (The use of subroutines is described below.) For         enced from elsewhere. (The use of subroutines is described below.)  For
6542         example, a pattern to match an IPv4 address  such  as  "192.168.23.245"         example,  a  pattern  to match an IPv4 address such as "192.168.23.245"
6543         could be written like this (ignore white space and line breaks):         could be written like this (ignore white space and line breaks):
6544    
6545           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )           (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
6546           \b (?&byte) (\.(?&byte)){3} \b           \b (?&byte) (\.(?&byte)){3} \b
6547    
6548         The  first part of the pattern is a DEFINE group inside which a another         The first part of the pattern is a DEFINE group inside which a  another
6549         group named "byte" is defined. This matches an individual component  of         group  named "byte" is defined. This matches an individual component of
6550         an  IPv4  address  (a number less than 256). When matching takes place,         an IPv4 address (a number less than 256). When  matching  takes  place,
6551         this part of the pattern is skipped because DEFINE acts  like  a  false         this  part  of  the pattern is skipped because DEFINE acts like a false
6552         condition.  The  rest of the pattern uses references to the named group         condition. The rest of the pattern uses references to the  named  group
6553         to match the four dot-separated components of an IPv4 address,  insist-         to  match the four dot-separated components of an IPv4 address, insist-
6554         ing on a word boundary at each end.         ing on a word boundary at each end.
6555    
6556     Assertion conditions     Assertion conditions
6557    
6558         If  the  condition  is  not  in any of the above formats, it must be an         If the condition is not in any of the above  formats,  it  must  be  an
6559         assertion.  This may be a positive or negative lookahead or  lookbehind         assertion.   This may be a positive or negative lookahead or lookbehind
6560         assertion.  Consider  this  pattern,  again  containing non-significant         assertion. Consider  this  pattern,  again  containing  non-significant
6561         white space, and with the two alternatives on the second line:         white space, and with the two alternatives on the second line:
6562    
6563           (?(?=[^a-z]*[a-z])           (?(?=[^a-z]*[a-z])
6564           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )           \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
6565    
6566         The condition  is  a  positive  lookahead  assertion  that  matches  an         The  condition  is  a  positive  lookahead  assertion  that  matches an
6567         optional  sequence of non-letters followed by a letter. In other words,         optional sequence of non-letters followed by a letter. In other  words,
6568         it tests for the presence of at least one letter in the subject.  If  a         it  tests  for the presence of at least one letter in the subject. If a
6569         letter  is found, the subject is matched against the first alternative;         letter is found, the subject is matched against the first  alternative;
6570         otherwise it is  matched  against  the  second.  This  pattern  matches         otherwise  it  is  matched  against  the  second.  This pattern matches
6571         strings  in  one  of the two forms dd-aaa-dd or dd-dd-dd, where aaa are         strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
6572         letters and dd are digits.         letters and dd are digits.
6573    
6574    
# Line 6535  COMMENTS Line 6577  COMMENTS
6577         There are two ways of including comments in patterns that are processed         There are two ways of including comments in patterns that are processed
6578         by PCRE. In both cases, the start of the comment must not be in a char-         by PCRE. In both cases, the start of the comment must not be in a char-
6579         acter class, nor in the middle of any other sequence of related charac-         acter class, nor in the middle of any other sequence of related charac-
6580         ters  such  as  (?: or a subpattern name or number. The characters that         ters such as (?: or a subpattern name or number.  The  characters  that
6581         make up a comment play no part in the pattern matching.         make up a comment play no part in the pattern matching.
6582    
6583         The sequence (?# marks the start of a comment that continues up to  the         The  sequence (?# marks the start of a comment that continues up to the
6584         next  closing parenthesis. Nested parentheses are not permitted. If the         next closing parenthesis. Nested parentheses are not permitted. If  the
6585         PCRE_EXTENDED option is set, an unescaped # character also introduces a         PCRE_EXTENDED option is set, an unescaped # character also introduces a
6586         comment,  which  in  this  case continues to immediately after the next         comment, which in this case continues to  immediately  after  the  next
6587         newline character or character sequence in the pattern.  Which  charac-         newline  character  or character sequence in the pattern. Which charac-
6588         ters are interpreted as newlines is controlled by the options passed to         ters are interpreted as newlines is controlled by the options passed to
6589         a compiling function or by a special sequence at the start of the  pat-         a  compiling function or by a special sequence at the start of the pat-
6590         tern, as described in the section entitled "Newline conventions" above.         tern, as described in the section entitled "Newline conventions" above.
6591         Note that the end of this type of comment is a literal newline sequence         Note that the end of this type of comment is a literal newline sequence
6592         in  the pattern; escape sequences that happen to represent a newline do         in the pattern; escape sequences that happen to represent a newline  do
6593         not count. For example, consider this  pattern  when  PCRE_EXTENDED  is         not  count.  For  example,  consider this pattern when PCRE_EXTENDED is
6594         set, and the default newline convention is in force:         set, and the default newline convention is in force:
6595    
6596           abc #comment \n still comment           abc #comment \n still comment
6597    
6598         On  encountering  the  # character, pcre_compile() skips along, looking         On encountering the # character, pcre_compile()  skips  along,  looking
6599         for a newline in the pattern. The sequence \n is still literal at  this         for  a newline in the pattern. The sequence \n is still literal at this
6600         stage,  so  it does not terminate the comment. Only an actual character         stage, so it does not terminate the comment. Only an  actual  character
6601         with the code value 0x0a (the default newline) does so.         with the code value 0x0a (the default newline) does so.
6602    
6603    
6604  RECURSIVE PATTERNS  RECURSIVE PATTERNS
6605    
6606         Consider the problem of matching a string in parentheses, allowing  for         Consider  the problem of matching a string in parentheses, allowing for
6607         unlimited  nested  parentheses.  Without the use of recursion, the best         unlimited nested parentheses. Without the use of  recursion,  the  best
6608         that can be done is to use a pattern that  matches  up  to  some  fixed         that  can  be  done  is  to use a pattern that matches up to some fixed
6609         depth  of  nesting.  It  is not possible to handle an arbitrary nesting         depth of nesting. It is not possible to  handle  an  arbitrary  nesting
6610         depth.         depth.
6611    
6612         For some time, Perl has provided a facility that allows regular expres-         For some time, Perl has provided a facility that allows regular expres-
6613         sions  to recurse (amongst other things). It does this by interpolating         sions to recurse (amongst other things). It does this by  interpolating
6614         Perl code in the expression at run time, and the code can refer to  the         Perl  code in the expression at run time, and the code can refer to the
6615         expression itself. A Perl pattern using code interpolation to solve the         expression itself. A Perl pattern using code interpolation to solve the
6616         parentheses problem can be created like this:         parentheses problem can be created like this:
6617    
# Line 6579  RECURSIVE PATTERNS Line 6621  RECURSIVE PATTERNS
6621         refers recursively to the pattern in which it appears.         refers recursively to the pattern in which it appears.
6622    
6623         Obviously, PCRE cannot support the interpolation of Perl code. Instead,         Obviously, PCRE cannot support the interpolation of Perl code. Instead,
6624         it supports special syntax for recursion of  the  entire  pattern,  and         it  supports  special  syntax  for recursion of the entire pattern, and
6625         also  for  individual  subpattern  recursion. After its introduction in         also for individual subpattern recursion.  After  its  introduction  in
6626         PCRE and Python, this kind of  recursion  was  subsequently  introduced         PCRE  and  Python,  this  kind of recursion was subsequently introduced
6627         into Perl at release 5.10.         into Perl at release 5.10.
6628    
6629         A  special  item  that consists of (? followed by a number greater than         A special item that consists of (? followed by a  number  greater  than
6630         zero and a closing parenthesis is a recursive subroutine  call  of  the         zero  and  a  closing parenthesis is a recursive subroutine call of the
6631         subpattern  of  the  given  number, provided that it occurs inside that         subpattern of the given number, provided that  it  occurs  inside  that
6632         subpattern. (If not, it is a non-recursive subroutine  call,  which  is         subpattern.  (If  not,  it is a non-recursive subroutine call, which is
6633         described  in  the  next  section.)  The special item (?R) or (?0) is a         described in the next section.) The special item  (?R)  or  (?0)  is  a
6634         recursive call of the entire regular expression.         recursive call of the entire regular expression.
6635    
6636         This PCRE pattern solves the nested  parentheses  problem  (assume  the         This  PCRE  pattern  solves  the nested parentheses problem (assume the
6637         PCRE_EXTENDED option is set so that white space is ignored):         PCRE_EXTENDED option is set so that white space is ignored):
6638    
6639           \( ( [^()]++ | (?R) )* \)           \( ( [^()]++ | (?R) )* \)
6640    
6641         First  it matches an opening parenthesis. Then it matches any number of         First it matches an opening parenthesis. Then it matches any number  of
6642         substrings which can either be a  sequence  of  non-parentheses,  or  a         substrings  which  can  either  be  a sequence of non-parentheses, or a
6643         recursive  match  of the pattern itself (that is, a correctly parenthe-         recursive match of the pattern itself (that is, a  correctly  parenthe-
6644         sized substring).  Finally there is a closing parenthesis. Note the use         sized substring).  Finally there is a closing parenthesis. Note the use
6645         of a possessive quantifier to avoid backtracking into sequences of non-         of a possessive quantifier to avoid backtracking into sequences of non-
6646         parentheses.         parentheses.
6647    
6648         If this were part of a larger pattern, you would not  want  to  recurse         If  this  were  part of a larger pattern, you would not want to recurse
6649         the entire pattern, so instead you could use this:         the entire pattern, so instead you could use this:
6650    
6651           ( \( ( [^()]++ | (?1) )* \) )           ( \( ( [^()]++ | (?1) )* \) )
6652    
6653         We  have  put the pattern into parentheses, and caused the recursion to         We have put the pattern into parentheses, and caused the  recursion  to
6654         refer to them instead of the whole pattern.         refer to them instead of the whole pattern.
6655    
6656         In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be         In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
6657         tricky.  This is made easier by the use of relative references. Instead         tricky. This is made easier by the use of relative references.  Instead
6658         of (?1) in the pattern above you can write (?-2) to refer to the second         of (?1) in the pattern above you can write (?-2) to refer to the second
6659         most  recently  opened  parentheses  preceding  the recursion. In other         most recently opened parentheses  preceding  the  recursion.  In  other
6660         words, a negative number counts capturing  parentheses  leftwards  from         words,  a  negative  number counts capturing parentheses leftwards from
6661         the point at which it is encountered.         the point at which it is encountered.
6662    
6663         It  is  also  possible  to refer to subsequently opened parentheses, by         It is also possible to refer to  subsequently  opened  parentheses,  by
6664         writing references such as (?+2). However, these  cannot  be  recursive         writing  references  such  as (?+2). However, these cannot be recursive
6665         because  the  reference  is  not inside the parentheses that are refer-         because the reference is not inside the  parentheses  that  are  refer-
6666         enced. They are always non-recursive subroutine calls, as described  in         enced.  They are always non-recursive subroutine calls, as described in
6667         the next section.         the next section.
6668    
6669         An  alternative  approach is to use named parentheses instead. The Perl         An alternative approach is to use named parentheses instead.  The  Perl
6670         syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also         syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
6671         supported. We could rewrite the above example as follows:         supported. We could rewrite the above example as follows:
6672    
6673           (?<pn> \( ( [^()]++ | (?&pn) )* \) )           (?<pn> \( ( [^()]++ | (?&pn) )* \) )
6674    
6675         If  there  is more than one subpattern with the same name, the earliest         If there is more than one subpattern with the same name,  the  earliest
6676         one is used.         one is used.
6677    
6678         This particular example pattern that we have been looking  at  contains         This  particular  example pattern that we have been looking at contains
6679         nested unlimited repeats, and so the use of a possessive quantifier for         nested unlimited repeats, and so the use of a possessive quantifier for
6680         matching strings of non-parentheses is important when applying the pat-         matching strings of non-parentheses is important when applying the pat-
6681         tern  to  strings  that do not match. For example, when this pattern is         tern to strings that do not match. For example, when  this  pattern  is
6682         applied to         applied to
6683    
6684           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
6685    
6686         it yields "no match" quickly. However, if a  possessive  quantifier  is         it  yields  "no  match" quickly. However, if a possessive quantifier is
6687         not  used, the match runs for a very long time indeed because there are         not used, the match runs for a very long time indeed because there  are
6688         so many different ways the + and * repeats can carve  up  the  subject,         so  many  different  ways the + and * repeats can carve up the subject,
6689         and all have to be tested before failure can be reported.         and all have to be tested before failure can be reported.
6690    
6691         At  the  end  of a match, the values of capturing parentheses are those         At the end of a match, the values of capturing  parentheses  are  those
6692         from the outermost level. If you want to obtain intermediate values,  a         from  the outermost level. If you want to obtain intermediate values, a
6693         callout  function can be used (see below and the pcrecallout documenta-         callout function can be used (see below and the pcrecallout  documenta-
6694         tion). If the pattern above is matched against         tion). If the pattern above is matched against
6695    
6696           (ab(cd)ef)           (ab(cd)ef)
6697    
6698         the value for the inner capturing parentheses  (numbered  2)  is  "ef",         the  value  for  the  inner capturing parentheses (numbered 2) is "ef",
6699         which  is the last value taken on at the top level. If a capturing sub-         which is the last value taken on at the top level. If a capturing  sub-
6700         pattern is not matched at the top level, its final  captured  value  is         pattern  is  not  matched at the top level, its final captured value is
6701         unset,  even  if  it was (temporarily) set at a deeper level during the         unset, even if it was (temporarily) set at a deeper  level  during  the
6702         matching process.         matching process.
6703    
6704         If there are more than 15 capturing parentheses in a pattern, PCRE  has         If  there are more than 15 capturing parentheses in a pattern, PCRE has
6705         to  obtain extra memory to store data during a recursion, which it does         to obtain extra memory to store data during a recursion, which it  does
6706         by using pcre_malloc, freeing it via pcre_free afterwards. If no memory         by using pcre_malloc, freeing it via pcre_free afterwards. If no memory
6707         can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.         can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
6708    
6709         Do  not  confuse  the (?R) item with the condition (R), which tests for         Do not confuse the (?R) item with the condition (R),  which  tests  for
6710         recursion.  Consider this pattern, which matches text in  angle  brack-         recursion.   Consider  this pattern, which matches text in angle brack-
6711         ets,  allowing for arbitrary nesting. Only digits are allowed in nested         ets, allowing for arbitrary nesting. Only digits are allowed in  nested
6712         brackets (that is, when recursing), whereas any characters are  permit-         brackets  (that is, when recursing), whereas any characters are permit-
6713         ted at the outer level.         ted at the outer level.
6714    
6715           < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >           < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
6716    
6717         In  this  pattern, (?(R) is the start of a conditional subpattern, with         In this pattern, (?(R) is the start of a conditional  subpattern,  with
6718         two different alternatives for the recursive and  non-recursive  cases.         two  different  alternatives for the recursive and non-recursive cases.
6719         The (?R) item is the actual recursive call.         The (?R) item is the actual recursive call.
6720    
6721     Differences in recursion processing between PCRE and Perl     Differences in recursion processing between PCRE and Perl
6722    
6723         Recursion  processing  in PCRE differs from Perl in two important ways.         Recursion processing in PCRE differs from Perl in two  important  ways.
6724         In PCRE (like Python, but unlike Perl), a recursive subpattern call  is         In  PCRE (like Python, but unlike Perl), a recursive subpattern call is
6725         always treated as an atomic group. That is, once it has matched some of         always treated as an atomic group. That is, once it has matched some of
6726         the subject string, it is never re-entered, even if it contains untried         the subject string, it is never re-entered, even if it contains untried
6727         alternatives  and  there  is a subsequent matching failure. This can be         alternatives and there is a subsequent matching failure.  This  can  be
6728         illustrated by the following pattern, which purports to match a  palin-         illustrated  by the following pattern, which purports to match a palin-
6729         dromic  string  that contains an odd number of characters (for example,         dromic string that contains an odd number of characters  (for  example,
6730         "a", "aba", "abcba", "abcdcba"):         "a", "aba", "abcba", "abcdcba"):
6731    
6732           ^(.|(.)(?1)\2)$           ^(.|(.)(?1)\2)$
6733    
6734         The idea is that it either matches a single character, or two identical         The idea is that it either matches a single character, or two identical
6735         characters  surrounding  a sub-palindrome. In Perl, this pattern works;         characters surrounding a sub-palindrome. In Perl, this  pattern  works;
6736         in PCRE it does not if the pattern is  longer  than  three  characters.         in  PCRE  it  does  not if the pattern is longer than three characters.
6737         Consider the subject string "abcba":         Consider the subject string "abcba":
6738    
6739         At  the  top level, the first character is matched, but as it is not at         At the top level, the first character is matched, but as it is  not  at
6740         the end of the string, the first alternative fails; the second alterna-         the end of the string, the first alternative fails; the second alterna-
6741         tive is taken and the recursion kicks in. The recursive call to subpat-         tive is taken and the recursion kicks in. The recursive call to subpat-
6742         tern 1 successfully matches the next character ("b").  (Note  that  the         tern  1  successfully  matches the next character ("b"). (Note that the
6743         beginning and end of line tests are not part of the recursion).         beginning and end of line tests are not part of the recursion).
6744    
6745         Back  at  the top level, the next character ("c") is compared with what         Back at the top level, the next character ("c") is compared  with  what
6746         subpattern 2 matched, which was "a". This fails. Because the  recursion         subpattern  2 matched, which was "a". This fails. Because the recursion
6747         is  treated  as  an atomic group, there are now no backtracking points,         is treated as an atomic group, there are now  no  backtracking  points,
6748         and so the entire match fails. (Perl is able, at  this  point,  to  re-         and  so  the  entire  match fails. (Perl is able, at this point, to re-
6749         enter  the  recursion  and try the second alternative.) However, if the         enter the recursion and try the second alternative.)  However,  if  the
6750         pattern is written with the alternatives in the other order, things are         pattern is written with the alternatives in the other order, things are
6751         different:         different:
6752    
6753           ^((.)(?1)\2|.)$           ^((.)(?1)\2|.)$
6754    
6755         This  time,  the recursing alternative is tried first, and continues to         This time, the recursing alternative is tried first, and  continues  to
6756         recurse until it runs out of characters, at which point  the  recursion         recurse  until  it runs out of characters, at which point the recursion
6757         fails.  But  this  time  we  do  have another alternative to try at the         fails. But this time we do have  another  alternative  to  try  at  the
6758         higher level. That is the big difference:  in  the  previous  case  the         higher  level.  That  is  the  big difference: in the previous case the
6759         remaining alternative is at a deeper recursion level, which PCRE cannot         remaining alternative is at a deeper recursion level, which PCRE cannot
6760         use.         use.
6761    
6762         To change the pattern so that it matches all palindromic  strings,  not         To  change  the pattern so that it matches all palindromic strings, not
6763         just  those  with an odd number of characters, it is tempting to change         just those with an odd number of characters, it is tempting  to  change
6764         the pattern to this:         the pattern to this:
6765    
6766           ^((.)(?1)\2|.?)$           ^((.)(?1)\2|.?)$
6767    
6768         Again, this works in Perl, but not in PCRE, and for  the  same  reason.         Again,  this  works  in Perl, but not in PCRE, and for the same reason.
6769         When  a  deeper  recursion has matched a single character, it cannot be         When a deeper recursion has matched a single character,  it  cannot  be
6770         entered again in order to match an empty string.  The  solution  is  to         entered  again  in  order  to match an empty string. The solution is to
6771         separate  the two cases, and write out the odd and even cases as alter-         separate the two cases, and write out the odd and even cases as  alter-
6772         natives at the higher level:         natives at the higher level:
6773    
6774           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))           ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
6775    
6776         If you want to match typical palindromic phrases, the  pattern  has  to         If  you  want  to match typical palindromic phrases, the pattern has to
6777         ignore all non-word characters, which can be done like this:         ignore all non-word characters, which can be done like this:
6778    
6779           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$           ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
6780    
6781         If run with the PCRE_CASELESS option, this pattern matches phrases such         If run with the PCRE_CASELESS option, this pattern matches phrases such
6782         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and         as "A man, a plan, a canal: Panama!" and it works well in both PCRE and
6783         Perl.  Note the use of the possessive quantifier *+ to avoid backtrack-         Perl. Note the use of the possessive quantifier *+ to avoid  backtrack-
6784         ing into sequences of non-word characters. Without this, PCRE  takes  a         ing  into  sequences of non-word characters. Without this, PCRE takes a
6785         great  deal  longer  (ten  times or more) to match typical phrases, and         great deal longer (ten times or more) to  match  typical  phrases,  and
6786         Perl takes so long that you think it has gone into a loop.         Perl takes so long that you think it has gone into a loop.
6787    
6788         WARNING: The palindrome-matching patterns above work only if  the  sub-         WARNING:  The  palindrome-matching patterns above work only if the sub-
6789         ject  string  does not start with a palindrome that is shorter than the         ject string does not start with a palindrome that is shorter  than  the
6790         entire string.  For example, although "abcba" is correctly matched,  if         entire  string.  For example, although "abcba" is correctly matched, if
6791         the  subject  is "ababa", PCRE finds the palindrome "aba" at the start,         the subject is "ababa", PCRE finds the palindrome "aba" at  the  start,
6792         then fails at top level because the end of the string does not  follow.         then  fails at top level because the end of the string does not follow.
6793         Once  again, it cannot jump back into the recursion to try other alter-         Once again, it cannot jump back into the recursion to try other  alter-
6794         natives, so the entire match fails.         natives, so the entire match fails.
6795    
6796         The second way in which PCRE and Perl differ in  their  recursion  pro-         The  second  way  in which PCRE and Perl differ in their recursion pro-
6797         cessing  is in the handling of captured values. In Perl, when a subpat-         cessing is in the handling of captured values. In Perl, when a  subpat-
6798         tern is called recursively or as a subpattern (see the  next  section),         tern  is  called recursively or as a subpattern (see the next section),
6799         it  has  no  access to any values that were captured outside the recur-         it has no access to any values that were captured  outside  the  recur-
6800         sion, whereas in PCRE these values can  be  referenced.  Consider  this         sion,  whereas  in  PCRE  these values can be referenced. Consider this
6801         pattern:         pattern:
6802    
6803           ^(.)(\1|a(?2))           ^(.)(\1|a(?2))
6804    
6805         In  PCRE,  this  pattern matches "bab". The first capturing parentheses         In PCRE, this pattern matches "bab". The  first  capturing  parentheses
6806         match "b", then in the second group, when the back reference  \1  fails         match  "b",  then in the second group, when the back reference \1 fails
6807         to  match "b", the second alternative matches "a" and then recurses. In         to match "b", the second alternative matches "a" and then recurses.  In
6808         the recursion, \1 does now match "b" and so the whole  match  succeeds.         the  recursion,  \1 does now match "b" and so the whole match succeeds.
6809         In  Perl,  the pattern fails to match because inside the recursive call         In Perl, the pattern fails to match because inside the  recursive  call
6810         \1 cannot access the externally set value.         \1 cannot access the externally set value.
6811    
6812    
6813  SUBPATTERNS AS SUBROUTINES  SUBPATTERNS AS SUBROUTINES
6814    
6815         If the syntax for a recursive subpattern call (either by number  or  by         If  the  syntax for a recursive subpattern call (either by number or by
6816         name)  is  used outside the parentheses to which it refers, it operates         name) is used outside the parentheses to which it refers,  it  operates
6817         like a subroutine in a programming language. The called subpattern  may         like  a subroutine in a programming language. The called subpattern may
6818         be  defined  before or after the reference. A numbered reference can be         be defined before or after the reference. A numbered reference  can  be
6819         absolute or relative, as in these examples:         absolute or relative, as in these examples:
6820    
6821           (...(absolute)...)...(?2)...           (...(absolute)...)...(?2)...
# Line 6784  SUBPATTERNS AS SUBROUTINES Line 6826  SUBPATTERNS AS SUBROUTINES
6826    
6827           (sens|respons)e and \1ibility           (sens|respons)e and \1ibility
6828    
6829         matches "sense and sensibility" and "response and responsibility",  but         matches  "sense and sensibility" and "response and responsibility", but
6830         not "sense and responsibility". If instead the pattern         not "sense and responsibility". If instead the pattern
6831    
6832           (sens|respons)e and (?1)ibility           (sens|respons)e and (?1)ibility
6833    
6834         is  used, it does match "sense and responsibility" as well as the other         is used, it does match "sense and responsibility" as well as the  other
6835         two strings. Another example is  given  in  the  discussion  of  DEFINE         two  strings.  Another  example  is  given  in the discussion of DEFINE
6836         above.         above.
6837    
6838         All  subroutine  calls, whether recursive or not, are always treated as         All subroutine calls, whether recursive or not, are always  treated  as
6839         atomic groups. That is, once a subroutine has matched some of the  sub-         atomic  groups. That is, once a subroutine has matched some of the sub-
6840         ject string, it is never re-entered, even if it contains untried alter-         ject string, it is never re-entered, even if it contains untried alter-
6841         natives and there is  a  subsequent  matching  failure.  Any  capturing         natives  and  there  is  a  subsequent  matching failure. Any capturing
6842         parentheses  that  are  set  during the subroutine call revert to their         parentheses that are set during the subroutine  call  revert  to  their
6843         previous values afterwards.         previous values afterwards.
6844    
6845         Processing options such as case-independence are fixed when  a  subpat-         Processing  options  such as case-independence are fixed when a subpat-
6846         tern  is defined, so if it is used as a subroutine, such options cannot         tern is defined, so if it is used as a subroutine, such options  cannot
6847         be changed for different calls. For example, consider this pattern:         be changed for different calls. For example, consider this pattern:
6848    
6849           (abc)(?i:(?-1))           (abc)(?i:(?-1))
6850    
6851         It matches "abcabc". It does not match "abcABC" because the  change  of         It  matches  "abcabc". It does not match "abcABC" because the change of
6852         processing option does not affect the called subpattern.         processing option does not affect the called subpattern.
6853    
6854    
6855  ONIGURUMA SUBROUTINE SYNTAX  ONIGURUMA SUBROUTINE SYNTAX
6856    
6857         For  compatibility with Oniguruma, the non-Perl syntax \g followed by a         For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
6858         name or a number enclosed either in angle brackets or single quotes, is         name or a number enclosed either in angle brackets or single quotes, is
6859         an  alternative  syntax  for  referencing a subpattern as a subroutine,         an alternative syntax for referencing a  subpattern  as  a  subroutine,
6860         possibly recursively. Here are two of the examples used above,  rewrit-         possibly  recursively. Here are two of the examples used above, rewrit-
6861         ten using this syntax:         ten using this syntax:
6862    
6863           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )           (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
6864           (sens|respons)e and \g'1'ibility           (sens|respons)e and \g'1'ibility
6865    
6866         PCRE  supports  an extension to Oniguruma: if a number is preceded by a         PCRE supports an extension to Oniguruma: if a number is preceded  by  a
6867         plus or a minus sign it is taken as a relative reference. For example:         plus or a minus sign it is taken as a relative reference. For example:
6868    
6869           (abc)(?i:\g<-1>)           (abc)(?i:\g<-1>)
6870    
6871         Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are  not         Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
6872         synonymous.  The former is a back reference; the latter is a subroutine         synonymous. The former is a back reference; the latter is a  subroutine
6873         call.         call.
6874    
6875    
6876  CALLOUTS  CALLOUTS
6877    
6878         Perl has a feature whereby using the sequence (?{...}) causes arbitrary         Perl has a feature whereby using the sequence (?{...}) causes arbitrary
6879         Perl  code to be obeyed in the middle of matching a regular expression.         Perl code to be obeyed in the middle of matching a regular  expression.
6880         This makes it possible, amongst other things, to extract different sub-         This makes it possible, amongst other things, to extract different sub-
6881         strings that match the same pair of parentheses when there is a repeti-         strings that match the same pair of parentheses when there is a repeti-
6882         tion.         tion.
6883    
6884         PCRE provides a similar feature, but of course it cannot obey arbitrary         PCRE provides a similar feature, but of course it cannot obey arbitrary
6885         Perl code. The feature is called "callout". The caller of PCRE provides         Perl code. The feature is called "callout". The caller of PCRE provides
6886         an external function by putting its entry point in the global  variable         an  external function by putting its entry point in the global variable
6887         pcre_callout  (8-bit  library) or pcre[16|32]_callout (16-bit or 32-bit         pcre_callout (8-bit library) or pcre[16|32]_callout (16-bit  or  32-bit
6888         library).  By default, this variable contains NULL, which disables  all         library).   By default, this variable contains NULL, which disables all
6889         calling out.         calling out.
6890    
6891         Within  a  regular  expression,  (?C) indicates the points at which the         Within a regular expression, (?C) indicates the  points  at  which  the
6892         external function is to be called. If you want  to  identify  different         external  function  is  to be called. If you want to identify different
6893         callout  points, you can put a number less than 256 after the letter C.         callout points, you can put a number less than 256 after the letter  C.
6894         The default value is zero.  For example, this pattern has  two  callout         The  default  value is zero.  For example, this pattern has two callout
6895         points:         points:
6896    
6897           (?C1)abc(?C2)def           (?C1)abc(?C2)def
6898    
6899         If  the PCRE_AUTO_CALLOUT flag is passed to a compiling function, call-         If the PCRE_AUTO_CALLOUT flag is passed to a compiling function,  call-
6900         outs are automatically installed before each item in the pattern.  They         outs  are automatically installed before each item in the pattern. They
6901         are all numbered 255.         are all numbered 255. If there is a conditional group  in  the  pattern
6902           whose condition is an assertion, an additional callout is inserted just
6903         During  matching, when PCRE reaches a callout point, the external func-         before the condition. An explicit callout may also be set at this posi-
6904         tion is called. It is provided with the  number  of  the  callout,  the         tion, as in this example:
6905         position  in  the pattern, and, optionally, one item of data originally  
6906         supplied by the caller of the matching function. The  callout  function           (?(?C9)(?=a)abc|def)
6907         may  cause  matching to proceed, to backtrack, or to fail altogether. A  
6908         complete description of the interface to the callout function is  given         Note that this applies only to assertion conditions, not to other types
6909           of condition.
6910    
6911           During matching, when PCRE reaches a callout point, the external  func-
6912           tion  is  called.  It  is  provided with the number of the callout, the
6913           position in the pattern, and, optionally, one item of  data  originally
6914           supplied  by  the caller of the matching function. The callout function
6915           may cause matching to proceed, to backtrack, or to fail  altogether.  A
6916           complete  description of the interface to the callout function is given
6917         in the pcrecallout documentation.         in the pcrecallout documentation.
6918    
6919    
6920  BACKTRACKING CONTROL  BACKTRACKING CONTROL
6921    
6922         Perl  5.10 introduced a number of "Special Backtracking Control Verbs",         Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
6923         which are described in the Perl documentation as "experimental and sub-         which  are  still  described in the Perl documentation as "experimental
6924         ject  to  change or removal in a future version of Perl". It goes on to         and subject to change or removal in a future version of Perl". It  goes
6925         say: "Their usage in production code should be noted to avoid  problems         on  to  say:  "Their  usage in production code should be noted to avoid
6926         during upgrades." The same remarks apply to the PCRE features described         problems during upgrades." The same remarks apply to the PCRE  features
6927         in this section.         described in this section.
   
        Since these verbs are specifically related  to  backtracking,  most  of  
        them  can  be  used only when the pattern is to be matched using one of  
        the traditional matching functions, which use a backtracking algorithm.  
        With  the  exception  of (*FAIL), which behaves like a failing negative  
        assertion, they cause an error if encountered by a DFA  matching  func-  
        tion.  
   
        If  any of these verbs are used in an assertion or in a subpattern that  
        is called as a subroutine (whether or not recursively), their effect is  
        confined to that subpattern; it does not extend to the surrounding pat-  
        tern, with one exception: the name from a *(MARK), (*PRUNE), or (*THEN)  
        that  is  encountered in a successful positive assertion is passed back  
        when a match succeeds (compare capturing  parentheses  in  assertions).  
        Note that such subpatterns are processed as anchored at the point where  
        they are tested. Note also that Perl's  treatment  of  subroutines  and  
        assertions is different in some cases.  
6928    
6929         The  new verbs make use of what was previously invalid syntax: an open-         The  new verbs make use of what was previously invalid syntax: an open-
6930         ing parenthesis followed by an asterisk. They are generally of the form         ing parenthesis followed by an asterisk. They are generally of the form
6931         (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-         (*VERB)  or (*VERB:NAME). Some may take either form, with differing be-
6932         haviour, depending on whether or not an argument is present. A name  is         haviour, depending on whether or not a name is present. A name  is  any
6933         any sequence of characters that does not include a closing parenthesis.         sequence of characters that does not include a closing parenthesis. The
6934         The maximum length of name is 255 in the 8-bit library and 65535 in the         maximum length of name is 255 in the 8-bit library  and  65535  in  the
6935         16-bit and 32-bit library.  If the name is empty, that is, if the clos-         16-bit  and  32-bit  libraries.   If the name is empty, that is, if the
6936         ing parenthesis immediately follows the colon, the effect is as if  the         closing parenthesis immediately follows the colon, the effect is as  if
6937         colon were not there. Any number of these verbs may occur in a pattern.         the colon were not there. Any number of these verbs may occur in a pat-
6938           tern.
6939    
6940           Since these verbs are specifically related  to  backtracking,  most  of
6941           them  can  be  used only when the pattern is to be matched using one of
6942           the traditional matching functions, because these  use  a  backtracking
6943           algorithm.  With the exception of (*FAIL), which behaves like a failing
6944           negative assertion, the backtracking control verbs cause  an  error  if
6945           encountered by a DFA matching function.
6946    
6947           The  behaviour  of  these  verbs in repeated groups, assertions, and in
6948           subpatterns called as subroutines (whether or not recursively) is docu-
6949           mented below.
6950    
6951     Optimizations that affect backtracking verbs     Optimizations that affect backtracking verbs
6952    
6953         PCRE  contains some optimizations that are used to speed up matching by         PCRE  contains some optimizations that are used to speed up matching by
6954         running some checks at the start of each match attempt. For example, it         running some checks at the start of each match attempt. For example, it
6955         may  know  the minimum length of matching subject, or that a particular         may  know  the minimum length of matching subject, or that a particular
6956         character must be present. When one of these  optimizations  suppresses         character must be present. When one of these optimizations bypasses the
6957         the  running  of  a match, any included backtracking verbs will not, of         running  of  a  match,  any  included  backtracking  verbs will not, of
6958         course, be processed. You can suppress the start-of-match optimizations         course, be processed. You can suppress the start-of-match optimizations
6959         by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-         by  setting  the  PCRE_NO_START_OPTIMIZE  option when calling pcre_com-
6960         pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).         pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT).
# Line 6929  BACKTRACKING CONTROL Line 6974  BACKTRACKING CONTROL
6974         This verb causes the match to end successfully, skipping the  remainder         This verb causes the match to end successfully, skipping the  remainder
6975         of  the pattern. However, when it is inside a subpattern that is called         of  the pattern. However, when it is inside a subpattern that is called
6976         as a subroutine, only that subpattern is ended  successfully.  Matching         as a subroutine, only that subpattern is ended  successfully.  Matching
6977         then  continues  at  the  outer level. If (*ACCEPT) is inside capturing         then continues at the outer level. If (*ACCEPT) in triggered in a posi-
6978         parentheses, the data so far is captured. For example:         tive assertion, the assertion succeeds; in a  negative  assertion,  the
6979           assertion fails.
6980    
6981           If  (*ACCEPT)  is inside capturing parentheses, the data so far is cap-
6982           tured. For example:
6983    
6984           A((?:A|B(*ACCEPT)|C)D)           A((?:A|B(*ACCEPT)|C)D)
6985    
# Line 6963  BACKTRACKING CONTROL Line 7012  BACKTRACKING CONTROL
7012         instances of (*MARK) as you like in a pattern, and their names  do  not         instances of (*MARK) as you like in a pattern, and their names  do  not
7013         have to be unique.         have to be unique.
7014    
7015         When  a match succeeds, the name of the last-encountered (*MARK) on the         When  a  match succeeds, the name of the last-encountered (*MARK:NAME),
7016         matching path is passed back to the caller as described in the  section         (*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed  back  to
7017         entitled  "Extra  data  for  pcre_exec()" in the pcreapi documentation.         the  caller  as  described  in  the  section  entitled  "Extra data for
7018         Here is an example of pcretest output, where the /K  modifier  requests         pcre_exec()" in the  pcreapi  documentation.  Here  is  an  example  of
7019         the retrieval and outputting of (*MARK) data:         pcretest  output, where the /K modifier requests the retrieval and out-
7020           putting of (*MARK) data:
7021    
7022             re> /X(*MARK:A)Y|X(*MARK:B)Z/K             re> /X(*MARK:A)Y|X(*MARK:B)Z/K
7023           data> XY           data> XY
# Line 6978  BACKTRACKING CONTROL Line 7028  BACKTRACKING CONTROL
7028           MK: B           MK: B
7029    
7030         The (*MARK) name is tagged with "MK:" in this output, and in this exam-         The (*MARK) name is tagged with "MK:" in this output, and in this exam-
7031         ple it indicates which of the two alternatives matched. This is a  more         ple  it indicates which of the two alternatives matched. This is a more
7032         efficient  way of obtaining this information than putting each alterna-         efficient way of obtaining this information than putting each  alterna-
7033         tive in its own capturing parentheses.         tive in its own capturing parentheses.
7034    
7035         If (*MARK) is encountered in a positive assertion, its name is recorded         If  a verb with a name is encountered in a positive assertion, its name
7036         and passed back if it is the last-encountered. This does not happen for         is recorded and passed back if it is the  last-encountered.  This  does
7037         negative assertions.         not happen for negative assertions.
7038    
7039         After a partial match or a failed match, the name of the  last  encoun-         After  a  partial match or a failed match, the last encountered name in
7040         tered (*MARK) in the entire match process is returned. For example:         the entire match process is returned. For example:
7041    
7042             re> /X(*MARK:A)Y|X(*MARK:B)Z/K             re> /X(*MARK:A)Y|X(*MARK:B)Z/K
7043           data> XP           data> XP
7044           No match, mark = B           No match, mark = B
7045    
7046         Note  that  in  this  unanchored  example the mark is retained from the         Note that in this unanchored example the  mark  is  retained  from  the
7047         match attempt that started at the letter "X" in the subject. Subsequent         match attempt that started at the letter "X" in the subject. Subsequent
7048         match attempts starting at "P" and then with an empty string do not get         match attempts starting at "P" and then with an empty string do not get
7049         as far as the (*MARK) item, but nevertheless do not reset it.         as far as the (*MARK) item, but nevertheless do not reset it.
7050    
7051         If you are interested in  (*MARK)  values  after  failed  matches,  you         If  you  are  interested  in  (*MARK)  values after failed matches, you
7052         should  probably  set  the PCRE_NO_START_OPTIMIZE option (see above) to         should probably set the PCRE_NO_START_OPTIMIZE option  (see  above)  to
7053         ensure that the match is always attempted.         ensure that the match is always attempted.
7054    
7055     Verbs that act after backtracking     Verbs that act after backtracking
7056    
7057         The following verbs do nothing when they are encountered. Matching con-         The following verbs do nothing when they are encountered. Matching con-
7058         tinues  with what follows, but if there is no subsequent match, causing         tinues with what follows, but if there is no subsequent match,  causing
7059         a backtrack to the verb, a failure is  forced.  That  is,  backtracking         a  backtrack  to  the  verb, a failure is forced. That is, backtracking
7060         cannot  pass  to the left of the verb. However, when one of these verbs         cannot pass to the left of the verb. However, when one of  these  verbs
7061         appears inside an atomic group, its effect is confined to  that  group,         appears  inside an atomic group or an assertion, its effect is confined
7062         because  once the group has been matched, there is never any backtrack-         to that group, because once the group has been matched, there is  never
7063         ing into it. In this situation, backtracking can  "jump  back"  to  the         any  backtracking  into  it.  In this situation, backtracking can "jump
7064         left  of the entire atomic group. (Remember also, as stated above, that         back" to the left of the entire atomic group  or  assertion.  (Remember
7065         this localization also applies in subroutine calls and assertions.)         also,  as  stated above, that this localization also applies in subrou-
7066           tine calls.)
7067    
7068         These verbs differ in exactly what kind of failure  occurs  when  back-         These verbs differ in exactly what kind of failure  occurs  when  back-
7069         tracking reaches them.         tracking reaches them.
# Line 7020  BACKTRACKING CONTROL Line 7071  BACKTRACKING CONTROL
7071           (*COMMIT)           (*COMMIT)
7072    
7073         This  verb, which may not be followed by a name, causes the whole match         This  verb, which may not be followed by a name, causes the whole match
7074         to fail outright if the rest of the pattern does not match. Even if the         to fail outright if there is a later matching failure that causes back-
7075         pattern is unanchored, no further attempts to find a match by advancing         tracking  to  reach  it.  Even if the pattern is unanchored, no further
7076         the  starting  point  take  place.  Once  (*COMMIT)  has  been  passed,         attempts to find a match by advancing the starting point take place. If
7077         pcre_exec()  is  committed  to  finding a match at the current starting         (*COMMIT)  is  the  only backtracking verb that is encountered, once it
7078         point, or not at all. For example:         has been passed pcre_exec() is committed to finding a match at the cur-
7079           rent starting point, or not at all. For example:
7080    
7081           a+(*COMMIT)b           a+(*COMMIT)b
7082    
7083         This matches "xxaab" but not "aacaab". It can be thought of as  a  kind         This  matches  "xxaab" but not "aacaab". It can be thought of as a kind
7084         of dynamic anchor, or "I've started, so I must finish." The name of the         of dynamic anchor, or "I've started, so I must finish." The name of the
7085         most recently passed (*MARK) in the path is passed back when  (*COMMIT)         most  recently passed (*MARK) in the path is passed back when (*COMMIT)
7086         forces a match failure.         forces a match failure.
7087    
7088           If there is more than one backtracking verb in a pattern,  a  different
7089           one  that  follows  (*COMMIT) may be triggered first, so merely passing
7090           (*COMMIT) during a match does not always guarantee that a match must be
7091           at this starting point.
7092    
7093         Note  that  (*COMMIT)  at  the start of a pattern is not the same as an         Note  that  (*COMMIT)  at  the start of a pattern is not the same as an
7094         anchor, unless PCRE's start-of-match optimizations are turned  off,  as         anchor, unless PCRE's start-of-match optimizations are turned  off,  as
7095         shown in this pcretest example:         shown in this pcretest example:
# Line 7052  BACKTRACKING CONTROL Line 7109  BACKTRACKING CONTROL
7109           (*PRUNE) or (*PRUNE:NAME)           (*PRUNE) or (*PRUNE:NAME)
7110    
7111         This  verb causes the match to fail at the current starting position in         This  verb causes the match to fail at the current starting position in
7112         the subject if the rest of the pattern does not match. If  the  pattern         the subject if there is a later matching failure that causes backtrack-
7113         is  unanchored,  the  normal  "bumpalong"  advance to the next starting         ing  to  reach it. If the pattern is unanchored, the normal "bumpalong"
7114         character then happens. Backtracking can occur as usual to the left  of         advance to the next starting character then happens.  Backtracking  can
7115         (*PRUNE),  before  it  is  reached,  or  when  matching to the right of         occur  as  usual to the left of (*PRUNE), before it is reached, or when
7116         (*PRUNE), but if there is no match to the  right,  backtracking  cannot         matching to the right of (*PRUNE), but if there  is  no  match  to  the
7117         cross  (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter-         right,  backtracking cannot cross (*PRUNE). In simple cases, the use of
7118         native to an atomic group or possessive quantifier, but there are  some         (*PRUNE) is just an alternative to an atomic group or possessive  quan-
7119         uses of (*PRUNE) that cannot be expressed in any other way.  The behav-         tifier, but there are some uses of (*PRUNE) that cannot be expressed in
7120         iour of (*PRUNE:NAME)  is  the  same  as  (*MARK:NAME)(*PRUNE).  In  an         any other way. In an anchored pattern (*PRUNE) has the same  effect  as
7121         anchored pattern (*PRUNE) has the same effect as (*COMMIT).         (*COMMIT).
7122    
7123           The   behaviour   of   (*PRUNE:NAME)   is   the   not   the   same   as
7124           (*MARK:NAME)(*PRUNE).  It is like (*MARK:NAME)  in  that  the  name  is
7125           remembered  for  passing  back  to  the  caller.  However, (*SKIP:NAME)
7126           searches only for names set with (*MARK).
7127    
7128           (*SKIP)           (*SKIP)
7129    
7130         This  verb, when given without a name, is like (*PRUNE), except that if         This verb, when given without a name, is like (*PRUNE), except that  if
7131         the pattern is unanchored, the "bumpalong" advance is not to  the  next         the  pattern  is unanchored, the "bumpalong" advance is not to the next
7132         character, but to the position in the subject where (*SKIP) was encoun-         character, but to the position in the subject where (*SKIP) was encoun-
7133         tered. (*SKIP) signifies that whatever text was matched leading  up  to         tered.  (*SKIP)  signifies that whatever text was matched leading up to
7134         it cannot be part of a successful match. Consider:         it cannot be part of a successful match. Consider:
7135    
7136           a+(*SKIP)b           a+(*SKIP)b
7137    
7138         If  the  subject  is  "aaaac...",  after  the first match attempt fails         If the subject is "aaaac...",  after  the  first  match  attempt  fails
7139         (starting at the first character in the  string),  the  starting  point         (starting  at  the  first  character in the string), the starting point
7140         skips on to start the next attempt at "c". Note that a possessive quan-         skips on to start the next attempt at "c". Note that a possessive quan-
7141         tifer does not have the same effect as this example; although it  would         tifer  does not have the same effect as this example; although it would
7142         suppress  backtracking  during  the  first  match  attempt,  the second         suppress backtracking  during  the  first  match  attempt,  the  second
7143         attempt would start at the second character instead of skipping  on  to         attempt  would  start at the second character instead of skipping on to
7144         "c".         "c".
7145    
7146           (*SKIP:NAME)           (*SKIP:NAME)
7147    
7148         When  (*SKIP) has an associated name, its behaviour is modified. If the         When (*SKIP) has an associated name, its behaviour is modified. When it
7149         following pattern fails to match, the previous path through the pattern         is triggered, the previous path through the pattern is searched for the
7150         is  searched for the most recent (*MARK) that has the same name. If one         most recent (*MARK) that has the  same  name.  If  one  is  found,  the
7151         is found, the "bumpalong" advance is to the subject position that  cor-         "bumpalong" advance is to the subject position that corresponds to that
7152         responds  to  that (*MARK) instead of to where (*SKIP) was encountered.         (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
7153         If no (*MARK) with a matching name is found, the (*SKIP) is ignored.         a matching name is found, the (*SKIP) is ignored.
7154    
7155           Note  that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
7156           ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
7157    
7158           (*THEN) or (*THEN:NAME)           (*THEN) or (*THEN:NAME)
7159    
7160         This verb causes a skip to the next innermost alternative if  the  rest         This verb causes a skip to the next innermost  alternative  when  back-
7161         of  the  pattern does not match. That is, it cancels pending backtrack-         tracking  reaches  it.  That  is,  it  cancels any further backtracking
7162         ing, but only within the current alternative. Its name comes  from  the         within the current alternative. Its name  comes  from  the  observation
7163         observation that it can be used for a pattern-based if-then-else block:         that it can be used for a pattern-based if-then-else block:
7164    
7165           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...           ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
7166    
7167         If  the COND1 pattern matches, FOO is tried (and possibly further items         If  the COND1 pattern matches, FOO is tried (and possibly further items
7168         after the end of the group if FOO succeeds); on  failure,  the  matcher         after the end of the group if FOO succeeds); on  failure,  the  matcher
7169         skips  to  the second alternative and tries COND2, without backtracking         skips  to  the second alternative and tries COND2, without backtracking
7170         into COND1. The behaviour  of  (*THEN:NAME)  is  exactly  the  same  as         into COND1.  If (*THEN) is not inside  an  alternation,  it  acts  like
7171         (*MARK:NAME)(*THEN).   If (*THEN) is not inside an alternation, it acts         (*PRUNE).
7172         like (*PRUNE).  
7173           The    behaviour   of   (*THEN:NAME)   is   the   not   the   same   as
7174         Note that a subpattern that does not contain a | character  is  just  a         (*MARK:NAME)(*THEN).  It is like  (*MARK:NAME)  in  that  the  name  is
7175         part  of the enclosing alternative; it is not a nested alternation with         remembered  for  passing  back  to  the  caller.  However, (*SKIP:NAME)
7176         only one alternative. The effect of (*THEN) extends beyond such a  sub-         searches only for names set with (*MARK).
7177         pattern  to  the enclosing alternative. Consider this pattern, where A,  
7178         B, etc. are complex pattern fragments that do not contain any | charac-         A subpattern that does not contain a | character is just a part of  the
7179         ters at this level:         enclosing  alternative;  it  is  not a nested alternation with only one
7180           alternative. The effect of (*THEN) extends beyond such a subpattern  to
7181           the  enclosing alternative. Consider this pattern, where A, B, etc. are
7182           complex pattern fragments that do not contain any | characters at  this
7183           level:
7184    
7185           A (B(*THEN)C) | D           A (B(*THEN)C) | D
7186    
# Line 7127  BACKTRACKING CONTROL Line 7196  BACKTRACKING CONTROL
7196         tern to fail because there are no more alternatives  to  try.  In  this         tern to fail because there are no more alternatives  to  try.  In  this
7197         case, matching does now backtrack into A.         case, matching does now backtrack into A.
7198    
7199         Note also that a conditional subpattern is not considered as having two         Note  that  a  conditional  subpattern  is not considered as having two
7200         alternatives, because only one is ever used.  In  other  words,  the  |         alternatives, because only one is ever used.  In  other  words,  the  |
7201         character in a conditional subpattern has a different meaning. Ignoring         character in a conditional subpattern has a different meaning. Ignoring
7202         white space, consider:         white space, consider:
# Line 7151  BACKTRACKING CONTROL Line 7220  BACKTRACKING CONTROL
7220         the advance may be more than one character. (*COMMIT) is the strongest,         the advance may be more than one character. (*COMMIT) is the strongest,
7221         causing the entire match to fail.         causing the entire match to fail.
7222    
7223         If more than one such verb is present in a pattern, the "strongest" one     More than one backtracking verb
7224         wins.  For example, consider this pattern, where A, B, etc. are complex  
7225         pattern fragments:         If  more  than  one  backtracking verb is present in a pattern, the one
7226           that is backtracked onto first acts. For example,  consider  this  pat-
7227           (A(*COMMIT)B(*THEN)C|D)         tern, where A, B, etc. are complex pattern fragments:
7228    
7229         Once  A  has  matched,  PCRE is committed to this match, at the current           (A(*COMMIT)B(*THEN)C|ABD)
7230         starting position. If subsequently B matches, but C does not, the  nor-  
7231         mal (*THEN) action of trying the next alternative (that is, D) does not         If  A matches but B fails, the backtrack to (*COMMIT) causes the entire
7232         happen because (*COMMIT) overrides.         match to fail. However, if A and B match, but C fails, the backtrack to
7233           (*THEN)  causes  the next alternative (ABD) to be tried. This behaviour
7234           is consistent, but is not always the same as Perl's. It means  that  if
7235           two  or  more backtracking verbs appear in succession, all the the last
7236           of them has no effect. Consider this example:
7237    
7238             ...(*COMMIT)(*PRUNE)...
7239    
7240           If there is a matching failure to the right, backtracking onto (*PRUNE)
7241           cases it to be triggered, and its action is taken. There can never be a
7242           backtrack onto (*COMMIT).
7243    
7244       Backtracking verbs in repeated groups
7245    
7246           PCRE differs from  Perl  in  its  handling  of  backtracking  verbs  in
7247           repeated groups. For example, consider:
7248    
7249             /(a(*COMMIT)b)+ac/
7250    
7251           If  the  subject  is  "abac",  Perl matches, but PCRE fails because the
7252           (*COMMIT) in the second repeat of the group acts.
7253    
7254       Backtracking verbs in assertions
7255    
7256           (*FAIL) in an assertion has its normal effect: it forces  an  immediate
7257           backtrack.
7258    
7259           (*ACCEPT) in a positive assertion causes the assertion to succeed with-
7260           out any further processing. In a negative assertion,  (*ACCEPT)  causes
7261           the assertion to fail without any further processing.
7262    
7263           The  other  backtracking verbs are not treated specially if they appear
7264           in an assertion. In particular, (*THEN) skips to the  next  alternative
7265           in  the innermost enclosing group that has alternations, whether or not
7266           this is within the assertion.
7267    
7268       Backtracking verbs in subroutines
7269    
7270           These behaviours occur whether or not the subpattern is  called  recur-
7271           sively.  Perl's treatment of subroutines is different in some cases.
7272    
7273           (*FAIL)  in  a subpattern called as a subroutine has its normal effect:
7274           it forces an immediate backtrack.
7275    
7276           (*ACCEPT) in a subpattern called as a subroutine causes the  subroutine
7277           match  to succeed without any further processing. Matching then contin-
7278           ues after the subroutine call.
7279    
7280           (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
7281           cause the subroutine match to fail.
7282    
7283           (*THEN)  skips to the next alternative in the innermost enclosing group
7284           within the subpattern that has alternatives. If there is no such  group
7285           within the subpattern, (*THEN) causes the subroutine match to fail.
7286    
7287    
7288  SEE ALSO  SEE ALSO
7289    
7290         pcreapi(3), pcrecallout(3),  pcrematching(3),  pcresyntax(3),  pcre(3),         pcreapi(3),  pcrecallout(3),  pcrematching(3),  pcresyntax(3), pcre(3),
7291         pcre16(3), pcre32(3).         pcre16(3), pcre32(3).
7292    
7293    
# Line 7178  AUTHOR Line 7300  AUTHOR
7300    
7301  REVISION  REVISION
7302    
7303         Last updated: 11 November 2012         Last updated: 22 March 2013
7304         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
7305  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7306    
7307    
7308    PCRESYNTAX(3)              Library Functions Manual              PCRESYNTAX(3)
7309    
7310    
 PCRESYNTAX(3)                                                    PCRESYNTAX(3)  
   
7311    
7312  NAME  NAME
7313         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
7314    
   
7315  PCRE REGULAR EXPRESSION SYNTAX SUMMARY  PCRE REGULAR EXPRESSION SYNTAX SUMMARY
7316    
7317         The  full syntax and semantics of the regular expressions that are sup-         The  full syntax and semantics of the regular expressions that are sup-
# Line 7296  PCRE SPECIAL CATEGORY PROPERTIES FOR \p Line 7418  PCRE SPECIAL CATEGORY PROPERTIES FOR \p
7418           Xan        Alphanumeric: union of properties L and N           Xan        Alphanumeric: union of properties L and N
7419           Xps        POSIX space: property Z or tab, NL, VT, FF, CR           Xps        POSIX space: property Z or tab, NL, VT, FF, CR
7420           Xsp        Perl space: property Z or tab, NL, FF, CR           Xsp        Perl space: property Z or tab, NL, FF, CR
7421             Xuc        Univerally-named character: one that can be
7422                          represented by a Universal Character Name
7423           Xwd        Perl word: property Xan or underscore           Xwd        Perl word: property Xan or underscore
7424    
7425    
# Line 7558  AUTHOR Line 7682  AUTHOR
7682    
7683  REVISION  REVISION
7684    
7685         Last updated: 11 November 2012         Last updated: 27 February 2013
7686         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
7687  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7688    
7689    
7690    PCREUNICODE(3)             Library Functions Manual             PCREUNICODE(3)
7691    
7692    
 PCREUNICODE(3)                                                  PCREUNICODE(3)  
   
7693    
7694  NAME  NAME
7695         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
7696    
   
7697  UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT  UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT
7698    
7699         As well as UTF-8 support, PCRE also supports UTF-16 (from release 8.30)         As well as UTF-8 support, PCRE also supports UTF-16 (from release 8.30)
# Line 7632  UNICODE PROPERTY SUPPORT Line 7756  UNICODE PROPERTY SUPPORT
7756         fication. Earlier releases of PCRE followed  the  rules  of  RFC  2279,         fication. Earlier releases of PCRE followed  the  rules  of  RFC  2279,
7757         which  allows  the  full  range of 31-bit values (0 to 0x7FFFFFFF). The         which  allows  the  full  range of 31-bit values (0 to 0x7FFFFFFF). The
7758         current check allows only values in the range U+0 to U+10FFFF,  exclud-         current check allows only values in the range U+0 to U+10FFFF,  exclud-
7759         ing the surrogate area and the non-characters.         ing  the  surrogate area. (From release 8.33 the so-called "non-charac-
7760           ter" code points are no longer excluded because Unicode corrigendum  #9
7761           makes it clear that they should not be.)
7762    
7763         Characters  in  the "Surrogate Area" of Unicode are reserved for use by         Characters  in  the "Surrogate Area" of Unicode are reserved for use by
7764         UTF-16, where they are used in pairs to encode codepoints  with  values         UTF-16, where they are used in pairs to encode codepoints  with  values
# Line 7641  UNICODE PROPERTY SUPPORT Line 7767  UNICODE PROPERTY SUPPORT
7767         other  words,  the  whole  surrogate  thing is a fudge for UTF-16 which         other  words,  the  whole  surrogate  thing is a fudge for UTF-16 which
7768         unfortunately messes up UTF-8 and UTF-32.)         unfortunately messes up UTF-8 and UTF-32.)
7769    
        Also excluded are the "Non-Character" code points, which are U+FDD0  to  
        U+FDEF  and  the  last  two  code  points  in  each plane, U+??FFFE and  
        U+??FFFF.  
   
7770         If an invalid UTF-8 string is passed to PCRE, an error return is given.         If an invalid UTF-8 string is passed to PCRE, an error return is given.
7771         At  compile  time, the only additional information is the offset to the         At  compile  time, the only additional information is the offset to the
7772         first byte of the failing character. The run-time functions pcre_exec()         first byte of the failing character. The run-time functions pcre_exec()
# Line 7676  UNICODE PROPERTY SUPPORT Line 7798  UNICODE PROPERTY SUPPORT
7798         surrogate range U+D800 to U+DFFF are independent code points. Values in         surrogate range U+D800 to U+DFFF are independent code points. Values in
7799         the surrogate range must be used in pairs in the correct manner.         the surrogate range must be used in pairs in the correct manner.
7800    
        Excluded  are  the  "Non-Character"  code  points,  which are U+FDD0 to  
        U+FDEF and the last  two  code  points  in  each  plane,  U+??FFFE  and  
        U+??FFFF.  
   
7801         If  an  invalid  UTF-16  string  is  passed to PCRE, an error return is         If  an  invalid  UTF-16  string  is  passed to PCRE, an error return is
7802         given. At compile time, the only additional information is  the  offset         given. At compile time, the only additional information is  the  offset
7803         to the first data unit of the failing character. The run-time functions         to the first data unit of the failing character. The run-time functions
# Line 7701  UNICODE PROPERTY SUPPORT Line 7819  UNICODE PROPERTY SUPPORT
7819         are passed as patterns and subjects are (by default) checked for valid-         are passed as patterns and subjects are (by default) checked for valid-
7820         ity on entry to the relevant functions.  This check allows only  values         ity on entry to the relevant functions.  This check allows only  values
7821         in  the  range  U+0 to U+10FFFF, excluding the surrogate area U+D800 to         in  the  range  U+0 to U+10FFFF, excluding the surrogate area U+D800 to
7822         U+DFFF, and the "Non-Character" code points, which are U+FDD0 to U+FDEF         U+DFFF.
        and the last two characters in each plane, U+??FFFE and U+??FFFF.  
7823    
7824         If  an  invalid  UTF-32  string  is  passed to PCRE, an error return is         If an invalid UTF-32 string is passed  to  PCRE,  an  error  return  is
7825         given. At compile time, the only additional information is  the  offset         given.  At  compile time, the only additional information is the offset
7826         to the first data unit of the failing character. The run-time functions         to the first data unit of the failing character. The run-time functions
7827         pcre32_exec() and pcre32_dfa_exec() also pass back this information, as         pcre32_exec() and pcre32_dfa_exec() also pass back this information, as
7828         well  as  a more detailed reason code if the caller has provided memory         well as a more detailed reason code if the caller has  provided  memory
7829         in which to do this.         in which to do this.
7830    
7831         In some situations, you may already know that your strings  are  valid,         In  some  situations, you may already know that your strings are valid,
7832         and  therefore  want  to  skip these checks in order to improve perfor-         and therefore want to skip these checks in  order  to  improve  perfor-
7833         mance. If you set the PCRE_NO_UTF32_CHECK flag at compile  time  or  at         mance.  If  you  set the PCRE_NO_UTF32_CHECK flag at compile time or at
7834         run time, PCRE assumes that the pattern or subject it is given (respec-         run time, PCRE assumes that the pattern or subject it is given (respec-
7835         tively) contains only valid UTF-32 sequences. In this case, it does not         tively) contains only valid UTF-32 sequences. In this case, it does not
7836         diagnose  an  invalid  UTF-32 string.  However, if an invalid string is         diagnose an invalid UTF-32 string.  However, if an  invalid  string  is
7837         passed, the result is undefined.         passed, the result is undefined.
7838    
7839     General comments about UTF modes     General comments about UTF modes
7840    
7841         1. Codepoints less than 256 can be  specified  in  patterns  by  either         1.  Codepoints  less  than  256  can be specified in patterns by either
7842         braced or unbraced hexadecimal escape sequences (for example, \x{b3} or         braced or unbraced hexadecimal escape sequences (for example, \x{b3} or
7843         \xb3). Larger values have to use braced sequences.         \xb3). Larger values have to use braced sequences.
7844    
7845         2. Octal numbers up to \777 are recognized,  and  in  UTF-8  mode  they         2.  Octal  numbers  up  to  \777 are recognized, and in UTF-8 mode they
7846         match two-byte characters for values greater than \177.         match two-byte characters for values greater than \177.
7847    
7848         3. Repeat quantifiers apply to complete UTF characters, not to individ-         3. Repeat quantifiers apply to complete UTF characters, not to individ-
7849         ual data units, for example: \x{100}{3}.         ual data units, for example: \x{100}{3}.
7850    
7851         4. The dot metacharacter matches one UTF character instead of a  single         4.  The dot metacharacter matches one UTF character instead of a single
7852         data unit.         data unit.
7853    
7854         5.  The  escape sequence \C can be used to match a single byte in UTF-8         5. The escape sequence \C can be used to match a single byte  in  UTF-8
7855         mode, or a single 16-bit data unit in UTF-16 mode, or a  single  32-bit         mode,  or  a single 16-bit data unit in UTF-16 mode, or a single 32-bit
7856         data  unit in UTF-32 mode, but its use can lead to some strange effects         data unit in UTF-32 mode, but its use can lead to some strange  effects
7857         because it breaks up multi-unit characters (see the description  of  \C         because  it  breaks up multi-unit characters (see the description of \C
7858         in  the  pcrepattern  documentation). The use of \C is not supported in         in the pcrepattern documentation). The use of \C is  not  supported  in
7859         the alternative matching function  pcre[16|32]_dfa_exec(),  nor  is  it         the  alternative  matching  function  pcre[16|32]_dfa_exec(), nor is it
7860         supported in UTF mode by the JIT optimization of pcre[16|32]_exec(). If         supported in UTF mode by the JIT optimization of pcre[16|32]_exec(). If
7861         JIT optimization is requested for a UTF pattern that  contains  \C,  it         JIT  optimization  is  requested for a UTF pattern that contains \C, it
7862         will not succeed, and so the matching will be carried out by the normal         will not succeed, and so the matching will be carried out by the normal
7863         interpretive function.         interpretive function.
7864    
7865         6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly         6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
7866         test characters of any code value, but, by default, the characters that         test characters of any code value, but, by default, the characters that
7867         PCRE recognizes as digits, spaces, or word characters remain  the  same         PCRE  recognizes  as digits, spaces, or word characters remain the same
7868         set  as  in  non-UTF  mode, all with values less than 256. This remains         set as in non-UTF mode, all with values less  than  256.  This  remains
7869         true even when PCRE is  built  to  include  Unicode  property  support,         true  even  when  PCRE  is  built  to include Unicode property support,
7870         because to do otherwise would slow down PCRE in many common cases. Note         because to do otherwise would slow down PCRE in many common cases. Note
7871         in particular that this applies to \b and \B, because they are  defined         in  particular that this applies to \b and \B, because they are defined
7872         in terms of \w and \W. If you really want to test for a wider sense of,         in terms of \w and \W. If you really want to test for a wider sense of,
7873         say, "digit", you can use  explicit  Unicode  property  tests  such  as         say,  "digit",  you  can  use  explicit  Unicode property tests such as
7874         \p{Nd}. Alternatively, if you set the PCRE_UCP option, the way that the         \p{Nd}. Alternatively, if you set the PCRE_UCP option, the way that the
7875         character escapes work is changed so that Unicode properties  are  used         character  escapes  work is changed so that Unicode properties are used
7876         to determine which characters match. There are more details in the sec-         to determine which characters match. There are more details in the sec-
7877         tion on generic character types in the pcrepattern documentation.         tion on generic character types in the pcrepattern documentation.
7878    
7879         7. Similarly, characters that match the POSIX named  character  classes         7.  Similarly,  characters that match the POSIX named character classes
7880         are all low-valued characters, unless the PCRE_UCP option is set.         are all low-valued characters, unless the PCRE_UCP option is set.
7881    
7882         8.  However,  the  horizontal and vertical white space matching escapes         8. However, the horizontal and vertical white  space  matching  escapes
7883         (\h, \H, \v, and \V) do match all the appropriate  Unicode  characters,         (\h,  \H,  \v, and \V) do match all the appropriate Unicode characters,
7884         whether or not PCRE_UCP is set.         whether or not PCRE_UCP is set.
7885    
7886         9.  Case-insensitive  matching  applies only to characters whose values         9. Case-insensitive matching applies only to  characters  whose  values
7887         are less than 128, unless PCRE is built with Unicode property  support.         are  less than 128, unless PCRE is built with Unicode property support.
7888         A  few  Unicode characters such as Greek sigma have more than two code-         A few Unicode characters such as Greek sigma have more than  two  code-
7889         points that are case-equivalent. Up to and including PCRE release 8.31,         points that are case-equivalent. Up to and including PCRE release 8.31,
7890         only  one-to-one case mappings were supported, but later releases (with         only one-to-one case mappings were supported, but later releases  (with
7891         Unicode property support) do treat as case-equivalent all  versions  of         Unicode  property  support) do treat as case-equivalent all versions of
7892         characters such as Greek sigma.         characters such as Greek sigma.
7893    
7894    
# Line 7784  AUTHOR Line 7901  AUTHOR
7901    
7902  REVISION  REVISION
7903    
7904         Last updated: 11 November 2012         Last updated: 27 February 2013
7905         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
7906  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7907    
7908    
7909    PCREJIT(3)                 Library Functions Manual                 PCREJIT(3)
7910    
7911    
 PCREJIT(3)                                                          PCREJIT(3)  
   
7912    
7913  NAME  NAME
7914         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
7915    
   
7916  PCRE JUST-IN-TIME COMPILER SUPPORT  PCRE JUST-IN-TIME COMPILER SUPPORT
7917    
7918         Just-in-time  compiling  is a heavyweight optimization that can greatly         Just-in-time  compiling  is a heavyweight optimization that can greatly
# Line 7941  UNSUPPORTED OPTIONS AND PATTERN ITEMS Line 8058  UNSUPPORTED OPTIONS AND PATTERN ITEMS
8058         BOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,   PCRE_PAR-         BOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,   PCRE_NOTEMPTY_ATSTART,   PCRE_PAR-
8059         TIAL_HARD, and PCRE_PARTIAL_SOFT.         TIAL_HARD, and PCRE_PARTIAL_SOFT.
8060    
8061         The unsupported pattern items are:         The  only  unsupported  pattern items are \C (match a single data unit)
8062           when running in a UTF mode, and a callout immediately before an  asser-
8063           \C             match a single byte; not supported in UTF-8 mode         tion condition in a conditional group.
          (?Cn)          callouts  
          (*PRUNE)       )  
          (*SKIP)        ) backtracking control verbs  
          (*THEN)        )  
   
        Support for some of these may be added in future.  
8064    
8065    
8066  RETURN VALUES FROM JIT EXECUTION  RETURN VALUES FROM JIT EXECUTION
# Line 8203  AUTHOR Line 8314  AUTHOR
8314    
8315  REVISION  REVISION
8316    
8317         Last updated: 31 October 2012         Last updated: 17 March 2013
8318         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
8319  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8320    
8321    
8322    PCREPARTIAL(3)             Library Functions Manual             PCREPARTIAL(3)
8323    
8324    
 PCREPARTIAL(3)                                                  PCREPARTIAL(3)  
   
8325    
8326  NAME  NAME
8327         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
8328    
   
8329  PARTIAL MATCHING IN PCRE  PARTIAL MATCHING IN PCRE
8330    
8331         In normal use of PCRE, if the subject string that is passed to a match-         In normal use of PCRE, if the subject string that is passed to a match-
# Line 8273  PARTIAL MATCHING USING pcre_exec() OR pc Line 8384  PARTIAL MATCHING USING pcre_exec() OR pc
8384         A  partial   match   occurs   during   a   call   to   pcre_exec()   or         A  partial   match   occurs   during   a   call   to   pcre_exec()   or
8385         pcre[16|32]_exec()  when  the end of the subject string is reached suc-         pcre[16|32]_exec()  when  the end of the subject string is reached suc-
8386         cessfully, but matching cannot continue  because  more  characters  are         cessfully, but matching cannot continue  because  more  characters  are
8387         needed.  However,  at least one character in the subject must have been         needed.   However, at least one character in the subject must have been
8388         inspected. This character need not  form  part  of  the  final  matched         inspected. This character need not  form  part  of  the  final  matched
8389         string;  lookbehind  assertions and the \K escape sequence provide ways         string;  lookbehind  assertions and the \K escape sequence provide ways
8390         of inspecting characters before the start of a matched  substring.  The         of inspecting characters before the start of a matched  substring.  The
# Line 8286  PARTIAL MATCHING USING pcre_exec() OR pc Line 8397  PARTIAL MATCHING USING pcre_exec() OR pc
8397         match  is returned, the first slot is set to the offset of the earliest         match  is returned, the first slot is set to the offset of the earliest
8398         character that was inspected. For convenience, the second offset points         character that was inspected. For convenience, the second offset points
8399         to the end of the subject so that a substring can easily be identified.         to the end of the subject so that a substring can easily be identified.
8400           If there are at least three slots in the offsets vector, the third slot
8401           is set to the offset of the character where matching started.
8402    
8403         For  the majority of patterns, the first offset identifies the start of         For the majority of patterns, the contents of the first and third slots
8404         the partially matched string. However, for patterns that contain  look-         will be the same. However, for patterns that contain lookbehind  asser-
8405         behind  assertions,  or  \K, or begin with \b or \B, earlier characters         tions, or begin with \b or \B, characters before the one where matching
8406         have been inspected while carrying out the match. For example:         started may have been inspected while carrying out the match. For exam-
8407           ple, consider this pattern:
8408    
8409           /(?<=abc)123/           /(?<=abc)123/
8410    
8411         This pattern matches "123", but only if it is preceded by "abc". If the         This pattern matches "123", but only if it is preceded by "abc". If the
8412         subject string is "xyzabc12", the offsets after a partial match are for         subject string is "xyzabc12", the first two  offsets  after  a  partial
8413         the substring "abc12", because  all  these  characters  are  needed  if         match  are for the substring "abc12", because all these characters were
8414         another match is tried with extra characters added to the subject.         inspected. However, the third offset is set to 6, because that  is  the
8415           offset where matching began.
8416    
8417         What happens when a partial match is identified depends on which of the         What happens when a partial match is identified depends on which of the
8418         two partial matching options are set.         two partial matching options are set.
# Line 8523  MULTI-SEGMENT MATCHING WITH pcre_exec() Line 8638  MULTI-SEGMENT MATCHING WITH pcre_exec()
8638    
8639         Note:  If  the pattern contains lookbehind assertions, or \K, or starts         Note:  If  the pattern contains lookbehind assertions, or \K, or starts
8640         with \b or \B, the string that is returned for a partial match includes         with \b or \B, the string that is returned for a partial match includes
8641         characters  that  precede  the partially matched string itself, because         characters  that precede the start of what would be returned for a com-
8642         these must be retained when adding on more characters for a  subsequent         plete match, because it contains all the characters that were inspected
8643         matching  attempt.   However, in some cases you may need to retain even         during the partial match.
        earlier characters, as discussed in the next section.  
8644    
8645    
8646  ISSUES WITH MULTI-SEGMENT MATCHING  ISSUES WITH MULTI-SEGMENT MATCHING
# Line 8535  ISSUES WITH MULTI-SEGMENT MATCHING Line 8649  ISSUES WITH MULTI-SEGMENT MATCHING
8649         whichever matching function is used.         whichever matching function is used.
8650    
8651         1. If the pattern contains a test for the beginning of a line, you need         1. If the pattern contains a test for the beginning of a line, you need
8652         to pass the PCRE_NOTBOL option when the subject  string  for  any  call         to  pass  the  PCRE_NOTBOL  option when the subject string for any call
8653         does  start  at  the  beginning  of a line. There is also a PCRE_NOTEOL         does start at the beginning of a line.  There  is  also  a  PCRE_NOTEOL
8654         option, but in practice when doing multi-segment matching you should be         option, but in practice when doing multi-segment matching you should be
8655         using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL.         using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL.
8656    
8657         2.  Lookbehind assertions that have already been obeyed are catered for         2. Lookbehind assertions that have already been obeyed are catered  for
8658         in the offsets that are returned for a partial match. However a lookbe-         in the offsets that are returned for a partial match. However a lookbe-
8659         hind  assertion later in the pattern could require even earlier charac-         hind assertion later in the pattern could require even earlier  charac-
8660         ters  to  be  inspected.  You  can  handle  this  case  by  using   the         ters   to  be  inspected.  You  can  handle  this  case  by  using  the
8661         PCRE_INFO_MAXLOOKBEHIND    option    of    the    pcre_fullinfo()    or         PCRE_INFO_MAXLOOKBEHIND    option    of    the    pcre_fullinfo()    or
8662         pcre[16|32]_fullinfo() functions to obtain the length  of  the  largest         pcre[16|32]_fullinfo()  functions  to  obtain the length of the longest
8663         lookbehind  in  the  pattern.  This  length is given in characters, not         lookbehind in the pattern. This length  is  given  in  characters,  not
8664         bytes. If you always retain at least that many  characters  before  the         bytes.  If  you  always retain at least that many characters before the
8665         partially  matched  string,  all  should  be well. (Of course, near the         partially matched string, all should be  well.  (Of  course,  near  the
8666         start of the subject, fewer characters may be present; in that case all         start of the subject, fewer characters may be present; in that case all
8667         characters should be retained.)         characters should be retained.)
8668    
8669           From release 8.33, there is a more accurate way of deciding which char-
8670           acters  to  retain.  Instead  of  subtracting the length of the longest
8671           lookbehind from the  earliest  inspected  character  (offsets[0]),  the
8672           match  start  position  (offsets[2]) should be used, and the next match
8673           attempt started at the offsets[2] character by setting the  startoffset
8674           argument of pcre_exec() or pcre_dfa_exec().
8675    
8676           For  example, if the pattern "(?<=123)abc" is partially matched against
8677           the string "xx123a", the three offset values returned are 2, 6, and  5.
8678           This  indicates  that  the  matching  process that gave a partial match
8679           started at offset 5, but the characters "123a" were all inspected.  The
8680           maximum  lookbehind  for  that pattern is 3, so taking that away from 5
8681           shows that we need only keep "123a", and the next match attempt can  be
8682           started at offset 3 (that is, at "a") when further characters have been
8683           added. When the match start is not the  earliest  inspected  character,
8684           pcretest shows it explicitly:
8685    
8686               re> "(?<=123)abc"
8687             data> xx123a\P\P
8688             Partial match at offset 5: 123a
8689    
8690         3.  Because a partial match must always contain at least one character,         3.  Because a partial match must always contain at least one character,
8691         what might be considered a partial match of an  empty  string  actually         what might be considered a partial match of an  empty  string  actually
8692         gives a "no match" result. For example:         gives a "no match" result. For example:
# Line 8654  AUTHOR Line 8789  AUTHOR
8789    
8790  REVISION  REVISION
8791    
8792         Last updated: 24 June 2012         Last updated: 20 February 2013
8793         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.
8794  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8795    
8796    
8797    PCREPRECOMPILE(3)          Library Functions Manual          PCREPRECOMPILE(3)
8798    
8799    
 PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)  
   
8800    
8801  NAME  NAME
8802         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
8803    
   
8804  SAVING AND RE-USING PRECOMPILED PCRE PATTERNS  SAVING AND RE-USING PRECOMPILED PCRE PATTERNS
8805    
8806         If  you  are running an application that uses a large number of regular         If  you  are running an application that uses a large number of regular
# Line 8792  REVISION Line 8927  REVISION
8927         Last updated: 24 June 2012         Last updated: 24 June 2012
8928         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8929  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8930    
8931    
8932    PCREPERFORM(3)             Library Functions Manual             PCREPERFORM(3)
8933    
8934    
 PCREPERFORM(3)                                                  PCREPERFORM(3)  
   
8935    
8936  NAME  NAME
8937         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
8938    
   
8939  PCRE PERFORMANCE  PCRE PERFORMANCE
8940    
8941         Two  aspects  of performance are discussed below: memory usage and pro-         Two  aspects  of performance are discussed below: memory usage and pro-
# Line 8962  REVISION Line 9097  REVISION
9097         Last updated: 25 August 2012         Last updated: 25 August 2012
9098         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9099  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9100    
9101    
9102    PCREPOSIX(3)               Library Functions Manual               PCREPOSIX(3)
9103    
9104    
 PCREPOSIX(3)                                                      PCREPOSIX(3)  
   
9105    
9106  NAME  NAME
9107         PCRE - Perl-compatible regular expressions.         PCRE - Perl-compatible regular expressions.
9108    
   
9109  SYNOPSIS OF POSIX API  SYNOPSIS OF POSIX API
9110    
9111         #include <pcreposix.h>         #include <pcreposix.h>
# Line 9227  REVISION Line 9362  REVISION
9362         Last updated: 09 January 2012         Last updated: 09 January 2012
9363         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9364  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9365    
9366    
9367    PCRECPP(3)                 Library Functions Manual                 PCRECPP(3)
9368    
9369    
 PCRECPP(3)                                                          PCRECPP(3)  
   
9370    
9371  NAME  NAME
9372         PCRE - Perl-compatible regular expressions.         PCRE - Perl-compatible regular expressions.
9373    
   
9374  SYNOPSIS OF C++ WRAPPER  SYNOPSIS OF C++ WRAPPER
9375    
9376         #include <pcrecpp.h>         #include <pcrecpp.h>
# Line 9570  REVISION Line 9705  REVISION
9705    
9706         Last updated: 08 January 2012         Last updated: 08 January 2012
9707  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9708    
9709    
9710    PCRESAMPLE(3)              Library Functions Manual              PCRESAMPLE(3)
9711    
9712    
 PCRESAMPLE(3)                                                    PCRESAMPLE(3)  
   
9713    
9714  NAME  NAME
9715         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
9716    
   
9717  PCRE SAMPLE PROGRAM  PCRE SAMPLE PROGRAM
9718    
9719         A simple, complete demonstration program, to get you started with using         A simple, complete demonstration program, to get you started with using
# Line 9658  REVISION Line 9793  REVISION
9793         Last updated: 10 January 2012         Last updated: 10 January 2012
9794         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9795  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9796  PCRELIMITS(3)                                                    PCRELIMITS(3)  PCRELIMITS(3)              Library Functions Manual              PCRELIMITS(3)
9797    
9798    
9799    
9800  NAME  NAME
9801         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
9802    
   
9803  SIZE AND OTHER LIMITATIONS  SIZE AND OTHER LIMITATIONS
9804    
9805         There  are some size limitations in PCRE but it is hoped that they will         There  are some size limitations in PCRE but it is hoped that they will
# Line 9719  REVISION Line 9854  REVISION
9854         Last updated: 04 May 2012         Last updated: 04 May 2012
9855         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9856  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9857    
9858    
9859    PCRESTACK(3)               Library Functions Manual               PCRESTACK(3)
9860    
9861    
 PCRESTACK(3)                                                      PCRESTACK(3)  
   
9862    
9863  NAME  NAME
9864         PCRE - Perl-compatible regular expressions         PCRE - Perl-compatible regular expressions
9865    
   
9866  PCRE DISCUSSION OF STACK USAGE  PCRE DISCUSSION OF STACK USAGE
9867    
9868         When  you call pcre[16|32]_exec(), it makes use of an internal function         When  you call pcre[16|32]_exec(), it makes use of an internal function
# Line 9905  REVISION Line 10040