/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 86 by nigel, Sat Feb 24 21:41:06 2007 UTC revision 87 by nigel, Sat Feb 24 21:41:21 2007 UTC
# Line 137  UTF-8 AND UNICODE PROPERTY SUPPORT Line 137  UTF-8 AND UNICODE PROPERTY SUPPORT
137         UTF-8  support),  the  escape sequences \p{..}, \P{..}, and \X are sup-         UTF-8  support),  the  escape sequences \p{..}, \P{..}, and \X are sup-
138         ported.  The available properties that can be tested are limited to the         ported.  The available properties that can be tested are limited to the
139         general  category  properties such as Lu for an upper case letter or Nd         general  category  properties such as Lu for an upper case letter or Nd
140         for a decimal number. A full list is given in the pcrepattern  documen-         for a decimal number, the Unicode script names such as Arabic  or  Han,
141         tation. The PCRE library is increased in size by about 90K when Unicode         and  the  derived  properties  Any  and L&. A full list is given in the
142         property support is included.         pcrepattern documentation. Only the short names for properties are sup-
143           ported.  For example, \p{L} matches a letter. Its Perl synonym, \p{Let-
144           ter}, is not supported.  Furthermore,  in  Perl,  many  properties  may
145           optionally  be  prefixed by "Is", for compatibility with Perl 5.6. PCRE
146           does not support this.
147    
148         The following comments apply when PCRE is running in UTF-8 mode:         The following comments apply when PCRE is running in UTF-8 mode:
149    
# Line 155  UTF-8 AND UNICODE PROPERTY SUPPORT Line 159  UTF-8 AND UNICODE PROPERTY SUPPORT
159         PCRE_NO_UTF8_CHECK  is set, the results are undefined. Your program may         PCRE_NO_UTF8_CHECK  is set, the results are undefined. Your program may
160         crash.         crash.
161    
162         2. In a pattern, the escape sequence \x{...}, where the contents of the         2. An unbraced hexadecimal escape sequence (such  as  \xb3)  matches  a
163         braces  is  a  string  of hexadecimal digits, is interpreted as a UTF-8         two-byte UTF-8 character if the value is greater than 127.
        character whose code number is the given hexadecimal number, for  exam-  
        ple:  \x{1234}.  If a non-hexadecimal digit appears between the braces,  
        the item is not recognized.  This escape sequence can be used either as  
        a literal, or within a character class.  
164    
165         3.  The  original hexadecimal escape sequence, \xhh, matches a two-byte         3.  Repeat quantifiers apply to complete UTF-8 characters, not to indi-
        UTF-8 character if the value is greater than 127.  
   
        4. Repeat quantifiers apply to complete UTF-8 characters, not to  indi-  
166         vidual bytes, for example: \x{100}{3}.         vidual bytes, for example: \x{100}{3}.
167    
168         5.  The dot metacharacter matches one UTF-8 character instead of a sin-         4. The dot metacharacter matches one UTF-8 character instead of a  sin-
169         gle byte.         gle byte.
170    
171         6. The escape sequence \C can be used to match a single byte  in  UTF-8         5.  The  escape sequence \C can be used to match a single byte in UTF-8
172         mode,  but  its  use can lead to some strange effects. This facility is         mode, but its use can lead to some strange effects.  This  facility  is
173         not available in the alternative matching function, pcre_dfa_exec().         not available in the alternative matching function, pcre_dfa_exec().
174    
175         7. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly         6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
176         test  characters of any code value, but the characters that PCRE recog-         test characters of any code value, but the characters that PCRE  recog-
177         nizes as digits, spaces, or word characters  remain  the  same  set  as         nizes  as  digits,  spaces,  or  word characters remain the same set as
178         before, all with values less than 256. This remains true even when PCRE         before, all with values less than 256. This remains true even when PCRE
179         includes Unicode property support, because to do otherwise  would  slow         includes  Unicode  property support, because to do otherwise would slow
180         down  PCRE in many common cases. If you really want to test for a wider         down PCRE in many common cases. If you really want to test for a  wider
181         sense of, say, "digit", you must use Unicode  property  tests  such  as         sense  of,  say,  "digit",  you must use Unicode property tests such as
182         \p{Nd}.         \p{Nd}.
183    
184         8.  Similarly,  characters that match the POSIX named character classes         7. Similarly, characters that match the POSIX named  character  classes
185         are all low-valued characters.         are all low-valued characters.
186    
187         9. Case-insensitive matching applies only to  characters  whose  values         8.  Case-insensitive  matching  applies only to characters whose values
188         are  less than 128, unless PCRE is built with Unicode property support.         are less than 128, unless PCRE is built with Unicode property  support.
189         Even when Unicode property support is available, PCRE  still  uses  its         Even  when  Unicode  property support is available, PCRE still uses its
190         own  character  tables when checking the case of low-valued characters,         own character tables when checking the case of  low-valued  characters,
191         so as not to degrade performance.  The Unicode property information  is         so  as not to degrade performance.  The Unicode property information is
192         used only for characters with higher values.         used only for characters with higher values. Even when Unicode property
193           support is available, PCRE supports case-insensitive matching only when
194           there is a one-to-one mapping between a letter's  cases.  There  are  a
195           small  number  of  many-to-one  mappings in Unicode; these are not sup-
196           ported by PCRE.
197    
198    
199  AUTHOR  AUTHOR
# Line 201  AUTHOR Line 202  AUTHOR
202         University Computing Service,         University Computing Service,
203         Cambridge CB2 3QG, England.         Cambridge CB2 3QG, England.
204    
205         Putting  an actual email address here seems to have been a spam magnet,         Putting an actual email address here seems to have been a spam  magnet,
206         so I've taken it away. If you want to email me, use my initial and sur-         so I've taken it away. If you want to email me, use my initial and sur-
207         name, separated by a dot, at the domain ucs.cam.ac.uk.         name, separated by a dot, at the domain ucs.cam.ac.uk.
208    
209  Last updated: 07 March 2005  Last updated: 24 January 2006
210  Copyright (c) 1997-2005 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.
211  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
212    
213    
# Line 808  CHECKING BUILD-TIME OPTIONS Line 809  CHECKING BUILD-TIME OPTIONS
809         internal  matching  function  calls in a pcre_exec() execution. Further         internal  matching  function  calls in a pcre_exec() execution. Further
810         details are given with pcre_exec() below.         details are given with pcre_exec() below.
811    
812             PCRE_CONFIG_MATCH_LIMIT_RECURSION
813    
814           The output is an integer that gives the default limit for the depth  of
815           recursion  when calling the internal matching function in a pcre_exec()
816           execution. Further details are given with pcre_exec() below.
817    
818           PCRE_CONFIG_STACKRECURSE           PCRE_CONFIG_STACKRECURSE
819    
820         The output is an integer that is set to one if internal recursion  when         The output is an integer that is set to one if internal recursion  when
# Line 861  COMPILING A PATTERN Line 868  COMPILING A PATTERN
868         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
869         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
870         sets the variable pointed to by errptr to point to a textual error mes-         sets the variable pointed to by errptr to point to a textual error mes-
871         sage.  The  offset from the start of the pattern to the character where         sage. This is a static string that is part of the library. You must not
872         the error was discovered is  placed  in  the  variable  pointed  to  by         try to free it. The offset from the start of the pattern to the charac-
873         erroffset,  which  must  not  be  NULL. If it is, an immediate error is         ter where the error was discovered is placed in the variable pointed to
874           by erroffset, which must not be NULL. If it is, an immediate  error  is
875         given.         given.
876    
877         If pcre_compile2() is used instead of pcre_compile(),  and  the  error-         If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
878         codeptr  argument is not NULL, a non-zero error code number is returned         codeptr argument is not NULL, a non-zero error code number is  returned
879         via this argument in the event of an error. This is in addition to  the         via  this argument in the event of an error. This is in addition to the
880         textual error message. Error codes and messages are listed below.         textual error message. Error codes and messages are listed below.
881    
882         If  the  final  argument, tableptr, is NULL, PCRE uses a default set of         If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
883         character tables that are  built  when  PCRE  is  compiled,  using  the         character  tables  that  are  built  when  PCRE  is compiled, using the
884         default  C  locale.  Otherwise, tableptr must be an address that is the         default C locale. Otherwise, tableptr must be an address  that  is  the
885         result of a call to pcre_maketables(). This value is  stored  with  the         result  of  a  call to pcre_maketables(). This value is stored with the
886         compiled  pattern,  and used again by pcre_exec(), unless another table         compiled pattern, and used again by pcre_exec(), unless  another  table
887         pointer is passed to it. For more discussion, see the section on locale         pointer is passed to it. For more discussion, see the section on locale
888         support below.         support below.
889    
890         This  code  fragment  shows a typical straightforward call to pcre_com-         This code fragment shows a typical straightforward  call  to  pcre_com-
891         pile():         pile():
892    
893           pcre *re;           pcre *re;
# Line 892  COMPILING A PATTERN Line 900  COMPILING A PATTERN
900             &erroffset,       /* for error offset */             &erroffset,       /* for error offset */
901             NULL);            /* use default character tables */             NULL);            /* use default character tables */
902    
903         The following names for option bits are defined in  the  pcre.h  header         The  following  names  for option bits are defined in the pcre.h header
904         file:         file:
905    
906           PCRE_ANCHORED           PCRE_ANCHORED
907    
908         If this bit is set, the pattern is forced to be "anchored", that is, it         If this bit is set, the pattern is forced to be "anchored", that is, it
909         is constrained to match only at the first matching point in the  string         is  constrained to match only at the first matching point in the string
910         that  is being searched (the "subject string"). This effect can also be         that is being searched (the "subject string"). This effect can also  be
911         achieved by appropriate constructs in the pattern itself, which is  the         achieved  by appropriate constructs in the pattern itself, which is the
912         only way to do it in Perl.         only way to do it in Perl.
913    
914           PCRE_AUTO_CALLOUT           PCRE_AUTO_CALLOUT
915    
916         If this bit is set, pcre_compile() automatically inserts callout items,         If this bit is set, pcre_compile() automatically inserts callout items,
917         all with number 255, before each pattern item. For  discussion  of  the         all  with  number  255, before each pattern item. For discussion of the
918         callout facility, see the pcrecallout documentation.         callout facility, see the pcrecallout documentation.
919    
920           PCRE_CASELESS           PCRE_CASELESS
921    
922         If  this  bit is set, letters in the pattern match both upper and lower         If this bit is set, letters in the pattern match both upper  and  lower
923         case letters. It is equivalent to Perl's  /i  option,  and  it  can  be         case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
924         changed  within a pattern by a (?i) option setting. In UTF-8 mode, PCRE         changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
925         always understands the concept of case for characters whose values  are         always  understands the concept of case for characters whose values are
926         less  than 128, so caseless matching is always possible. For characters         less than 128, so caseless matching is always possible. For  characters
927         with higher values, the concept of case is supported if  PCRE  is  com-         with  higher  values,  the concept of case is supported if PCRE is com-
928         piled  with Unicode property support, but not otherwise. If you want to         piled with Unicode property support, but not otherwise. If you want  to
929         use caseless matching for characters 128 and  above,  you  must  ensure         use  caseless  matching  for  characters 128 and above, you must ensure
930         that  PCRE  is  compiled  with Unicode property support as well as with         that PCRE is compiled with Unicode property support  as  well  as  with
931         UTF-8 support.         UTF-8 support.
932    
933           PCRE_DOLLAR_ENDONLY           PCRE_DOLLAR_ENDONLY
934    
935         If this bit is set, a dollar metacharacter in the pattern matches  only         If  this bit is set, a dollar metacharacter in the pattern matches only
936         at  the  end  of the subject string. Without this option, a dollar also         at the end of the subject string. Without this option,  a  dollar  also
937         matches immediately before the final character if it is a newline  (but         matches  immediately before the final character if it is a newline (but
938         not  before  any  other  newlines).  The  PCRE_DOLLAR_ENDONLY option is         not before any  other  newlines).  The  PCRE_DOLLAR_ENDONLY  option  is
939         ignored if PCRE_MULTILINE is set. There is no equivalent to this option         ignored if PCRE_MULTILINE is set. There is no equivalent to this option
940         in Perl, and no way to set it within a pattern.         in Perl, and no way to set it within a pattern.
941    
942           PCRE_DOTALL           PCRE_DOTALL
943    
944         If this bit is set, a dot metacharater in the pattern matches all char-         If this bit is set, a dot metacharater in the pattern matches all char-
945         acters, including newlines. Without it,  newlines  are  excluded.  This         acters,  including  newlines.  Without  it, newlines are excluded. This
946         option  is equivalent to Perl's /s option, and it can be changed within         option is equivalent to Perl's /s option, and it can be changed  within
947         a pattern by a (?s) option setting.  A  negative  class  such  as  [^a]         a  pattern  by  a  (?s)  option  setting. A negative class such as [^a]
948         always  matches a newline character, independent of the setting of this         always matches a newline character, independent of the setting of  this
949         option.         option.
950    
951           PCRE_EXTENDED           PCRE_EXTENDED
952    
953         If this bit is set, whitespace  data  characters  in  the  pattern  are         If  this  bit  is  set,  whitespace  data characters in the pattern are
954         totally ignored except when escaped or inside a character class. White-         totally ignored except when escaped or inside a character class. White-
955         space does not include the VT character (code 11). In addition, charac-         space does not include the VT character (code 11). In addition, charac-
956         ters between an unescaped # outside a character class and the next new-         ters between an unescaped # outside a character class and the next new-
957         line character, inclusive, are also  ignored.  This  is  equivalent  to         line  character,  inclusive,  are  also  ignored. This is equivalent to
958         Perl's  /x  option,  and  it  can be changed within a pattern by a (?x)         Perl's /x option, and it can be changed within  a  pattern  by  a  (?x)
959         option setting.         option setting.
960    
961         This option makes it possible to include  comments  inside  complicated         This  option  makes  it possible to include comments inside complicated
962         patterns.   Note,  however,  that this applies only to data characters.         patterns.  Note, however, that this applies only  to  data  characters.
963         Whitespace  characters  may  never  appear  within  special   character         Whitespace   characters  may  never  appear  within  special  character
964         sequences  in  a  pattern,  for  example  within the sequence (?( which         sequences in a pattern, for  example  within  the  sequence  (?(  which
965         introduces a conditional subpattern.         introduces a conditional subpattern.
966    
967           PCRE_EXTRA           PCRE_EXTRA
968    
969         This option was invented in order to turn on  additional  functionality         This  option  was invented in order to turn on additional functionality
970         of  PCRE  that  is  incompatible with Perl, but it is currently of very         of PCRE that is incompatible with Perl, but it  is  currently  of  very
971         little use. When set, any backslash in a pattern that is followed by  a         little  use. When set, any backslash in a pattern that is followed by a
972         letter  that  has  no  special  meaning causes an error, thus reserving         letter that has no special meaning  causes  an  error,  thus  reserving
973         these combinations for future expansion. By  default,  as  in  Perl,  a         these  combinations  for  future  expansion.  By default, as in Perl, a
974         backslash  followed by a letter with no special meaning is treated as a         backslash followed by a letter with no special meaning is treated as  a
975         literal. There are at present no  other  features  controlled  by  this         literal.  There  are  at  present  no other features controlled by this
976         option. It can also be set by a (?X) option setting within a pattern.         option. It can also be set by a (?X) option setting within a pattern.
977    
978           PCRE_FIRSTLINE           PCRE_FIRSTLINE
979    
980         If  this  option  is  set,  an  unanchored pattern is required to match         If this option is set, an  unanchored  pattern  is  required  to  match
981         before or at the first newline character in the subject string,  though         before  or at the first newline character in the subject string, though
982         the matched text may continue over the newline.         the matched text may continue over the newline.
983    
984           PCRE_MULTILINE           PCRE_MULTILINE
985    
986         By  default,  PCRE  treats the subject string as consisting of a single         By default, PCRE treats the subject string as consisting  of  a  single
987         line of characters (even if it actually contains newlines). The  "start         line  of characters (even if it actually contains newlines). The "start
988         of  line"  metacharacter  (^)  matches only at the start of the string,         of line" metacharacter (^) matches only at the  start  of  the  string,
989         while the "end of line" metacharacter ($) matches only at  the  end  of         while  the  "end  of line" metacharacter ($) matches only at the end of
990         the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY         the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
991         is set). This is the same as Perl.         is set). This is the same as Perl.
992    
993         When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"         When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
994         constructs  match  immediately following or immediately before any new-         constructs match immediately following or immediately before  any  new-
995         line in the subject string, respectively, as well as at the very  start         line  in the subject string, respectively, as well as at the very start
996         and  end. This is equivalent to Perl's /m option, and it can be changed         and end. This is equivalent to Perl's /m option, and it can be  changed
997         within a pattern by a (?m) option setting. If there are no "\n" charac-         within a pattern by a (?m) option setting. If there are no "\n" charac-
998         ters  in  a  subject  string, or no occurrences of ^ or $ in a pattern,         ters in a subject string, or no occurrences of ^ or  $  in  a  pattern,
999         setting PCRE_MULTILINE has no effect.         setting PCRE_MULTILINE has no effect.
1000    
1001           PCRE_NO_AUTO_CAPTURE           PCRE_NO_AUTO_CAPTURE
1002    
1003         If this option is set, it disables the use of numbered capturing paren-         If this option is set, it disables the use of numbered capturing paren-
1004         theses  in the pattern. Any opening parenthesis that is not followed by         theses in the pattern. Any opening parenthesis that is not followed  by
1005         ? behaves as if it were followed by ?: but named parentheses can  still         ?  behaves as if it were followed by ?: but named parentheses can still
1006         be  used  for  capturing  (and  they acquire numbers in the usual way).         be used for capturing (and they acquire  numbers  in  the  usual  way).
1007         There is no equivalent of this option in Perl.         There is no equivalent of this option in Perl.
1008    
1009           PCRE_UNGREEDY           PCRE_UNGREEDY
1010    
1011         This option inverts the "greediness" of the quantifiers  so  that  they         This  option  inverts  the "greediness" of the quantifiers so that they
1012         are  not greedy by default, but become greedy if followed by "?". It is         are not greedy by default, but become greedy if followed by "?". It  is
1013         not compatible with Perl. It can also be set by a (?U)  option  setting         not  compatible  with Perl. It can also be set by a (?U) option setting
1014         within the pattern.         within the pattern.
1015    
1016           PCRE_UTF8           PCRE_UTF8
1017    
1018         This  option  causes PCRE to regard both the pattern and the subject as         This option causes PCRE to regard both the pattern and the  subject  as
1019         strings of UTF-8 characters instead of single-byte  character  strings.         strings  of  UTF-8 characters instead of single-byte character strings.
1020         However,  it is available only when PCRE is built to include UTF-8 sup-         However, it is available only when PCRE is built to include UTF-8  sup-
1021         port. If not, the use of this option provokes an error. Details of  how         port.  If not, the use of this option provokes an error. Details of how
1022         this  option  changes the behaviour of PCRE are given in the section on         this option changes the behaviour of PCRE are given in the  section  on
1023         UTF-8 support in the main pcre page.         UTF-8 support in the main pcre page.
1024    
1025           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
1026    
1027         When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is         When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
1028         automatically  checked. If an invalid UTF-8 sequence of bytes is found,         automatically checked. If an invalid UTF-8 sequence of bytes is  found,
1029         pcre_compile() returns an error. If you already know that your  pattern         pcre_compile()  returns an error. If you already know that your pattern
1030         is  valid, and you want to skip this check for performance reasons, you         is valid, and you want to skip this check for performance reasons,  you
1031         can set the PCRE_NO_UTF8_CHECK option. When it is set,  the  effect  of         can  set  the  PCRE_NO_UTF8_CHECK option. When it is set, the effect of
1032         passing an invalid UTF-8 string as a pattern is undefined. It may cause         passing an invalid UTF-8 string as a pattern is undefined. It may cause
1033         your program to crash.  Note that this option can  also  be  passed  to         your  program  to  crash.   Note that this option can also be passed to
1034         pcre_exec()  and pcre_dfa_exec(), to suppress the UTF-8 validity check-         pcre_exec() and pcre_dfa_exec(), to suppress the UTF-8 validity  check-
1035         ing of subject strings.         ing of subject strings.
1036    
1037    
1038  COMPILATION ERROR CODES  COMPILATION ERROR CODES
1039    
1040         The following table lists the error  codes  than  may  be  returned  by         The  following  table  lists  the  error  codes than may be returned by
1041         pcre_compile2(),  along with the error messages that may be returned by         pcre_compile2(), along with the error messages that may be returned  by
1042         both compiling functions.         both compiling functions.
1043    
1044            0  no error            0  no error
# Line 1088  STUDYING A PATTERN Line 1096  STUDYING A PATTERN
1096         pcre_extra *pcre_study(const pcre *code, int options         pcre_extra *pcre_study(const pcre *code, int options
1097              const char **errptr);              const char **errptr);
1098    
1099         If a compiled pattern is going to be used several times,  it  is  worth         If  a  compiled  pattern is going to be used several times, it is worth
1100         spending more time analyzing it in order to speed up the time taken for         spending more time analyzing it in order to speed up the time taken for
1101         matching. The function pcre_study() takes a pointer to a compiled  pat-         matching.  The function pcre_study() takes a pointer to a compiled pat-
1102         tern as its first argument. If studying the pattern produces additional         tern as its first argument. If studying the pattern produces additional
1103         information that will help speed up matching,  pcre_study()  returns  a         information  that  will  help speed up matching, pcre_study() returns a
1104         pointer  to a pcre_extra block, in which the study_data field points to         pointer to a pcre_extra block, in which the study_data field points  to
1105         the results of the study.         the results of the study.
1106    
1107         The  returned  value  from  pcre_study()  can  be  passed  directly  to         The  returned  value  from  pcre_study()  can  be  passed  directly  to
1108         pcre_exec().  However,  a  pcre_extra  block also contains other fields         pcre_exec(). However, a pcre_extra block  also  contains  other  fields
1109         that can be set by the caller before the block  is  passed;  these  are         that  can  be  set  by the caller before the block is passed; these are
1110         described below in the section on matching a pattern.         described below in the section on matching a pattern.
1111    
1112         If  studying  the  pattern  does not produce any additional information         If studying the pattern does not  produce  any  additional  information
1113         pcre_study() returns NULL. In that circumstance, if the calling program         pcre_study() returns NULL. In that circumstance, if the calling program
1114         wants  to  pass  any of the other fields to pcre_exec(), it must set up         wants to pass any of the other fields to pcre_exec(), it  must  set  up
1115         its own pcre_extra block.         its own pcre_extra block.
1116    
1117         The second argument of pcre_study() contains option bits.  At  present,         The  second  argument of pcre_study() contains option bits. At present,
1118         no options are defined, and this argument should always be zero.         no options are defined, and this argument should always be zero.
1119    
1120         The  third argument for pcre_study() is a pointer for an error message.         The third argument for pcre_study() is a pointer for an error  message.
1121         If studying succeeds (even if no data is  returned),  the  variable  it         If  studying  succeeds  (even  if no data is returned), the variable it
1122         points  to  is set to NULL. Otherwise it points to a textual error mes-         points to is set to NULL. Otherwise it is set to  point  to  a  textual
1123         sage. You should therefore test the error pointer for NULL after  call-         error message. This is a static string that is part of the library. You
1124         ing pcre_study(), to be sure that it has run successfully.         must not try to free it. You should test the  error  pointer  for  NULL
1125           after calling pcre_study(), to be sure that it has run successfully.
1126    
1127         This is a typical call to pcre_study():         This is a typical call to pcre_study():
1128    
# Line 1135  LOCALE SUPPORT Line 1144  LOCALE SUPPORT
1144         by  character  value.  When running in UTF-8 mode, this applies only to         by  character  value.  When running in UTF-8 mode, this applies only to
1145         characters with codes less than 128. Higher-valued  codes  never  match         characters with codes less than 128. Higher-valued  codes  never  match
1146         escapes  such  as  \w or \d, but can be tested with \p if PCRE is built         escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
1147         with Unicode character property support.         with Unicode character property support. The use of locales  with  Uni-
1148           code is discouraged.
1149    
1150         An internal set of tables is created in the default C locale when  PCRE         An  internal set of tables is created in the default C locale when PCRE
1151         is  built.  This  is  used when the final argument of pcre_compile() is         is built. This is used when the final  argument  of  pcre_compile()  is
1152         NULL, and is sufficient for many applications. An  alternative  set  of         NULL,  and  is  sufficient for many applications. An alternative set of
1153         tables  can,  however, be supplied. These may be created in a different         tables can, however, be supplied. These may be created in  a  different
1154         locale from the default. As more and more applications change to  using         locale  from the default. As more and more applications change to using
1155         Unicode, the need for this locale support is expected to die away.         Unicode, the need for this locale support is expected to die away.
1156    
1157         External  tables  are  built by calling the pcre_maketables() function,         External tables are built by calling  the  pcre_maketables()  function,
1158         which has no arguments, in the relevant locale. The result can then  be         which  has no arguments, in the relevant locale. The result can then be
1159         passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For         passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
1160         example, to build and use tables that are appropriate  for  the  French         example,  to  build  and use tables that are appropriate for the French
1161         locale  (where  accented  characters  with  values greater than 128 are         locale (where accented characters with  values  greater  than  128  are
1162         treated as letters), the following code could be used:         treated as letters), the following code could be used:
1163    
1164           setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
1165           tables = pcre_maketables();           tables = pcre_maketables();
1166           re = pcre_compile(..., tables);           re = pcre_compile(..., tables);
1167    
1168         When pcre_maketables() runs, the tables are built  in  memory  that  is         When  pcre_maketables()  runs,  the  tables are built in memory that is
1169         obtained  via  pcre_malloc. It is the caller's responsibility to ensure         obtained via pcre_malloc. It is the caller's responsibility  to  ensure
1170         that the memory containing the tables remains available for as long  as         that  the memory containing the tables remains available for as long as
1171         it is needed.         it is needed.
1172    
1173         The pointer that is passed to pcre_compile() is saved with the compiled         The pointer that is passed to pcre_compile() is saved with the compiled
1174         pattern, and the same tables are used via this pointer by  pcre_study()         pattern,  and the same tables are used via this pointer by pcre_study()
1175         and normally also by pcre_exec(). Thus, by default, for any single pat-         and normally also by pcre_exec(). Thus, by default, for any single pat-
1176         tern, compilation, studying and matching all happen in the same locale,         tern, compilation, studying and matching all happen in the same locale,
1177         but different patterns can be compiled in different locales.         but different patterns can be compiled in different locales.
1178    
1179         It  is  possible to pass a table pointer or NULL (indicating the use of         It is possible to pass a table pointer or NULL (indicating the  use  of
1180         the internal tables) to pcre_exec(). Although  not  intended  for  this         the  internal  tables)  to  pcre_exec(). Although not intended for this
1181         purpose,  this facility could be used to match a pattern in a different         purpose, this facility could be used to match a pattern in a  different
1182         locale from the one in which it was compiled. Passing table pointers at         locale from the one in which it was compiled. Passing table pointers at
1183         run time is discussed below in the section on matching a pattern.         run time is discussed below in the section on matching a pattern.
1184    
# Line 1178  INFORMATION ABOUT A PATTERN Line 1188  INFORMATION ABOUT A PATTERN
1188         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
1189              int what, void *where);              int what, void *where);
1190    
1191         The  pcre_fullinfo() function returns information about a compiled pat-         The pcre_fullinfo() function returns information about a compiled  pat-
1192         tern. It replaces the obsolete pcre_info() function, which is neverthe-         tern. It replaces the obsolete pcre_info() function, which is neverthe-
1193         less retained for backwards compability (and is documented below).         less retained for backwards compability (and is documented below).
1194    
1195         The  first  argument  for  pcre_fullinfo() is a pointer to the compiled         The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
1196         pattern. The second argument is the result of pcre_study(), or NULL  if         pattern.  The second argument is the result of pcre_study(), or NULL if
1197         the  pattern  was not studied. The third argument specifies which piece         the pattern was not studied. The third argument specifies  which  piece
1198         of information is required, and the fourth argument is a pointer  to  a         of  information  is required, and the fourth argument is a pointer to a
1199         variable  to  receive  the  data. The yield of the function is zero for         variable to receive the data. The yield of the  function  is  zero  for
1200         success, or one of the following negative numbers:         success, or one of the following negative numbers:
1201    
1202           PCRE_ERROR_NULL       the argument code was NULL           PCRE_ERROR_NULL       the argument code was NULL
# Line 1194  INFORMATION ABOUT A PATTERN Line 1204  INFORMATION ABOUT A PATTERN
1204           PCRE_ERROR_BADMAGIC   the "magic number" was not found           PCRE_ERROR_BADMAGIC   the "magic number" was not found
1205           PCRE_ERROR_BADOPTION  the value of what was invalid           PCRE_ERROR_BADOPTION  the value of what was invalid
1206    
1207         The "magic number" is placed at the start of each compiled  pattern  as         The  "magic  number" is placed at the start of each compiled pattern as
1208         an  simple check against passing an arbitrary memory pointer. Here is a         an simple check against passing an arbitrary memory pointer. Here is  a
1209         typical call of pcre_fullinfo(), to obtain the length of  the  compiled         typical  call  of pcre_fullinfo(), to obtain the length of the compiled
1210         pattern:         pattern:
1211    
1212           int rc;           int rc;
# Line 1207  INFORMATION ABOUT A PATTERN Line 1217  INFORMATION ABOUT A PATTERN
1217             PCRE_INFO_SIZE,   /* what is required */             PCRE_INFO_SIZE,   /* what is required */
1218             &length);         /* where to put the data */             &length);         /* where to put the data */
1219    
1220         The  possible  values for the third argument are defined in pcre.h, and         The possible values for the third argument are defined in  pcre.h,  and
1221         are as follows:         are as follows:
1222    
1223           PCRE_INFO_BACKREFMAX           PCRE_INFO_BACKREFMAX
1224    
1225         Return the number of the highest back reference  in  the  pattern.  The         Return  the  number  of  the highest back reference in the pattern. The
1226         fourth  argument  should  point to an int variable. Zero is returned if         fourth argument should point to an int variable. Zero  is  returned  if
1227         there are no back references.         there are no back references.
1228    
1229           PCRE_INFO_CAPTURECOUNT           PCRE_INFO_CAPTURECOUNT
1230    
1231         Return the number of capturing subpatterns in the pattern.  The  fourth         Return  the  number of capturing subpatterns in the pattern. The fourth
1232         argument should point to an int variable.         argument should point to an int variable.
1233    
1234           PCRE_INFO_DEFAULT_TABLES           PCRE_INFO_DEFAULT_TABLES
1235    
1236         Return  a pointer to the internal default character tables within PCRE.         Return a pointer to the internal default character tables within  PCRE.
1237         The fourth argument should point to an unsigned char *  variable.  This         The  fourth  argument should point to an unsigned char * variable. This
1238         information call is provided for internal use by the pcre_study() func-         information call is provided for internal use by the pcre_study() func-
1239         tion. External callers can cause PCRE to use  its  internal  tables  by         tion.  External  callers  can  cause PCRE to use its internal tables by
1240         passing a NULL table pointer.         passing a NULL table pointer.
1241    
1242           PCRE_INFO_FIRSTBYTE           PCRE_INFO_FIRSTBYTE
1243    
1244         Return  information  about  the first byte of any matched string, for a         Return information about the first byte of any matched  string,  for  a
1245         non-anchored   pattern.   (This    option    used    to    be    called         non-anchored    pattern.    (This    option    used    to   be   called
1246         PCRE_INFO_FIRSTCHAR;  the  old  name  is still recognized for backwards         PCRE_INFO_FIRSTCHAR; the old name is  still  recognized  for  backwards
1247         compatibility.)         compatibility.)
1248    
1249         If there is a fixed first byte, for example, from  a  pattern  such  as         If  there  is  a  fixed first byte, for example, from a pattern such as
1250         (cat|cow|coyote),  it  is  returned in the integer pointed to by where.         (cat|cow|coyote), it is returned in the integer pointed  to  by  where.
1251         Otherwise, if either         Otherwise, if either
1252    
1253         (a) the pattern was compiled with the PCRE_MULTILINE option, and  every         (a)  the pattern was compiled with the PCRE_MULTILINE option, and every
1254         branch starts with "^", or         branch starts with "^", or
1255    
1256         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
1257         set (if it were set, the pattern would be anchored),         set (if it were set, the pattern would be anchored),
1258    
1259         -1 is returned, indicating that the pattern matches only at  the  start         -1  is  returned, indicating that the pattern matches only at the start
1260         of  a  subject string or after any newline within the string. Otherwise         of a subject string or after any newline within the  string.  Otherwise
1261         -2 is returned. For anchored patterns, -2 is returned.         -2 is returned. For anchored patterns, -2 is returned.
1262    
1263           PCRE_INFO_FIRSTTABLE           PCRE_INFO_FIRSTTABLE
1264    
1265         If the pattern was studied, and this resulted in the construction of  a         If  the pattern was studied, and this resulted in the construction of a
1266         256-bit table indicating a fixed set of bytes for the first byte in any         256-bit table indicating a fixed set of bytes for the first byte in any
1267         matching string, a pointer to the table is returned. Otherwise NULL  is         matching  string, a pointer to the table is returned. Otherwise NULL is
1268         returned.  The fourth argument should point to an unsigned char * vari-         returned. The fourth argument should point to an unsigned char *  vari-
1269         able.         able.
1270    
1271           PCRE_INFO_LASTLITERAL           PCRE_INFO_LASTLITERAL
1272    
1273         Return the value of the rightmost literal byte that must exist  in  any         Return  the  value of the rightmost literal byte that must exist in any
1274         matched  string,  other  than  at  its  start,  if such a byte has been         matched string, other than at its  start,  if  such  a  byte  has  been
1275         recorded. The fourth argument should point to an int variable. If there         recorded. The fourth argument should point to an int variable. If there
1276         is  no such byte, -1 is returned. For anchored patterns, a last literal         is no such byte, -1 is returned. For anchored patterns, a last  literal
1277         byte is recorded only if it follows something of variable  length.  For         byte  is  recorded only if it follows something of variable length. For
1278         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
1279         /^a\dz\d/ the returned value is -1.         /^a\dz\d/ the returned value is -1.
1280    
# Line 1272  INFORMATION ABOUT A PATTERN Line 1282  INFORMATION ABOUT A PATTERN
1282           PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
1283           PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
1284    
1285         PCRE supports the use of named as well as numbered capturing  parenthe-         PCRE  supports the use of named as well as numbered capturing parenthe-
1286         ses.  The names are just an additional way of identifying the parenthe-         ses. The names are just an additional way of identifying the  parenthe-
1287         ses,  which  still  acquire  numbers.  A  convenience  function  called         ses,  which  still  acquire  numbers.  A  convenience  function  called
1288         pcre_get_named_substring()  is  provided  for  extracting an individual         pcre_get_named_substring() is provided  for  extracting  an  individual
1289         captured substring by name. It is also possible  to  extract  the  data         captured  substring  by  name.  It is also possible to extract the data
1290         directly,  by  first converting the name to a number in order to access         directly, by first converting the name to a number in order  to  access
1291         the correct pointers in the output vector (described  with  pcre_exec()         the  correct  pointers in the output vector (described with pcre_exec()
1292         below).  To  do the conversion, you need to use the name-to-number map,         below). To do the conversion, you need to use the  name-to-number  map,
1293         which is described by these three values.         which is described by these three values.
1294    
1295         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
1296         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
1297         of each entry; both of these  return  an  int  value.  The  entry  size         of  each  entry;  both  of  these  return  an int value. The entry size
1298         depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns         depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
1299         a pointer to the first entry of the table  (a  pointer  to  char).  The         a  pointer  to  the  first  entry of the table (a pointer to char). The
1300         first two bytes of each entry are the number of the capturing parenthe-         first two bytes of each entry are the number of the capturing parenthe-
1301         sis, most significant byte first. The rest of the entry is  the  corre-         sis,  most  significant byte first. The rest of the entry is the corre-
1302         sponding  name,  zero  terminated. The names are in alphabetical order.         sponding name, zero terminated. The names are  in  alphabetical  order.
1303         For example, consider the following pattern  (assume  PCRE_EXTENDED  is         For  example,  consider  the following pattern (assume PCRE_EXTENDED is
1304         set, so white space - including newlines - is ignored):         set, so white space - including newlines - is ignored):
1305    
1306           (?P<date> (?P<year>(\d\d)?\d\d) -           (?P<date> (?P<year>(\d\d)?\d\d) -
1307           (?P<month>\d\d) - (?P<day>\d\d) )           (?P<month>\d\d) - (?P<day>\d\d) )
1308    
1309         There  are  four  named subpatterns, so the table has four entries, and         There are four named subpatterns, so the table has  four  entries,  and
1310         each entry in the table is eight bytes long. The table is  as  follows,         each  entry  in the table is eight bytes long. The table is as follows,
1311         with non-printing bytes shows in hexadecimal, and undefined bytes shown         with non-printing bytes shows in hexadecimal, and undefined bytes shown
1312         as ??:         as ??:
1313    
# Line 1306  INFORMATION ABOUT A PATTERN Line 1316  INFORMATION ABOUT A PATTERN
1316           00 04 m  o  n  t  h  00           00 04 m  o  n  t  h  00
1317           00 02 y  e  a  r  00 ??           00 02 y  e  a  r  00 ??
1318    
1319         When writing code to extract data  from  named  subpatterns  using  the         When  writing  code  to  extract  data from named subpatterns using the
1320         name-to-number map, remember that the length of each entry is likely to         name-to-number map, remember that the length of each entry is likely to
1321         be different for each compiled pattern.         be different for each compiled pattern.
1322    
1323           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
1324    
1325         Return a copy of the options with which the pattern was  compiled.  The         Return  a  copy of the options with which the pattern was compiled. The
1326         fourth  argument  should  point to an unsigned long int variable. These         fourth argument should point to an unsigned long  int  variable.  These
1327         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
1328         by any top-level option settings within the pattern itself.         by any top-level option settings within the pattern itself.
1329    
1330         A  pattern  is  automatically  anchored by PCRE if all of its top-level         A pattern is automatically anchored by PCRE if  all  of  its  top-level
1331         alternatives begin with one of the following:         alternatives begin with one of the following:
1332    
1333           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 1331  INFORMATION ABOUT A PATTERN Line 1341  INFORMATION ABOUT A PATTERN
1341    
1342           PCRE_INFO_SIZE           PCRE_INFO_SIZE
1343    
1344         Return  the  size  of the compiled pattern, that is, the value that was         Return the size of the compiled pattern, that is, the  value  that  was
1345         passed as the argument to pcre_malloc() when PCRE was getting memory in         passed as the argument to pcre_malloc() when PCRE was getting memory in
1346         which to place the compiled data. The fourth argument should point to a         which to place the compiled data. The fourth argument should point to a
1347         size_t variable.         size_t variable.
# Line 1339  INFORMATION ABOUT A PATTERN Line 1349  INFORMATION ABOUT A PATTERN
1349           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
1350    
1351         Return the size of the data block pointed to by the study_data field in         Return the size of the data block pointed to by the study_data field in
1352         a  pcre_extra  block.  That  is,  it  is  the  value that was passed to         a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
1353         pcre_malloc() when PCRE was getting memory into which to place the data         pcre_malloc() when PCRE was getting memory into which to place the data
1354         created  by  pcre_study(). The fourth argument should point to a size_t         created by pcre_study(). The fourth argument should point to  a  size_t
1355         variable.         variable.
1356    
1357    
# Line 1349  OBSOLETE INFO FUNCTION Line 1359  OBSOLETE INFO FUNCTION
1359    
1360         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
1361    
1362         The pcre_info() function is now obsolete because its interface  is  too         The  pcre_info()  function is now obsolete because its interface is too
1363         restrictive  to return all the available data about a compiled pattern.         restrictive to return all the available data about a compiled  pattern.
1364         New  programs  should  use  pcre_fullinfo()  instead.  The   yield   of         New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
1365         pcre_info()  is the number of capturing subpatterns, or one of the fol-         pcre_info() is the number of capturing subpatterns, or one of the  fol-
1366         lowing negative numbers:         lowing negative numbers:
1367    
1368           PCRE_ERROR_NULL       the argument code was NULL           PCRE_ERROR_NULL       the argument code was NULL
1369           PCRE_ERROR_BADMAGIC   the "magic number" was not found           PCRE_ERROR_BADMAGIC   the "magic number" was not found
1370    
1371         If the optptr argument is not NULL, a copy of the  options  with  which         If  the  optptr  argument is not NULL, a copy of the options with which
1372         the  pattern  was  compiled  is placed in the integer it points to (see         the pattern was compiled is placed in the integer  it  points  to  (see
1373         PCRE_INFO_OPTIONS above).         PCRE_INFO_OPTIONS above).
1374    
1375         If the pattern is not anchored and the  firstcharptr  argument  is  not         If  the  pattern  is  not anchored and the firstcharptr argument is not
1376         NULL,  it is used to pass back information about the first character of         NULL, it is used to pass back information about the first character  of
1377         any matched string (see PCRE_INFO_FIRSTBYTE above).         any matched string (see PCRE_INFO_FIRSTBYTE above).
1378    
1379    
# Line 1371  REFERENCE COUNTS Line 1381  REFERENCE COUNTS
1381    
1382         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
1383    
1384         The pcre_refcount() function is used to maintain a reference  count  in         The  pcre_refcount()  function is used to maintain a reference count in
1385         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
1386         benefit of applications that  operate  in  an  object-oriented  manner,         benefit  of  applications  that  operate  in an object-oriented manner,
1387         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
1388         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
1389    
1390         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
1391         zero.   It is changed only by calling this function, whose action is to         zero.  It is changed only by calling this function, whose action is  to
1392         add the adjust value (which may be positive or  negative)  to  it.  The         add  the  adjust  value  (which may be positive or negative) to it. The
1393         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
1394         is constrained to lie between 0 and 65535, inclusive. If the new  value         is  constrained to lie between 0 and 65535, inclusive. If the new value
1395         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
1396    
1397         Except  when it is zero, the reference count is not correctly preserved         Except when it is zero, the reference count is not correctly  preserved
1398         if a pattern is compiled on one host and then  transferred  to  a  host         if  a  pattern  is  compiled on one host and then transferred to a host
1399         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
1400    
1401    
# Line 1395  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1405  MATCHING A PATTERN: THE TRADITIONAL FUNC
1405              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
1406              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
1407    
1408         The function pcre_exec() is called to match a subject string against  a         The  function pcre_exec() is called to match a subject string against a
1409         compiled  pattern, which is passed in the code argument. If the pattern         compiled pattern, which is passed in the code argument. If the  pattern
1410         has been studied, the result of the study should be passed in the extra         has been studied, the result of the study should be passed in the extra
1411         argument.  This  function is the main matching facility of the library,         argument. This function is the main matching facility of  the  library,
1412         and it operates in a Perl-like manner. For specialist use there is also         and it operates in a Perl-like manner. For specialist use there is also
1413         an  alternative matching function, which is described below in the sec-         an alternative matching function, which is described below in the  sec-
1414         tion about the pcre_dfa_exec() function.         tion about the pcre_dfa_exec() function.
1415    
1416         In most applications, the pattern will have been compiled (and  option-         In  most applications, the pattern will have been compiled (and option-
1417         ally  studied)  in the same process that calls pcre_exec(). However, it         ally studied) in the same process that calls pcre_exec().  However,  it
1418         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
1419         later  in  different processes, possibly even on different hosts. For a         later in different processes, possibly even on different hosts.  For  a
1420         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
1421    
1422         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 1425  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1435  MATCHING A PATTERN: THE TRADITIONAL FUNC
1435    
1436     Extra data for pcre_exec()     Extra data for pcre_exec()
1437    
1438         If the extra argument is not NULL, it must point to a  pcre_extra  data         If  the  extra argument is not NULL, it must point to a pcre_extra data
1439         block.  The pcre_study() function returns such a block (when it doesn't         block. The pcre_study() function returns such a block (when it  doesn't
1440         return NULL), but you can also create one for yourself, and pass  addi-         return  NULL), but you can also create one for yourself, and pass addi-
1441         tional  information in it. The fields in a pcre_extra block are as fol-         tional information in it. The pcre_extra block contains  the  following
1442         lows:         fields (not necessarily in this order):
1443    
1444           unsigned long int flags;           unsigned long int flags;
1445           void *study_data;           void *study_data;
1446           unsigned long int match_limit;           unsigned long int match_limit;
1447             unsigned long int match_limit_recursion;
1448           void *callout_data;           void *callout_data;
1449           const unsigned char *tables;           const unsigned char *tables;
1450    
1451         The flags field is a bitmap that specifies which of  the  other  fields         The  flags  field  is a bitmap that specifies which of the other fields
1452         are set. The flag bits are:         are set. The flag bits are:
1453    
1454           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_STUDY_DATA
1455           PCRE_EXTRA_MATCH_LIMIT           PCRE_EXTRA_MATCH_LIMIT
1456             PCRE_EXTRA_MATCH_LIMIT_RECURSION
1457           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_CALLOUT_DATA
1458           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
1459    
1460         Other  flag  bits should be set to zero. The study_data field is set in         Other flag bits should be set to zero. The study_data field is  set  in
1461         the pcre_extra block that is returned by  pcre_study(),  together  with         the  pcre_extra  block  that is returned by pcre_study(), together with
1462         the appropriate flag bit. You should not set this yourself, but you may         the appropriate flag bit. You should not set this yourself, but you may
1463         add to the block by setting the other fields  and  their  corresponding         add  to  the  block by setting the other fields and their corresponding
1464         flag bits.         flag bits.
1465    
1466         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
1467         a vast amount of resources when running patterns that are not going  to         a  vast amount of resources when running patterns that are not going to
1468         match,  but  which  have  a very large number of possibilities in their         match, but which have a very large number  of  possibilities  in  their
1469         search trees. The classic  example  is  the  use  of  nested  unlimited         search  trees.  The  classic  example  is  the  use of nested unlimited
1470         repeats.         repeats.
1471    
1472         Internally,  PCRE uses a function called match() which it calls repeat-         Internally, PCRE uses a function called match() which it calls  repeat-
1473         edly (sometimes recursively). The limit is imposed  on  the  number  of         edly  (sometimes  recursively). The limit set by match_limit is imposed
1474         times  this  function is called during a match, which has the effect of         on the number of times this function is called during  a  match,  which
1475         limiting the amount of recursion and backtracking that can take  place.         has  the  effect  of  limiting the amount of backtracking that can take
1476         For patterns that are not anchored, the count starts from zero for each         place. For patterns that are not anchored, the count restarts from zero
1477         position in the subject string.         for each position in the subject string.
1478    
1479         The default limit for the library can be set when PCRE  is  built;  the         The  default  value  for  the  limit can be set when PCRE is built; the
1480         default  default  is 10 million, which handles all but the most extreme         default default is 10 million, which handles all but the  most  extreme
1481         cases. You can reduce  the  default  by  suppling  pcre_exec()  with  a         cases.  You  can  override  the  default by suppling pcre_exec() with a
1482         pcre_extra  block  in  which match_limit is set to a smaller value, and         pcre_extra    block    in    which    match_limit    is    set,     and
1483         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
1484         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1485    
1486         The  pcre_callout  field is used in conjunction with the "callout" fea-         The match_limit_recursion field is similar to match_limit, but  instead
1487           of limiting the total number of times that match() is called, it limits
1488           the depth of recursion. The recursion depth is a  smaller  number  than
1489           the  total number of calls, because not all calls to match() are recur-
1490           sive.  This limit is of use only if it is set smaller than match_limit.
1491    
1492           Limiting  the  recursion  depth  limits the amount of stack that can be
1493           used, or, when PCRE has been compiled to use memory on the heap instead
1494           of the stack, the amount of heap memory that can be used.
1495    
1496           The  default  value  for  match_limit_recursion can be set when PCRE is
1497           built; the default default  is  the  same  value  as  the  default  for
1498           match_limit.  You can override the default by suppling pcre_exec() with
1499           a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
1500           PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
1501           limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1502    
1503           The pcre_callout field is used in conjunction with the  "callout"  fea-
1504         ture, which is described in the pcrecallout documentation.         ture, which is described in the pcrecallout documentation.
1505    
1506         The tables field  is  used  to  pass  a  character  tables  pointer  to         The  tables  field  is  used  to  pass  a  character  tables pointer to
1507         pcre_exec();  this overrides the value that is stored with the compiled         pcre_exec(); this overrides the value that is stored with the  compiled
1508         pattern. A non-NULL value is stored with the compiled pattern  only  if         pattern.  A  non-NULL value is stored with the compiled pattern only if
1509         custom  tables  were  supplied to pcre_compile() via its tableptr argu-         custom tables were supplied to pcre_compile() via  its  tableptr  argu-
1510         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
1511         PCRE's  internal  tables  to be used. This facility is helpful when re-         PCRE's internal tables to be used. This facility is  helpful  when  re-
1512         using patterns that have been saved after compiling  with  an  external         using  patterns  that  have been saved after compiling with an external
1513         set  of  tables,  because  the  external tables might be at a different         set of tables, because the external tables  might  be  at  a  different
1514         address when pcre_exec() is called. See the  pcreprecompile  documenta-         address  when  pcre_exec() is called. See the pcreprecompile documenta-
1515         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
1516    
1517     Option bits for pcre_exec()     Option bits for pcre_exec()
1518    
1519         The  unused  bits of the options argument for pcre_exec() must be zero.         The unused bits of the options argument for pcre_exec() must  be  zero.
1520         The  only  bits  that  may  be  set  are  PCRE_ANCHORED,   PCRE_NOTBOL,         The   only  bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NOTBOL,
1521         PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.         PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
1522    
1523           PCRE_ANCHORED           PCRE_ANCHORED
1524    
1525         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
1526         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
1527         turned  out to be anchored by virtue of its contents, it cannot be made         turned out to be anchored by virtue of its contents, it cannot be  made
1528         unachored at matching time.         unachored at matching time.
1529    
1530           PCRE_NOTBOL           PCRE_NOTBOL
1531    
1532         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
1533         the  beginning  of  a  line, so the circumflex metacharacter should not         the beginning of a line, so the  circumflex  metacharacter  should  not
1534         match before it. Setting this without PCRE_MULTILINE (at compile  time)         match  before it. Setting this without PCRE_MULTILINE (at compile time)
1535         causes  circumflex  never to match. This option affects only the behav-         causes circumflex never to match. This option affects only  the  behav-
1536         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
1537    
1538           PCRE_NOTEOL           PCRE_NOTEOL
1539    
1540         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
1541         of  a line, so the dollar metacharacter should not match it nor (except         of a line, so the dollar metacharacter should not match it nor  (except
1542         in multiline mode) a newline immediately before it. Setting this  with-         in  multiline mode) a newline immediately before it. Setting this with-
1543         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
1544         option affects only the behaviour of the dollar metacharacter. It  does         option  affects only the behaviour of the dollar metacharacter. It does
1545         not affect \Z or \z.         not affect \Z or \z.
1546    
1547           PCRE_NOTEMPTY           PCRE_NOTEMPTY
1548    
1549         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
1550         set. If there are alternatives in the pattern, they are tried.  If  all         set.  If  there are alternatives in the pattern, they are tried. If all
1551         the  alternatives  match  the empty string, the entire match fails. For         the alternatives match the empty string, the entire  match  fails.  For
1552         example, if the pattern         example, if the pattern
1553    
1554           a?b?           a?b?
1555    
1556         is applied to a string not beginning with "a" or "b",  it  matches  the         is  applied  to  a string not beginning with "a" or "b", it matches the
1557         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
1558         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
1559         rences of "a" or "b".         rences of "a" or "b".
1560    
1561         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
1562         cial case of a pattern match of the empty  string  within  its  split()         cial  case  of  a  pattern match of the empty string within its split()
1563         function,  and  when  using  the /g modifier. It is possible to emulate         function, and when using the /g modifier. It  is  possible  to  emulate
1564         Perl's behaviour after matching a null string by first trying the match         Perl's behaviour after matching a null string by first trying the match
1565         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
1566         if that fails by advancing the starting offset (see below)  and  trying         if  that  fails by advancing the starting offset (see below) and trying
1567         an ordinary match again. There is some code that demonstrates how to do         an ordinary match again. There is some code that demonstrates how to do
1568         this in the pcredemo.c sample program.         this in the pcredemo.c sample program.
1569    
1570           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
1571    
1572         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
1573         UTF-8  string is automatically checked when pcre_exec() is subsequently         UTF-8 string is automatically checked when pcre_exec() is  subsequently
1574         called.  The value of startoffset is also checked  to  ensure  that  it         called.   The  value  of  startoffset is also checked to ensure that it
1575         points  to the start of a UTF-8 character. If an invalid UTF-8 sequence         points to the start of a UTF-8 character. If an invalid UTF-8  sequence
1576         of bytes is found, pcre_exec() returns the error PCRE_ERROR_BADUTF8. If         of bytes is found, pcre_exec() returns the error PCRE_ERROR_BADUTF8. If
1577         startoffset  contains  an  invalid  value, PCRE_ERROR_BADUTF8_OFFSET is         startoffset contains an  invalid  value,  PCRE_ERROR_BADUTF8_OFFSET  is
1578         returned.         returned.
1579    
1580         If you already know that your subject is valid, and you  want  to  skip         If  you  already  know that your subject is valid, and you want to skip
1581         these    checks    for   performance   reasons,   you   can   set   the         these   checks   for   performance   reasons,   you   can    set    the
1582         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
1583         do  this  for the second and subsequent calls to pcre_exec() if you are         do this for the second and subsequent calls to pcre_exec() if  you  are
1584         making repeated calls to find all  the  matches  in  a  single  subject         making  repeated  calls  to  find  all  the matches in a single subject
1585         string.  However,  you  should  be  sure  that the value of startoffset         string. However, you should be  sure  that  the  value  of  startoffset
1586         points to the start of a UTF-8 character.  When  PCRE_NO_UTF8_CHECK  is         points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
1587         set,  the  effect of passing an invalid UTF-8 string as a subject, or a         set, the effect of passing an invalid UTF-8 string as a subject,  or  a
1588         value of startoffset that does not point to the start of a UTF-8  char-         value  of startoffset that does not point to the start of a UTF-8 char-
1589         acter, is undefined. Your program may crash.         acter, is undefined. Your program may crash.
1590    
1591           PCRE_PARTIAL           PCRE_PARTIAL
1592    
1593         This  option  turns  on  the  partial  matching feature. If the subject         This option turns on the  partial  matching  feature.  If  the  subject
1594         string fails to match the pattern, but at some point during the  match-         string  fails to match the pattern, but at some point during the match-
1595         ing  process  the  end of the subject was reached (that is, the subject         ing process the end of the subject was reached (that  is,  the  subject
1596         partially matches the pattern and the failure to  match  occurred  only         partially  matches  the  pattern and the failure to match occurred only
1597         because  there were not enough subject characters), pcre_exec() returns         because there were not enough subject characters), pcre_exec()  returns
1598         PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL  is         PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
1599         used,  there  are restrictions on what may appear in the pattern. These         used, there are restrictions on what may appear in the  pattern.  These
1600         are discussed in the pcrepartial documentation.         are discussed in the pcrepartial documentation.
1601    
1602     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
1603    
1604         The subject string is passed to pcre_exec() as a pointer in subject,  a         The  subject string is passed to pcre_exec() as a pointer in subject, a
1605         length  in  length, and a starting byte offset in startoffset. In UTF-8         length in length, and a starting byte offset in startoffset.  In  UTF-8
1606         mode, the byte offset must point to the start  of  a  UTF-8  character.         mode,  the  byte  offset  must point to the start of a UTF-8 character.
1607         Unlike  the  pattern string, the subject may contain binary zero bytes.         Unlike the pattern string, the subject may contain binary  zero  bytes.
1608         When the starting offset is zero, the search for a match starts at  the         When  the starting offset is zero, the search for a match starts at the
1609         beginning of the subject, and this is by far the most common case.         beginning of the subject, and this is by far the most common case.
1610    
1611         A  non-zero  starting offset is useful when searching for another match         A non-zero starting offset is useful when searching for  another  match
1612         in the same subject by calling pcre_exec() again after a previous  suc-         in  the same subject by calling pcre_exec() again after a previous suc-
1613         cess.   Setting  startoffset differs from just passing over a shortened         cess.  Setting startoffset differs from just passing over  a  shortened
1614         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
1615         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
1616    
1617           \Biss\B           \Biss\B
1618    
1619         which  finds  occurrences  of "iss" in the middle of words. (\B matches         which finds occurrences of "iss" in the middle of  words.  (\B  matches
1620         only if the current position in the subject is not  a  word  boundary.)         only  if  the  current position in the subject is not a word boundary.)
1621         When  applied  to the string "Mississipi" the first call to pcre_exec()         When applied to the string "Mississipi" the first call  to  pcre_exec()
1622         finds the first occurrence. If pcre_exec() is called  again  with  just         finds  the  first  occurrence. If pcre_exec() is called again with just
1623         the  remainder  of  the  subject,  namely  "issipi", it does not match,         the remainder of the subject,  namely  "issipi",  it  does  not  match,
1624         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
1625         to  be  a  word  boundary. However, if pcre_exec() is passed the entire         to be a word boundary. However, if pcre_exec()  is  passed  the  entire
1626         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
1627         rence  of "iss" because it is able to look behind the starting point to         rence of "iss" because it is able to look behind the starting point  to
1628         discover that it is preceded by a letter.         discover that it is preceded by a letter.
1629    
1630         If a non-zero starting offset is passed when the pattern  is  anchored,         If  a  non-zero starting offset is passed when the pattern is anchored,
1631         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
1632         if the pattern does not require the match to be at  the  start  of  the         if  the  pattern  does  not require the match to be at the start of the
1633         subject.         subject.
1634    
1635     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
1636    
1637         In  general, a pattern matches a certain portion of the subject, and in         In general, a pattern matches a certain portion of the subject, and  in
1638         addition, further substrings from the subject  may  be  picked  out  by         addition,  further  substrings  from  the  subject may be picked out by
1639         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
1640         this is called "capturing" in what follows, and the  phrase  "capturing         this  is  called "capturing" in what follows, and the phrase "capturing
1641         subpattern"  is  used for a fragment of a pattern that picks out a sub-         subpattern" is used for a fragment of a pattern that picks out  a  sub-
1642         string. PCRE supports several other kinds of  parenthesized  subpattern         string.  PCRE  supports several other kinds of parenthesized subpattern
1643         that do not cause substrings to be captured.         that do not cause substrings to be captured.
1644    
1645         Captured  substrings are returned to the caller via a vector of integer         Captured substrings are returned to the caller via a vector of  integer
1646         offsets whose address is passed in ovector. The number of  elements  in         offsets  whose  address is passed in ovector. The number of elements in
1647         the  vector is passed in ovecsize, which must be a non-negative number.         the vector is passed in ovecsize, which must be a non-negative  number.
1648         Note: this argument is NOT the size of ovector in bytes.         Note: this argument is NOT the size of ovector in bytes.
1649    
1650         The first two-thirds of the vector is used to pass back  captured  sub-         The  first  two-thirds of the vector is used to pass back captured sub-
1651         strings,  each  substring using a pair of integers. The remaining third         strings, each substring using a pair of integers. The  remaining  third
1652         of the vector is used as workspace by pcre_exec() while  matching  cap-         of  the  vector is used as workspace by pcre_exec() while matching cap-
1653         turing  subpatterns, and is not available for passing back information.         turing subpatterns, and is not available for passing back  information.
1654         The length passed in ovecsize should always be a multiple of three.  If         The  length passed in ovecsize should always be a multiple of three. If
1655         it is not, it is rounded down.         it is not, it is rounded down.
1656    
1657         When  a  match  is successful, information about captured substrings is         When a match is successful, information about  captured  substrings  is
1658         returned in pairs of integers, starting at the  beginning  of  ovector,         returned  in  pairs  of integers, starting at the beginning of ovector,
1659         and  continuing  up  to two-thirds of its length at the most. The first         and continuing up to two-thirds of its length at the  most.  The  first
1660         element of a pair is set to the offset of the first character in a sub-         element of a pair is set to the offset of the first character in a sub-
1661         string,  and  the  second  is  set to the offset of the first character         string, and the second is set to the  offset  of  the  first  character
1662         after the end of a substring. The  first  pair,  ovector[0]  and  ovec-         after  the  end  of  a  substring. The first pair, ovector[0] and ovec-
1663         tor[1],  identify  the  portion  of  the  subject string matched by the         tor[1], identify the portion of  the  subject  string  matched  by  the
1664         entire pattern. The next pair is used for the first  capturing  subpat-         entire  pattern.  The next pair is used for the first capturing subpat-
1665         tern,  and  so  on.  The value returned by pcre_exec() is the number of         tern, and so on. The value returned by pcre_exec()  is  the  number  of
1666         pairs that have been set. If there are no  capturing  subpatterns,  the         pairs  that  have  been set. If there are no capturing subpatterns, the
1667         return  value  from  a  successful match is 1, indicating that just the         return value from a successful match is 1,  indicating  that  just  the
1668         first pair of offsets has been set.         first pair of offsets has been set.
1669    
1670         Some convenience functions are provided  for  extracting  the  captured         Some  convenience  functions  are  provided for extracting the captured
1671         substrings  as  separate  strings. These are described in the following         substrings as separate strings. These are described  in  the  following
1672         section.         section.
1673    
1674         It is possible for an capturing subpattern number  n+1  to  match  some         It  is  possible  for  an capturing subpattern number n+1 to match some
1675         part  of  the  subject  when subpattern n has not been used at all. For         part of the subject when subpattern n has not been  used  at  all.  For
1676         example, if the string "abc" is matched against the pattern (a|(z))(bc)         example, if the string "abc" is matched against the pattern (a|(z))(bc)
1677         subpatterns  1 and 3 are matched, but 2 is not. When this happens, both         subpatterns 1 and 3 are matched, but 2 is not. When this happens,  both
1678         offset values corresponding to the unused subpattern are set to -1.         offset values corresponding to the unused subpattern are set to -1.
1679    
1680         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
1681         of the string that it matched that is returned.         of the string that it matched that is returned.
1682    
1683         If  the vector is too small to hold all the captured substring offsets,         If the vector is too small to hold all the captured substring  offsets,
1684         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
1685         function  returns a value of zero. In particular, if the substring off-         function returns a value of zero. In particular, if the substring  off-
1686         sets are not of interest, pcre_exec() may be called with ovector passed         sets are not of interest, pcre_exec() may be called with ovector passed
1687         as  NULL  and  ovecsize  as zero. However, if the pattern contains back         as NULL and ovecsize as zero. However, if  the  pattern  contains  back
1688         references and the ovector is not big enough to  remember  the  related         references  and  the  ovector is not big enough to remember the related
1689         substrings,  PCRE has to get additional memory for use during matching.         substrings, PCRE has to get additional memory for use during  matching.
1690         Thus it is usually advisable to supply an ovector.         Thus it is usually advisable to supply an ovector.
1691    
1692         Note that pcre_info() can be used to find out how many  capturing  sub-         Note  that  pcre_info() can be used to find out how many capturing sub-
1693         patterns there are in a compiled pattern. The smallest size for ovector         patterns there are in a compiled pattern. The smallest size for ovector
1694         that will allow for n captured substrings, in addition to  the  offsets         that  will  allow for n captured substrings, in addition to the offsets
1695         of the substring matched by the whole pattern, is (n+1)*3.         of the substring matched by the whole pattern, is (n+1)*3.
1696    
1697     Return values from pcre_exec()     Return values from pcre_exec()
1698    
1699         If  pcre_exec()  fails, it returns a negative number. The following are         If pcre_exec() fails, it returns a negative number. The  following  are
1700         defined in the header file:         defined in the header file:
1701    
1702           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 1676  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1705  MATCHING A PATTERN: THE TRADITIONAL FUNC
1705    
1706           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
1707    
1708         Either code or subject was passed as NULL,  or  ovector  was  NULL  and         Either  code  or  subject  was  passed as NULL, or ovector was NULL and
1709         ovecsize was not zero.         ovecsize was not zero.
1710    
1711           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 1685  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1714  MATCHING A PATTERN: THE TRADITIONAL FUNC
1714    
1715           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
1716    
1717         PCRE  stores a 4-byte "magic number" at the start of the compiled code,         PCRE stores a 4-byte "magic number" at the start of the compiled  code,
1718         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
1719         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
1720         an environment with the other endianness. This is the error  that  PCRE         an  environment  with the other endianness. This is the error that PCRE
1721         gives when the magic number is not present.         gives when the magic number is not present.
1722    
1723           PCRE_ERROR_UNKNOWN_NODE   (-5)           PCRE_ERROR_UNKNOWN_NODE   (-5)
1724    
1725         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
1726         compiled pattern. This error could be caused by a bug  in  PCRE  or  by         compiled  pattern.  This  error  could be caused by a bug in PCRE or by
1727         overwriting of the compiled pattern.         overwriting of the compiled pattern.
1728    
1729           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
1730    
1731         If  a  pattern contains back references, but the ovector that is passed         If a pattern contains back references, but the ovector that  is  passed
1732         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
1733         PCRE  gets  a  block of memory at the start of matching to use for this         PCRE gets a block of memory at the start of matching to  use  for  this
1734         purpose. If the call via pcre_malloc() fails, this error is given.  The         purpose.  If the call via pcre_malloc() fails, this error is given. The
1735         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
1736    
1737           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
1738    
1739         This  error is used by the pcre_copy_substring(), pcre_get_substring(),         This error is used by the pcre_copy_substring(),  pcre_get_substring(),
1740         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
1741         returned by pcre_exec().         returned by pcre_exec().
1742    
1743           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
1744    
1745         The  recursion  and backtracking limit, as specified by the match_limit         The backtracking limit, as specified by  the  match_limit  field  in  a
1746         field in a pcre_extra structure (or defaulted)  was  reached.  See  the         pcre_extra  structure  (or  defaulted) was reached. See the description
1747           above.
1748    
1749             PCRE_ERROR_RECURSIONLIMIT (-21)
1750    
1751           The internal recursion limit, as specified by the match_limit_recursion
1752           field  in  a  pcre_extra  structure (or defaulted) was reached. See the
1753         description above.         description above.
1754    
1755           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
1756    
1757         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
1758         use by callout functions that want to yield a distinctive  error  code.         use  by  callout functions that want to yield a distinctive error code.
1759         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
1760    
1761           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
1762    
1763         A  string  that contains an invalid UTF-8 byte sequence was passed as a         A string that contains an invalid UTF-8 byte sequence was passed  as  a
1764         subject.         subject.
1765    
1766           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
1767    
1768         The UTF-8 byte sequence that was passed as a subject was valid, but the         The UTF-8 byte sequence that was passed as a subject was valid, but the
1769         value  of startoffset did not point to the beginning of a UTF-8 charac-         value of startoffset did not point to the beginning of a UTF-8  charac-
1770         ter.         ter.
1771    
1772           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
1773    
1774         The subject string did not match, but it did match partially.  See  the         The  subject  string did not match, but it did match partially. See the
1775         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
1776    
1777           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
1778    
1779         The  PCRE_PARTIAL  option  was  used with a compiled pattern containing         The PCRE_PARTIAL option was used with  a  compiled  pattern  containing
1780         items that are not supported for partial matching. See the  pcrepartial         items  that are not supported for partial matching. See the pcrepartial
1781         documentation for details of partial matching.         documentation for details of partial matching.
1782    
1783           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
1784    
1785         An  unexpected  internal error has occurred. This error could be caused         An unexpected internal error has occurred. This error could  be  caused
1786         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
1787    
1788           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
1789    
1790         This error is given if the value of the ovecsize argument is  negative.         This  error is given if the value of the ovecsize argument is negative.
1791    
1792    
1793  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
# Line 1768  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 1803  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
1803         int pcre_get_substring_list(const char *subject,         int pcre_get_substring_list(const char *subject,
1804              int *ovector, int stringcount, const char ***listptr);              int *ovector, int stringcount, const char ***listptr);
1805    
1806         Captured  substrings  can  be  accessed  directly  by using the offsets         Captured substrings can be  accessed  directly  by  using  the  offsets
1807         returned by pcre_exec() in  ovector.  For  convenience,  the  functions         returned  by  pcre_exec()  in  ovector.  For convenience, the functions
1808         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
1809         string_list() are provided for extracting captured substrings  as  new,         string_list()  are  provided for extracting captured substrings as new,
1810         separate,  zero-terminated strings. These functions identify substrings         separate, zero-terminated strings. These functions identify  substrings
1811         by number. The next section describes functions  for  extracting  named         by  number.  The  next section describes functions for extracting named
1812         substrings.  A  substring  that  contains  a  binary  zero is correctly         substrings. A substring  that  contains  a  binary  zero  is  correctly
1813         extracted and has a further zero added on the end, but  the  result  is         extracted  and  has  a further zero added on the end, but the result is
1814         not, of course, a C string.         not, of course, a C string.
1815    
1816         The  first  three  arguments  are the same for all three of these func-         The first three arguments are the same for all  three  of  these  func-
1817         tions: subject is the subject string that has  just  been  successfully         tions:  subject  is  the subject string that has just been successfully
1818         matched, ovector is a pointer to the vector of integer offsets that was         matched, ovector is a pointer to the vector of integer offsets that was
1819         passed to pcre_exec(), and stringcount is the number of substrings that         passed to pcre_exec(), and stringcount is the number of substrings that
1820         were  captured  by  the match, including the substring that matched the         were captured by the match, including the substring  that  matched  the
1821         entire regular expression. This is the value returned by pcre_exec() if         entire regular expression. This is the value returned by pcre_exec() if
1822         it  is greater than zero. If pcre_exec() returned zero, indicating that         it is greater than zero. If pcre_exec() returned zero, indicating  that
1823         it ran out of space in ovector, the value passed as stringcount  should         it  ran out of space in ovector, the value passed as stringcount should
1824         be the number of elements in the vector divided by three.         be the number of elements in the vector divided by three.
1825    
1826         The  functions pcre_copy_substring() and pcre_get_substring() extract a         The functions pcre_copy_substring() and pcre_get_substring() extract  a
1827         single substring, whose number is given as  stringnumber.  A  value  of         single  substring,  whose  number  is given as stringnumber. A value of
1828         zero  extracts  the  substring that matched the entire pattern, whereas         zero extracts the substring that matched the  entire  pattern,  whereas
1829         higher values  extract  the  captured  substrings.  For  pcre_copy_sub-         higher  values  extract  the  captured  substrings.  For pcre_copy_sub-
1830         string(),  the  string  is  placed  in buffer, whose length is given by         string(), the string is placed in buffer,  whose  length  is  given  by
1831         buffersize, while for pcre_get_substring() a new  block  of  memory  is         buffersize,  while  for  pcre_get_substring()  a new block of memory is
1832         obtained  via  pcre_malloc,  and its address is returned via stringptr.         obtained via pcre_malloc, and its address is  returned  via  stringptr.
1833         The yield of the function is the length of the  string,  not  including         The  yield  of  the function is the length of the string, not including
1834         the terminating zero, or one of         the terminating zero, or one of
1835    
1836           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
1837    
1838         The  buffer  was too small for pcre_copy_substring(), or the attempt to         The buffer was too small for pcre_copy_substring(), or the  attempt  to
1839         get memory failed for pcre_get_substring().         get memory failed for pcre_get_substring().
1840    
1841           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
1842    
1843         There is no substring whose number is stringnumber.         There is no substring whose number is stringnumber.
1844    
1845         The pcre_get_substring_list()  function  extracts  all  available  sub-         The  pcre_get_substring_list()  function  extracts  all  available sub-
1846         strings  and  builds  a list of pointers to them. All this is done in a         strings and builds a list of pointers to them. All this is  done  in  a
1847         single block of memory that is obtained via pcre_malloc. The address of         single block of memory that is obtained via pcre_malloc. The address of
1848         the  memory  block  is returned via listptr, which is also the start of         the memory block is returned via listptr, which is also  the  start  of
1849         the list of string pointers. The end of the list is marked  by  a  NULL         the  list  of  string pointers. The end of the list is marked by a NULL
1850         pointer. The yield of the function is zero if all went well, or         pointer. The yield of the function is zero if all went well, or
1851    
1852           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
1853    
1854         if the attempt to get the memory block failed.         if the attempt to get the memory block failed.
1855    
1856         When  any of these functions encounter a substring that is unset, which         When any of these functions encounter a substring that is unset,  which
1857         can happen when capturing subpattern number n+1 matches  some  part  of         can  happen  when  capturing subpattern number n+1 matches some part of
1858         the  subject, but subpattern n has not been used at all, they return an         the subject, but subpattern n has not been used at all, they return  an
1859         empty string. This can be distinguished from a genuine zero-length sub-         empty string. This can be distinguished from a genuine zero-length sub-
1860         string  by inspecting the appropriate offset in ovector, which is nega-         string by inspecting the appropriate offset in ovector, which is  nega-
1861         tive for unset substrings.         tive for unset substrings.
1862    
1863         The two convenience functions pcre_free_substring() and  pcre_free_sub-         The  two convenience functions pcre_free_substring() and pcre_free_sub-
1864         string_list()  can  be  used  to free the memory returned by a previous         string_list() can be used to free the memory  returned  by  a  previous
1865         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
1866         tively.  They  do  nothing  more  than  call the function pointed to by         tively. They do nothing more than  call  the  function  pointed  to  by
1867         pcre_free, which of course could be called directly from a  C  program.         pcre_free,  which  of course could be called directly from a C program.
1868         However,  PCRE is used in some situations where it is linked via a spe-         However, PCRE is used in some situations where it is linked via a  spe-
1869         cial  interface  to  another  programming  language  which  cannot  use         cial  interface  to  another  programming  language  which  cannot  use
1870         pcre_free  directly;  it is for these cases that the functions are pro-         pcre_free directly; it is for these cases that the functions  are  pro-
1871         vided.         vided.
1872    
1873    
# Line 1851  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 1886  EXTRACTING CAPTURED SUBSTRINGS BY NAME
1886              int stringcount, const char *stringname,              int stringcount, const char *stringname,
1887              const char **stringptr);              const char **stringptr);
1888    
1889         To extract a substring by name, you first have to find associated  num-         To  extract a substring by name, you first have to find associated num-
1890         ber.  For example, for this pattern         ber.  For example, for this pattern
1891    
1892           (a+)b(?P<xxx>\d+)...           (a+)b(?P<xxx>\d+)...
1893    
1894         the number of the subpattern called "xxx" is 2. You can find the number         the number of the subpattern called "xxx" is 2. You can find the number
1895         from the name by calling pcre_get_stringnumber(). The first argument is         from the name by calling pcre_get_stringnumber(). The first argument is
1896         the  compiled  pattern,  and  the  second is the name. The yield of the         the compiled pattern, and the second is the  name.  The  yield  of  the
1897         function is the subpattern number, or  PCRE_ERROR_NOSUBSTRING  (-7)  if         function  is  the  subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if
1898         there is no subpattern of that name.         there is no subpattern of that name.
1899    
1900         Given the number, you can extract the substring directly, or use one of         Given the number, you can extract the substring directly, or use one of
1901         the functions described in the previous section. For convenience, there         the functions described in the previous section. For convenience, there
1902         are also two functions that do the whole job.         are also two functions that do the whole job.
1903    
1904         Most    of    the    arguments   of   pcre_copy_named_substring()   and         Most   of   the   arguments    of    pcre_copy_named_substring()    and
1905         pcre_get_named_substring() are the same  as  those  for  the  similarly         pcre_get_named_substring()  are  the  same  as  those for the similarly
1906         named  functions  that extract by number. As these are described in the         named functions that extract by number. As these are described  in  the
1907         previous section, they are not re-described here. There  are  just  two         previous  section,  they  are not re-described here. There are just two
1908         differences:         differences:
1909    
1910         First,  instead  of a substring number, a substring name is given. Sec-         First, instead of a substring number, a substring name is  given.  Sec-
1911         ond, there is an extra argument, given at the start, which is a pointer         ond, there is an extra argument, given at the start, which is a pointer
1912         to  the compiled pattern. This is needed in order to gain access to the         to the compiled pattern. This is needed in order to gain access to  the
1913         name-to-number translation table.         name-to-number translation table.
1914    
1915         These functions call pcre_get_stringnumber(), and if it succeeds,  they         These  functions call pcre_get_stringnumber(), and if it succeeds, they
1916         then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
1917         ate.         ate.
1918    
1919    
1920  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
1921    
1922         The traditional matching function uses a  similar  algorithm  to  Perl,         The  traditional  matching  function  uses a similar algorithm to Perl,
1923         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
1924         the subject. If you want to find all possible matches, or  the  longest         the  subject.  If you want to find all possible matches, or the longest
1925         possible  match,  consider using the alternative matching function (see         possible match, consider using the alternative matching  function  (see
1926         below) instead. If you cannot use the alternative function,  but  still         below)  instead.  If you cannot use the alternative function, but still
1927         need  to  find all possible matches, you can kludge it up by making use         need to find all possible matches, you can kludge it up by  making  use
1928         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
1929         tation.         tation.
1930    
1931         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
1932         tern.  When your callout function is called, extract and save the  cur-         tern.   When your callout function is called, extract and save the cur-
1933         rent  matched  substring.  Then  return  1, which forces pcre_exec() to         rent matched substring. Then return  1,  which  forces  pcre_exec()  to
1934         backtrack and try other alternatives. Ultimately, when it runs  out  of         backtrack  and  try other alternatives. Ultimately, when it runs out of
1935         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
1936    
1937    
# Line 1907  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 1942  MATCHING A PATTERN: THE ALTERNATIVE FUNC
1942              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
1943              int *workspace, int wscount);              int *workspace, int wscount);
1944    
1945         The  function  pcre_dfa_exec()  is  called  to  match  a subject string         The function pcre_dfa_exec()  is  called  to  match  a  subject  string
1946         against a compiled pattern, using a "DFA" matching algorithm. This  has         against  a compiled pattern, using a "DFA" matching algorithm. This has
1947         different  characteristics to the normal algorithm, and is not compati-         different characteristics to the normal algorithm, and is not  compati-
1948         ble with Perl. Some of the features of PCRE patterns are not supported.         ble with Perl. Some of the features of PCRE patterns are not supported.
1949         Nevertheless, there are times when this kind of matching can be useful.         Nevertheless, there are times when this kind of matching can be useful.
1950         For a discussion of the two matching algorithms, see  the  pcrematching         For  a  discussion of the two matching algorithms, see the pcrematching
1951         documentation.         documentation.
1952    
1953         The  arguments  for  the  pcre_dfa_exec()  function are the same as for         The arguments for the pcre_dfa_exec() function  are  the  same  as  for
1954         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
1955         ent  way,  and  this is described below. The other common arguments are         ent way, and this is described below. The other  common  arguments  are
1956         used in the same way as for pcre_exec(), so their  description  is  not         used  in  the  same way as for pcre_exec(), so their description is not
1957         repeated here.         repeated here.
1958    
1959         The  two  additional  arguments provide workspace for the function. The         The two additional arguments provide workspace for  the  function.  The
1960         workspace vector should contain at least 20 elements. It  is  used  for         workspace  vector  should  contain at least 20 elements. It is used for
1961         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
1962         workspace will be needed for patterns and subjects where  there  are  a         workspace  will  be  needed for patterns and subjects where there are a
1963         lot of possible matches.         lot of possible matches.
1964    
1965         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_dfa_exec():
1966    
1967           int rc;           int rc;
1968           int ovector[10];           int ovector[10];
1969           int wspace[20];           int wspace[20];
1970           rc = pcre_exec(           rc = pcre_dfa_exec(
1971             re,             /* result of pcre_compile() */             re,             /* result of pcre_compile() */
1972             NULL,           /* we didn't study the pattern */             NULL,           /* we didn't study the pattern */
1973             "some string",  /* the subject string */             "some string",  /* the subject string */
# Line 1946  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 1981  MATCHING A PATTERN: THE ALTERNATIVE FUNC
1981    
1982     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
1983    
1984         The  unused  bits  of  the options argument for pcre_dfa_exec() must be         The unused bits of the options argument  for  pcre_dfa_exec()  must  be
1985         zero. The only bits that may be  set  are  PCRE_ANCHORED,  PCRE_NOTBOL,         zero.  The  only  bits  that may be set are PCRE_ANCHORED, PCRE_NOTBOL,
1986         PCRE_NOTEOL,     PCRE_NOTEMPTY,    PCRE_NO_UTF8_CHECK,    PCRE_PARTIAL,         PCRE_NOTEOL,    PCRE_NOTEMPTY,    PCRE_NO_UTF8_CHECK,     PCRE_PARTIAL,
1987         PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All  but  the  last  three  of         PCRE_DFA_SHORTEST,  and  PCRE_DFA_RESTART.  All  but  the last three of
1988         these  are  the  same  as  for pcre_exec(), so their description is not         these are the same as for pcre_exec(),  so  their  description  is  not
1989         repeated here.         repeated here.
1990    
1991           PCRE_PARTIAL           PCRE_PARTIAL
1992    
1993         This has the same general effect as it does for  pcre_exec(),  but  the         This  has  the  same general effect as it does for pcre_exec(), but the
1994         details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for         details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for
1995         pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into         pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into
1996         PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have         PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have
1997         been no complete matches, but there is still at least one matching pos-         been no complete matches, but there is still at least one matching pos-
1998         sibility.  The portion of the string that provided the partial match is         sibility. The portion of the string that provided the partial match  is
1999         set as the first matching string.         set as the first matching string.
2000    
2001           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
2002    
2003         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
2004         stop  as  soon  as  it  has found one match. Because of the way the DFA         stop as soon as it has found one match. Because  of  the  way  the  DFA
2005         algorithm works, this is necessarily the shortest possible match at the         algorithm works, this is necessarily the shortest possible match at the
2006         first possible matching point in the subject string.         first possible matching point in the subject string.
2007    
2008           PCRE_DFA_RESTART           PCRE_DFA_RESTART
2009    
2010         When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and         When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and
2011         returns a partial match, it is possible to call it  again,  with  addi-         returns  a  partial  match, it is possible to call it again, with addi-
2012         tional  subject  characters,  and have it continue with the same match.         tional subject characters, and have it continue with  the  same  match.
2013         The PCRE_DFA_RESTART option requests this action; when it is  set,  the         The  PCRE_DFA_RESTART  option requests this action; when it is set, the
2014         workspace  and wscount options must reference the same vector as before         workspace and wscount options must reference the same vector as  before
2015         because data about the match so far is left in  them  after  a  partial         because  data  about  the  match so far is left in them after a partial
2016         match.  There  is  more  discussion of this facility in the pcrepartial         match. There is more discussion of this  facility  in  the  pcrepartial
2017         documentation.         documentation.
2018    
2019     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
2020    
2021         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
2022         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
2023         of the function start at the same point in  the  subject.  The  shorter         of  the  function  start  at the same point in the subject. The shorter
2024         matches  are all initial substrings of the longer matches. For example,         matches are all initial substrings of the longer matches. For  example,
2025         if the pattern         if the pattern
2026    
2027           <.*>           <.*>
# Line 2001  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2036  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2036           <something> <something else>           <something> <something else>
2037           <something> <something else> <something further>           <something> <something else> <something further>
2038    
2039         On success, the yield of the function is a number  greater  than  zero,         On  success,  the  yield of the function is a number greater than zero,
2040         which  is  the  number of matched substrings. The substrings themselves         which is the number of matched substrings.  The  substrings  themselves
2041         are returned in ovector. Each string uses two elements;  the  first  is         are  returned  in  ovector. Each string uses two elements; the first is
2042         the  offset  to the start, and the second is the offset to the end. All         the offset to the start, and the second is the offset to the  end.  All
2043         the strings have the same start offset. (Space could have been saved by         the strings have the same start offset. (Space could have been saved by
2044         giving  this only once, but it was decided to retain some compatibility         giving this only once, but it was decided to retain some  compatibility
2045         with the way pcre_exec() returns data, even though the meaning  of  the         with  the  way pcre_exec() returns data, even though the meaning of the
2046         strings is different.)         strings is different.)
2047    
2048         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
2049         est matching string is given first. If there were too many  matches  to         est  matching  string is given first. If there were too many matches to
2050         fit  into ovector, the yield of the function is zero, and the vector is         fit into ovector, the yield of the function is zero, and the vector  is
2051         filled with the longest matches.         filled with the longest matches.
2052    
2053     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
2054    
2055         The pcre_dfa_exec() function returns a negative number when  it  fails.         The  pcre_dfa_exec()  function returns a negative number when it fails.
2056         Many  of  the  errors  are  the  same as for pcre_exec(), and these are         Many of the errors are the same  as  for  pcre_exec(),  and  these  are
2057         described above.  There are in addition the following errors  that  are         described  above.   There are in addition the following errors that are
2058         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
2059    
2060           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
2061    
2062         This  return is given if pcre_dfa_exec() encounters an item in the pat-         This return is given if pcre_dfa_exec() encounters an item in the  pat-
2063         tern that it does not support, for instance, the use of \C  or  a  back         tern  that  it  does not support, for instance, the use of \C or a back
2064         reference.         reference.
2065    
2066           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
2067    
2068         This  return is given if pcre_dfa_exec() encounters a condition item in         This return is given if pcre_dfa_exec() encounters a condition item  in
2069         a pattern that uses a back reference for the  condition.  This  is  not         a  pattern  that  uses  a back reference for the condition. This is not
2070         supported.         supported.
2071    
2072           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
2073    
2074         This  return  is given if pcre_dfa_exec() is called with an extra block         This return is given if pcre_dfa_exec() is called with an  extra  block
2075         that contains a setting of the match_limit field. This is not supported         that contains a setting of the match_limit field. This is not supported
2076         (it is meaningless).         (it is meaningless).
2077    
2078           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
2079    
2080         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
2081         workspace vector.         workspace vector.
2082    
2083           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
2084    
2085         When a recursive subpattern is processed, the matching  function  calls         When  a  recursive subpattern is processed, the matching function calls
2086         itself  recursively,  using  private vectors for ovector and workspace.         itself recursively, using private vectors for  ovector  and  workspace.
2087         This error is given if the output vector  is  not  large  enough.  This         This  error  is  given  if  the output vector is not large enough. This
2088         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
2089    
2090  Last updated: 16 May 2005  Last updated: 18 January 2006
2091  Copyright (c) 1997-2005 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.
2092  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2093    
2094    
# Line 2229  DIFFERENCES BETWEEN PCRE AND PERL Line 2264  DIFFERENCES BETWEEN PCRE AND PERL
2264         handle regular expressions. The differences  described  here  are  with         handle regular expressions. The differences  described  here  are  with
2265         respect to Perl 5.8.         respect to Perl 5.8.
2266    
2267         1.  PCRE does not have full UTF-8 support. Details of what it does have         1.  PCRE has only a subset of Perl's UTF-8 and Unicode support. Details
2268         are given in the section on UTF-8 support in the main pcre page.         of what it does have are given in the section on UTF-8 support  in  the
2269           main pcre page.
2270    
2271         2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl         2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl
2272         permits  them,  but they do not mean what you might think. For example,         permits them, but they do not mean what you might think.  For  example,
2273         (?!a){3} does not assert that the next three characters are not "a". It         (?!a){3} does not assert that the next three characters are not "a". It
2274         just asserts that the next character is not "a" three times.         just asserts that the next character is not "a" three times.
2275    
2276         3.  Capturing  subpatterns  that occur inside negative lookahead asser-         3. Capturing subpatterns that occur inside  negative  lookahead  asser-
2277         tions are counted, but their entries in the offsets  vector  are  never         tions  are  counted,  but their entries in the offsets vector are never
2278         set.  Perl sets its numerical variables from any such patterns that are         set. Perl sets its numerical variables from any such patterns that  are
2279         matched before the assertion fails to match something (thereby succeed-         matched before the assertion fails to match something (thereby succeed-
2280         ing),  but  only  if the negative lookahead assertion contains just one         ing), but only if the negative lookahead assertion  contains  just  one
2281         branch.         branch.
2282    
2283         4. Though binary zero characters are supported in the  subject  string,         4.  Though  binary zero characters are supported in the subject string,
2284         they are not allowed in a pattern string because it is passed as a nor-         they are not allowed in a pattern string because it is passed as a nor-
2285         mal C string, terminated by zero. The escape sequence \0 can be used in         mal C string, terminated by zero. The escape sequence \0 can be used in
2286         the pattern to represent a binary zero.         the pattern to represent a binary zero.
2287    
2288         5.  The  following Perl escape sequences are not supported: \l, \u, \L,         5. The following Perl escape sequences are not supported: \l,  \u,  \L,
2289         \U, and \N. In fact these are implemented by Perl's general string-han-         \U, and \N. In fact these are implemented by Perl's general string-han-
2290         dling  and are not part of its pattern matching engine. If any of these         dling and are not part of its pattern matching engine. If any of  these
2291         are encountered by PCRE, an error is generated.         are encountered by PCRE, an error is generated.
2292    
2293         6. The Perl escape sequences \p, \P, and \X are supported only if  PCRE         6.  The Perl escape sequences \p, \P, and \X are supported only if PCRE
2294         is  built  with Unicode character property support. The properties that         is built with Unicode character property support. The  properties  that
2295         can be tested with \p and \P are limited to the general category  prop-         can  be tested with \p and \P are limited to the general category prop-
2296         erties such as Lu and Nd.         erties such as Lu and Nd, script names such as Greek or  Han,  and  the
2297           derived properties Any and L&.
2298    
2299         7. PCRE does support the \Q...\E escape for quoting substrings. Charac-         7. PCRE does support the \Q...\E escape for quoting substrings. Charac-
2300         ters in between are treated as literals.  This  is  slightly  different         ters in between are treated as literals.  This  is  slightly  different
# Line 2330  DIFFERENCES BETWEEN PCRE AND PERL Line 2367  DIFFERENCES BETWEEN PCRE AND PERL
2367         (n)  The  alternative  matching function (pcre_dfa_exec()) matches in a         (n)  The  alternative  matching function (pcre_dfa_exec()) matches in a
2368         different way and is not Perl-compatible.         different way and is not Perl-compatible.
2369    
2370  Last updated: 28 February 2005  Last updated: 24 January 2006
2371  Copyright (c) 1997-2005 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.
2372  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2373    
2374    
# Line 2477  BACKSLASH Line 2514  BACKSLASH
2514           \t        tab (hex 09)           \t        tab (hex 09)
2515           \ddd      character with octal code ddd, or backreference           \ddd      character with octal code ddd, or backreference
2516           \xhh      character with hex code hh           \xhh      character with hex code hh
2517           \x{hhh..} character with hex code hhh... (UTF-8 mode only)           \x{hhh..} character with hex code hhh..
2518    
2519         The  precise  effect of \cx is as follows: if x is a lower case letter,         The  precise  effect of \cx is as follows: if x is a lower case letter,
2520         it is converted to upper case. Then bit 6 of the character (hex 40)  is         it is converted to upper case. Then bit 6 of the character (hex 40)  is
# Line 2485  BACKSLASH Line 2522  BACKSLASH
2522         becomes hex 7B.         becomes hex 7B.
2523    
2524         After \x, from zero to two hexadecimal digits are read (letters can  be         After \x, from zero to two hexadecimal digits are read (letters can  be
2525         in  upper or lower case). In UTF-8 mode, any number of hexadecimal dig-         in  upper  or  lower case). Any number of hexadecimal digits may appear
2526         its may appear between \x{ and }, but the value of the  character  code         between \x{ and }, but the value of the character  code  must  be  less
2527         must  be  less  than  2**31  (that is, the maximum hexadecimal value is         than 256 in non-UTF-8 mode, and less than 2**31 in UTF-8 mode (that is,
2528         7FFFFFFF). If characters other than hexadecimal digits  appear  between         the maximum hexadecimal value is 7FFFFFFF). If  characters  other  than
2529         \x{  and }, or if there is no terminating }, this form of escape is not         hexadecimal  digits  appear between \x{ and }, or if there is no termi-
2530         recognized. Instead, the initial \x will  be  interpreted  as  a  basic         nating }, this form of escape is not recognized.  Instead, the  initial
2531         hexadecimal  escape, with no following digits, giving a character whose         \x will be interpreted as a basic hexadecimal escape, with no following
2532         value is zero.         digits, giving a character whose value is zero.
2533    
2534         Characters whose value is less than 256 can be defined by either of the         Characters whose value is less than 256 can be defined by either of the
2535         two  syntaxes for \x when PCRE is in UTF-8 mode. There is no difference         two  syntaxes  for  \x. There is no difference in the way they are han-
2536         in the way they are handled. For example, \xdc is exactly the  same  as         dled. For example, \xdc is exactly the same as \x{dc}.
2537         \x{dc}.  
2538           After \0 up to two further octal digits are read.  In  both  cases,  if
2539         After  \0  up  to  two further octal digits are read. In both cases, if         there  are fewer than two digits, just those that are present are used.
2540         there are fewer than two digits, just those that are present are  used.         Thus the sequence \0\x\07 specifies two binary zeros followed by a  BEL
2541         Thus  the sequence \0\x\07 specifies two binary zeros followed by a BEL         character  (code  value  7).  Make sure you supply two digits after the
2542         character (code value 7). Make sure you supply  two  digits  after  the         initial zero if the pattern character that follows is itself  an  octal
        initial  zero  if the pattern character that follows is itself an octal  
2543         digit.         digit.
2544    
2545         The handling of a backslash followed by a digit other than 0 is compli-         The handling of a backslash followed by a digit other than 0 is compli-
2546         cated.  Outside a character class, PCRE reads it and any following dig-         cated.  Outside a character class, PCRE reads it and any following dig-
2547         its as a decimal number. If the number is less than  10,  or  if  there         its  as  a  decimal  number. If the number is less than 10, or if there
2548         have been at least that many previous capturing left parentheses in the         have been at least that many previous capturing left parentheses in the
2549         expression, the entire  sequence  is  taken  as  a  back  reference.  A         expression,  the  entire  sequence  is  taken  as  a  back reference. A
2550         description  of how this works is given later, following the discussion         description of how this works is given later, following the  discussion
2551         of parenthesized subpatterns.         of parenthesized subpatterns.
2552    
2553         Inside a character class, or if the decimal number is  greater  than  9         Inside  a  character  class, or if the decimal number is greater than 9
2554         and  there have not been that many capturing subpatterns, PCRE re-reads         and there have not been that many capturing subpatterns, PCRE  re-reads
2555         up to three octal digits following the backslash, and generates a  sin-         up  to three octal digits following the backslash, and generates a sin-
2556         gle byte from the least significant 8 bits of the value. Any subsequent         gle byte from the least significant 8 bits of the value. Any subsequent
2557         digits stand for themselves.  For example:         digits stand for themselves.  For example:
2558    
# Line 2535  BACKSLASH Line 2571  BACKSLASH
2571           \81    is either a back reference, or a binary zero           \81    is either a back reference, or a binary zero
2572                     followed by the two characters "8" and "1"                     followed by the two characters "8" and "1"
2573    
2574         Note that octal values of 100 or greater must not be  introduced  by  a         Note  that  octal  values of 100 or greater must not be introduced by a
2575         leading zero, because no more than three octal digits are ever read.         leading zero, because no more than three octal digits are ever read.
2576    
2577         All  the  sequences  that  define a single byte value or a single UTF-8         All the sequences that define a single byte value  or  a  single  UTF-8
2578         character (in UTF-8 mode) can be used both inside and outside character         character (in UTF-8 mode) can be used both inside and outside character
2579         classes.  In  addition,  inside  a  character class, the sequence \b is         classes. In addition, inside a character  class,  the  sequence  \b  is
2580         interpreted as the backspace character (hex 08), and the sequence \X is         interpreted as the backspace character (hex 08), and the sequence \X is
2581         interpreted  as  the  character  "X".  Outside a character class, these         interpreted as the character "X".  Outside  a  character  class,  these
2582         sequences have different meanings (see below).         sequences have different meanings (see below).
2583    
2584     Generic character types     Generic character types
2585    
2586         The third use of backslash is for specifying generic  character  types.         The  third  use of backslash is for specifying generic character types.
2587         The following are always recognized:         The following are always recognized:
2588    
2589           \d     any decimal digit           \d     any decimal digit
# Line 2558  BACKSLASH Line 2594  BACKSLASH
2594           \W     any "non-word" character           \W     any "non-word" character
2595    
2596         Each pair of escape sequences partitions the complete set of characters         Each pair of escape sequences partitions the complete set of characters
2597         into two disjoint sets. Any given character matches one, and only  one,         into  two disjoint sets. Any given character matches one, and only one,
2598         of each pair.         of each pair.
2599    
2600         These character type sequences can appear both inside and outside char-         These character type sequences can appear both inside and outside char-
2601         acter classes. They each match one character of the  appropriate  type.         acter  classes.  They each match one character of the appropriate type.
2602         If  the current matching point is at the end of the subject string, all         If the current matching point is at the end of the subject string,  all
2603         of them fail, since there is no character to match.         of them fail, since there is no character to match.
2604    
2605         For compatibility with Perl, \s does not match the VT  character  (code         For  compatibility  with Perl, \s does not match the VT character (code
2606         11).   This makes it different from the the POSIX "space" class. The \s         11).  This makes it different from the the POSIX "space" class. The  \s
2607         characters are HT (9), LF (10), FF (12), CR (13), and space (32).         characters are HT (9), LF (10), FF (12), CR (13), and space (32).
2608    
2609         A "word" character is an underscore or any character less than 256 that         A "word" character is an underscore or any character less than 256 that
2610         is  a  letter  or  digit.  The definition of letters and digits is con-         is a letter or digit. The definition of  letters  and  digits  is  con-
2611         trolled by PCRE's low-valued character tables, and may vary if  locale-         trolled  by PCRE's low-valued character tables, and may vary if locale-
2612         specific  matching is taking place (see "Locale support" in the pcreapi         specific matching is taking place (see "Locale support" in the  pcreapi
2613         page). For example, in the  "fr_FR"  (French)  locale,  some  character         page).  For  example,  in  the  "fr_FR" (French) locale, some character
2614         codes  greater  than  128  are used for accented letters, and these are         codes greater than 128 are used for accented  letters,  and  these  are
2615         matched by \w.         matched by \w.
2616    
2617         In UTF-8 mode, characters with values greater than 128 never match  \d,         In  UTF-8 mode, characters with values greater than 128 never match \d,
2618         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-
2619         code character property support is available.         code  character  property support is available. The use of locales with
2620           Unicode is discouraged.
2621    
2622     Unicode character properties     Unicode character properties
2623    
2624         When PCRE is built with Unicode character property support, three addi-         When PCRE is built with Unicode character property support, three addi-
2625         tional  escape sequences to match generic character types are available         tional  escape  sequences  to  match character properties are available
2626         when UTF-8 mode is selected. They are:         when UTF-8 mode is selected. They are:
2627    
2628          \p{xx}   a character with the xx property           \p{xx}   a character with the xx property
2629          \P{xx}   a character without the xx property           \P{xx}   a character without the xx property
2630          \X       an extended Unicode sequence           \X       an extended Unicode sequence
2631    
2632         The property names represented by xx above are limited to  the  Unicode         The property names represented by xx above are limited to  the  Unicode
2633         general  category properties. Each character has exactly one such prop-         script names, the general category properties, and "Any", which matches
2634         erty, specified by a two-letter abbreviation.  For  compatibility  with         any character (including newline). Other properties such as "InMusical-
2635         Perl,  negation  can be specified by including a circumflex between the         Symbols"  are  not  currently supported by PCRE. Note that \P{Any} does
2636         opening brace and the property name. For example, \p{^Lu} is  the  same         not match any characters, so always causes a match failure.
2637         as \P{Lu}.  
2638           Sets of Unicode characters are defined as belonging to certain scripts.
2639         If  only  one  letter  is  specified with \p or \P, it includes all the         A  character from one of these sets can be matched using a script name.
2640         properties that start with that letter. In this case, in the absence of         For example:
2641         negation, the curly brackets in the escape sequence are optional; these  
2642         two examples have the same effect:           \p{Greek}
2643             \P{Han}
2644    
2645           Those that are not part of an identified script are lumped together  as
2646           "Common". The current list of scripts is:
2647    
2648           Arabic,  Armenian,  Bengali,  Bopomofo, Braille, Buginese, Buhid, Cana-
2649           dian_Aboriginal, Cherokee, Common, Coptic, Cypriot, Cyrillic,  Deseret,
2650           Devanagari,  Ethiopic,  Georgian,  Glagolitic, Gothic, Greek, Gujarati,
2651           Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,  Inherited,  Kannada,
2652           Katakana,  Kharoshthi,  Khmer,  Lao, Latin, Limbu, Linear_B, Malayalam,
2653           Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian, Oriya,
2654           Osmanya,  Runic,  Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tag-
2655           banwa,  Tai_Le,  Tamil,  Telugu,  Thaana,  Thai,   Tibetan,   Tifinagh,
2656           Ugaritic, Yi.
2657    
2658           Each  character has exactly one general category property, specified by
2659           a two-letter abbreviation. For compatibility with Perl, negation can be
2660           specified  by  including a circumflex between the opening brace and the
2661           property name. For example, \p{^Lu} is the same as \P{Lu}.
2662    
2663           If only one letter is specified with \p or \P, it includes all the gen-
2664           eral  category properties that start with that letter. In this case, in
2665           the absence of negation, the curly brackets in the escape sequence  are
2666           optional; these two examples have the same effect:
2667    
2668           \p{L}           \p{L}
2669           \pL           \pL
2670    
2671         The following property codes are supported:         The following general category property codes are supported:
2672    
2673           C     Other           C     Other
2674           Cc    Control           Cc    Control
# Line 2653  BACKSLASH Line 2714  BACKSLASH
2714           Zp    Paragraph separator           Zp    Paragraph separator
2715           Zs    Space separator           Zs    Space separator
2716    
2717         Extended properties such as "Greek" or "InMusicalSymbols" are not  sup-         The  special property L& is also supported: it matches a character that
2718         ported by PCRE.         has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
2719           classified as a modifier or "other".
2720    
2721           The  long  synonyms  for  these  properties that Perl supports (such as
2722           \p{Letter}) are not supported by PCRE. Nor is is  permitted  to  prefix
2723           any of these properties with "Is".
2724    
2725           No character that is in the Unicode table has the Cn (unassigned) prop-
2726           erty.  Instead, this property is assumed for any code point that is not
2727           in the Unicode table.
2728    
2729         Specifying  caseless  matching  does not affect these escape sequences.         Specifying  caseless  matching  does not affect these escape sequences.
2730         For example, \p{Lu} always matches only upper case letters.         For example, \p{Lu} always matches only upper case letters.
# Line 3633  RECURSIVE PATTERNS Line 3703  RECURSIVE PATTERNS
3703         tion.)  The special item (?R) is a recursive call of the entire regular         tion.)  The special item (?R) is a recursive call of the entire regular
3704         expression.         expression.
3705    
3706         For example, this PCRE pattern solves the  nested  parentheses  problem         A recursive subpattern call is always treated as an atomic group.  That
3707         (assume  the  PCRE_EXTENDED  option  is  set  so  that  white  space is         is,  once  it  has  matched some of the subject string, it is never re-
3708         ignored):         entered, even if it contains untried alternatives and there is a subse-
3709           quent matching failure.
3710    
3711           This  PCRE  pattern  solves  the nested parentheses problem (assume the
3712           PCRE_EXTENDED option is set so that white space is ignored):
3713    
3714           \( ( (?>[^()]+) | (?R) )* \)           \( ( (?>[^()]+) | (?R) )* \)
3715    
3716         First it matches an opening parenthesis. Then it matches any number  of         First it matches an opening parenthesis. Then it matches any number  of
3717         substrings  which  can  either  be  a sequence of non-parentheses, or a         substrings  which  can  either  be  a sequence of non-parentheses, or a
3718         recursive match of the pattern itself (that is  a  correctly  parenthe-         recursive match of the pattern itself (that is, a  correctly  parenthe-
3719         sized substring).  Finally there is a closing parenthesis.         sized substring).  Finally there is a closing parenthesis.
3720    
3721         If  this  were  part of a larger pattern, you would not want to recurse         If  this  were  part of a larger pattern, you would not want to recurse
# Line 3725  SUBPATTERNS AS SUBROUTINES Line 3799  SUBPATTERNS AS SUBROUTINES
3799         two strings. Such references must, however, follow  the  subpattern  to         two strings. Such references must, however, follow  the  subpattern  to
3800         which they refer.         which they refer.
3801    
3802           Like recursive subpatterns, a "subroutine" call is always treated as an
3803           atomic group. That is, once it has matched some of the subject  string,
3804           it  is  never  re-entered, even if it contains untried alternatives and
3805           there is a subsequent matching failure.
3806    
3807    
3808  CALLOUTS  CALLOUTS
3809    
3810         Perl has a feature whereby using the sequence (?{...}) causes arbitrary         Perl has a feature whereby using the sequence (?{...}) causes arbitrary
3811         Perl code to be obeyed in the middle of matching a regular  expression.         Perl  code to be obeyed in the middle of matching a regular expression.
3812         This makes it possible, amongst other things, to extract different sub-         This makes it possible, amongst other things, to extract different sub-
3813         strings that match the same pair of parentheses when there is a repeti-         strings that match the same pair of parentheses when there is a repeti-
3814         tion.         tion.
3815    
3816         PCRE provides a similar feature, but of course it cannot obey arbitrary         PCRE provides a similar feature, but of course it cannot obey arbitrary
3817         Perl code. The feature is called "callout". The caller of PCRE provides         Perl code. The feature is called "callout". The caller of PCRE provides
3818         an  external function by putting its entry point in the global variable         an external function by putting its entry point in the global  variable
3819         pcre_callout.  By default, this variable contains NULL, which  disables         pcre_callout.   By default, this variable contains NULL, which disables
3820         all calling out.         all calling out.
3821    
3822         Within  a  regular  expression,  (?C) indicates the points at which the         Within a regular expression, (?C) indicates the  points  at  which  the
3823         external function is to be called. If you want  to  identify  different         external  function  is  to be called. If you want to identify different
3824         callout  points, you can put a number less than 256 after the letter C.         callout points, you can put a number less than 256 after the letter  C.
3825         The default value is zero.  For example, this pattern has  two  callout         The  default  value is zero.  For example, this pattern has two callout
3826         points:         points:
3827    
3828           (?C1)abc(?C2)def           (?C1)abc(?C2)def
3829    
3830         If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are         If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
3831         automatically installed before each item in the pattern. They  are  all         automatically  installed  before each item in the pattern. They are all
3832         numbered 255.         numbered 255.
3833    
3834         During matching, when PCRE reaches a callout point (and pcre_callout is         During matching, when PCRE reaches a callout point (and pcre_callout is
3835         set), the external function is called. It is provided with  the  number         set),  the  external function is called. It is provided with the number
3836         of  the callout, the position in the pattern, and, optionally, one item         of the callout, the position in the pattern, and, optionally, one  item
3837         of data originally supplied by the caller of pcre_exec().  The  callout         of  data  originally supplied by the caller of pcre_exec(). The callout
3838         function  may cause matching to proceed, to backtrack, or to fail alto-         function may cause matching to proceed, to backtrack, or to fail  alto-
3839         gether. A complete description of the interface to the callout function         gether. A complete description of the interface to the callout function
3840         is given in the pcrecallout documentation.         is given in the pcrecallout documentation.
3841    
3842  Last updated: 28 February 2005  Last updated: 24 January 2006
3843  Copyright (c) 1997-2005 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.
3844  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3845    
3846    
# Line 3851  EXAMPLE OF PARTIAL MATCHING USING PCRETE Line 3930  EXAMPLE OF PARTIAL MATCHING USING PCRETE
3930         uses the date example quoted above:         uses the date example quoted above:
3931    
3932             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
3933           data> 25jun04P           data> 25jun04\P
3934            0: 25jun04            0: 25jun04
3935            1: jun            1: jun
3936           data> 25dec3P           data> 25dec3\P
3937           Partial match           Partial match
3938           data> 3juP           data> 3ju\P
3939           Partial match           Partial match
3940           data> 3jujP           data> 3juj\P
3941           No match           No match
3942           data> jP           data> j\P
3943           No match           No match
3944    
3945         The first data string is matched  completely,  so  pcretest  shows  the         The first data string is matched  completely,  so  pcretest  shows  the
# Line 3950  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe Line 4029  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe
4029         Because of this phenomenon, it does not usually make  sense  to  end  a         Because of this phenomenon, it does not usually make  sense  to  end  a
4030         pattern that is going to be matched in this way with a variable repeat.         pattern that is going to be matched in this way with a variable repeat.
4031    
4032  Last updated: 28 February 2005         4. Patterns that contain alternatives at the top level which do not all
4033  Copyright (c) 1997-2005 University of Cambridge.         start with the same pattern item may not work as expected. For example,
4034           consider this pattern:
4035    
4036             1234|3789
4037    
4038           If the first part of the subject is "ABC123", a partial  match  of  the
4039           first  alternative  is found at offset 3. There is no partial match for
4040           the second alternative, because such a match does not start at the same
4041           point  in  the  subject  string. Attempting to continue with the string
4042           "789" does not yield a match because only those alternatives that match
4043           at  one point in the subject are remembered. The problem arises because
4044           the start of the second alternative matches within the  first  alterna-
4045           tive. There is no problem with anchored patterns or patterns such as:
4046    
4047             1234|ABCD
4048    
4049           where no string can be a partial match for both alternatives.
4050    
4051    Last updated: 16 January 2006
4052    Copyright (c) 1997-2006 University of Cambridge.
4053  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4054    
4055    
# Line 4065  COMPATIBILITY WITH DIFFERENT PCRE RELEAS Line 4163  COMPATIBILITY WITH DIFFERENT PCRE RELEAS
4163         them for release 5.0. However, from now on, it should  be  possible  to         them for release 5.0. However, from now on, it should  be  possible  to
4164         make changes in a compatible manner.         make changes in a compatible manner.
4165    
4166  Last updated: 28 February 2005         Notwithstanding the above, if you have any saved patterns in UTF-8 mode
4167  Copyright (c) 1997-2005 University of Cambridge.         that use \p or \P that were compiled with any release up to and includ-
4168           ing 6.4, you will have to recompile them for release 6.5 and above.
4169    
4170    Last updated: 01 February 2006
4171    Copyright (c) 1997-2006 University of Cambridge.
4172  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4173    
4174    
# Line 4191  DESCRIPTION Line 4293  DESCRIPTION
4293         functions call the native ones, it is also necessary to add -lpcre.         functions call the native ones, it is also necessary to add -lpcre.
4294    
4295         I have implemented only those option bits that can be reasonably mapped         I have implemented only those option bits that can be reasonably mapped
4296         to  PCRE  native  options.  In  addition,  the options REG_EXTENDED and         to PCRE native options. In addition, the option REG_EXTENDED is defined
4297         REG_NOSUB are defined with the value zero. They  have  no  effect,  but         with the value zero. This has no effect, but since  programs  that  are
4298         since  programs that are written to the POSIX interface often use them,         written  to  the  POSIX interface often use it, this makes it easier to
4299         this makes it easier to slot in PCRE as a  replacement  library.  Other         slot in PCRE as a replacement library. Other POSIX options are not even
4300         POSIX options are not even defined.         defined.
4301    
4302         When  PCRE  is  called  via these functions, it is only the API that is         When  PCRE  is  called  via these functions, it is only the API that is
4303         POSIX-like in style. The syntax and semantics of  the  regular  expres-         POSIX-like in style. The syntax and semantics of  the  regular  expres-
# Line 4220  COMPILING A PATTERN Line 4322  COMPILING A PATTERN
4322         form. The pattern is a C string terminated by a  binary  zero,  and  is         form. The pattern is a C string terminated by a  binary  zero,  and  is
4323         passed  in  the  argument  pattern. The preg argument is a pointer to a         passed  in  the  argument  pattern. The preg argument is a pointer to a
4324         regex_t structure that is used as a base for storing information  about         regex_t structure that is used as a base for storing information  about
4325         the compiled expression.         the compiled regular expression.
4326    
4327         The argument cflags is either zero, or contains one or more of the bits         The argument cflags is either zero, or contains one or more of the bits
4328         defined by the following macros:         defined by the following macros:
4329    
4330           REG_DOTALL           REG_DOTALL
4331    
4332         The PCRE_DOTALL option is set when the expression is passed for  compi-         The PCRE_DOTALL option is set when the regular expression is passed for
4333         lation  to the native function. Note that REG_DOTALL is not part of the         compilation to the native function. Note that REG_DOTALL is not part of
4334         POSIX standard.         the POSIX standard.
4335    
4336           REG_ICASE           REG_ICASE
4337    
4338         The PCRE_CASELESS option is set when the expression is passed for  com-         The PCRE_CASELESS option is set when the regular expression  is  passed
4339         pilation to the native function.         for compilation to the native function.
4340    
4341           REG_NEWLINE           REG_NEWLINE
4342    
4343         The PCRE_MULTILINE option is set when the expression is passed for com-         The  PCRE_MULTILINE option is set when the regular expression is passed
4344         pilation to the native function. Note that  this  does  not  mimic  the         for compilation to the native function. Note that this does  not  mimic
4345         defined POSIX behaviour for REG_NEWLINE (see the following section).         the  defined  POSIX  behaviour  for REG_NEWLINE (see the following sec-
4346           tion).
4347    
4348             REG_NOSUB
4349    
4350           The PCRE_NO_AUTO_CAPTURE option is set when the regular  expression  is
4351           passed for compilation to the native function. In addition, when a pat-
4352           tern that is compiled with this flag is passed to regexec() for  match-
4353           ing,  the  nmatch  and  pmatch  arguments  are ignored, and no captured
4354           strings are returned.
4355    
4356             REG_UTF8
4357    
4358           The PCRE_UTF8 option is set when the regular expression is  passed  for
4359           compilation  to the native function. This causes the pattern itself and
4360           all data strings used for matching it to be treated as  UTF-8  strings.
4361           Note that REG_UTF8 is not part of the POSIX standard.
4362    
4363         In  the  absence  of  these  flags, no options are passed to the native         In  the  absence  of  these  flags, no options are passed to the native
4364         function.  This means the the  regex  is  compiled  with  PCRE  default         function.  This means the the  regex  is  compiled  with  PCRE  default
# Line 4307  MATCHING A PATTERN Line 4425  MATCHING A PATTERN
4425         The PCRE_NOTEOL option is set when calling the underlying PCRE matching         The PCRE_NOTEOL option is set when calling the underlying PCRE matching
4426         function.         function.
4427    
4428         The  portion of the string that was matched, and also any captured sub-         If  the pattern was compiled with the REG_NOSUB flag, no data about any
4429         strings, are returned via the pmatch argument, which points to an array         matched strings  is  returned.  The  nmatch  and  pmatch  arguments  of
4430         of  nmatch  structures of type regmatch_t, containing the members rm_so         regexec() are ignored.
4431         and rm_eo. These contain the offset to the first character of each sub-  
4432         string and the offset to the first character after the end of each sub-         Otherwise,the portion of the string that was matched, and also any cap-
4433         string, respectively. The 0th element of  the  vector  relates  to  the         tured substrings, are returned via the pmatch argument, which points to
4434         entire  portion  of string that was matched; subsequent elements relate         an  array  of nmatch structures of type regmatch_t, containing the mem-
4435         to the capturing subpatterns of the regular expression. Unused  entries         bers rm_so and rm_eo. These contain the offset to the  first  character
4436         in the array have both structure members set to -1.         of  each  substring and the offset to the first character after the end
4437           of each substring, respectively. The 0th element of the vector  relates
4438           to  the  entire portion of string that was matched; subsequent elements
4439           relate to the capturing subpatterns of the regular  expression.  Unused
4440           entries in the array have both structure members set to -1.
4441    
4442         A  successful  match  yields  a  zero  return;  various error codes are         A  successful  match  yields  a  zero  return;  various error codes are
4443         defined in the header file, of  which  REG_NOMATCH  is  the  "expected"         defined in the header file, of  which  REG_NOMATCH  is  the  "expected"
# Line 4346  AUTHOR Line 4468  AUTHOR
4468         University Computing Service,         University Computing Service,
4469         Cambridge CB2 3QG, England.         Cambridge CB2 3QG, England.
4470    
4471  Last updated: 28 February 2005  Last updated: 16 January 2006
4472  Copyright (c) 1997-2005 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.
4473  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4474    
4475    
# Line 4520  PASSING MODIFIERS TO THE REGULAR EXPRESS Line 4642  PASSING MODIFIERS TO THE REGULAR EXPRESS
4642    
4643           RE_Options & set_caseless(bool)           RE_Options & set_caseless(bool)
4644    
4645         which sets or unsets the  modifier.  Moreover,  PCRE_CONFIG_MATCH_LIMIT         which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
4646         can  be accessed through the set_match_limit() and match_limit() member         be  accessed  through  the  set_match_limit()  and match_limit() member
4647         functions. Setting match_limit to a non-zero value will limit the  exe-         functions. Setting match_limit to a non-zero value will limit the  exe-
4648         cution  of pcre to keep it from doing bad things like blowing the stack         cution  of pcre to keep it from doing bad things like blowing the stack
4649         or taking an eternity to return a result.  A  value  of  5000  is  good         or taking an eternity to return a result.  A  value  of  5000  is  good
4650         enough  to stop stack blowup in a 2MB thread stack. Setting match_limit         enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
4651         to zero disables match limiting.         to  zero  disables  match  limiting.  Alternatively,   you   can   call
4652           match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
4653           limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
4654           matches PCRE does; match_limit_recursion() limits the depth of internal
4655           recursion, and therefore the amount of stack that is used.
4656    
4657         Normally, to pass one or more modifiers to a RE class,  you  declare  a         Normally, to pass one or more modifiers to a RE class,  you  declare  a
4658         RE_Options object, set the appropriate options, and pass this object to         RE_Options object, set the appropriate options, and pass this object to

Legend:
Removed from v.86  
changed lines
  Added in v.87

  ViewVC Help
Powered by ViewVC 1.1.5