/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 406 by ph10, Mon Mar 23 12:05:43 2009 UTC revision 416 by ph10, Sat Apr 11 14:34:02 2009 UTC
# Line 28  INTRODUCTION Line 28  INTRODUCTION
28         mately with Perl 5.10, including support for UTF-8 encoded strings  and         mately with Perl 5.10, including support for UTF-8 encoded strings  and
29         Unicode general category properties. However, UTF-8 and Unicode support         Unicode general category properties. However, UTF-8 and Unicode support
30         has to be explicitly enabled; it is not the default. The Unicode tables         has to be explicitly enabled; it is not the default. The Unicode tables
31         correspond to Unicode release 5.0.0.         correspond to Unicode release 5.1.
32    
33         In  addition to the Perl-compatible matching function, PCRE contains an         In  addition to the Perl-compatible matching function, PCRE contains an
34         alternative matching function that matches the same  compiled  patterns         alternative matching function that matches the same  compiled  patterns
# Line 136  UTF-8 AND UNICODE PROPERTY SUPPORT Line 136  UTF-8 AND UNICODE PROPERTY SUPPORT
136    
137         In order process UTF-8 strings, you must build PCRE  to  include  UTF-8         In order process UTF-8 strings, you must build PCRE  to  include  UTF-8
138         support  in  the  code,  and, in addition, you must call pcre_compile()         support  in  the  code,  and, in addition, you must call pcre_compile()
139         with the PCRE_UTF8 option flag. When you do this, both the pattern  and         with the PCRE_UTF8 option flag, or the  pattern  must  start  with  the
140         any  subject  strings  that are matched against it are treated as UTF-8         sequence  (*UTF8).  When  either of these is the case, both the pattern
141         strings instead of just strings of bytes.         and any subject strings that are matched  against  it  are  treated  as
142           UTF-8 strings instead of just strings of bytes.
143    
144         If you compile PCRE with UTF-8 support, but do not use it at run  time,         If  you compile PCRE with UTF-8 support, but do not use it at run time,
145         the  library will be a bit bigger, but the additional run time overhead         the library will be a bit bigger, but the additional run time  overhead
146         is limited to testing the PCRE_UTF8 flag occasionally, so should not be         is limited to testing the PCRE_UTF8 flag occasionally, so should not be
147         very big.         very big.
148    
149         If PCRE is built with Unicode character property support (which implies         If PCRE is built with Unicode character property support (which implies
150         UTF-8 support), the escape sequences \p{..}, \P{..}, and  \X  are  sup-         UTF-8  support),  the  escape sequences \p{..}, \P{..}, and \X are sup-
151         ported.  The available properties that can be tested are limited to the         ported.  The available properties that can be tested are limited to the
152         general category properties such as Lu for an upper case letter  or  Nd         general  category  properties such as Lu for an upper case letter or Nd
153         for  a  decimal number, the Unicode script names such as Arabic or Han,         for a decimal number, the Unicode script names such as Arabic  or  Han,
154         and the derived properties Any and L&. A full  list  is  given  in  the         and  the  derived  properties  Any  and L&. A full list is given in the
155         pcrepattern documentation. Only the short names for properties are sup-         pcrepattern documentation. Only the short names for properties are sup-
156         ported. For example, \p{L} matches a letter. Its Perl synonym,  \p{Let-         ported.  For example, \p{L} matches a letter. Its Perl synonym, \p{Let-
157         ter},  is  not  supported.   Furthermore,  in Perl, many properties may         ter}, is not supported.  Furthermore,  in  Perl,  many  properties  may
158         optionally be prefixed by "Is", for compatibility with Perl  5.6.  PCRE         optionally  be  prefixed by "Is", for compatibility with Perl 5.6. PCRE
159         does not support this.         does not support this.
160    
161     Validity of UTF-8 strings     Validity of UTF-8 strings
162    
163         When  you  set  the  PCRE_UTF8 flag, the strings passed as patterns and         When you set the PCRE_UTF8 flag, the strings  passed  as  patterns  and
164         subjects are (by default) checked for validity on entry to the relevant         subjects are (by default) checked for validity on entry to the relevant
165         functions.  From  release 7.3 of PCRE, the check is according the rules         functions. From release 7.3 of PCRE, the check is according  the  rules
166         of RFC 3629, which are themselves derived from the  Unicode  specifica-         of  RFC  3629, which are themselves derived from the Unicode specifica-
167         tion.  Earlier  releases  of PCRE followed the rules of RFC 2279, which         tion. Earlier releases of PCRE followed the rules of  RFC  2279,  which
168         allows the full range of 31-bit values (0 to 0x7FFFFFFF).  The  current         allows  the  full range of 31-bit values (0 to 0x7FFFFFFF). The current
169         check allows only values in the range U+0 to U+10FFFF, excluding U+D800         check allows only values in the range U+0 to U+10FFFF, excluding U+D800
170         to U+DFFF.         to U+DFFF.
171    
172         The excluded code points are the "Low Surrogate Area"  of  Unicode,  of         The  excluded  code  points are the "Low Surrogate Area" of Unicode, of
173         which  the Unicode Standard says this: "The Low Surrogate Area does not         which the Unicode Standard says this: "The Low Surrogate Area does  not
174         contain any  character  assignments,  consequently  no  character  code         contain  any  character  assignments,  consequently  no  character code
175         charts or namelists are provided for this area. Surrogates are reserved         charts or namelists are provided for this area. Surrogates are reserved
176         for use with UTF-16 and then must be used in pairs."  The  code  points         for  use  with  UTF-16 and then must be used in pairs." The code points
177         that  are  encoded  by  UTF-16  pairs are available as independent code         that are encoded by UTF-16 pairs  are  available  as  independent  code
178         points in the UTF-8 encoding. (In  other  words,  the  whole  surrogate         points  in  the  UTF-8  encoding.  (In other words, the whole surrogate
179         thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)         thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
180    
181         If  an  invalid  UTF-8  string  is  passed  to  PCRE,  an  error return         If an  invalid  UTF-8  string  is  passed  to  PCRE,  an  error  return
182         (PCRE_ERROR_BADUTF8) is given. In some situations, you may already know         (PCRE_ERROR_BADUTF8) is given. In some situations, you may already know
183         that your strings are valid, and therefore want to skip these checks in         that your strings are valid, and therefore want to skip these checks in
184         order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag at         order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag at
185         compile  time  or at run time, PCRE assumes that the pattern or subject         compile time or at run time, PCRE assumes that the pattern  or  subject
186         it is given (respectively) contains only valid  UTF-8  codes.  In  this         it  is  given  (respectively)  contains only valid UTF-8 codes. In this
187         case, it does not diagnose an invalid UTF-8 string.         case, it does not diagnose an invalid UTF-8 string.
188    
189         If  you  pass  an  invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set,         If you pass an invalid UTF-8 string  when  PCRE_NO_UTF8_CHECK  is  set,
190         what happens depends on why the string is invalid. If the  string  con-         what  happens  depends on why the string is invalid. If the string con-
191         forms to the "old" definition of UTF-8 (RFC 2279), it is processed as a         forms to the "old" definition of UTF-8 (RFC 2279), it is processed as a
192         string of characters in the range 0  to  0x7FFFFFFF.  In  other  words,         string  of  characters  in  the  range 0 to 0x7FFFFFFF. In other words,
193         apart from the initial validity test, PCRE (when in UTF-8 mode) handles         apart from the initial validity test, PCRE (when in UTF-8 mode) handles
194         strings according to the more liberal rules of RFC  2279.  However,  if         strings  according  to  the more liberal rules of RFC 2279. However, if
195         the  string does not even conform to RFC 2279, the result is undefined.         the string does not even conform to RFC 2279, the result is  undefined.
196         Your program may crash.         Your program may crash.
197    
198         If you want to process strings  of  values  in  the  full  range  0  to         If  you  want  to  process  strings  of  values  in the full range 0 to
199         0x7FFFFFFF,  encoded in a UTF-8-like manner as per the old RFC, you can         0x7FFFFFFF, encoded in a UTF-8-like manner as per the old RFC, you  can
200         set PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in         set PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in
201         this situation, you will have to apply your own validity check.         this situation, you will have to apply your own validity check.
202    
203     General comments about UTF-8 mode     General comments about UTF-8 mode
204    
205         1.  An  unbraced  hexadecimal  escape sequence (such as \xb3) matches a         1. An unbraced hexadecimal escape sequence (such  as  \xb3)  matches  a
206         two-byte UTF-8 character if the value is greater than 127.         two-byte UTF-8 character if the value is greater than 127.
207    
208         2. Octal numbers up to \777 are recognized, and  match  two-byte  UTF-8         2.  Octal  numbers  up to \777 are recognized, and match two-byte UTF-8
209         characters for values greater than \177.         characters for values greater than \177.
210    
211         3.  Repeat quantifiers apply to complete UTF-8 characters, not to indi-         3. Repeat quantifiers apply to complete UTF-8 characters, not to  indi-
212         vidual bytes, for example: \x{100}{3}.         vidual bytes, for example: \x{100}{3}.
213    
214         4. The dot metacharacter matches one UTF-8 character instead of a  sin-         4.  The dot metacharacter matches one UTF-8 character instead of a sin-
215         gle byte.         gle byte.
216    
217         5.  The  escape sequence \C can be used to match a single byte in UTF-8         5. The escape sequence \C can be used to match a single byte  in  UTF-8
218         mode, but its use can lead to some strange effects.  This  facility  is         mode,  but  its  use can lead to some strange effects. This facility is
219         not available in the alternative matching function, pcre_dfa_exec().         not available in the alternative matching function, pcre_dfa_exec().
220    
221         6.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly         6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
222         test characters of any code value, but the characters that PCRE  recog-         test  characters of any code value, but the characters that PCRE recog-
223         nizes  as  digits,  spaces,  or  word characters remain the same set as         nizes as digits, spaces, or word characters  remain  the  same  set  as
224         before, all with values less than 256. This remains true even when PCRE         before, all with values less than 256. This remains true even when PCRE
225         includes  Unicode  property support, because to do otherwise would slow         includes Unicode property support, because to do otherwise  would  slow
226         down PCRE in many common cases. If you really want to test for a  wider         down  PCRE in many common cases. If you really want to test for a wider
227         sense  of,  say,  "digit",  you must use Unicode property tests such as         sense of, say, "digit", you must use Unicode  property  tests  such  as
228         \p{Nd}. Note that this also applies to \b, because  it  is  defined  in         \p{Nd}.  Note  that  this  also applies to \b, because it is defined in
229         terms of \w and \W.         terms of \w and \W.
230    
231         7.  Similarly,  characters that match the POSIX named character classes         7. Similarly, characters that match the POSIX named  character  classes
232         are all low-valued characters.         are all low-valued characters.
233    
234         8. However, the Perl 5.10 horizontal and vertical  whitespace  matching         8.  However,  the Perl 5.10 horizontal and vertical whitespace matching
235         escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-         escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-
236         acters.         acters.
237    
238         9. Case-insensitive matching applies only to  characters  whose  values         9.  Case-insensitive  matching  applies only to characters whose values
239         are  less than 128, unless PCRE is built with Unicode property support.         are less than 128, unless PCRE is built with Unicode property  support.
240         Even when Unicode property support is available, PCRE  still  uses  its         Even  when  Unicode  property support is available, PCRE still uses its
241         own  character  tables when checking the case of low-valued characters,         own character tables when checking the case of  low-valued  characters,
242         so as not to degrade performance.  The Unicode property information  is         so  as not to degrade performance.  The Unicode property information is
243         used only for characters with higher values. Even when Unicode property         used only for characters with higher values. Even when Unicode property
244         support is available, PCRE supports case-insensitive matching only when         support is available, PCRE supports case-insensitive matching only when
245         there  is  a  one-to-one  mapping between a letter's cases. There are a         there is a one-to-one mapping between a letter's  cases.  There  are  a
246         small number of many-to-one mappings in Unicode;  these  are  not  sup-         small  number  of  many-to-one  mappings in Unicode; these are not sup-
247         ported by PCRE.         ported by PCRE.
248    
249    
# Line 252  AUTHOR Line 253  AUTHOR
253         University Computing Service         University Computing Service
254         Cambridge CB2 3QH, England.         Cambridge CB2 3QH, England.
255    
256         Putting  an actual email address here seems to have been a spam magnet,         Putting an actual email address here seems to have been a spam  magnet,
257         so I've taken it away. If you want to email me, use  my  two  initials,         so  I've  taken  it away. If you want to email me, use my two initials,
258         followed by the two digits 10, at the domain cam.ac.uk.         followed by the two digits 10, at the domain cam.ac.uk.
259    
260    
261  REVISION  REVISION
262    
263         Last updated: 18 March 2009         Last updated: 11 April 2009
264         Copyright (c) 1997-2009 University of Cambridge.         Copyright (c) 1997-2009 University of Cambridge.
265  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
266    
# Line 1133  COMPILING A PATTERN Line 1134  COMPILING A PATTERN
1134    
1135         The options argument contains various bit settings that affect the com-         The options argument contains various bit settings that affect the com-
1136         pilation. It should be zero if no options are required.  The  available         pilation. It should be zero if no options are required.  The  available
1137         options  are  described  below. Some of them, in particular, those that         options  are  described  below. Some of them (in particular, those that
1138         are compatible with Perl, can also be set and  unset  from  within  the         are compatible with Perl, but also some others) can  also  be  set  and
1139         pattern  (see  the  detailed  description in the pcrepattern documenta-         unset  from  within  the  pattern  (see the detailed description in the
1140         tion). For these options, the contents of the options  argument  speci-         pcrepattern documentation). For those options that can be different  in
1141         fies  their initial settings at the start of compilation and execution.         different  parts  of  the pattern, the contents of the options argument
1142         The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at  the  time         specifies their initial settings at the start of compilation and execu-
1143         of matching as well as at compile time.         tion.  The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the
1144           time of matching as well as at compile time.
1145    
1146         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,         If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
1147         if compilation of a pattern fails,  pcre_compile()  returns  NULL,  and         if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
1148         sets the variable pointed to by errptr to point to a textual error mes-         sets the variable pointed to by errptr to point to a textual error mes-
1149         sage. This is a static string that is part of the library. You must not         sage. This is a static string that is part of the library. You must not
1150         try to free it. The offset from the start of the pattern to the charac-         try to free it. The offset from the start of the pattern to the charac-
1151         ter where the error was discovered is placed in the variable pointed to         ter where the error was discovered is placed in the variable pointed to
1152         by  erroffset,  which must not be NULL. If it is, an immediate error is         by erroffset, which must not be NULL. If it is, an immediate  error  is
1153         given.         given.
1154    
1155         If pcre_compile2() is used instead of pcre_compile(),  and  the  error-         If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
1156         codeptr  argument is not NULL, a non-zero error code number is returned         codeptr argument is not NULL, a non-zero error code number is  returned
1157         via this argument in the event of an error. This is in addition to  the         via  this argument in the event of an error. This is in addition to the
1158         textual error message. Error codes and messages are listed below.         textual error message. Error codes and messages are listed below.
1159    
1160         If  the  final  argument, tableptr, is NULL, PCRE uses a default set of         If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
1161         character tables that are  built  when  PCRE  is  compiled,  using  the         character  tables  that  are  built  when  PCRE  is compiled, using the
1162         default  C  locale.  Otherwise, tableptr must be an address that is the         default C locale. Otherwise, tableptr must be an address  that  is  the
1163         result of a call to pcre_maketables(). This value is  stored  with  the         result  of  a  call to pcre_maketables(). This value is stored with the
1164         compiled  pattern,  and used again by pcre_exec(), unless another table         compiled pattern, and used again by pcre_exec(), unless  another  table
1165         pointer is passed to it. For more discussion, see the section on locale         pointer is passed to it. For more discussion, see the section on locale
1166         support below.         support below.
1167    
1168         This  code  fragment  shows a typical straightforward call to pcre_com-         This code fragment shows a typical straightforward  call  to  pcre_com-
1169         pile():         pile():
1170    
1171           pcre *re;           pcre *re;
# Line 1176  COMPILING A PATTERN Line 1178  COMPILING A PATTERN
1178             &erroffset,       /* for error offset */             &erroffset,       /* for error offset */
1179             NULL);            /* use default character tables */             NULL);            /* use default character tables */
1180    
1181         The following names for option bits are defined in  the  pcre.h  header         The  following  names  for option bits are defined in the pcre.h header
1182         file:         file:
1183    
1184           PCRE_ANCHORED           PCRE_ANCHORED
1185    
1186         If this bit is set, the pattern is forced to be "anchored", that is, it         If this bit is set, the pattern is forced to be "anchored", that is, it
1187         is constrained to match only at the first matching point in the  string         is  constrained to match only at the first matching point in the string
1188         that  is being searched (the "subject string"). This effect can also be         that is being searched (the "subject string"). This effect can also  be
1189         achieved by appropriate constructs in the pattern itself, which is  the         achieved  by appropriate constructs in the pattern itself, which is the
1190         only way to do it in Perl.         only way to do it in Perl.
1191    
1192           PCRE_AUTO_CALLOUT           PCRE_AUTO_CALLOUT
1193    
1194         If this bit is set, pcre_compile() automatically inserts callout items,         If this bit is set, pcre_compile() automatically inserts callout items,
1195         all with number 255, before each pattern item. For  discussion  of  the         all  with  number  255, before each pattern item. For discussion of the
1196         callout facility, see the pcrecallout documentation.         callout facility, see the pcrecallout documentation.
1197    
1198           PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
1199           PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
1200    
1201         These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
1202         sequence matches. The choice is either to match only CR, LF,  or  CRLF,         sequence  matches.  The choice is either to match only CR, LF, or CRLF,
1203         or to match any Unicode newline sequence. The default is specified when         or to match any Unicode newline sequence. The default is specified when
1204         PCRE is built. It can be overridden from within the pattern, or by set-         PCRE is built. It can be overridden from within the pattern, or by set-
1205         ting an option when a compiled pattern is matched.         ting an option when a compiled pattern is matched.
1206    
1207           PCRE_CASELESS           PCRE_CASELESS
1208    
1209         If  this  bit is set, letters in the pattern match both upper and lower         If this bit is set, letters in the pattern match both upper  and  lower
1210         case letters. It is equivalent to Perl's  /i  option,  and  it  can  be         case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
1211         changed  within a pattern by a (?i) option setting. In UTF-8 mode, PCRE         changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
1212         always understands the concept of case for characters whose values  are         always  understands the concept of case for characters whose values are
1213         less  than 128, so caseless matching is always possible. For characters         less than 128, so caseless matching is always possible. For  characters
1214         with higher values, the concept of case is supported if  PCRE  is  com-         with  higher  values,  the concept of case is supported if PCRE is com-
1215         piled  with Unicode property support, but not otherwise. If you want to         piled with Unicode property support, but not otherwise. If you want  to
1216         use caseless matching for characters 128 and  above,  you  must  ensure         use  caseless  matching  for  characters 128 and above, you must ensure
1217         that  PCRE  is  compiled  with Unicode property support as well as with         that PCRE is compiled with Unicode property support  as  well  as  with
1218         UTF-8 support.         UTF-8 support.
1219    
1220           PCRE_DOLLAR_ENDONLY           PCRE_DOLLAR_ENDONLY
1221    
1222         If this bit is set, a dollar metacharacter in the pattern matches  only         If  this bit is set, a dollar metacharacter in the pattern matches only
1223         at  the  end  of the subject string. Without this option, a dollar also         at the end of the subject string. Without this option,  a  dollar  also
1224         matches immediately before a newline at the end of the string (but  not         matches  immediately before a newline at the end of the string (but not
1225         before  any  other newlines). The PCRE_DOLLAR_ENDONLY option is ignored         before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
1226         if PCRE_MULTILINE is set.  There is no equivalent  to  this  option  in         if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
1227         Perl, and no way to set it within a pattern.         Perl, and no way to set it within a pattern.
1228    
1229           PCRE_DOTALL           PCRE_DOTALL
1230    
1231         If this bit is set, a dot metacharater in the pattern matches all char-         If this bit is set, a dot metacharater in the pattern matches all char-
1232         acters, including those that indicate newline. Without it, a  dot  does         acters,  including  those that indicate newline. Without it, a dot does
1233         not  match  when  the  current position is at a newline. This option is         not match when the current position is at a  newline.  This  option  is
1234         equivalent to Perl's /s option, and it can be changed within a  pattern         equivalent  to Perl's /s option, and it can be changed within a pattern
1235         by  a (?s) option setting. A negative class such as [^a] always matches         by a (?s) option setting. A negative class such as [^a] always  matches
1236         newline characters, independent of the setting of this option.         newline characters, independent of the setting of this option.
1237    
1238           PCRE_DUPNAMES           PCRE_DUPNAMES
1239    
1240         If this bit is set, names used to identify capturing  subpatterns  need         If  this  bit is set, names used to identify capturing subpatterns need
1241         not be unique. This can be helpful for certain types of pattern when it         not be unique. This can be helpful for certain types of pattern when it
1242         is known that only one instance of the named  subpattern  can  ever  be         is  known  that  only  one instance of the named subpattern can ever be
1243         matched.  There  are  more details of named subpatterns below; see also         matched. There are more details of named subpatterns  below;  see  also
1244         the pcrepattern documentation.         the pcrepattern documentation.
1245    
1246           PCRE_EXTENDED           PCRE_EXTENDED
1247    
1248         If this bit is set, whitespace  data  characters  in  the  pattern  are         If  this  bit  is  set,  whitespace  data characters in the pattern are
1249         totally ignored except when escaped or inside a character class. White-         totally ignored except when escaped or inside a character class. White-
1250         space does not include the VT character (code 11). In addition, charac-         space does not include the VT character (code 11). In addition, charac-
1251         ters between an unescaped # outside a character class and the next new-         ters between an unescaped # outside a character class and the next new-
1252         line, inclusive, are also ignored. This  is  equivalent  to  Perl's  /x         line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
1253         option,  and  it  can be changed within a pattern by a (?x) option set-         option, and it can be changed within a pattern by a  (?x)  option  set-
1254         ting.         ting.
1255    
1256         This option makes it possible to include  comments  inside  complicated         This  option  makes  it possible to include comments inside complicated
1257         patterns.   Note,  however,  that this applies only to data characters.         patterns.  Note, however, that this applies only  to  data  characters.
1258         Whitespace  characters  may  never  appear  within  special   character         Whitespace   characters  may  never  appear  within  special  character
1259         sequences  in  a  pattern,  for  example  within the sequence (?( which         sequences in a pattern, for  example  within  the  sequence  (?(  which
1260         introduces a conditional subpattern.         introduces a conditional subpattern.
1261    
1262           PCRE_EXTRA           PCRE_EXTRA
1263    
1264         This option was invented in order to turn on  additional  functionality         This  option  was invented in order to turn on additional functionality
1265         of  PCRE  that  is  incompatible with Perl, but it is currently of very         of PCRE that is incompatible with Perl, but it  is  currently  of  very
1266         little use. When set, any backslash in a pattern that is followed by  a         little  use. When set, any backslash in a pattern that is followed by a
1267         letter  that  has  no  special  meaning causes an error, thus reserving         letter that has no special meaning  causes  an  error,  thus  reserving
1268         these combinations for future expansion. By  default,  as  in  Perl,  a         these  combinations  for  future  expansion.  By default, as in Perl, a
1269         backslash  followed by a letter with no special meaning is treated as a         backslash followed by a letter with no special meaning is treated as  a
1270         literal. (Perl can, however, be persuaded to give a warning for  this.)         literal.  (Perl can, however, be persuaded to give a warning for this.)
1271         There  are  at  present no other features controlled by this option. It         There are at present no other features controlled by  this  option.  It
1272         can also be set by a (?X) option setting within a pattern.         can also be set by a (?X) option setting within a pattern.
1273    
1274           PCRE_FIRSTLINE           PCRE_FIRSTLINE
1275    
1276         If this option is set, an  unanchored  pattern  is  required  to  match         If  this  option  is  set,  an  unanchored pattern is required to match
1277         before  or  at  the  first  newline  in  the subject string, though the         before or at the first  newline  in  the  subject  string,  though  the
1278         matched text may continue over the newline.         matched text may continue over the newline.
1279    
1280           PCRE_JAVASCRIPT_COMPAT           PCRE_JAVASCRIPT_COMPAT
1281    
1282         If this option is set, PCRE's behaviour is changed in some ways so that         If this option is set, PCRE's behaviour is changed in some ways so that
1283         it  is  compatible with JavaScript rather than Perl. The changes are as         it is compatible with JavaScript rather than Perl. The changes  are  as
1284         follows:         follows:
1285    
1286         (1) A lone closing square bracket in a pattern  causes  a  compile-time         (1)  A  lone  closing square bracket in a pattern causes a compile-time
1287         error,  because this is illegal in JavaScript (by default it is treated         error, because this is illegal in JavaScript (by default it is  treated
1288         as a data character). Thus, the pattern AB]CD becomes illegal when this         as a data character). Thus, the pattern AB]CD becomes illegal when this
1289         option is set.         option is set.
1290    
1291         (2)  At run time, a back reference to an unset subpattern group matches         (2) At run time, a back reference to an unset subpattern group  matches
1292         an empty string (by default this causes the current  matching  alterna-         an  empty  string (by default this causes the current matching alterna-
1293         tive  to  fail). A pattern such as (\1)(a) succeeds when this option is         tive to fail). A pattern such as (\1)(a) succeeds when this  option  is
1294         set (assuming it can find an "a" in the subject), whereas it  fails  by         set  (assuming  it can find an "a" in the subject), whereas it fails by
1295         default, for Perl compatibility.         default, for Perl compatibility.
1296    
1297           PCRE_MULTILINE           PCRE_MULTILINE
1298    
1299         By  default,  PCRE  treats the subject string as consisting of a single         By default, PCRE treats the subject string as consisting  of  a  single
1300         line of characters (even if it actually contains newlines). The  "start         line  of characters (even if it actually contains newlines). The "start
1301         of  line"  metacharacter  (^)  matches only at the start of the string,         of line" metacharacter (^) matches only at the  start  of  the  string,
1302         while the "end of line" metacharacter ($) matches only at  the  end  of         while  the  "end  of line" metacharacter ($) matches only at the end of
1303         the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY         the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
1304         is set). This is the same as Perl.         is set). This is the same as Perl.
1305    
1306         When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"         When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
1307         constructs  match  immediately following or immediately before internal         constructs match immediately following or immediately  before  internal
1308         newlines in the subject string, respectively, as well as  at  the  very         newlines  in  the  subject string, respectively, as well as at the very
1309         start  and  end.  This is equivalent to Perl's /m option, and it can be         start and end. This is equivalent to Perl's /m option, and  it  can  be
1310         changed within a pattern by a (?m) option setting. If there are no new-         changed within a pattern by a (?m) option setting. If there are no new-
1311         lines  in  a  subject string, or no occurrences of ^ or $ in a pattern,         lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
1312         setting PCRE_MULTILINE has no effect.         setting PCRE_MULTILINE has no effect.
1313    
1314           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
# Line 1315  COMPILING A PATTERN Line 1317  COMPILING A PATTERN
1317           PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
1318           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1319    
1320         These options override the default newline definition that  was  chosen         These  options  override the default newline definition that was chosen
1321         when  PCRE  was built. Setting the first or the second specifies that a         when PCRE was built. Setting the first or the second specifies  that  a
1322         newline is indicated by a single character (CR  or  LF,  respectively).         newline  is  indicated  by a single character (CR or LF, respectively).
1323         Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the         Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
1324         two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies         two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
1325         that any of the three preceding sequences should be recognized. Setting         that any of the three preceding sequences should be recognized. Setting
1326         PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be         PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
1327         recognized. The Unicode newline sequences are the three just mentioned,         recognized. The Unicode newline sequences are the three just mentioned,
1328         plus the single characters VT (vertical  tab,  U+000B),  FF  (formfeed,         plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
1329         U+000C),  NEL  (next line, U+0085), LS (line separator, U+2028), and PS         U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
1330         (paragraph separator, U+2029). The last  two  are  recognized  only  in         (paragraph  separator,  U+2029).  The  last  two are recognized only in
1331         UTF-8 mode.         UTF-8 mode.
1332    
1333         The  newline  setting  in  the  options  word  uses three bits that are         The newline setting in the  options  word  uses  three  bits  that  are
1334         treated as a number, giving eight possibilities. Currently only six are         treated as a number, giving eight possibilities. Currently only six are
1335         used  (default  plus the five values above). This means that if you set         used (default plus the five values above). This means that if  you  set
1336         more than one newline option, the combination may or may not be  sensi-         more  than one newline option, the combination may or may not be sensi-
1337         ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to         ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
1338         PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers  and         PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
1339         cause an error.         cause an error.
1340    
1341         The  only time that a line break is specially recognized when compiling         The only time that a line break is specially recognized when  compiling
1342         a pattern is if PCRE_EXTENDED is set, and  an  unescaped  #  outside  a         a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
1343         character  class  is  encountered.  This indicates a comment that lasts         character class is encountered. This indicates  a  comment  that  lasts
1344         until after the next line break sequence. In other circumstances,  line         until  after the next line break sequence. In other circumstances, line
1345         break   sequences   are   treated  as  literal  data,  except  that  in         break  sequences  are  treated  as  literal  data,   except   that   in
1346         PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters         PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
1347         and are therefore ignored.         and are therefore ignored.
1348    
# Line 1350  COMPILING A PATTERN Line 1352  COMPILING A PATTERN
1352           PCRE_NO_AUTO_CAPTURE           PCRE_NO_AUTO_CAPTURE
1353    
1354         If this option is set, it disables the use of numbered capturing paren-         If this option is set, it disables the use of numbered capturing paren-
1355         theses  in the pattern. Any opening parenthesis that is not followed by         theses in the pattern. Any opening parenthesis that is not followed  by
1356         ? behaves as if it were followed by ?: but named parentheses can  still         ?  behaves as if it were followed by ?: but named parentheses can still
1357         be  used  for  capturing  (and  they acquire numbers in the usual way).         be used for capturing (and they acquire  numbers  in  the  usual  way).
1358         There is no equivalent of this option in Perl.         There is no equivalent of this option in Perl.
1359    
1360           PCRE_UNGREEDY           PCRE_UNGREEDY
1361    
1362         This option inverts the "greediness" of the quantifiers  so  that  they         This  option  inverts  the "greediness" of the quantifiers so that they
1363         are  not greedy by default, but become greedy if followed by "?". It is         are not greedy by default, but become greedy if followed by "?". It  is
1364         not compatible with Perl. It can also be set by a (?U)  option  setting         not  compatible  with Perl. It can also be set by a (?U) option setting
1365         within the pattern.         within the pattern.
1366    
1367           PCRE_UTF8           PCRE_UTF8
1368    
1369         This  option  causes PCRE to regard both the pattern and the subject as         This option causes PCRE to regard both the pattern and the  subject  as
1370         strings of UTF-8 characters instead of single-byte  character  strings.         strings  of  UTF-8 characters instead of single-byte character strings.
1371         However,  it is available only when PCRE is built to include UTF-8 sup-         However, it is available only when PCRE is built to include UTF-8  sup-
1372         port. If not, the use of this option provokes an error. Details of  how         port.  If not, the use of this option provokes an error. Details of how
1373         this  option  changes the behaviour of PCRE are given in the section on         this option changes the behaviour of PCRE are given in the  section  on
1374         UTF-8 support in the main pcre page.         UTF-8 support in the main pcre page.
1375    
1376           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
1377    
1378         When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is         When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
1379         automatically  checked.  There  is  a  discussion about the validity of         automatically checked. There is a  discussion  about  the  validity  of
1380         UTF-8 strings in the main pcre page. If an invalid  UTF-8  sequence  of         UTF-8  strings  in  the main pcre page. If an invalid UTF-8 sequence of
1381         bytes  is  found,  pcre_compile() returns an error. If you already know         bytes is found, pcre_compile() returns an error. If  you  already  know
1382         that your pattern is valid, and you want to skip this check for perfor-         that your pattern is valid, and you want to skip this check for perfor-
1383         mance  reasons,  you  can set the PCRE_NO_UTF8_CHECK option. When it is         mance reasons, you can set the PCRE_NO_UTF8_CHECK option.  When  it  is
1384         set, the effect of passing an invalid UTF-8  string  as  a  pattern  is         set,  the  effect  of  passing  an invalid UTF-8 string as a pattern is
1385         undefined.  It  may  cause your program to crash. Note that this option         undefined. It may cause your program to crash. Note  that  this  option
1386         can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress  the         can  also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the
1387         UTF-8 validity checking of subject strings.         UTF-8 validity checking of subject strings.
1388    
1389    
1390  COMPILATION ERROR CODES  COMPILATION ERROR CODES
1391    
1392         The  following  table  lists  the  error  codes than may be returned by         The following table lists the error  codes  than  may  be  returned  by
1393         pcre_compile2(), along with the error messages that may be returned  by         pcre_compile2(),  along with the error messages that may be returned by
1394         both  compiling functions. As PCRE has developed, some error codes have         both compiling functions. As PCRE has developed, some error codes  have
1395         fallen out of use. To avoid confusion, they have not been re-used.         fallen out of use. To avoid confusion, they have not been re-used.
1396    
1397            0  no error            0  no error
# Line 1445  COMPILATION ERROR CODES Line 1447  COMPILATION ERROR CODES
1447           50  [this code is not in use]           50  [this code is not in use]
1448           51  octal value is greater than \377 (not in UTF-8 mode)           51  octal value is greater than \377 (not in UTF-8 mode)
1449           52  internal error: overran compiling workspace           52  internal error: overran compiling workspace
1450           53  internal  error:  previously-checked  referenced  subpattern  not           53   internal  error:  previously-checked  referenced  subpattern not
1451         found         found
1452           54  DEFINE group contains more than one branch           54  DEFINE group contains more than one branch
1453           55  repeating a DEFINE group is not allowed           55  repeating a DEFINE group is not allowed
# Line 1460  COMPILATION ERROR CODES Line 1462  COMPILATION ERROR CODES
1462           63  digit expected after (?+           63  digit expected after (?+
1463           64  ] is an invalid data character in JavaScript compatibility mode           64  ] is an invalid data character in JavaScript compatibility mode
1464    
1465         The  numbers  32  and 10000 in errors 48 and 49 are defaults; different         The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
1466         values may be used if the limits were changed when PCRE was built.         values may be used if the limits were changed when PCRE was built.
1467    
1468    
# Line 1469  STUDYING A PATTERN Line 1471  STUDYING A PATTERN
1471         pcre_extra *pcre_study(const pcre *code, int options         pcre_extra *pcre_study(const pcre *code, int options
1472              const char **errptr);              const char **errptr);
1473    
1474         If a compiled pattern is going to be used several times,  it  is  worth         If  a  compiled  pattern is going to be used several times, it is worth
1475         spending more time analyzing it in order to speed up the time taken for         spending more time analyzing it in order to speed up the time taken for
1476         matching. The function pcre_study() takes a pointer to a compiled  pat-         matching.  The function pcre_study() takes a pointer to a compiled pat-
1477         tern as its first argument. If studying the pattern produces additional         tern as its first argument. If studying the pattern produces additional
1478         information that will help speed up matching,  pcre_study()  returns  a         information  that  will  help speed up matching, pcre_study() returns a
1479         pointer  to a pcre_extra block, in which the study_data field points to         pointer to a pcre_extra block, in which the study_data field points  to
1480         the results of the study.         the results of the study.
1481    
1482         The  returned  value  from  pcre_study()  can  be  passed  directly  to         The  returned  value  from  pcre_study()  can  be  passed  directly  to
1483         pcre_exec().  However,  a  pcre_extra  block also contains other fields         pcre_exec(). However, a pcre_extra block  also  contains  other  fields
1484         that can be set by the caller before the block  is  passed;  these  are         that  can  be  set  by the caller before the block is passed; these are
1485         described below in the section on matching a pattern.         described below in the section on matching a pattern.
1486    
1487         If  studying  the  pattern  does not produce any additional information         If studying the pattern does not  produce  any  additional  information
1488         pcre_study() returns NULL. In that circumstance, if the calling program         pcre_study() returns NULL. In that circumstance, if the calling program
1489         wants  to  pass  any of the other fields to pcre_exec(), it must set up         wants to pass any of the other fields to pcre_exec(), it  must  set  up
1490         its own pcre_extra block.         its own pcre_extra block.
1491    
1492         The second argument of pcre_study() contains option bits.  At  present,         The  second  argument of pcre_study() contains option bits. At present,
1493         no options are defined, and this argument should always be zero.         no options are defined, and this argument should always be zero.
1494    
1495         The  third argument for pcre_study() is a pointer for an error message.         The third argument for pcre_study() is a pointer for an error  message.
1496         If studying succeeds (even if no data is  returned),  the  variable  it         If  studying  succeeds  (even  if no data is returned), the variable it
1497         points  to  is  set  to NULL. Otherwise it is set to point to a textual         points to is set to NULL. Otherwise it is set to  point  to  a  textual
1498         error message. This is a static string that is part of the library. You         error message. This is a static string that is part of the library. You
1499         must  not  try  to  free it. You should test the error pointer for NULL         must not try to free it. You should test the  error  pointer  for  NULL
1500         after calling pcre_study(), to be sure that it has run successfully.         after calling pcre_study(), to be sure that it has run successfully.
1501    
1502         This is a typical call to pcre_study():         This is a typical call to pcre_study():
# Line 1506  STUDYING A PATTERN Line 1508  STUDYING A PATTERN
1508             &error);        /* set to NULL or points to a message */             &error);        /* set to NULL or points to a message */
1509    
1510         At present, studying a pattern is useful only for non-anchored patterns         At present, studying a pattern is useful only for non-anchored patterns
1511         that  do not have a single fixed starting character. A bitmap of possi-         that do not have a single fixed starting character. A bitmap of  possi-
1512         ble starting bytes is created.         ble starting bytes is created.
1513    
1514    
1515  LOCALE SUPPORT  LOCALE SUPPORT
1516    
1517         PCRE handles caseless matching, and determines whether  characters  are         PCRE  handles  caseless matching, and determines whether characters are
1518         letters,  digits, or whatever, by reference to a set of tables, indexed         letters, digits, or whatever, by reference to a set of tables,  indexed
1519         by character value. When running in UTF-8 mode, this  applies  only  to         by  character  value.  When running in UTF-8 mode, this applies only to
1520         characters  with  codes  less than 128. Higher-valued codes never match         characters with codes less than 128. Higher-valued  codes  never  match
1521         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built         escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
1522         with  Unicode  character property support. The use of locales with Uni-         with Unicode character property support. The use of locales  with  Uni-
1523         code is discouraged. If you are handling characters with codes  greater         code  is discouraged. If you are handling characters with codes greater
1524         than  128, you should either use UTF-8 and Unicode, or use locales, but         than 128, you should either use UTF-8 and Unicode, or use locales,  but
1525         not try to mix the two.         not try to mix the two.
1526    
1527         PCRE contains an internal set of tables that are used  when  the  final         PCRE  contains  an  internal set of tables that are used when the final
1528         argument  of  pcre_compile()  is  NULL.  These  are sufficient for many         argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
1529         applications.  Normally, the internal tables recognize only ASCII char-         applications.  Normally, the internal tables recognize only ASCII char-
1530         acters. However, when PCRE is built, it is possible to cause the inter-         acters. However, when PCRE is built, it is possible to cause the inter-
1531         nal tables to be rebuilt in the default "C" locale of the local system,         nal tables to be rebuilt in the default "C" locale of the local system,
1532         which may cause them to be different.         which may cause them to be different.
1533    
1534         The  internal tables can always be overridden by tables supplied by the         The internal tables can always be overridden by tables supplied by  the
1535         application that calls PCRE. These may be created in a different locale         application that calls PCRE. These may be created in a different locale
1536         from  the  default.  As more and more applications change to using Uni-         from the default. As more and more applications change  to  using  Uni-
1537         code, the need for this locale support is expected to die away.         code, the need for this locale support is expected to die away.
1538    
1539         External tables are built by calling  the  pcre_maketables()  function,         External  tables  are  built by calling the pcre_maketables() function,
1540         which  has no arguments, in the relevant locale. The result can then be         which has no arguments, in the relevant locale. The result can then  be
1541         passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For         passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
1542         example,  to  build  and use tables that are appropriate for the French         example, to build and use tables that are appropriate  for  the  French
1543         locale (where accented characters with  values  greater  than  128  are         locale  (where  accented  characters  with  values greater than 128 are
1544         treated as letters), the following code could be used:         treated as letters), the following code could be used:
1545    
1546           setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
1547           tables = pcre_maketables();           tables = pcre_maketables();
1548           re = pcre_compile(..., tables);           re = pcre_compile(..., tables);
1549    
1550         The  locale  name "fr_FR" is used on Linux and other Unix-like systems;         The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
1551         if you are using Windows, the name for the French locale is "french".         if you are using Windows, the name for the French locale is "french".
1552    
1553         When pcre_maketables() runs, the tables are built  in  memory  that  is         When  pcre_maketables()  runs,  the  tables are built in memory that is
1554         obtained  via  pcre_malloc. It is the caller's responsibility to ensure         obtained via pcre_malloc. It is the caller's responsibility  to  ensure
1555         that the memory containing the tables remains available for as long  as         that  the memory containing the tables remains available for as long as
1556         it is needed.         it is needed.
1557    
1558         The pointer that is passed to pcre_compile() is saved with the compiled         The pointer that is passed to pcre_compile() is saved with the compiled
1559         pattern, and the same tables are used via this pointer by  pcre_study()         pattern,  and the same tables are used via this pointer by pcre_study()
1560         and normally also by pcre_exec(). Thus, by default, for any single pat-         and normally also by pcre_exec(). Thus, by default, for any single pat-
1561         tern, compilation, studying and matching all happen in the same locale,         tern, compilation, studying and matching all happen in the same locale,
1562         but different patterns can be compiled in different locales.         but different patterns can be compiled in different locales.
1563    
1564         It  is  possible to pass a table pointer or NULL (indicating the use of         It is possible to pass a table pointer or NULL (indicating the  use  of
1565         the internal tables) to pcre_exec(). Although  not  intended  for  this         the  internal  tables)  to  pcre_exec(). Although not intended for this
1566         purpose,  this facility could be used to match a pattern in a different         purpose, this facility could be used to match a pattern in a  different
1567         locale from the one in which it was compiled. Passing table pointers at         locale from the one in which it was compiled. Passing table pointers at
1568         run time is discussed below in the section on matching a pattern.         run time is discussed below in the section on matching a pattern.
1569    
# Line 1571  INFORMATION ABOUT A PATTERN Line 1573  INFORMATION ABOUT A PATTERN
1573         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
1574              int what, void *where);              int what, void *where);
1575    
1576         The  pcre_fullinfo() function returns information about a compiled pat-         The pcre_fullinfo() function returns information about a compiled  pat-
1577         tern. It replaces the obsolete pcre_info() function, which is neverthe-         tern. It replaces the obsolete pcre_info() function, which is neverthe-
1578         less retained for backwards compability (and is documented below).         less retained for backwards compability (and is documented below).
1579    
1580         The  first  argument  for  pcre_fullinfo() is a pointer to the compiled         The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
1581         pattern. The second argument is the result of pcre_study(), or NULL  if         pattern.  The second argument is the result of pcre_study(), or NULL if
1582         the  pattern  was not studied. The third argument specifies which piece         the pattern was not studied. The third argument specifies  which  piece
1583         of information is required, and the fourth argument is a pointer  to  a         of  information  is required, and the fourth argument is a pointer to a
1584         variable  to  receive  the  data. The yield of the function is zero for         variable to receive the data. The yield of the  function  is  zero  for
1585         success, or one of the following negative numbers:         success, or one of the following negative numbers:
1586    
1587           PCRE_ERROR_NULL       the argument code was NULL           PCRE_ERROR_NULL       the argument code was NULL
# Line 1587  INFORMATION ABOUT A PATTERN Line 1589  INFORMATION ABOUT A PATTERN
1589           PCRE_ERROR_BADMAGIC   the "magic number" was not found           PCRE_ERROR_BADMAGIC   the "magic number" was not found
1590           PCRE_ERROR_BADOPTION  the value of what was invalid           PCRE_ERROR_BADOPTION  the value of what was invalid
1591    
1592         The "magic number" is placed at the start of each compiled  pattern  as         The  "magic  number" is placed at the start of each compiled pattern as
1593         an  simple check against passing an arbitrary memory pointer. Here is a         an simple check against passing an arbitrary memory pointer. Here is  a
1594         typical call of pcre_fullinfo(), to obtain the length of  the  compiled         typical  call  of pcre_fullinfo(), to obtain the length of the compiled
1595         pattern:         pattern:
1596    
1597           int rc;           int rc;
# Line 1600  INFORMATION ABOUT A PATTERN Line 1602  INFORMATION ABOUT A PATTERN
1602             PCRE_INFO_SIZE,   /* what is required */             PCRE_INFO_SIZE,   /* what is required */
1603             &length);         /* where to put the data */             &length);         /* where to put the data */
1604    
1605         The  possible  values for the third argument are defined in pcre.h, and         The possible values for the third argument are defined in  pcre.h,  and
1606         are as follows:         are as follows:
1607    
1608           PCRE_INFO_BACKREFMAX           PCRE_INFO_BACKREFMAX
1609    
1610         Return the number of the highest back reference  in  the  pattern.  The         Return  the  number  of  the highest back reference in the pattern. The
1611         fourth  argument  should  point to an int variable. Zero is returned if         fourth argument should point to an int variable. Zero  is  returned  if
1612         there are no back references.         there are no back references.
1613    
1614           PCRE_INFO_CAPTURECOUNT           PCRE_INFO_CAPTURECOUNT
1615    
1616         Return the number of capturing subpatterns in the pattern.  The  fourth         Return  the  number of capturing subpatterns in the pattern. The fourth
1617         argument should point to an int variable.         argument should point to an int variable.
1618    
1619           PCRE_INFO_DEFAULT_TABLES           PCRE_INFO_DEFAULT_TABLES
1620    
1621         Return  a pointer to the internal default character tables within PCRE.         Return a pointer to the internal default character tables within  PCRE.
1622         The fourth argument should point to an unsigned char *  variable.  This         The  fourth  argument should point to an unsigned char * variable. This
1623         information call is provided for internal use by the pcre_study() func-         information call is provided for internal use by the pcre_study() func-
1624         tion. External callers can cause PCRE to use  its  internal  tables  by         tion.  External  callers  can  cause PCRE to use its internal tables by
1625         passing a NULL table pointer.         passing a NULL table pointer.
1626    
1627           PCRE_INFO_FIRSTBYTE           PCRE_INFO_FIRSTBYTE
1628    
1629         Return  information  about  the first byte of any matched string, for a         Return information about the first byte of any matched  string,  for  a
1630         non-anchored pattern. The fourth argument should point to an int  vari-         non-anchored  pattern. The fourth argument should point to an int vari-
1631         able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name         able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old  name
1632         is still recognized for backwards compatibility.)         is still recognized for backwards compatibility.)
1633    
1634         If there is a fixed first byte, for example, from  a  pattern  such  as         If  there  is  a  fixed first byte, for example, from a pattern such as
1635         (cat|cow|coyote), its value is returned. Otherwise, if either         (cat|cow|coyote), its value is returned. Otherwise, if either
1636    
1637         (a)  the pattern was compiled with the PCRE_MULTILINE option, and every         (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
1638         branch starts with "^", or         branch starts with "^", or
1639    
1640         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not         (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
1641         set (if it were set, the pattern would be anchored),         set (if it were set, the pattern would be anchored),
1642    
1643         -1  is  returned, indicating that the pattern matches only at the start         -1 is returned, indicating that the pattern matches only at  the  start
1644         of a subject string or after any newline within the  string.  Otherwise         of  a  subject string or after any newline within the string. Otherwise
1645         -2 is returned. For anchored patterns, -2 is returned.         -2 is returned. For anchored patterns, -2 is returned.
1646    
1647           PCRE_INFO_FIRSTTABLE           PCRE_INFO_FIRSTTABLE
1648    
1649         If  the pattern was studied, and this resulted in the construction of a         If the pattern was studied, and this resulted in the construction of  a
1650         256-bit table indicating a fixed set of bytes for the first byte in any         256-bit table indicating a fixed set of bytes for the first byte in any
1651         matching  string, a pointer to the table is returned. Otherwise NULL is         matching string, a pointer to the table is returned. Otherwise NULL  is
1652         returned. The fourth argument should point to an unsigned char *  vari-         returned.  The fourth argument should point to an unsigned char * vari-
1653         able.         able.
1654    
1655           PCRE_INFO_HASCRORLF           PCRE_INFO_HASCRORLF
1656    
1657         Return  1  if  the  pattern  contains any explicit matches for CR or LF         Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
1658         characters, otherwise 0. The fourth argument should  point  to  an  int         characters,  otherwise  0.  The  fourth argument should point to an int
1659         variable.  An explicit match is either a literal CR or LF character, or         variable. An explicit match is either a literal CR or LF character,  or
1660         \r or \n.         \r or \n.
1661    
1662           PCRE_INFO_JCHANGED           PCRE_INFO_JCHANGED
1663    
1664         Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,         Return  1  if  the (?J) or (?-J) option setting is used in the pattern,
1665         otherwise  0. The fourth argument should point to an int variable. (?J)         otherwise 0. The fourth argument should point to an int variable.  (?J)
1666         and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.         and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.
1667    
1668           PCRE_INFO_LASTLITERAL           PCRE_INFO_LASTLITERAL
1669    
1670         Return the value of the rightmost literal byte that must exist  in  any         Return  the  value of the rightmost literal byte that must exist in any
1671         matched  string,  other  than  at  its  start,  if such a byte has been         matched string, other than at its  start,  if  such  a  byte  has  been
1672         recorded. The fourth argument should point to an int variable. If there         recorded. The fourth argument should point to an int variable. If there
1673         is  no such byte, -1 is returned. For anchored patterns, a last literal         is no such byte, -1 is returned. For anchored patterns, a last  literal
1674         byte is recorded only if it follows something of variable  length.  For         byte  is  recorded only if it follows something of variable length. For
1675         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
1676         /^a\dz\d/ the returned value is -1.         /^a\dz\d/ the returned value is -1.
1677    
# Line 1677  INFORMATION ABOUT A PATTERN Line 1679  INFORMATION ABOUT A PATTERN
1679           PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
1680           PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
1681    
1682         PCRE supports the use of named as well as numbered capturing  parenthe-         PCRE  supports the use of named as well as numbered capturing parenthe-
1683         ses.  The names are just an additional way of identifying the parenthe-         ses. The names are just an additional way of identifying the  parenthe-
1684         ses, which still acquire numbers. Several convenience functions such as         ses, which still acquire numbers. Several convenience functions such as
1685         pcre_get_named_substring()  are  provided  for extracting captured sub-         pcre_get_named_substring() are provided for  extracting  captured  sub-
1686         strings by name. It is also possible to extract the data  directly,  by         strings  by  name. It is also possible to extract the data directly, by
1687         first  converting  the  name to a number in order to access the correct         first converting the name to a number in order to  access  the  correct
1688         pointers in the output vector (described with pcre_exec() below). To do         pointers in the output vector (described with pcre_exec() below). To do
1689         the  conversion,  you  need  to  use  the  name-to-number map, which is         the conversion, you need  to  use  the  name-to-number  map,  which  is
1690         described by these three values.         described by these three values.
1691    
1692         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
1693         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
1694         of each entry; both of these  return  an  int  value.  The  entry  size         of  each  entry;  both  of  these  return  an int value. The entry size
1695         depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns         depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
1696         a pointer to the first entry of the table  (a  pointer  to  char).  The         a  pointer  to  the  first  entry of the table (a pointer to char). The
1697         first two bytes of each entry are the number of the capturing parenthe-         first two bytes of each entry are the number of the capturing parenthe-
1698         sis, most significant byte first. The rest of the entry is  the  corre-         sis,  most  significant byte first. The rest of the entry is the corre-
1699         sponding  name,  zero  terminated. The names are in alphabetical order.         sponding name, zero terminated. The names are  in  alphabetical  order.
1700         When PCRE_DUPNAMES is set, duplicate names are in order of their paren-         When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
1701         theses  numbers.  For  example,  consider the following pattern (assume         theses numbers. For example, consider  the  following  pattern  (assume
1702         PCRE_EXTENDED is  set,  so  white  space  -  including  newlines  -  is         PCRE_EXTENDED  is  set,  so  white  space  -  including  newlines  - is
1703         ignored):         ignored):
1704    
1705           (?<date> (?<year>(\d\d)?\d\d) -           (?<date> (?<year>(\d\d)?\d\d) -
1706           (?<month>\d\d) - (?<day>\d\d) )           (?<month>\d\d) - (?<day>\d\d) )
1707    
1708         There  are  four  named subpatterns, so the table has four entries, and         There are four named subpatterns, so the table has  four  entries,  and
1709         each entry in the table is eight bytes long. The table is  as  follows,         each  entry  in the table is eight bytes long. The table is as follows,
1710         with non-printing bytes shows in hexadecimal, and undefined bytes shown         with non-printing bytes shows in hexadecimal, and undefined bytes shown
1711         as ??:         as ??:
1712    
# Line 1713  INFORMATION ABOUT A PATTERN Line 1715  INFORMATION ABOUT A PATTERN
1715           00 04 m  o  n  t  h  00           00 04 m  o  n  t  h  00
1716           00 02 y  e  a  r  00 ??           00 02 y  e  a  r  00 ??
1717    
1718         When writing code to extract data  from  named  subpatterns  using  the         When  writing  code  to  extract  data from named subpatterns using the
1719         name-to-number  map,  remember that the length of the entries is likely         name-to-number map, remember that the length of the entries  is  likely
1720         to be different for each compiled pattern.         to be different for each compiled pattern.
1721    
1722           PCRE_INFO_OKPARTIAL           PCRE_INFO_OKPARTIAL
1723    
1724         Return 1 if the pattern can be used for partial matching, otherwise  0.         Return  1 if the pattern can be used for partial matching, otherwise 0.
1725         The  fourth  argument  should point to an int variable. The pcrepartial         The fourth argument should point to an int  variable.  The  pcrepartial
1726         documentation lists the restrictions that apply to patterns  when  par-         documentation  lists  the restrictions that apply to patterns when par-
1727         tial matching is used.         tial matching is used.
1728    
1729           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
1730    
1731         Return  a  copy of the options with which the pattern was compiled. The         Return a copy of the options with which the pattern was  compiled.  The
1732         fourth argument should point to an unsigned long  int  variable.  These         fourth  argument  should  point to an unsigned long int variable. These
1733         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
1734         by any top-level option settings at the start of the pattern itself. In         by any top-level option settings at the start of the pattern itself. In
1735         other  words,  they are the options that will be in force when matching         other words, they are the options that will be in force  when  matching
1736         starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with         starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with
1737         the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,         the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,
1738         and PCRE_EXTENDED.         and PCRE_EXTENDED.
1739    
1740         A pattern is automatically anchored by PCRE if  all  of  its  top-level         A  pattern  is  automatically  anchored by PCRE if all of its top-level
1741         alternatives begin with one of the following:         alternatives begin with one of the following:
1742    
1743           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 1749  INFORMATION ABOUT A PATTERN Line 1751  INFORMATION ABOUT A PATTERN
1751    
1752           PCRE_INFO_SIZE           PCRE_INFO_SIZE
1753    
1754         Return the size of the compiled pattern, that is, the  value  that  was         Return  the  size  of the compiled pattern, that is, the value that was
1755         passed as the argument to pcre_malloc() when PCRE was getting memory in         passed as the argument to pcre_malloc() when PCRE was getting memory in
1756         which to place the compiled data. The fourth argument should point to a         which to place the compiled data. The fourth argument should point to a
1757         size_t variable.         size_t variable.
# Line 1757  INFORMATION ABOUT A PATTERN Line 1759  INFORMATION ABOUT A PATTERN
1759           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
1760    
1761         Return the size of the data block pointed to by the study_data field in         Return the size of the data block pointed to by the study_data field in
1762         a pcre_extra block. That is,  it  is  the  value  that  was  passed  to         a  pcre_extra  block.  That  is,  it  is  the  value that was passed to
1763         pcre_malloc() when PCRE was getting memory into which to place the data         pcre_malloc() when PCRE was getting memory into which to place the data
1764         created by pcre_study(). The fourth argument should point to  a  size_t         created  by  pcre_study(). The fourth argument should point to a size_t
1765         variable.         variable.
1766    
1767    
# Line 1767  OBSOLETE INFO FUNCTION Line 1769  OBSOLETE INFO FUNCTION
1769    
1770         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
1771    
1772         The  pcre_info()  function is now obsolete because its interface is too         The pcre_info() function is now obsolete because its interface  is  too
1773         restrictive to return all the available data about a compiled  pattern.         restrictive  to return all the available data about a compiled pattern.
1774         New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of         New  programs  should  use  pcre_fullinfo()  instead.  The   yield   of
1775         pcre_info() is the number of capturing subpatterns, or one of the  fol-         pcre_info()  is the number of capturing subpatterns, or one of the fol-
1776         lowing negative numbers:         lowing negative numbers:
1777    
1778           PCRE_ERROR_NULL       the argument code was NULL           PCRE_ERROR_NULL       the argument code was NULL
1779           PCRE_ERROR_BADMAGIC   the "magic number" was not found           PCRE_ERROR_BADMAGIC   the "magic number" was not found
1780    
1781         If  the  optptr  argument is not NULL, a copy of the options with which         If the optptr argument is not NULL, a copy of the  options  with  which
1782         the pattern was compiled is placed in the integer  it  points  to  (see         the  pattern  was  compiled  is placed in the integer it points to (see
1783         PCRE_INFO_OPTIONS above).         PCRE_INFO_OPTIONS above).
1784    
1785         If  the  pattern  is  not anchored and the firstcharptr argument is not         If the pattern is not anchored and the  firstcharptr  argument  is  not
1786         NULL, it is used to pass back information about the first character  of         NULL,  it is used to pass back information about the first character of
1787         any matched string (see PCRE_INFO_FIRSTBYTE above).         any matched string (see PCRE_INFO_FIRSTBYTE above).
1788    
1789    
# Line 1789  REFERENCE COUNTS Line 1791  REFERENCE COUNTS
1791    
1792         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
1793    
1794         The  pcre_refcount()  function is used to maintain a reference count in         The pcre_refcount() function is used to maintain a reference  count  in
1795         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
1796         benefit  of  applications  that  operate  in an object-oriented manner,         benefit of applications that  operate  in  an  object-oriented  manner,
1797         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
1798         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
1799    
1800         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
1801         zero.  It is changed only by calling this function, whose action is  to         zero.   It is changed only by calling this function, whose action is to
1802         add  the  adjust  value  (which may be positive or negative) to it. The         add the adjust value (which may be positive or  negative)  to  it.  The
1803         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
1804         is  constrained to lie between 0 and 65535, inclusive. If the new value         is constrained to lie between 0 and 65535, inclusive. If the new  value
1805         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
1806    
1807         Except when it is zero, the reference count is not correctly  preserved         Except  when it is zero, the reference count is not correctly preserved
1808         if  a  pattern  is  compiled on one host and then transferred to a host         if a pattern is compiled on one host and then  transferred  to  a  host
1809         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
1810    
1811    
# Line 1813  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1815  MATCHING A PATTERN: THE TRADITIONAL FUNC
1815              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
1816              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
1817    
1818         The function pcre_exec() is called to match a subject string against  a         The  function pcre_exec() is called to match a subject string against a
1819         compiled  pattern, which is passed in the code argument. If the pattern         compiled pattern, which is passed in the code argument. If the  pattern
1820         has been studied, the result of the study should be passed in the extra         has been studied, the result of the study should be passed in the extra
1821         argument.  This  function is the main matching facility of the library,         argument. This function is the main matching facility of  the  library,
1822         and it operates in a Perl-like manner. For specialist use there is also         and it operates in a Perl-like manner. For specialist use there is also
1823         an  alternative matching function, which is described below in the sec-         an alternative matching function, which is described below in the  sec-
1824         tion about the pcre_dfa_exec() function.         tion about the pcre_dfa_exec() function.
1825    
1826         In most applications, the pattern will have been compiled (and  option-         In  most applications, the pattern will have been compiled (and option-
1827         ally  studied)  in the same process that calls pcre_exec(). However, it         ally studied) in the same process that calls pcre_exec().  However,  it
1828         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
1829         later  in  different processes, possibly even on different hosts. For a         later in different processes, possibly even on different hosts.  For  a
1830         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
1831    
1832         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 1843  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1845  MATCHING A PATTERN: THE TRADITIONAL FUNC
1845    
1846     Extra data for pcre_exec()     Extra data for pcre_exec()
1847    
1848         If the extra argument is not NULL, it must point to a  pcre_extra  data         If  the  extra argument is not NULL, it must point to a pcre_extra data
1849         block.  The pcre_study() function returns such a block (when it doesn't         block. The pcre_study() function returns such a block (when it  doesn't
1850         return NULL), but you can also create one for yourself, and pass  addi-         return  NULL), but you can also create one for yourself, and pass addi-
1851         tional  information  in it. The pcre_extra block contains the following         tional information in it. The pcre_extra block contains  the  following
1852         fields (not necessarily in this order):         fields (not necessarily in this order):
1853    
1854           unsigned long int flags;           unsigned long int flags;
# Line 1856  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1858  MATCHING A PATTERN: THE TRADITIONAL FUNC
1858           void *callout_data;           void *callout_data;
1859           const unsigned char *tables;           const unsigned char *tables;
1860    
1861         The flags field is a bitmap that specifies which of  the  other  fields         The  flags  field  is a bitmap that specifies which of the other fields
1862         are set. The flag bits are:         are set. The flag bits are:
1863    
1864           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_STUDY_DATA
# Line 1865  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1867  MATCHING A PATTERN: THE TRADITIONAL FUNC
1867           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_CALLOUT_DATA
1868           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
1869    
1870         Other  flag  bits should be set to zero. The study_data field is set in         Other flag bits should be set to zero. The study_data field is  set  in
1871         the pcre_extra block that is returned by  pcre_study(),  together  with         the  pcre_extra  block  that is returned by pcre_study(), together with
1872         the appropriate flag bit. You should not set this yourself, but you may         the appropriate flag bit. You should not set this yourself, but you may
1873         add to the block by setting the other fields  and  their  corresponding         add  to  the  block by setting the other fields and their corresponding
1874         flag bits.         flag bits.
1875    
1876         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
1877         a vast amount of resources when running patterns that are not going  to         a  vast amount of resources when running patterns that are not going to
1878         match,  but  which  have  a very large number of possibilities in their         match, but which have a very large number  of  possibilities  in  their
1879         search trees. The classic  example  is  the  use  of  nested  unlimited         search  trees.  The  classic  example  is  the  use of nested unlimited
1880         repeats.         repeats.
1881    
1882         Internally,  PCRE uses a function called match() which it calls repeat-         Internally, PCRE uses a function called match() which it calls  repeat-
1883         edly (sometimes recursively). The limit set by match_limit  is  imposed         edly  (sometimes  recursively). The limit set by match_limit is imposed
1884         on  the  number  of times this function is called during a match, which         on the number of times this function is called during  a  match,  which
1885         has the effect of limiting the amount of  backtracking  that  can  take         has  the  effect  of  limiting the amount of backtracking that can take
1886         place. For patterns that are not anchored, the count restarts from zero         place. For patterns that are not anchored, the count restarts from zero
1887         for each position in the subject string.         for each position in the subject string.
1888    
1889         The default value for the limit can be set  when  PCRE  is  built;  the         The  default  value  for  the  limit can be set when PCRE is built; the
1890         default  default  is 10 million, which handles all but the most extreme         default default is 10 million, which handles all but the  most  extreme
1891         cases. You can override the default  by  suppling  pcre_exec()  with  a         cases.  You  can  override  the  default by suppling pcre_exec() with a
1892         pcre_extra     block    in    which    match_limit    is    set,    and         pcre_extra    block    in    which    match_limit    is    set,     and
1893         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
1894         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1895    
1896         The  match_limit_recursion field is similar to match_limit, but instead         The match_limit_recursion field is similar to match_limit, but  instead
1897         of limiting the total number of times that match() is called, it limits         of limiting the total number of times that match() is called, it limits
1898         the  depth  of  recursion. The recursion depth is a smaller number than         the depth of recursion. The recursion depth is a  smaller  number  than
1899         the total number of calls, because not all calls to match() are  recur-         the  total number of calls, because not all calls to match() are recur-
1900         sive.  This limit is of use only if it is set smaller than match_limit.         sive.  This limit is of use only if it is set smaller than match_limit.
1901    
1902         Limiting  the  recursion  depth  limits the amount of stack that can be         Limiting the recursion depth limits the amount of  stack  that  can  be
1903         used, or, when PCRE has been compiled to use memory on the heap instead         used, or, when PCRE has been compiled to use memory on the heap instead
1904         of the stack, the amount of heap memory that can be used.         of the stack, the amount of heap memory that can be used.
1905    
1906         The  default  value  for  match_limit_recursion can be set when PCRE is         The default value for match_limit_recursion can be  set  when  PCRE  is
1907         built; the default default  is  the  same  value  as  the  default  for         built;  the  default  default  is  the  same  value  as the default for
1908         match_limit.  You can override the default by suppling pcre_exec() with         match_limit. You can override the default by suppling pcre_exec()  with
1909         a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and         a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and
1910         PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the         PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the
1911         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1912    
1913         The pcre_callout field is used in conjunction with the  "callout"  fea-         The  pcre_callout  field is used in conjunction with the "callout" fea-
1914         ture, which is described in the pcrecallout documentation.         ture, which is described in the pcrecallout documentation.
1915    
1916         The  tables  field  is  used  to  pass  a  character  tables pointer to         The tables field  is  used  to  pass  a  character  tables  pointer  to
1917         pcre_exec(); this overrides the value that is stored with the  compiled         pcre_exec();  this overrides the value that is stored with the compiled
1918         pattern.  A  non-NULL value is stored with the compiled pattern only if         pattern. A non-NULL value is stored with the compiled pattern  only  if
1919         custom tables were supplied to pcre_compile() via  its  tableptr  argu-         custom  tables  were  supplied to pcre_compile() via its tableptr argu-
1920         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
1921         PCRE's internal tables to be used. This facility is  helpful  when  re-         PCRE's  internal  tables  to be used. This facility is helpful when re-
1922         using  patterns  that  have been saved after compiling with an external         using patterns that have been saved after compiling  with  an  external
1923         set of tables, because the external tables  might  be  at  a  different         set  of  tables,  because  the  external tables might be at a different
1924         address  when  pcre_exec() is called. See the pcreprecompile documenta-         address when pcre_exec() is called. See the  pcreprecompile  documenta-
1925         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
1926    
1927     Option bits for pcre_exec()     Option bits for pcre_exec()
1928    
1929         The unused bits of the options argument for pcre_exec() must  be  zero.         The  unused  bits of the options argument for pcre_exec() must be zero.
1930         The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,         The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,
1931         PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,    PCRE_NO_START_OPTIMIZE,         PCRE_NOTBOL,    PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_START_OPTIMIZE,
1932         PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.         PCRE_NO_UTF8_CHECK and PCRE_PARTIAL.
1933    
1934           PCRE_ANCHORED           PCRE_ANCHORED
1935    
1936         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
1937         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
1938         turned  out to be anchored by virtue of its contents, it cannot be made         turned out to be anchored by virtue of its contents, it cannot be  made
1939         unachored at matching time.         unachored at matching time.
1940    
1941           PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
1942           PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
1943    
1944         These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
1945         sequence  matches.  The choice is either to match only CR, LF, or CRLF,         sequence matches. The choice is either to match only CR, LF,  or  CRLF,
1946         or to match any Unicode newline sequence. These  options  override  the         or  to  match  any Unicode newline sequence. These options override the
1947         choice that was made or defaulted when the pattern was compiled.         choice that was made or defaulted when the pattern was compiled.
1948    
1949           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
# Line 1950  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1952  MATCHING A PATTERN: THE TRADITIONAL FUNC
1952           PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
1953           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1954    
1955         These  options  override  the  newline  definition  that  was chosen or         These options override  the  newline  definition  that  was  chosen  or
1956         defaulted when the pattern was compiled. For details, see the  descrip-         defaulted  when the pattern was compiled. For details, see the descrip-
1957         tion  of  pcre_compile()  above.  During  matching,  the newline choice         tion of pcre_compile()  above.  During  matching,  the  newline  choice
1958         affects the behaviour of the dot, circumflex,  and  dollar  metacharac-         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-
1959         ters.  It may also alter the way the match position is advanced after a         ters. It may also alter the way the match position is advanced after  a
1960         match failure for an unanchored pattern.         match failure for an unanchored pattern.
1961    
1962         When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is         When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is
1963         set,  and a match attempt for an unanchored pattern fails when the cur-         set, and a match attempt for an unanchored pattern fails when the  cur-
1964         rent position is at a  CRLF  sequence,  and  the  pattern  contains  no         rent  position  is  at  a  CRLF  sequence,  and the pattern contains no
1965         explicit  matches  for  CR  or  LF  characters,  the  match position is         explicit matches for  CR  or  LF  characters,  the  match  position  is
1966         advanced by two characters instead of one, in other words, to after the         advanced by two characters instead of one, in other words, to after the
1967         CRLF.         CRLF.
1968    
1969         The above rule is a compromise that makes the most common cases work as         The above rule is a compromise that makes the most common cases work as
1970         expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL         expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL
1971         option is not set), it does not match the string "\r\nA" because, after         option is not set), it does not match the string "\r\nA" because, after
1972         failing at the start, it skips both the CR and the LF before  retrying.         failing  at the start, it skips both the CR and the LF before retrying.
1973         However,  the  pattern  [\r\n]A does match that string, because it con-         However, the pattern [\r\n]A does match that string,  because  it  con-
1974         tains an explicit CR or LF reference, and so advances only by one char-         tains an explicit CR or LF reference, and so advances only by one char-
1975         acter after the first failure.         acter after the first failure.
1976    
1977         An explicit match for CR of LF is either a literal appearance of one of         An explicit match for CR of LF is either a literal appearance of one of
1978         those characters, or one of the \r or  \n  escape  sequences.  Implicit         those  characters,  or  one  of the \r or \n escape sequences. Implicit
1979         matches  such  as [^X] do not count, nor does \s (which includes CR and         matches such as [^X] do not count, nor does \s (which includes  CR  and
1980         LF in the characters that it matches).         LF in the characters that it matches).
1981    
1982         Notwithstanding the above, anomalous effects may still occur when  CRLF         Notwithstanding  the above, anomalous effects may still occur when CRLF
1983         is a valid newline sequence and explicit \r or \n escapes appear in the         is a valid newline sequence and explicit \r or \n escapes appear in the
1984         pattern.         pattern.
1985    
1986           PCRE_NOTBOL           PCRE_NOTBOL
1987    
1988         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
1989         the  beginning  of  a  line, so the circumflex metacharacter should not         the beginning of a line, so the  circumflex  metacharacter  should  not
1990         match before it. Setting this without PCRE_MULTILINE (at compile  time)         match  before it. Setting this without PCRE_MULTILINE (at compile time)
1991         causes  circumflex  never to match. This option affects only the behav-         causes circumflex never to match. This option affects only  the  behav-
1992         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
1993    
1994           PCRE_NOTEOL           PCRE_NOTEOL
1995    
1996         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
1997         of  a line, so the dollar metacharacter should not match it nor (except         of a line, so the dollar metacharacter should not match it nor  (except
1998         in multiline mode) a newline immediately before it. Setting this  with-         in  multiline mode) a newline immediately before it. Setting this with-
1999         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
2000         option affects only the behaviour of the dollar metacharacter. It  does         option  affects only the behaviour of the dollar metacharacter. It does
2001         not affect \Z or \z.         not affect \Z or \z.
2002    
2003           PCRE_NOTEMPTY           PCRE_NOTEMPTY
2004    
2005         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
2006         set. If there are alternatives in the pattern, they are tried.  If  all         set.  If  there are alternatives in the pattern, they are tried. If all
2007         the  alternatives  match  the empty string, the entire match fails. For         the alternatives match the empty string, the entire  match  fails.  For
2008         example, if the pattern         example, if the pattern
2009    
2010           a?b?           a?b?
2011    
2012         is applied to a string not beginning with "a" or "b",  it  matches  the         is  applied  to  a string not beginning with "a" or "b", it matches the
2013         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
2014         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
2015         rences of "a" or "b".         rences of "a" or "b".
2016    
2017         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
2018         cial case of a pattern match of the empty  string  within  its  split()         cial  case  of  a  pattern match of the empty string within its split()
2019         function,  and  when  using  the /g modifier. It is possible to emulate         function, and when using the /g modifier. It  is  possible  to  emulate
2020         Perl's behaviour after matching a null string by first trying the match         Perl's behaviour after matching a null string by first trying the match
2021         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
2022         if that fails by advancing the starting offset (see below)  and  trying         if  that  fails by advancing the starting offset (see below) and trying
2023         an ordinary match again. There is some code that demonstrates how to do         an ordinary match again. There is some code that demonstrates how to do
2024         this in the pcredemo.c sample program.         this in the pcredemo.c sample program.
2025    
2026           PCRE_NO_START_OPTIMIZE           PCRE_NO_START_OPTIMIZE
2027    
2028         There are a number of optimizations that pcre_exec() uses at the  start         There  are a number of optimizations that pcre_exec() uses at the start
2029         of  a  match,  in  order to speed up the process. For example, if it is         of a match, in order to speed up the process. For  example,  if  it  is
2030         known that a match must start with a specific  character,  it  searches         known  that  a  match must start with a specific character, it searches
2031         the subject for that character, and fails immediately if it cannot find         the subject for that character, and fails immediately if it cannot find
2032         it, without actually running the main matching function. When  callouts         it,  without actually running the main matching function. When callouts
2033         are  in  use,  these  optimizations  can cause them to be skipped. This         are in use, these optimizations can cause  them  to  be  skipped.  This
2034         option disables the "start-up" optimizations,  causing  performance  to         option  disables  the  "start-up" optimizations, causing performance to
2035         suffer, but ensuring that the callouts do occur.         suffer, but ensuring that the callouts do occur.
2036    
2037           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
2038    
2039         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
2040         UTF-8 string is automatically checked when pcre_exec() is  subsequently         UTF-8  string is automatically checked when pcre_exec() is subsequently
2041         called.   The  value  of  startoffset is also checked to ensure that it         called.  The value of startoffset is also checked  to  ensure  that  it
2042         points to the start of a UTF-8 character. There is a  discussion  about         points  to  the start of a UTF-8 character. There is a discussion about
2043         the  validity  of  UTF-8 strings in the section on UTF-8 support in the         the validity of UTF-8 strings in the section on UTF-8  support  in  the
2044         main pcre page. If  an  invalid  UTF-8  sequence  of  bytes  is  found,         main  pcre  page.  If  an  invalid  UTF-8  sequence  of bytes is found,
2045         pcre_exec()  returns  the error PCRE_ERROR_BADUTF8. If startoffset con-         pcre_exec() returns the error PCRE_ERROR_BADUTF8. If  startoffset  con-
2046         tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.         tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
2047    
2048         If you already know that your subject is valid, and you  want  to  skip         If  you  already  know that your subject is valid, and you want to skip
2049         these    checks    for   performance   reasons,   you   can   set   the         these   checks   for   performance   reasons,   you   can    set    the
2050         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
2051         do  this  for the second and subsequent calls to pcre_exec() if you are         do this for the second and subsequent calls to pcre_exec() if  you  are
2052         making repeated calls to find all  the  matches  in  a  single  subject         making  repeated  calls  to  find  all  the matches in a single subject
2053         string.  However,  you  should  be  sure  that the value of startoffset         string. However, you should be  sure  that  the  value  of  startoffset
2054         points to the start of a UTF-8 character.  When  PCRE_NO_UTF8_CHECK  is         points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
2055         set,  the  effect of passing an invalid UTF-8 string as a subject, or a         set, the effect of passing an invalid UTF-8 string as a subject,  or  a
2056         value of startoffset that does not point to the start of a UTF-8  char-         value  of startoffset that does not point to the start of a UTF-8 char-
2057         acter, is undefined. Your program may crash.         acter, is undefined. Your program may crash.
2058    
2059           PCRE_PARTIAL           PCRE_PARTIAL
2060    
2061         This  option  turns  on  the  partial  matching feature. If the subject         This option turns on the  partial  matching  feature.  If  the  subject
2062         string fails to match the pattern, but at some point during the  match-         string  fails to match the pattern, but at some point during the match-
2063         ing  process  the  end of the subject was reached (that is, the subject         ing process the end of the subject was reached (that  is,  the  subject
2064         partially matches the pattern and the failure to  match  occurred  only         partially  matches  the  pattern and the failure to match occurred only
2065         because  there were not enough subject characters), pcre_exec() returns         because there were not enough subject characters), pcre_exec()  returns
2066         PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL  is         PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
2067         used,  there  are restrictions on what may appear in the pattern. These         used, there are restrictions on what may appear in the  pattern.  These
2068         are discussed in the pcrepartial documentation.         are discussed in the pcrepartial documentation.
2069    
2070     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
2071    
2072         The subject string is passed to pcre_exec() as a pointer in subject,  a         The  subject string is passed to pcre_exec() as a pointer in subject, a
2073         length (in bytes) in length, and a starting byte offset in startoffset.         length (in bytes) in length, and a starting byte offset in startoffset.
2074         In UTF-8 mode, the byte offset must point to the start of a UTF-8 char-         In UTF-8 mode, the byte offset must point to the start of a UTF-8 char-
2075         acter.  Unlike  the pattern string, the subject may contain binary zero         acter. Unlike the pattern string, the subject may contain  binary  zero
2076         bytes. When the starting offset is zero, the search for a match  starts         bytes.  When the starting offset is zero, the search for a match starts
2077         at  the  beginning  of  the subject, and this is by far the most common         at the beginning of the subject, and this is by  far  the  most  common
2078         case.         case.
2079    
2080         A non-zero starting offset is useful when searching for  another  match         A  non-zero  starting offset is useful when searching for another match
2081         in  the same subject by calling pcre_exec() again after a previous suc-         in the same subject by calling pcre_exec() again after a previous  suc-
2082         cess.  Setting startoffset differs from just passing over  a  shortened         cess.   Setting  startoffset differs from just passing over a shortened
2083         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
2084         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
2085    
2086           \Biss\B           \Biss\B
2087    
2088         which finds occurrences of "iss" in the middle of  words.  (\B  matches         which  finds  occurrences  of "iss" in the middle of words. (\B matches
2089         only  if  the  current position in the subject is not a word boundary.)         only if the current position in the subject is not  a  word  boundary.)
2090         When applied to the string "Mississipi" the first call  to  pcre_exec()         When  applied  to the string "Mississipi" the first call to pcre_exec()
2091         finds  the  first  occurrence. If pcre_exec() is called again with just         finds the first occurrence. If pcre_exec() is called  again  with  just
2092         the remainder of the subject,  namely  "issipi",  it  does  not  match,         the  remainder  of  the  subject,  namely  "issipi", it does not match,
2093         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
2094         to be a word boundary. However, if pcre_exec()  is  passed  the  entire         to  be  a  word  boundary. However, if pcre_exec() is passed the entire
2095         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
2096         rence of "iss" because it is able to look behind the starting point  to         rence  of "iss" because it is able to look behind the starting point to
2097         discover that it is preceded by a letter.         discover that it is preceded by a letter.
2098    
2099         If  a  non-zero starting offset is passed when the pattern is anchored,         If a non-zero starting offset is passed when the pattern  is  anchored,
2100         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
2101         if  the  pattern  does  not require the match to be at the start of the         if the pattern does not require the match to be at  the  start  of  the
2102         subject.         subject.
2103    
2104     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
2105    
2106         In general, a pattern matches a certain portion of the subject, and  in         In  general, a pattern matches a certain portion of the subject, and in
2107         addition,  further  substrings  from  the  subject may be picked out by         addition, further substrings from the subject  may  be  picked  out  by
2108         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,
2109         this  is  called "capturing" in what follows, and the phrase "capturing         this is called "capturing" in what follows, and the  phrase  "capturing
2110         subpattern" is used for a fragment of a pattern that picks out  a  sub-         subpattern"  is  used for a fragment of a pattern that picks out a sub-
2111         string.  PCRE  supports several other kinds of parenthesized subpattern         string. PCRE supports several other kinds of  parenthesized  subpattern
2112         that do not cause substrings to be captured.         that do not cause substrings to be captured.
2113    
2114         Captured substrings are returned to the caller via a vector of integers         Captured substrings are returned to the caller via a vector of integers
2115         whose  address is passed in ovector. The number of elements in the vec-         whose address is passed in ovector. The number of elements in the  vec-
2116         tor is passed in ovecsize, which must be a non-negative  number.  Note:         tor  is  passed in ovecsize, which must be a non-negative number. Note:
2117         this argument is NOT the size of ovector in bytes.         this argument is NOT the size of ovector in bytes.
2118    
2119         The  first  two-thirds of the vector is used to pass back captured sub-         The first two-thirds of the vector is used to pass back  captured  sub-
2120         strings, each substring using a pair of integers. The  remaining  third         strings,  each  substring using a pair of integers. The remaining third
2121         of  the  vector is used as workspace by pcre_exec() while matching cap-         of the vector is used as workspace by pcre_exec() while  matching  cap-
2122         turing subpatterns, and is not available for passing back  information.         turing  subpatterns, and is not available for passing back information.
2123         The  number passed in ovecsize should always be a multiple of three. If         The number passed in ovecsize should always be a multiple of three.  If
2124         it is not, it is rounded down.         it is not, it is rounded down.
2125    
2126         When a match is successful, information about  captured  substrings  is         When  a  match  is successful, information about captured substrings is
2127         returned  in  pairs  of integers, starting at the beginning of ovector,         returned in pairs of integers, starting at the  beginning  of  ovector,
2128         and continuing up to two-thirds of its length at the  most.  The  first         and  continuing  up  to two-thirds of its length at the most. The first
2129         element  of  each pair is set to the byte offset of the first character         element of each pair is set to the byte offset of the  first  character
2130         in a substring, and the second is set to the byte offset of  the  first         in  a  substring, and the second is set to the byte offset of the first
2131         character  after  the end of a substring. Note: these values are always         character after the end of a substring. Note: these values  are  always
2132         byte offsets, even in UTF-8 mode. They are not character counts.         byte offsets, even in UTF-8 mode. They are not character counts.
2133    
2134         The first pair of integers, ovector[0]  and  ovector[1],  identify  the         The  first  pair  of  integers, ovector[0] and ovector[1], identify the
2135         portion  of  the subject string matched by the entire pattern. The next         portion of the subject string matched by the entire pattern.  The  next
2136         pair is used for the first capturing subpattern, and so on.  The  value         pair  is  used for the first capturing subpattern, and so on. The value
2137         returned by pcre_exec() is one more than the highest numbered pair that         returned by pcre_exec() is one more than the highest numbered pair that
2138         has been set.  For example, if two substrings have been  captured,  the         has  been  set.  For example, if two substrings have been captured, the
2139         returned  value is 3. If there are no capturing subpatterns, the return         returned value is 3. If there are no capturing subpatterns, the  return
2140         value from a successful match is 1, indicating that just the first pair         value from a successful match is 1, indicating that just the first pair
2141         of offsets has been set.         of offsets has been set.
2142    
2143         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
2144         of the string that it matched that is returned.         of the string that it matched that is returned.
2145    
2146         If the vector is too small to hold all the captured substring  offsets,         If  the vector is too small to hold all the captured substring offsets,
2147         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
2148         function returns a value of zero. If the substring offsets are  not  of         function  returns  a value of zero. If the substring offsets are not of
2149         interest,  pcre_exec()  may  be  called with ovector passed as NULL and         interest, pcre_exec() may be called with ovector  passed  as  NULL  and
2150         ovecsize as zero. However, if the pattern contains back references  and         ovecsize  as zero. However, if the pattern contains back references and
2151         the  ovector is not big enough to remember the related substrings, PCRE         the ovector is not big enough to remember the related substrings,  PCRE
2152         has to get additional memory for use during matching. Thus it  is  usu-         has  to  get additional memory for use during matching. Thus it is usu-
2153         ally advisable to supply an ovector.         ally advisable to supply an ovector.
2154    
2155         The  pcre_info()  function  can  be used to find out how many capturing         The pcre_info() function can be used to find  out  how  many  capturing
2156         subpatterns there are in a compiled  pattern.  The  smallest  size  for         subpatterns  there  are  in  a  compiled pattern. The smallest size for
2157         ovector  that  will allow for n captured substrings, in addition to the         ovector that will allow for n captured substrings, in addition  to  the
2158         offsets of the substring matched by the whole pattern, is (n+1)*3.         offsets of the substring matched by the whole pattern, is (n+1)*3.
2159    
2160         It is possible for capturing subpattern number n+1 to match  some  part         It  is  possible for capturing subpattern number n+1 to match some part
2161         of the subject when subpattern n has not been used at all. For example,         of the subject when subpattern n has not been used at all. For example,
2162         if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the         if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
2163         return from the function is 4, and subpatterns 1 and 3 are matched, but         return from the function is 4, and subpatterns 1 and 3 are matched, but
2164         2 is not. When this happens, both values in  the  offset  pairs  corre-         2  is  not.  When  this happens, both values in the offset pairs corre-
2165         sponding to unused subpatterns are set to -1.         sponding to unused subpatterns are set to -1.
2166    
2167         Offset  values  that correspond to unused subpatterns at the end of the         Offset values that correspond to unused subpatterns at the end  of  the
2168         expression are also set to -1. For example,  if  the  string  "abc"  is         expression  are  also  set  to  -1. For example, if the string "abc" is
2169         matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not         matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
2170         matched. The return from the function is 2, because  the  highest  used         matched.  The  return  from the function is 2, because the highest used
2171         capturing subpattern number is 1. However, you can refer to the offsets         capturing subpattern number is 1. However, you can refer to the offsets
2172         for the second and third capturing subpatterns if  you  wish  (assuming         for  the  second  and third capturing subpatterns if you wish (assuming
2173         the vector is large enough, of course).         the vector is large enough, of course).
2174    
2175         Some  convenience  functions  are  provided for extracting the captured         Some convenience functions are provided  for  extracting  the  captured
2176         substrings as separate strings. These are described below.         substrings as separate strings. These are described below.
2177    
2178     Error return values from pcre_exec()     Error return values from pcre_exec()
2179    
2180         If pcre_exec() fails, it returns a negative number. The  following  are         If  pcre_exec()  fails, it returns a negative number. The following are
2181         defined in the header file:         defined in the header file:
2182    
2183           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 2184  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2186  MATCHING A PATTERN: THE TRADITIONAL FUNC
2186    
2187           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
2188    
2189         Either  code  or  subject  was  passed as NULL, or ovector was NULL and         Either code or subject was passed as NULL,  or  ovector  was  NULL  and
2190         ovecsize was not zero.         ovecsize was not zero.
2191    
2192           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 2193  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2195  MATCHING A PATTERN: THE TRADITIONAL FUNC
2195    
2196           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
2197    
2198         PCRE stores a 4-byte "magic number" at the start of the compiled  code,         PCRE  stores a 4-byte "magic number" at the start of the compiled code,
2199         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
2200         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
2201         an  environment  with the other endianness. This is the error that PCRE         an environment with the other endianness. This is the error  that  PCRE
2202         gives when the magic number is not present.         gives when the magic number is not present.
2203    
2204           PCRE_ERROR_UNKNOWN_OPCODE (-5)           PCRE_ERROR_UNKNOWN_OPCODE (-5)
2205    
2206         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
2207         compiled  pattern.  This  error  could be caused by a bug in PCRE or by         compiled pattern. This error could be caused by a bug  in  PCRE  or  by
2208         overwriting of the compiled pattern.         overwriting of the compiled pattern.
2209    
2210           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2211    
2212         If a pattern contains back references, but the ovector that  is  passed         If  a  pattern contains back references, but the ovector that is passed
2213         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
2214         PCRE gets a block of memory at the start of matching to  use  for  this         PCRE  gets  a  block of memory at the start of matching to use for this
2215         purpose.  If the call via pcre_malloc() fails, this error is given. The         purpose. If the call via pcre_malloc() fails, this error is given.  The
2216         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
2217    
2218           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2219    
2220         This error is used by the pcre_copy_substring(),  pcre_get_substring(),         This  error is used by the pcre_copy_substring(), pcre_get_substring(),
2221         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
2222         returned by pcre_exec().         returned by pcre_exec().
2223    
2224           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
2225    
2226         The backtracking limit, as specified by  the  match_limit  field  in  a         The  backtracking  limit,  as  specified  by the match_limit field in a
2227         pcre_extra  structure  (or  defaulted) was reached. See the description         pcre_extra structure (or defaulted) was reached.  See  the  description
2228         above.         above.
2229    
2230           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
2231    
2232         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
2233         use  by  callout functions that want to yield a distinctive error code.         use by callout functions that want to yield a distinctive  error  code.
2234         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
2235    
2236           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
2237    
2238         A string that contains an invalid UTF-8 byte sequence was passed  as  a         A  string  that contains an invalid UTF-8 byte sequence was passed as a
2239         subject.         subject.
2240    
2241           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
2242    
2243         The UTF-8 byte sequence that was passed as a subject was valid, but the         The UTF-8 byte sequence that was passed as a subject was valid, but the
2244         value of startoffset did not point to the beginning of a UTF-8  charac-         value  of startoffset did not point to the beginning of a UTF-8 charac-
2245         ter.         ter.
2246    
2247           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
2248    
2249         The  subject  string did not match, but it did match partially. See the         The subject string did not match, but it did match partially.  See  the
2250         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
2251    
2252           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
2253    
2254         The PCRE_PARTIAL option was used with  a  compiled  pattern  containing         The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
2255         items  that are not supported for partial matching. See the pcrepartial         items that are not supported for partial matching. See the  pcrepartial
2256         documentation for details of partial matching.         documentation for details of partial matching.
2257    
2258           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
2259    
2260         An unexpected internal error has occurred. This error could  be  caused         An  unexpected  internal error has occurred. This error could be caused
2261         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
2262    
2263           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
# Line 2265  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2267  MATCHING A PATTERN: THE TRADITIONAL FUNC
2267           PCRE_ERROR_RECURSIONLIMIT (-21)           PCRE_ERROR_RECURSIONLIMIT (-21)
2268    
2269         The internal recursion limit, as specified by the match_limit_recursion         The internal recursion limit, as specified by the match_limit_recursion
2270         field in a pcre_extra structure (or defaulted)  was  reached.  See  the         field  in  a  pcre_extra  structure (or defaulted) was reached. See the
2271         description above.         description above.
2272    
2273           PCRE_ERROR_BADNEWLINE     (-23)           PCRE_ERROR_BADNEWLINE     (-23)
# Line 2288  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 2290  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
2290         int pcre_get_substring_list(const char *subject,         int pcre_get_substring_list(const char *subject,
2291              int *ovector, int stringcount, const char ***listptr);              int *ovector, int stringcount, const char ***listptr);
2292    
2293         Captured  substrings  can  be  accessed  directly  by using the offsets         Captured substrings can be  accessed  directly  by  using  the  offsets
2294         returned by pcre_exec() in  ovector.  For  convenience,  the  functions         returned  by  pcre_exec()  in  ovector.  For convenience, the functions
2295         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
2296         string_list() are provided for extracting captured substrings  as  new,         string_list()  are  provided for extracting captured substrings as new,
2297         separate,  zero-terminated strings. These functions identify substrings         separate, zero-terminated strings. These functions identify  substrings
2298         by number. The next section describes functions  for  extracting  named         by  number.  The  next section describes functions for extracting named
2299         substrings.         substrings.
2300    
2301         A  substring that contains a binary zero is correctly extracted and has         A substring that contains a binary zero is correctly extracted and  has
2302         a further zero added on the end, but the result is not, of course, a  C         a  further zero added on the end, but the result is not, of course, a C
2303         string.   However,  you  can  process such a string by referring to the         string.  However, you can process such a string  by  referring  to  the
2304         length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-         length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-
2305         string().  Unfortunately, the interface to pcre_get_substring_list() is         string().  Unfortunately, the interface to pcre_get_substring_list() is
2306         not adequate for handling strings containing binary zeros, because  the         not  adequate for handling strings containing binary zeros, because the
2307         end of the final string is not independently indicated.         end of the final string is not independently indicated.
2308    
2309         The  first  three  arguments  are the same for all three of these func-         The first three arguments are the same for all  three  of  these  func-
2310         tions: subject is the subject string that has  just  been  successfully         tions:  subject  is  the subject string that has just been successfully
2311         matched, ovector is a pointer to the vector of integer offsets that was         matched, ovector is a pointer to the vector of integer offsets that was
2312         passed to pcre_exec(), and stringcount is the number of substrings that         passed to pcre_exec(), and stringcount is the number of substrings that
2313         were  captured  by  the match, including the substring that matched the         were captured by the match, including the substring  that  matched  the
2314         entire regular expression. This is the value returned by pcre_exec() if         entire regular expression. This is the value returned by pcre_exec() if
2315         it  is greater than zero. If pcre_exec() returned zero, indicating that         it is greater than zero. If pcre_exec() returned zero, indicating  that
2316         it ran out of space in ovector, the value passed as stringcount  should         it  ran out of space in ovector, the value passed as stringcount should
2317         be the number of elements in the vector divided by three.         be the number of elements in the vector divided by three.
2318    
2319         The  functions pcre_copy_substring() and pcre_get_substring() extract a         The functions pcre_copy_substring() and pcre_get_substring() extract  a
2320         single substring, whose number is given as  stringnumber.  A  value  of         single  substring,  whose  number  is given as stringnumber. A value of
2321         zero  extracts  the  substring that matched the entire pattern, whereas         zero extracts the substring that matched the  entire  pattern,  whereas
2322         higher values  extract  the  captured  substrings.  For  pcre_copy_sub-         higher  values  extract  the  captured  substrings.  For pcre_copy_sub-
2323         string(),  the  string  is  placed  in buffer, whose length is given by         string(), the string is placed in buffer,  whose  length  is  given  by
2324         buffersize, while for pcre_get_substring() a new  block  of  memory  is         buffersize,  while  for  pcre_get_substring()  a new block of memory is
2325         obtained  via  pcre_malloc,  and its address is returned via stringptr.         obtained via pcre_malloc, and its address is  returned  via  stringptr.
2326         The yield of the function is the length of the  string,  not  including         The  yield  of  the function is the length of the string, not including
2327         the terminating zero, or one of these error codes:         the terminating zero, or one of these error codes:
2328    
2329           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2330    
2331         The  buffer  was too small for pcre_copy_substring(), or the attempt to         The buffer was too small for pcre_copy_substring(), or the  attempt  to
2332         get memory failed for pcre_get_substring().         get memory failed for pcre_get_substring().
2333    
2334           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2335    
2336         There is no substring whose number is stringnumber.         There is no substring whose number is stringnumber.
2337    
2338         The pcre_get_substring_list()  function  extracts  all  available  sub-         The  pcre_get_substring_list()  function  extracts  all  available sub-
2339         strings  and  builds  a list of pointers to them. All this is done in a         strings and builds a list of pointers to them. All this is  done  in  a
2340         single block of memory that is obtained via pcre_malloc. The address of         single block of memory that is obtained via pcre_malloc. The address of
2341         the  memory  block  is returned via listptr, which is also the start of         the memory block is returned via listptr, which is also  the  start  of
2342         the list of string pointers. The end of the list is marked  by  a  NULL         the  list  of  string pointers. The end of the list is marked by a NULL
2343         pointer.  The  yield  of  the function is zero if all went well, or the         pointer. The yield of the function is zero if all  went  well,  or  the
2344         error code         error code
2345    
2346           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2347    
2348         if the attempt to get the memory block failed.         if the attempt to get the memory block failed.
2349    
2350         When any of these functions encounter a substring that is unset,  which         When  any of these functions encounter a substring that is unset, which
2351         can  happen  when  capturing subpattern number n+1 matches some part of         can happen when capturing subpattern number n+1 matches  some  part  of
2352         the subject, but subpattern n has not been used at all, they return  an         the  subject, but subpattern n has not been used at all, they return an
2353         empty string. This can be distinguished from a genuine zero-length sub-         empty string. This can be distinguished from a genuine zero-length sub-
2354         string by inspecting the appropriate offset in ovector, which is  nega-         string  by inspecting the appropriate offset in ovector, which is nega-
2355         tive for unset substrings.         tive for unset substrings.
2356    
2357         The  two convenience functions pcre_free_substring() and pcre_free_sub-         The two convenience functions pcre_free_substring() and  pcre_free_sub-
2358         string_list() can be used to free the memory  returned  by  a  previous         string_list()  can  be  used  to free the memory returned by a previous
2359         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
2360         tively. They do nothing more than  call  the  function  pointed  to  by         tively.  They  do  nothing  more  than  call the function pointed to by
2361         pcre_free,  which  of course could be called directly from a C program.         pcre_free, which of course could be called directly from a  C  program.
2362         However, PCRE is used in some situations where it is linked via a  spe-         However,  PCRE is used in some situations where it is linked via a spe-
2363         cial   interface  to  another  programming  language  that  cannot  use         cial  interface  to  another  programming  language  that  cannot   use
2364         pcre_free directly; it is for these cases that the functions  are  pro-         pcre_free  directly;  it is for these cases that the functions are pro-
2365         vided.         vided.
2366    
2367    
# Line 2378  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2380  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2380              int stringcount, const char *stringname,              int stringcount, const char *stringname,
2381              const char **stringptr);              const char **stringptr);
2382    
2383         To  extract a substring by name, you first have to find associated num-         To extract a substring by name, you first have to find associated  num-
2384         ber.  For example, for this pattern         ber.  For example, for this pattern
2385    
2386           (a+)b(?<xxx>\d+)...           (a+)b(?<xxx>\d+)...
# Line 2387  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2389  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2389         be unique (PCRE_DUPNAMES was not set), you can find the number from the         be unique (PCRE_DUPNAMES was not set), you can find the number from the
2390         name by calling pcre_get_stringnumber(). The first argument is the com-         name by calling pcre_get_stringnumber(). The first argument is the com-
2391         piled pattern, and the second is the name. The yield of the function is         piled pattern, and the second is the name. The yield of the function is
2392         the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no         the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no
2393         subpattern of that name.         subpattern of that name.
2394    
2395         Given the number, you can extract the substring directly, or use one of         Given the number, you can extract the substring directly, or use one of
2396         the functions described in the previous section. For convenience, there         the functions described in the previous section. For convenience, there
2397         are also two functions that do the whole job.         are also two functions that do the whole job.
2398    
2399         Most    of    the    arguments   of   pcre_copy_named_substring()   and         Most   of   the   arguments    of    pcre_copy_named_substring()    and
2400         pcre_get_named_substring() are the same  as  those  for  the  similarly         pcre_get_named_substring()  are  the  same  as  those for the similarly
2401         named  functions  that extract by number. As these are described in the         named functions that extract by number. As these are described  in  the
2402         previous section, they are not re-described here. There  are  just  two         previous  section,  they  are not re-described here. There are just two
2403         differences:         differences:
2404    
2405         First,  instead  of a substring number, a substring name is given. Sec-         First, instead of a substring number, a substring name is  given.  Sec-
2406         ond, there is an extra argument, given at the start, which is a pointer         ond, there is an extra argument, given at the start, which is a pointer
2407         to  the compiled pattern. This is needed in order to gain access to the         to the compiled pattern. This is needed in order to gain access to  the
2408         name-to-number translation table.         name-to-number translation table.
2409    
2410         These functions call pcre_get_stringnumber(), and if it succeeds,  they         These  functions call pcre_get_stringnumber(), and if it succeeds, they
2411         then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
2412         ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
2413         behaviour may not be what you want (see the next section).         behaviour may not be what you want (see the next section).
2414    
2415         Warning:  If the pattern uses the "(?|" feature to set up multiple sub-         Warning: If the pattern uses the "(?|" feature to set up multiple  sub-
2416         patterns with the same number, you  cannot  use  names  to  distinguish         patterns  with  the  same  number,  you cannot use names to distinguish
2417         them, because names are not included in the compiled code. The matching         them, because names are not included in the compiled code. The matching
2418         process uses only numbers.         process uses only numbers.
2419    
# Line 2421  DUPLICATE SUBPATTERN NAMES Line 2423  DUPLICATE SUBPATTERN NAMES
2423         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
2424              const char *name, char **first, char **last);              const char *name, char **first, char **last);
2425    
2426         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
2427         subpatterns  are  not  required  to  be unique. Normally, patterns with         subpatterns are not required to  be  unique.  Normally,  patterns  with
2428         duplicate names are such that in any one match, only one of  the  named         duplicate  names  are such that in any one match, only one of the named
2429         subpatterns  participates. An example is shown in the pcrepattern docu-         subpatterns participates. An example is shown in the pcrepattern  docu-
2430         mentation.         mentation.
2431    
2432         When   duplicates   are   present,   pcre_copy_named_substring()    and         When    duplicates   are   present,   pcre_copy_named_substring()   and
2433         pcre_get_named_substring()  return the first substring corresponding to         pcre_get_named_substring() return the first substring corresponding  to
2434         the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING         the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING
2435         (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()         (-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()
2436         function returns one of the numbers that are associated with the  name,         function  returns one of the numbers that are associated with the name,
2437         but it is not defined which it is.         but it is not defined which it is.
2438    
2439         If  you want to get full details of all captured substrings for a given         If you want to get full details of all captured substrings for a  given
2440         name, you must use  the  pcre_get_stringtable_entries()  function.  The         name,  you  must  use  the pcre_get_stringtable_entries() function. The
2441         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
2442         third and fourth are pointers to variables which  are  updated  by  the         third  and  fourth  are  pointers to variables which are updated by the
2443         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
2444         the name-to-number table  for  the  given  name.  The  function  itself         the  name-to-number  table  for  the  given  name.  The function itself
2445         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
2446         there are none. The format of the table is described above in the  sec-         there  are none. The format of the table is described above in the sec-
2447         tion  entitled  Information  about  a  pattern.  Given all the relevant         tion entitled Information about a  pattern.   Given  all  the  relevant
2448         entries for the name, you can extract each of their numbers, and  hence         entries  for the name, you can extract each of their numbers, and hence
2449         the captured data, if any.         the captured data, if any.
2450    
2451    
2452  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
2453    
2454         The  traditional  matching  function  uses a similar algorithm to Perl,         The traditional matching function uses a  similar  algorithm  to  Perl,
2455         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
2456         the  subject.  If you want to find all possible matches, or the longest         the subject. If you want to find all possible matches, or  the  longest
2457         possible match, consider using the alternative matching  function  (see         possible  match,  consider using the alternative matching function (see
2458         below)  instead.  If you cannot use the alternative function, but still         below) instead. If you cannot use the alternative function,  but  still
2459         need to find all possible matches, you can kludge it up by  making  use         need  to  find all possible matches, you can kludge it up by making use
2460         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
2461         tation.         tation.
2462    
2463         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
2464         tern.   When your callout function is called, extract and save the cur-         tern.  When your callout function is called, extract and save the  cur-
2465         rent matched substring. Then return  1,  which  forces  pcre_exec()  to         rent  matched  substring.  Then  return  1, which forces pcre_exec() to
2466         backtrack  and  try other alternatives. Ultimately, when it runs out of         backtrack and try other alternatives. Ultimately, when it runs  out  of
2467         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
2468    
2469    
# Line 2472  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2474  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2474              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
2475              int *workspace, int wscount);              int *workspace, int wscount);
2476    
2477         The function pcre_dfa_exec()  is  called  to  match  a  subject  string         The  function  pcre_dfa_exec()  is  called  to  match  a subject string
2478         against  a  compiled pattern, using a matching algorithm that scans the         against a compiled pattern, using a matching algorithm that  scans  the
2479         subject string just once, and does not backtrack.  This  has  different         subject  string  just  once, and does not backtrack. This has different
2480         characteristics  to  the  normal  algorithm, and is not compatible with         characteristics to the normal algorithm, and  is  not  compatible  with
2481         Perl. Some of the features of PCRE patterns are not  supported.  Never-         Perl.  Some  of the features of PCRE patterns are not supported. Never-
2482         theless,  there are times when this kind of matching can be useful. For         theless, there are times when this kind of matching can be useful.  For
2483         a discussion of the two matching algorithms, see the pcrematching docu-         a discussion of the two matching algorithms, see the pcrematching docu-
2484         mentation.         mentation.
2485    
2486         The  arguments  for  the  pcre_dfa_exec()  function are the same as for         The arguments for the pcre_dfa_exec() function  are  the  same  as  for
2487         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
2488         ent  way,  and  this is described below. The other common arguments are         ent way, and this is described below. The other  common  arguments  are
2489         used in the same way as for pcre_exec(), so their  description  is  not         used  in  the  same way as for pcre_exec(), so their description is not
2490         repeated here.         repeated here.
2491    
2492         The  two  additional  arguments provide workspace for the function. The         The two additional arguments provide workspace for  the  function.  The
2493         workspace vector should contain at least 20 elements. It  is  used  for         workspace  vector  should  contain at least 20 elements. It is used for
2494         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
2495         workspace will be needed for patterns and subjects where  there  are  a         workspace  will  be  needed for patterns and subjects where there are a
2496         lot of potential matches.         lot of potential matches.
2497    
2498         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 2512  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2514  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2514    
2515     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
2516    
2517         The  unused  bits  of  the options argument for pcre_dfa_exec() must be         The unused bits of the options argument  for  pcre_dfa_exec()  must  be
2518         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
2519         LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,         LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,
2520         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
2521         three of these are the same as for pcre_exec(), so their description is         three of these are the same as for pcre_exec(), so their description is
2522         not repeated here.         not repeated here.
2523    
2524           PCRE_PARTIAL           PCRE_PARTIAL
2525    
2526         This has the same general effect as it does for  pcre_exec(),  but  the         This  has  the  same general effect as it does for pcre_exec(), but the
2527         details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for         details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for
2528         pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into         pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into
2529         PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have         PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have
2530         been no complete matches, but there is still at least one matching pos-         been no complete matches, but there is still at least one matching pos-
2531         sibility.  The portion of the string that provided the partial match is         sibility. The portion of the string that provided the partial match  is
2532         set as the first matching string.         set as the first matching string.
2533    
2534           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
2535    
2536         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
2537         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
2538         tive algorithm works, this is necessarily the shortest  possible  match         tive  algorithm  works, this is necessarily the shortest possible match
2539         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
2540    
2541           PCRE_DFA_RESTART           PCRE_DFA_RESTART
2542    
2543         When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and         When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and
2544         returns a partial match, it is possible to call it  again,  with  addi-         returns  a  partial  match, it is possible to call it again, with addi-
2545         tional  subject  characters,  and have it continue with the same match.         tional subject characters, and have it continue with  the  same  match.
2546         The PCRE_DFA_RESTART option requests this action; when it is  set,  the         The  PCRE_DFA_RESTART  option requests this action; when it is set, the
2547         workspace  and wscount options must reference the same vector as before         workspace and wscount options must reference the same vector as  before
2548         because data about the match so far is left in  them  after  a  partial         because  data  about  the  match so far is left in them after a partial
2549         match.  There  is  more  discussion of this facility in the pcrepartial         match. There is more discussion of this  facility  in  the  pcrepartial
2550         documentation.         documentation.
2551    
2552     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
2553    
2554         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
2555         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
2556         of the function start at the same point in  the  subject.  The  shorter         of  the  function  start  at the same point in the subject. The shorter
2557         matches  are all initial substrings of the longer matches. For example,         matches are all initial substrings of the longer matches. For  example,
2558         if the pattern         if the pattern
2559    
2560           <.*>           <.*>
# Line 2567  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2569  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2569           <something> <something else>           <something> <something else>
2570           <something> <something else> <something further>           <something> <something else> <something further>
2571    
2572         On success, the yield of the function is a number  greater  than  zero,         On  success,  the  yield of the function is a number greater than zero,
2573         which  is  the  number of matched substrings. The substrings themselves         which is the number of matched substrings.  The  substrings  themselves
2574         are returned in ovector. Each string uses two elements;  the  first  is         are  returned  in  ovector. Each string uses two elements; the first is
2575         the  offset  to  the start, and the second is the offset to the end. In         the offset to the start, and the second is the offset to  the  end.  In
2576         fact, all the strings have the same start  offset.  (Space  could  have         fact,  all  the  strings  have the same start offset. (Space could have
2577         been  saved by giving this only once, but it was decided to retain some         been saved by giving this only once, but it was decided to retain  some
2578         compatibility with the way pcre_exec() returns data,  even  though  the         compatibility  with  the  way pcre_exec() returns data, even though the
2579         meaning of the strings is different.)         meaning of the strings is different.)
2580    
2581         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
2582         est matching string is given first. If there were too many  matches  to         est  matching  string is given first. If there were too many matches to
2583         fit  into ovector, the yield of the function is zero, and the vector is         fit into ovector, the yield of the function is zero, and the vector  is
2584         filled with the longest matches.         filled with the longest matches.
2585    
2586     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
2587    
2588         The pcre_dfa_exec() function returns a negative number when  it  fails.         The  pcre_dfa_exec()  function returns a negative number when it fails.
2589         Many  of  the  errors  are  the  same as for pcre_exec(), and these are         Many of the errors are the same  as  for  pcre_exec(),  and  these  are
2590         described above.  There are in addition the following errors  that  are         described  above.   There are in addition the following errors that are
2591         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
2592    
2593           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
2594    
2595         This  return is given if pcre_dfa_exec() encounters an item in the pat-         This return is given if pcre_dfa_exec() encounters an item in the  pat-
2596         tern that it does not support, for instance, the use of \C  or  a  back         tern  that  it  does not support, for instance, the use of \C or a back
2597         reference.         reference.
2598    
2599           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
2600    
2601         This  return  is  given  if pcre_dfa_exec() encounters a condition item         This return is given if pcre_dfa_exec()  encounters  a  condition  item
2602         that uses a back reference for the condition, or a test  for  recursion         that  uses  a back reference for the condition, or a test for recursion
2603         in a specific group. These are not supported.         in a specific group. These are not supported.
2604    
2605           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
2606    
2607         This  return  is given if pcre_dfa_exec() is called with an extra block         This return is given if pcre_dfa_exec() is called with an  extra  block
2608         that contains a setting of the match_limit field. This is not supported         that contains a setting of the match_limit field. This is not supported
2609         (it is meaningless).         (it is meaningless).
2610    
2611           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
2612    
2613         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
2614         workspace vector.         workspace vector.
2615    
2616           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
2617    
2618         When a recursive subpattern is processed, the matching  function  calls         When  a  recursive subpattern is processed, the matching function calls
2619         itself  recursively,  using  private vectors for ovector and workspace.         itself recursively, using private vectors for  ovector  and  workspace.
2620         This error is given if the output vector  is  not  large  enough.  This         This  error  is  given  if  the output vector is not large enough. This
2621         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
2622    
2623    
2624  SEE ALSO  SEE ALSO
2625    
2626         pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-         pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
2627         tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).         tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
2628    
2629    
# Line 2634  AUTHOR Line 2636  AUTHOR
2636    
2637  REVISION  REVISION
2638    
2639         Last updated: 17 March 2009         Last updated: 11 April 2009
2640         Copyright (c) 1997-2009 University of Cambridge.         Copyright (c) 1997-2009 University of Cambridge.
2641  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2642    
# Line 2983  PCRE REGULAR EXPRESSION DETAILS Line 2985  PCRE REGULAR EXPRESSION DETAILS
2985         The original operation of PCRE was on strings of  one-byte  characters.         The original operation of PCRE was on strings of  one-byte  characters.
2986         However,  there is now also support for UTF-8 character strings. To use         However,  there is now also support for UTF-8 character strings. To use
2987         this, you must build PCRE to  include  UTF-8  support,  and  then  call         this, you must build PCRE to  include  UTF-8  support,  and  then  call
2988         pcre_compile()  with  the  PCRE_UTF8  option.  How this affects pattern         pcre_compile()  with  the  PCRE_UTF8  option.  There  is also a special
2989         matching is mentioned in several places below. There is also a  summary         sequence that can be given at the start of a pattern:
2990         of  UTF-8  features  in  the  section on UTF-8 support in the main pcre  
2991         page.           (*UTF8)
2992    
2993           Starting a pattern with this sequence  is  equivalent  to  setting  the
2994           PCRE_UTF8  option.  This  feature  is  not Perl-compatible. How setting
2995           UTF-8 mode affects pattern matching  is  mentioned  in  several  places
2996           below.  There  is  also  a  summary of UTF-8 features in the section on
2997           UTF-8 support in the main pcre page.
2998    
2999         The remainder of this document discusses the  patterns  that  are  sup-         The remainder of this document discusses the  patterns  that  are  sup-
3000         ported  by  PCRE when its main matching function, pcre_exec(), is used.         ported  by  PCRE when its main matching function, pcre_exec(), is used.
# Line 3832  INTERNAL OPTION SETTING Line 3840  INTERNAL OPTION SETTING
3840         can  be changed in the same way as the Perl-compatible options by using         can  be changed in the same way as the Perl-compatible options by using
3841         the characters J, U and X respectively.         the characters J, U and X respectively.
3842    
3843         When an option change occurs at top level (that is, not inside  subpat-         When one of these option changes occurs at  top  level  (that  is,  not
3844         tern  parentheses),  the change applies to the remainder of the pattern         inside  subpattern parentheses), the change applies to the remainder of
3845         that follows.  If the change is placed right at the start of a pattern,         the pattern that follows. If the change is placed right at the start of
3846         PCRE extracts it into the global options (and it will therefore show up         a pattern, PCRE extracts it into the global options (and it will there-
3847         in data extracted by the pcre_fullinfo() function).         fore show up in data extracted by the pcre_fullinfo() function).
3848    
3849         An option change within a subpattern (see below for  a  description  of         An option change within a subpattern (see below for  a  description  of
3850         subpatterns) affects only that part of the current pattern that follows         subpatterns) affects only that part of the current pattern that follows
# Line 3859  INTERNAL OPTION SETTING Line 3867  INTERNAL OPTION SETTING
3867    
3868         Note:  There  are  other  PCRE-specific  options that can be set by the         Note:  There  are  other  PCRE-specific  options that can be set by the
3869         application when the compile or match functions  are  called.  In  some         application when the compile or match functions  are  called.  In  some
3870         cases  the  pattern  can  contain special leading sequences to override         cases the pattern can contain special leading sequences such as (*CRLF)
3871         what the application has set or what has been  defaulted.  Details  are         to override what the application has set or what  has  been  defaulted.
3872         given in the section entitled "Newline sequences" above.         Details  are  given  in the section entitled "Newline sequences" above.
3873           There is also the (*UTF8) leading sequence that  can  be  used  to  set
3874           UTF-8 mode; this is equivalent to setting the PCRE_UTF8 option.
3875    
3876    
3877  SUBPATTERNS  SUBPATTERNS
# Line 5021  AUTHOR Line 5031  AUTHOR
5031    
5032  REVISION  REVISION
5033    
5034         Last updated: 18 March 2009         Last updated: 11 April 2009
5035         Copyright (c) 1997-2009 University of Cambridge.         Copyright (c) 1997-2009 University of Cambridge.
5036  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5037    
# Line 5134  GENERAL CATEGORY PROPERTY CODES FOR \p a Line 5144  GENERAL CATEGORY PROPERTY CODES FOR \p a
5144  SCRIPT NAMES FOR \p AND \P  SCRIPT NAMES FOR \p AND \P
5145    
5146         Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,         Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
5147         Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,         Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cu-
5148         Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,         neiform,  Cypriot,  Cyrillic,  Deseret, Devanagari, Ethiopic, Georgian,
5149         Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-         Glagolitic, Gothic, Greek, Gujarati, Gurmukhi,  Han,  Hangul,  Hanunoo,
5150         gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,         Hebrew,  Hiragana,  Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi,
5151         Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,         Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian,  Malayalam,
5152         Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,         Mongolian,  Myanmar,  New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian,
5153         Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,         Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurash-
5154         Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.         tra,  Shavian,  Sinhala,  Sudanese, Syloti_Nagri, Syriac, Tagalog, Tag-
5155           banwa,  Tai_Le,  Tamil,  Telugu,  Thaana,  Thai,   Tibetan,   Tifinagh,
5156           Ugaritic, Vai, Yi.
5157    
5158    
5159  CHARACTER CLASSES  CHARACTER CLASSES
# Line 5193  QUANTIFIERS Line 5205  QUANTIFIERS
5205    
5206  ANCHORS AND SIMPLE ASSERTIONS  ANCHORS AND SIMPLE ASSERTIONS
5207    
5208           \b          word boundary           \b          word boundary (only ASCII letters recognized)
5209           \B          not a word boundary           \B          not a word boundary
5210           ^           start of subject           ^           start of subject
5211                        also after internal newline in multiline mode                        also after internal newline in multiline mode
# Line 5219  ALTERNATION Line 5231  ALTERNATION
5231    
5232  CAPTURING  CAPTURING
5233    
5234           (...)          capturing group           (...)           capturing group
5235           (?<name>...)   named capturing group (Perl)           (?<name>...)    named capturing group (Perl)
5236           (?'name'...)   named capturing group (Perl)           (?'name'...)    named capturing group (Perl)
5237           (?P<name>...)  named capturing group (Python)           (?P<name>...)   named capturing group (Python)
5238           (?:...)        non-capturing group           (?:...)         non-capturing group
5239           (?|...)        non-capturing group; reset group numbers for           (?|...)         non-capturing group; reset group numbers for
5240                           capturing groups in each alternative                            capturing groups in each alternative
5241    
5242    
5243  ATOMIC GROUPS  ATOMIC GROUPS
5244    
5245           (?>...)        atomic, non-capturing group           (?>...)         atomic, non-capturing group
5246    
5247    
5248  COMMENT  COMMENT
5249    
5250           (?#....)       comment (not nestable)           (?#....)        comment (not nestable)
5251    
5252    
5253  OPTION SETTING  OPTION SETTING
5254    
5255           (?i)           caseless           (?i)            caseless
5256           (?J)           allow duplicate names           (?J)            allow duplicate names
5257           (?m)           multiline           (?m)            multiline
5258           (?s)           single line (dotall)           (?s)            single line (dotall)
5259           (?U)           default ungreedy (lazy)           (?U)            default ungreedy (lazy)
5260           (?x)           extended (ignore white space)           (?x)            extended (ignore white space)
5261           (?-...)        unset option(s)           (?-...)         unset option(s)
5262    
5263           The following is recognized only at the start of a pattern or after one
5264           of the newline-setting options with similar syntax:
5265    
5266             (*UTF8)         set UTF-8 mode
5267    
5268    
5269  LOOKAHEAD AND LOOKBEHIND ASSERTIONS  LOOKAHEAD AND LOOKBEHIND ASSERTIONS
5270    
5271           (?=...)        positive look ahead           (?=...)         positive look ahead
5272           (?!...)        negative look ahead           (?!...)         negative look ahead
5273           (?<=...)       positive look behind           (?<=...)        positive look behind
5274           (?<!...)       negative look behind           (?<!...)        negative look behind
5275    
5276         Each top-level branch of a look behind must be of a fixed length.         Each top-level branch of a look behind must be of a fixed length.
5277    
5278    
5279  BACKREFERENCES  BACKREFERENCES
5280    
5281           \n             reference by number (can be ambiguous)           \n              reference by number (can be ambiguous)
5282           \gn            reference by number           \gn             reference by number
5283           \g{n}          reference by number           \g{n}           reference by number
5284           \g{-n}         relative reference by number           \g{-n}          relative reference by number
5285           \k<name>       reference by name (Perl)           \k<name>        reference by name (Perl)
5286           \k'name'       reference by name (Perl)           \k'name'        reference by name (Perl)
5287           \g{name}       reference by name (Perl)           \g{name}        reference by name (Perl)
5288           \k{name}       reference by name (.NET)           \k{name}        reference by name (.NET)
5289           (?P=name)      reference by name (Python)           (?P=name)       reference by name (Python)
5290    
5291    
5292  SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)  SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
5293    
5294           (?R)           recurse whole pattern           (?R)            recurse whole pattern
5295           (?n)           call subpattern by absolute number           (?n)            call subpattern by absolute number
5296           (?+n)          call subpattern by relative number           (?+n)           call subpattern by relative number
5297           (?-n)          call subpattern by relative number           (?-n)           call subpattern by relative number
5298           (?&name)       call subpattern by name (Perl)           (?&name)        call subpattern by name (Perl)
5299           (?P>name)      call subpattern by name (Python)           (?P>name)       call subpattern by name (Python)
5300           \g<name>       call subpattern by name (Oniguruma)           \g<name>        call subpattern by name (Oniguruma)
5301           \g'name'       call subpattern by name (Oniguruma)           \g'name'        call subpattern by name (Oniguruma)
5302           \g<n>          call subpattern by absolute number (Oniguruma)           \g<n>           call subpattern by absolute number (Oniguruma)
5303           \g'n'          call subpattern by absolute number (Oniguruma)           \g'n'           call subpattern by absolute number (Oniguruma)
5304           \g<+n>         call subpattern by relative number (PCRE extension)           \g<+n>          call subpattern by relative number (PCRE extension)
5305           \g'+n'         call subpattern by relative number (PCRE extension)           \g'+n'          call subpattern by relative number (PCRE extension)
5306           \g<-n>         call subpattern by relative number (PCRE extension)           \g<-n>          call subpattern by relative number (PCRE extension)
5307           \g'-n'         call subpattern by relative number (PCRE extension)           \g'-n'          call subpattern by relative number (PCRE extension)
5308    
5309    
5310  CONDITIONAL PATTERNS  CONDITIONAL PATTERNS
# Line 5295  CONDITIONAL PATTERNS Line 5312  CONDITIONAL PATTERNS
5312           (?(condition)yes-pattern)           (?(condition)yes-pattern)
5313           (?(condition)yes-pattern|no-pattern)           (?(condition)yes-pattern|no-pattern)
5314    
5315           (?(n)...       absolute reference condition           (?(n)...        absolute reference condition
5316           (?(+n)...      relative reference condition           (?(+n)...       relative reference condition
5317           (?(-n)...      relative reference condition           (?(-n)...       relative reference condition
5318           (?(<name>)...  named reference condition (Perl)           (?(<name>)...   named reference condition (Perl)
5319           (?('name')...  named reference condition (Perl)           (?('name')...   named reference condition (Perl)
5320           (?(name)...    named reference condition (PCRE)           (?(name)...     named reference condition (PCRE)
5321           (?(R)...       overall recursion condition           (?(R)...        overall recursion condition
5322           (?(Rn)...      specific group recursion condition           (?(Rn)...       specific group recursion condition
5323           (?(R&name)...  specific recursion condition           (?(R&name)...   specific recursion condition
5324           (?(DEFINE)...  define subpattern for reference           (?(DEFINE)...   define subpattern for reference
5325           (?(assert)...  assertion condition           (?(assert)...   assertion condition
5326    
5327    
5328  BACKTRACKING CONTROL  BACKTRACKING CONTROL
5329    
5330         The following act immediately they are reached:         The following act immediately they are reached:
5331    
5332           (*ACCEPT)      force successful match           (*ACCEPT)       force successful match
5333           (*FAIL)        force backtrack; synonym (*F)           (*FAIL)         force backtrack; synonym (*F)
5334    
5335         The following act only when a subsequent match failure causes  a  back-         The  following  act only when a subsequent match failure causes a back-
5336         track to reach them. They all force a match failure, but they differ in         track to reach them. They all force a match failure, but they differ in
5337         what happens afterwards. Those that advance the start-of-match point do         what happens afterwards. Those that advance the start-of-match point do
5338         so only if the pattern is not anchored.         so only if the pattern is not anchored.
5339    
5340           (*COMMIT)      overall failure, no advance of starting point           (*COMMIT)       overall failure, no advance of starting point
5341           (*PRUNE)       advance to next starting character           (*PRUNE)        advance to next starting character
5342           (*SKIP)        advance start to current matching position           (*SKIP)         advance start to current matching position
5343           (*THEN)        local failure, backtrack to next alternation           (*THEN)         local failure, backtrack to next alternation
5344    
5345    
5346  NEWLINE CONVENTIONS  NEWLINE CONVENTIONS
5347    
5348         These  are  recognized only at the very start of the pattern or after a         These are recognized only at the very start of the pattern or  after  a
5349         (*BSR_...) option.         (*BSR_...) or (*UTF8) option.
5350    
5351           (*CR)           (*CR)           carriage return only
5352           (*LF)           (*LF)           linefeed only
5353           (*CRLF)           (*CRLF)         carriage return followed by linefeed
5354           (*ANYCRLF)           (*ANYCRLF)      all three of the above
5355           (*ANY)           (*ANY)          any Unicode newline sequence
5356    
5357    
5358  WHAT \R MATCHES  WHAT \R MATCHES
5359    
5360         These are recognized only at the very start of the pattern or  after  a         These  are  recognized only at the very start of the pattern or after a
5361         (*...) option that sets the newline convention.         (*...) option that sets the newline convention or UTF-8 mode.
5362    
5363           (*BSR_ANYCRLF)           (*BSR_ANYCRLF)  CR, LF, or CRLF
5364           (*BSR_UNICODE)           (*BSR_UNICODE)  any Unicode newline sequence
5365    
5366    
5367  CALLOUTS  CALLOUTS
# Line 5367  AUTHOR Line 5384  AUTHOR
5384    
5385  REVISION  REVISION
5386    
5387         Last updated: 09 April 2008         Last updated: 11 April 2009
5388         Copyright (c) 1997-2008 University of Cambridge.         Copyright (c) 1997-2009 University of Cambridge.
5389  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5390    
5391    

Legend:
Removed from v.406  
changed lines
  Added in v.416

  ViewVC Help
Powered by ViewVC 1.1.5