/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 123 by ph10, Mon Mar 12 15:19:06 2007 UTC revision 185 by ph10, Tue Jun 19 13:39:46 2007 UTC
# Line 72  USER DOCUMENTATION Line 72  USER DOCUMENTATION
72         of searching. The sections are as follows:         of searching. The sections are as follows:
73    
74           pcre              this document           pcre              this document
75             pcre-config       show PCRE installation configuration information
76           pcreapi           details of PCRE's native C API           pcreapi           details of PCRE's native C API
77           pcrebuild         options for building PCRE           pcrebuild         options for building PCRE
78           pcrecallout       details of the callout feature           pcrecallout       details of the callout feature
# Line 196  UTF-8 AND UNICODE PROPERTY SUPPORT Line 197  UTF-8 AND UNICODE PROPERTY SUPPORT
197         8. Similarly, characters that match the POSIX named  character  classes         8. Similarly, characters that match the POSIX named  character  classes
198         are all low-valued characters.         are all low-valued characters.
199    
200         9.  Case-insensitive  matching  applies only to characters whose values         9.  However,  the Perl 5.10 horizontal and vertical whitespace matching
201           escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-
202           acters.
203    
204           10.  Case-insensitive  matching applies only to characters whose values
205         are less than 128, unless PCRE is built with Unicode property  support.         are less than 128, unless PCRE is built with Unicode property  support.
206         Even  when  Unicode  property support is available, PCRE still uses its         Even  when  Unicode  property support is available, PCRE still uses its
207         own character tables when checking the case of  low-valued  characters,         own character tables when checking the case of  low-valued  characters,
# Line 215  AUTHOR Line 220  AUTHOR
220         Cambridge CB2 3QH, England.         Cambridge CB2 3QH, England.
221    
222         Putting an actual email address here seems to have been a spam  magnet,         Putting an actual email address here seems to have been a spam  magnet,
223         so I've taken it away. If you want to email me, use my initial and sur-         so  I've  taken  it away. If you want to email me, use my two initials,
224         name, separated by a dot, at the domain ucs.cam.ac.uk.         followed by the two digits 10, at the domain cam.ac.uk.
225    
226    
227  REVISION  REVISION
228    
229         Last updated: 06 March 2007         Last updated: 13 June 2007
230         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
231  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
232    
# Line 244  PCRE BUILD-TIME OPTIONS Line 249  PCRE BUILD-TIME OPTIONS
249    
250           ./configure --help           ./configure --help
251    
252         The following sections describe certain options whose names begin  with         The following sections include  descriptions  of  options  whose  names
253         --enable  or  --disable. These settings specify changes to the defaults         begin with --enable or --disable. These settings specify changes to the
254         for the configure command. Because of the  way  that  configure  works,         defaults for the configure command. Because of the way  that  configure
255         --enable  and  --disable  always  come  in  pairs, so the complementary         works,  --enable  and --disable always come in pairs, so the complemen-
256         option always exists as well, but as it specifies the  default,  it  is         tary option always exists as well, but as it specifies the default,  it
257         not described.         is not described.
258    
259    
260  C++ SUPPORT  C++ SUPPORT
# Line 288  UNICODE CHARACTER PROPERTY SUPPORT Line 293  UNICODE CHARACTER PROPERTY SUPPORT
293         to the configure command. This implies UTF-8 support, even if you  have         to the configure command. This implies UTF-8 support, even if you  have
294         not explicitly requested it.         not explicitly requested it.
295    
296         Including  Unicode  property  support  adds around 90K of tables to the         Including  Unicode  property  support  adds around 30K of tables to the
297         PCRE library, approximately doubling its size. Only the  general  cate-         PCRE library. Only the general category properties such as  Lu  and  Nd
298         gory  properties  such as Lu and Nd are supported. Details are given in         are supported. Details are given in the pcrepattern documentation.
        the pcrepattern documentation.  
299    
300    
301  CODE VALUE OF NEWLINE  CODE VALUE OF NEWLINE
302    
303         By default, PCRE interprets character 10 (linefeed, LF)  as  indicating         By  default,  PCRE interprets character 10 (linefeed, LF) as indicating
304         the  end  of  a line. This is the normal newline character on Unix-like         the end of a line. This is the normal newline  character  on  Unix-like
305         systems. You can compile PCRE to use character 13 (carriage return, CR)         systems. You can compile PCRE to use character 13 (carriage return, CR)
306         instead, by adding         instead, by adding
307    
308           --enable-newline-is-cr           --enable-newline-is-cr
309    
310         to  the  configure  command.  There  is  also  a --enable-newline-is-lf         to the  configure  command.  There  is  also  a  --enable-newline-is-lf
311         option, which explicitly specifies linefeed as the newline character.         option, which explicitly specifies linefeed as the newline character.
312    
313         Alternatively, you can specify that line endings are to be indicated by         Alternatively, you can specify that line endings are to be indicated by
# Line 313  CODE VALUE OF NEWLINE Line 317  CODE VALUE OF NEWLINE
317    
318         to the configure command. There is a fourth option, specified by         to the configure command. There is a fourth option, specified by
319    
320             --enable-newline-is-anycrlf
321    
322           which causes PCRE to recognize any of the three sequences  CR,  LF,  or
323           CRLF as indicating a line ending. Finally, a fifth option, specified by
324    
325           --enable-newline-is-any           --enable-newline-is-any
326    
327         which causes PCRE to recognize any Unicode newline sequence.         causes PCRE to recognize any Unicode newline sequence.
328    
329         Whatever  line  ending convention is selected when PCRE is built can be         Whatever line ending convention is selected when PCRE is built  can  be
330         overridden when the library functions are called. At build time  it  is         overridden  when  the library functions are called. At build time it is
331         conventional to use the standard for your operating system.         conventional to use the standard for your operating system.
332    
333    
334  BUILDING SHARED AND STATIC LIBRARIES  BUILDING SHARED AND STATIC LIBRARIES
335    
336         The  PCRE building process uses libtool to build both shared and static         The PCRE building process uses libtool to build both shared and  static
337         Unix libraries by default. You can suppress one of these by adding  one         Unix  libraries by default. You can suppress one of these by adding one
338         of         of
339    
340           --disable-shared           --disable-shared
# Line 337  BUILDING SHARED AND STATIC LIBRARIES Line 346  BUILDING SHARED AND STATIC LIBRARIES
346  POSIX MALLOC USAGE  POSIX MALLOC USAGE
347    
348         When PCRE is called through the POSIX interface (see the pcreposix doc-         When PCRE is called through the POSIX interface (see the pcreposix doc-
349         umentation), additional working storage is  required  for  holding  the         umentation),  additional  working  storage  is required for holding the
350         pointers  to capturing substrings, because PCRE requires three integers         pointers to capturing substrings, because PCRE requires three  integers
351         per substring, whereas the POSIX interface provides only  two.  If  the         per  substring,  whereas  the POSIX interface provides only two. If the
352         number of expected substrings is small, the wrapper function uses space         number of expected substrings is small, the wrapper function uses space
353         on the stack, because this is faster than using malloc() for each call.         on the stack, because this is faster than using malloc() for each call.
354         The default threshold above which the stack is no longer used is 10; it         The default threshold above which the stack is no longer used is 10; it
# Line 352  POSIX MALLOC USAGE Line 361  POSIX MALLOC USAGE
361    
362  HANDLING VERY LARGE PATTERNS  HANDLING VERY LARGE PATTERNS
363    
364         Within a compiled pattern, offset values are used  to  point  from  one         Within  a  compiled  pattern,  offset values are used to point from one
365         part  to another (for example, from an opening parenthesis to an alter-         part to another (for example, from an opening parenthesis to an  alter-
366         nation metacharacter). By default, two-byte values are used  for  these         nation  metacharacter).  By default, two-byte values are used for these
367         offsets,  leading  to  a  maximum size for a compiled pattern of around         offsets, leading to a maximum size for a  compiled  pattern  of  around
368         64K. This is sufficient to handle all but the most  gigantic  patterns.         64K.  This  is sufficient to handle all but the most gigantic patterns.
369         Nevertheless,  some  people do want to process enormous patterns, so it         Nevertheless, some people do want to process enormous patterns,  so  it
370         is possible to compile PCRE to use three-byte or four-byte  offsets  by         is  possible  to compile PCRE to use three-byte or four-byte offsets by
371         adding a setting such as         adding a setting such as
372    
373           --with-link-size=3           --with-link-size=3
374    
375         to  the  configure  command.  The value given must be 2, 3, or 4. Using         to the configure command. The value given must be 2,  3,  or  4.  Using
376         longer offsets slows down the operation of PCRE because it has to  load         longer  offsets slows down the operation of PCRE because it has to load
377         additional bytes when handling them.         additional bytes when handling them.
378    
        If  you  build  PCRE with an increased link size, test 2 (and test 5 if  
        you are using UTF-8) will fail. Part of the output of these tests is  a  
        representation  of the compiled pattern, and this changes with the link  
        size.  
   
379    
380  AVOIDING EXCESSIVE STACK USAGE  AVOIDING EXCESSIVE STACK USAGE
381    
# Line 390  AVOIDING EXCESSIVE STACK USAGE Line 394  AVOIDING EXCESSIVE STACK USAGE
394    
395         to  the  configure  command. With this configuration, PCRE will use the         to  the  configure  command. With this configuration, PCRE will use the
396         pcre_stack_malloc and pcre_stack_free variables to call memory  manage-         pcre_stack_malloc and pcre_stack_free variables to call memory  manage-
397         ment  functions.  Separate  functions are provided because the usage is         ment  functions. By default these point to malloc() and free(), but you
398         very predictable: the block sizes requested are always  the  same,  and         can replace the pointers so that your own functions are used.
399         the  blocks  are always freed in reverse order. A calling program might  
400         be able to implement optimized functions that perform better  than  the         Separate functions are  provided  rather  than  using  pcre_malloc  and
401         standard  malloc()  and  free()  functions.  PCRE  runs noticeably more         pcre_free  because  the  usage  is  very  predictable:  the block sizes
402         slowly when built in this way. This option affects only the pcre_exec()         requested are always the same, and  the  blocks  are  always  freed  in
403         function; it is not relevant for the the pcre_dfa_exec() function.         reverse  order.  A calling program might be able to implement optimized
404           functions that perform better  than  malloc()  and  free().  PCRE  runs
405           noticeably more slowly when built in this way. This option affects only
406           the  pcre_exec()  function;  it   is   not   relevant   for   the   the
407           pcre_dfa_exec() function.
408    
409    
410  LIMITING PCRE RESOURCE USAGE  LIMITING PCRE RESOURCE USAGE
# Line 429  LIMITING PCRE RESOURCE USAGE Line 437  LIMITING PCRE RESOURCE USAGE
437         time.         time.
438    
439    
440    CREATING CHARACTER TABLES AT BUILD TIME
441    
442           PCRE uses fixed tables for processing characters whose code values  are
443           less  than 256. By default, PCRE is built with a set of tables that are
444           distributed in the file pcre_chartables.c.dist. These  tables  are  for
445           ASCII codes only. If you add
446    
447             --enable-rebuild-chartables
448    
449           to  the  configure  command, the distributed tables are no longer used.
450           Instead, a program called dftables is compiled and  run.  This  outputs
451           the source for new set of tables, created in the default locale of your
452           C runtime system. (This method of replacing the tables does not work if
453           you  are cross compiling, because dftables is run on the local host. If
454           you need to create alternative tables when cross  compiling,  you  will
455           have to do so "by hand".)
456    
457    
458  USING EBCDIC CODE  USING EBCDIC CODE
459    
460         PCRE assumes by default that it will run in an  environment  where  the         PCRE  assumes  by  default that it will run in an environment where the
461         character  code  is  ASCII  (or Unicode, which is a superset of ASCII).         character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
462         PCRE can, however, be compiled to  run  in  an  EBCDIC  environment  by         PCRE  can,  however,  be  compiled  to  run in an EBCDIC environment by
463         adding         adding
464    
465           --enable-ebcdic           --enable-ebcdic
466    
467         to the configure command.         to the configure command. This setting implies --enable-rebuild-charta-
468           bles.
469    
470    
471  SEE ALSO  SEE ALSO
# Line 455  AUTHOR Line 482  AUTHOR
482    
483  REVISION  REVISION
484    
485         Last updated: 06 March 2007         Last updated: 05 June 2007
486         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
487  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
488    
# Line 508  REGULAR EXPRESSIONS AS TREES Line 535  REGULAR EXPRESSIONS AS TREES
535    
536  THE STANDARD MATCHING ALGORITHM  THE STANDARD MATCHING ALGORITHM
537    
538         In the terminology of Jeffrey Friedl's book Mastering  Regular  Expres-         In the terminology of Jeffrey Friedl's book "Mastering Regular  Expres-
539         sions,  the  standard  algorithm  is  an "NFA algorithm". It conducts a         sions",  the  standard  algorithm  is an "NFA algorithm". It conducts a
540         depth-first search of the pattern tree. That is, it  proceeds  along  a         depth-first search of the pattern tree. That is, it  proceeds  along  a
541         single path through the tree, checking that the subject matches what is         single path through the tree, checking that the subject matches what is
542         required. When there is a mismatch, the algorithm  tries  any  alterna-         required. When there is a mismatch, the algorithm  tries  any  alterna-
# Line 591  THE ALTERNATIVE MATCHING ALGORITHM Line 618  THE ALTERNATIVE MATCHING ALGORITHM
618         ence  as  the  condition or test for a specific group recursion are not         ence  as  the  condition or test for a specific group recursion are not
619         supported.         supported.
620    
621         5. Callouts are supported, but the value of the  capture_top  field  is         5. Because many paths through the tree may be  active,  the  \K  escape
622           sequence, which resets the start of the match when encountered (but may
623           be on some paths and not on others), is not  supported.  It  causes  an
624           error if encountered.
625    
626           6.  Callouts  are  supported, but the value of the capture_top field is
627         always 1, and the value of the capture_last field is always -1.         always 1, and the value of the capture_last field is always -1.
628    
629         6.  The \C escape sequence, which (in the standard algorithm) matches a         7.  The \C escape sequence, which (in the standard algorithm) matches a
630         single byte, even in UTF-8 mode, is not supported because the  alterna-         single  byte, even in UTF-8 mode, is not supported because the alterna-
631         tive  algorithm  moves  through  the  subject string one character at a         tive algorithm moves through the subject  string  one  character  at  a
632         time, for all active paths through the tree.         time, for all active paths through the tree.
633    
634    
635  ADVANTAGES OF THE ALTERNATIVE ALGORITHM  ADVANTAGES OF THE ALTERNATIVE ALGORITHM
636    
637         Using the alternative matching algorithm provides the following  advan-         Using  the alternative matching algorithm provides the following advan-
638         tages:         tages:
639    
640         1. All possible matches (at a single point in the subject) are automat-         1. All possible matches (at a single point in the subject) are automat-
641         ically found, and in particular, the longest match is  found.  To  find         ically  found,  and  in particular, the longest match is found. To find
642         more than one match using the standard algorithm, you have to do kludgy         more than one match using the standard algorithm, you have to do kludgy
643         things with callouts.         things with callouts.
644    
645         2. There is much better support for partial matching. The  restrictions         2.  There is much better support for partial matching. The restrictions
646         on  the content of the pattern that apply when using the standard algo-         on the content of the pattern that apply when using the standard  algo-
647         rithm for partial matching do not apply to the  alternative  algorithm.         rithm  for  partial matching do not apply to the alternative algorithm.
648         For  non-anchored patterns, the starting position of a partial match is         For non-anchored patterns, the starting position of a partial match  is
649         available.         available.
650    
651         3. Because the alternative algorithm  scans  the  subject  string  just         3.  Because  the  alternative  algorithm  scans the subject string just
652         once,  and  never  needs to backtrack, it is possible to pass very long         once, and never needs to backtrack, it is possible to  pass  very  long
653         subject strings to the matching function in  several  pieces,  checking         subject  strings  to  the matching function in several pieces, checking
654         for partial matching each time.         for partial matching each time.
655    
656    
# Line 626  DISADVANTAGES OF THE ALTERNATIVE ALGORIT Line 658  DISADVANTAGES OF THE ALTERNATIVE ALGORIT
658    
659         The alternative algorithm suffers from a number of disadvantages:         The alternative algorithm suffers from a number of disadvantages:
660    
661         1.  It  is  substantially  slower  than the standard algorithm. This is         1. It is substantially slower than  the  standard  algorithm.  This  is
662         partly because it has to search for all possible matches, but  is  also         partly  because  it has to search for all possible matches, but is also
663         because it is less susceptible to optimization.         because it is less susceptible to optimization.
664    
665         2. Capturing parentheses and back references are not supported.         2. Capturing parentheses and back references are not supported.
# Line 645  AUTHOR Line 677  AUTHOR
677    
678  REVISION  REVISION
679    
680         Last updated: 06 March 2007         Last updated: 29 May 2007
681         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
682  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
683    
# Line 828  PCRE API OVERVIEW Line 860  PCRE API OVERVIEW
860    
861  NEWLINES  NEWLINES
862    
863         PCRE  supports four different conventions for indicating line breaks in         PCRE  supports five different conventions for indicating line breaks in
864         strings: a single CR (carriage return) character, a  single  LF  (line-         strings: a single CR (carriage return) character, a  single  LF  (line-
865         feed)  character,  the two-character sequence CRLF, or any Unicode new-         feed) character, the two-character sequence CRLF, any of the three pre-
866         line sequence.  The Unicode newline sequences are the three  just  men-         ceding, or any Unicode newline sequence. The Unicode newline  sequences
867         tioned, plus the single characters VT (vertical tab, U+000B), FF (form-         are  the  three just mentioned, plus the single characters VT (vertical
868         feed, U+000C), NEL (next line, U+0085), LS  (line  separator,  U+2028),         tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
869         and PS (paragraph separator, U+2029).         separator, U+2028), and PS (paragraph separator, U+2029).
870    
871         Each  of  the first three conventions is used by at least one operating         Each  of  the first three conventions is used by at least one operating
872         system as its standard newline sequence. When PCRE is built, a  default         system as its standard newline sequence. When PCRE is built, a  default
# Line 868  SAVING PRECOMPILED PATTERNS FOR LATER US Line 900  SAVING PRECOMPILED PATTERNS FOR LATER US
900         The compiled form of a regular expression can be saved and re-used at a         The compiled form of a regular expression can be saved and re-used at a
901         later time, possibly by a different program, and even on a  host  other         later time, possibly by a different program, and even on a  host  other
902         than  the  one  on  which  it  was  compiled.  Details are given in the         than  the  one  on  which  it  was  compiled.  Details are given in the
903         pcreprecompile documentation.         pcreprecompile documentation. However, compiling a  regular  expression
904           with  one version of PCRE for use with a different version is not guar-
905           anteed to work and may cause crashes.
906    
907    
908  CHECKING BUILD-TIME OPTIONS  CHECKING BUILD-TIME OPTIONS
# Line 899  CHECKING BUILD-TIME OPTIONS Line 933  CHECKING BUILD-TIME OPTIONS
933    
934         The output is an integer whose value specifies  the  default  character         The output is an integer whose value specifies  the  default  character
935         sequence  that is recognized as meaning "newline". The four values that         sequence  that is recognized as meaning "newline". The four values that
936         are supported are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY.         are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
937         The default should normally be the standard sequence for your operating         and  -1  for  ANY. The default should normally be the standard sequence
938         system.         for your operating system.
939    
940           PCRE_CONFIG_LINK_SIZE           PCRE_CONFIG_LINK_SIZE
941    
# Line 1125  COMPILING A PATTERN Line 1159  COMPILING A PATTERN
1159           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
1160           PCRE_NEWLINE_LF           PCRE_NEWLINE_LF
1161           PCRE_NEWLINE_CRLF           PCRE_NEWLINE_CRLF
1162             PCRE_NEWLINE_ANYCRLF
1163           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1164    
1165         These  options  override the default newline definition that was chosen         These  options  override the default newline definition that was chosen
1166         when PCRE was built. Setting the first or the second specifies  that  a         when PCRE was built. Setting the first or the second specifies  that  a
1167         newline  is  indicated  by a single character (CR or LF, respectively).         newline  is  indicated  by a single character (CR or LF, respectively).
1168         Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the         Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
1169         two-character  CRLF  sequence.  Setting PCRE_NEWLINE_ANY specifies that         two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
1170         any Unicode newline sequence should be recognized. The Unicode  newline         that any of the three preceding sequences should be recognized. Setting
1171         sequences  are  the three just mentioned, plus the single characters VT         PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
1172         (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085),         recognized. The Unicode newline sequences are the three just mentioned,
1173         LS  (line separator, U+2028), and PS (paragraph separator, U+2029). The         plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
1174         last two are recognized only in UTF-8 mode.         U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
1175           (paragraph  separator,  U+2029).  The  last  two are recognized only in
1176           UTF-8 mode.
1177    
1178         The newline setting in the  options  word  uses  three  bits  that  are         The newline setting in the  options  word  uses  three  bits  that  are
1179         treated  as  a  number, giving eight possibilities. Currently only five         treated as a number, giving eight possibilities. Currently only six are
1180         are used (default plus the four values above). This means that  if  you         used (default plus the five values above). This means that if  you  set
1181         set  more  than  one  newline option, the combination may or may not be         more  than one newline option, the combination may or may not be sensi-
1182         sensible. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is  equiva-         ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
1183         lent  to PCRE_NEWLINE_CRLF, but other combinations yield unused numbers         PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
1184         and cause an error.         cause an error.
1185    
1186         The only time that a line break is specially recognized when  compiling         The only time that a line break is specially recognized when  compiling
1187         a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a         a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
# Line 1230  COMPILATION ERROR CODES Line 1267  COMPILATION ERROR CODES
1267           26  malformed number or name after (?(           26  malformed number or name after (?(
1268           27  conditional group contains more than two branches           27  conditional group contains more than two branches
1269           28  assertion expected after (?(           28  assertion expected after (?(
1270           29  (?R or (?digits must be followed by )           29  (?R or (?[+-]digits must be followed by )
1271           30  unknown POSIX class name           30  unknown POSIX class name
1272           31  POSIX collating elements are not supported           31  POSIX collating elements are not supported
1273           32  this version of PCRE is not compiled with PCRE_UTF8 support           32  this version of PCRE is not compiled with PCRE_UTF8 support
# Line 1259  COMPILATION ERROR CODES Line 1296  COMPILATION ERROR CODES
1296           54  DEFINE group contains more than one branch           54  DEFINE group contains more than one branch
1297           55  repeating a DEFINE group is not allowed           55  repeating a DEFINE group is not allowed
1298           56  inconsistent NEWLINE options"           56  inconsistent NEWLINE options"
1299             57  \g is not followed by a braced name or an optionally braced
1300                   non-zero number
1301             58  (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number
1302    
1303    
1304  STUDYING A PATTERN  STUDYING A PATTERN
# Line 1310  STUDYING A PATTERN Line 1350  STUDYING A PATTERN
1350  LOCALE SUPPORT  LOCALE SUPPORT
1351    
1352         PCRE handles caseless matching, and determines whether  characters  are         PCRE handles caseless matching, and determines whether  characters  are
1353         letters  digits,  or whatever, by reference to a set of tables, indexed         letters,  digits, or whatever, by reference to a set of tables, indexed
1354         by character value. When running in UTF-8 mode, this  applies  only  to         by character value. When running in UTF-8 mode, this  applies  only  to
1355         characters  with  codes  less than 128. Higher-valued codes never match         characters  with  codes  less than 128. Higher-valued codes never match
1356         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
1357         with  Unicode  character property support. The use of locales with Uni-         with  Unicode  character property support. The use of locales with Uni-
1358         code is discouraged.         code is discouraged. If you are handling characters with codes  greater
1359           than  128, you should either use UTF-8 and Unicode, or use locales, but
1360         An internal set of tables is created in the default C locale when  PCRE         not try to mix the two.
1361         is  built.  This  is  used when the final argument of pcre_compile() is  
1362         NULL, and is sufficient for many applications. An  alternative  set  of         PCRE contains an internal set of tables that are used  when  the  final
1363         tables  can,  however, be supplied. These may be created in a different         argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
1364         locale from the default. As more and more applications change to  using         applications.  Normally, the internal tables recognize only ASCII char-
1365         Unicode, the need for this locale support is expected to die away.         acters. However, when PCRE is built, it is possible to cause the inter-
1366           nal tables to be rebuilt in the default "C" locale of the local system,
1367         External  tables  are  built by calling the pcre_maketables() function,         which may cause them to be different.
1368         which has no arguments, in the relevant locale. The result can then  be  
1369         passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For         The  internal tables can always be overridden by tables supplied by the
1370         example, to build and use tables that are appropriate  for  the  French         application that calls PCRE. These may be created in a different locale
1371         locale  (where  accented  characters  with  values greater than 128 are         from  the  default.  As more and more applications change to using Uni-
1372           code, the need for this locale support is expected to die away.
1373    
1374           External tables are built by calling  the  pcre_maketables()  function,
1375           which  has no arguments, in the relevant locale. The result can then be
1376           passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
1377           example,  to  build  and use tables that are appropriate for the French
1378           locale (where accented characters with  values  greater  than  128  are
1379         treated as letters), the following code could be used:         treated as letters), the following code could be used:
1380    
1381           setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
1382           tables = pcre_maketables();           tables = pcre_maketables();
1383           re = pcre_compile(..., tables);           re = pcre_compile(..., tables);
1384    
1385           The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
1386           if you are using Windows, the name for the French locale is "french".
1387    
1388         When pcre_maketables() runs, the tables are built  in  memory  that  is         When pcre_maketables() runs, the tables are built  in  memory  that  is
1389         obtained  via  pcre_malloc. It is the caller's responsibility to ensure         obtained  via  pcre_malloc. It is the caller's responsibility to ensure
1390         that the memory containing the tables remains available for as long  as         that the memory containing the tables remains available for as long  as
# Line 1437  INFORMATION ABOUT A PATTERN Line 1487  INFORMATION ABOUT A PATTERN
1487         returned. The fourth argument should point to an unsigned char *  vari-         returned. The fourth argument should point to an unsigned char *  vari-
1488         able.         able.
1489    
1490             PCRE_INFO_JCHANGED
1491    
1492           Return  1  if the (?J) option setting is used in the pattern, otherwise
1493           0. The fourth argument should point to an int variable. The (?J) inter-
1494           nal option setting changes the local PCRE_DUPNAMES option.
1495    
1496           PCRE_INFO_LASTLITERAL           PCRE_INFO_LASTLITERAL
1497    
1498         Return  the  value of the rightmost literal byte that must exist in any         Return  the  value of the rightmost literal byte that must exist in any
# Line 1491  INFORMATION ABOUT A PATTERN Line 1547  INFORMATION ABOUT A PATTERN
1547         name-to-number map, remember that the length of the entries  is  likely         name-to-number map, remember that the length of the entries  is  likely
1548         to be different for each compiled pattern.         to be different for each compiled pattern.
1549    
1550             PCRE_INFO_OKPARTIAL
1551    
1552           Return  1 if the pattern can be used for partial matching, otherwise 0.
1553           The fourth argument should point to an int  variable.  The  pcrepartial
1554           documentation  lists  the restrictions that apply to patterns when par-
1555           tial matching is used.
1556    
1557           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
1558    
1559         Return  a  copy of the options with which the pattern was compiled. The         Return a copy of the options with which the pattern was  compiled.  The
1560         fourth argument should point to an unsigned long  int  variable.  These         fourth  argument  should  point to an unsigned long int variable. These
1561         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
1562         by any top-level option settings within the pattern itself.         by any top-level option settings within the pattern itself.
1563    
1564         A pattern is automatically anchored by PCRE if  all  of  its  top-level         A  pattern  is  automatically  anchored by PCRE if all of its top-level
1565         alternatives begin with one of the following:         alternatives begin with one of the following:
1566    
1567           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 1512  INFORMATION ABOUT A PATTERN Line 1575  INFORMATION ABOUT A PATTERN
1575    
1576           PCRE_INFO_SIZE           PCRE_INFO_SIZE
1577    
1578         Return the size of the compiled pattern, that is, the  value  that  was         Return  the  size  of the compiled pattern, that is, the value that was
1579         passed as the argument to pcre_malloc() when PCRE was getting memory in         passed as the argument to pcre_malloc() when PCRE was getting memory in
1580         which to place the compiled data. The fourth argument should point to a         which to place the compiled data. The fourth argument should point to a
1581         size_t variable.         size_t variable.
# Line 1520  INFORMATION ABOUT A PATTERN Line 1583  INFORMATION ABOUT A PATTERN
1583           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
1584    
1585         Return the size of the data block pointed to by the study_data field in         Return the size of the data block pointed to by the study_data field in
1586         a pcre_extra block. That is,  it  is  the  value  that  was  passed  to         a  pcre_extra  block.  That  is,  it  is  the  value that was passed to
1587         pcre_malloc() when PCRE was getting memory into which to place the data         pcre_malloc() when PCRE was getting memory into which to place the data
1588         created by pcre_study(). The fourth argument should point to  a  size_t         created  by  pcre_study(). The fourth argument should point to a size_t
1589         variable.         variable.
1590    
1591    
# Line 1530  OBSOLETE INFO FUNCTION Line 1593  OBSOLETE INFO FUNCTION
1593    
1594         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
1595    
1596         The  pcre_info()  function is now obsolete because its interface is too         The pcre_info() function is now obsolete because its interface  is  too
1597         restrictive to return all the available data about a compiled  pattern.         restrictive  to return all the available data about a compiled pattern.
1598         New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of         New  programs  should  use  pcre_fullinfo()  instead.  The   yield   of
1599         pcre_info() is the number of capturing subpatterns, or one of the  fol-         pcre_info()  is the number of capturing subpatterns, or one of the fol-
1600         lowing negative numbers:         lowing negative numbers:
1601    
1602           PCRE_ERROR_NULL       the argument code was NULL           PCRE_ERROR_NULL       the argument code was NULL
1603           PCRE_ERROR_BADMAGIC   the "magic number" was not found           PCRE_ERROR_BADMAGIC   the "magic number" was not found
1604    
1605         If  the  optptr  argument is not NULL, a copy of the options with which         If the optptr argument is not NULL, a copy of the  options  with  which
1606         the pattern was compiled is placed in the integer  it  points  to  (see         the  pattern  was  compiled  is placed in the integer it points to (see
1607         PCRE_INFO_OPTIONS above).         PCRE_INFO_OPTIONS above).
1608    
1609         If  the  pattern  is  not anchored and the firstcharptr argument is not         If the pattern is not anchored and the  firstcharptr  argument  is  not
1610         NULL, it is used to pass back information about the first character  of         NULL,  it is used to pass back information about the first character of
1611         any matched string (see PCRE_INFO_FIRSTBYTE above).         any matched string (see PCRE_INFO_FIRSTBYTE above).
1612    
1613    
# Line 1552  REFERENCE COUNTS Line 1615  REFERENCE COUNTS
1615    
1616         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
1617    
1618         The  pcre_refcount()  function is used to maintain a reference count in         The pcre_refcount() function is used to maintain a reference  count  in
1619         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
1620         benefit  of  applications  that  operate  in an object-oriented manner,         benefit of applications that  operate  in  an  object-oriented  manner,
1621         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
1622         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
1623    
1624         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
1625         zero.  It is changed only by calling this function, whose action is  to         zero.   It is changed only by calling this function, whose action is to
1626         add  the  adjust  value  (which may be positive or negative) to it. The         add the adjust value (which may be positive or  negative)  to  it.  The
1627         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
1628         is  constrained to lie between 0 and 65535, inclusive. If the new value         is constrained to lie between 0 and 65535, inclusive. If the new  value
1629         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
1630    
1631         Except when it is zero, the reference count is not correctly  preserved         Except  when it is zero, the reference count is not correctly preserved
1632         if  a  pattern  is  compiled on one host and then transferred to a host         if a pattern is compiled on one host and then  transferred  to  a  host
1633         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
1634    
1635    
# Line 1576  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1639  MATCHING A PATTERN: THE TRADITIONAL FUNC
1639              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
1640              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
1641    
1642         The  function pcre_exec() is called to match a subject string against a         The function pcre_exec() is called to match a subject string against  a
1643         compiled pattern, which is passed in the code argument. If the  pattern         compiled  pattern, which is passed in the code argument. If the pattern
1644         has been studied, the result of the study should be passed in the extra         has been studied, the result of the study should be passed in the extra
1645         argument. This function is the main matching facility of  the  library,         argument.  This  function is the main matching facility of the library,
1646         and it operates in a Perl-like manner. For specialist use there is also         and it operates in a Perl-like manner. For specialist use there is also
1647         an alternative matching function, which is described below in the  sec-         an  alternative matching function, which is described below in the sec-
1648         tion about the pcre_dfa_exec() function.         tion about the pcre_dfa_exec() function.
1649    
1650         In  most applications, the pattern will have been compiled (and option-         In most applications, the pattern will have been compiled (and  option-
1651         ally studied) in the same process that calls pcre_exec().  However,  it         ally  studied)  in the same process that calls pcre_exec(). However, it
1652         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
1653         later in different processes, possibly even on different hosts.  For  a         later  in  different processes, possibly even on different hosts. For a
1654         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
1655    
1656         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 1606  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1669  MATCHING A PATTERN: THE TRADITIONAL FUNC
1669    
1670     Extra data for pcre_exec()     Extra data for pcre_exec()
1671    
1672         If  the  extra argument is not NULL, it must point to a pcre_extra data         If the extra argument is not NULL, it must point to a  pcre_extra  data
1673         block. The pcre_study() function returns such a block (when it  doesn't         block.  The pcre_study() function returns such a block (when it doesn't
1674         return  NULL), but you can also create one for yourself, and pass addi-         return NULL), but you can also create one for yourself, and pass  addi-
1675         tional information in it. The pcre_extra block contains  the  following         tional  information  in it. The pcre_extra block contains the following
1676         fields (not necessarily in this order):         fields (not necessarily in this order):
1677    
1678           unsigned long int flags;           unsigned long int flags;
# Line 1619  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1682  MATCHING A PATTERN: THE TRADITIONAL FUNC
1682           void *callout_data;           void *callout_data;
1683           const unsigned char *tables;           const unsigned char *tables;
1684    
1685         The  flags  field  is a bitmap that specifies which of the other fields         The flags field is a bitmap that specifies which of  the  other  fields
1686         are set. The flag bits are:         are set. The flag bits are:
1687    
1688           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_STUDY_DATA
# Line 1628  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1691  MATCHING A PATTERN: THE TRADITIONAL FUNC
1691           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_CALLOUT_DATA
1692           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
1693    
1694         Other flag bits should be set to zero. The study_data field is  set  in         Other  flag  bits should be set to zero. The study_data field is set in
1695         the  pcre_extra  block  that is returned by pcre_study(), together with         the pcre_extra block that is returned by  pcre_study(),  together  with
1696         the appropriate flag bit. You should not set this yourself, but you may         the appropriate flag bit. You should not set this yourself, but you may
1697         add  to  the  block by setting the other fields and their corresponding         add to the block by setting the other fields  and  their  corresponding
1698         flag bits.         flag bits.
1699    
1700         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
1701         a  vast amount of resources when running patterns that are not going to         a vast amount of resources when running patterns that are not going  to
1702         match, but which have a very large number  of  possibilities  in  their         match,  but  which  have  a very large number of possibilities in their
1703         search  trees.  The  classic  example  is  the  use of nested unlimited         search trees. The classic  example  is  the  use  of  nested  unlimited
1704         repeats.         repeats.
1705    
1706         Internally, PCRE uses a function called match() which it calls  repeat-         Internally,  PCRE uses a function called match() which it calls repeat-
1707         edly  (sometimes  recursively). The limit set by match_limit is imposed         edly (sometimes recursively). The limit set by match_limit  is  imposed
1708         on the number of times this function is called during  a  match,  which         on  the  number  of times this function is called during a match, which
1709         has  the  effect  of  limiting the amount of backtracking that can take         has the effect of limiting the amount of  backtracking  that  can  take
1710         place. For patterns that are not anchored, the count restarts from zero         place. For patterns that are not anchored, the count restarts from zero
1711         for each position in the subject string.         for each position in the subject string.
1712    
1713         The  default  value  for  the  limit can be set when PCRE is built; the         The default value for the limit can be set  when  PCRE  is  built;  the
1714         default default is 10 million, which handles all but the  most  extreme         default  default  is 10 million, which handles all but the most extreme
1715         cases.  You  can  override  the  default by suppling pcre_exec() with a         cases. You can override the default  by  suppling  pcre_exec()  with  a
1716         pcre_extra    block    in    which    match_limit    is    set,     and         pcre_extra     block    in    which    match_limit    is    set,    and
1717         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is
1718         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1719    
1720         The match_limit_recursion field is similar to match_limit, but  instead         The  match_limit_recursion field is similar to match_limit, but instead
1721         of limiting the total number of times that match() is called, it limits         of limiting the total number of times that match() is called, it limits
1722         the depth of recursion. The recursion depth is a  smaller  number  than         the  depth  of  recursion. The recursion depth is a smaller number than
1723         the  total number of calls, because not all calls to match() are recur-         the total number of calls, because not all calls to match() are  recur-
1724         sive.  This limit is of use only if it is set smaller than match_limit.         sive.  This limit is of use only if it is set smaller than match_limit.
1725    
1726         Limiting  the  recursion  depth  limits the amount of stack that can be         Limiting the recursion depth limits the amount of  stack  that  can  be
1727         used, or, when PCRE has been compiled to use memory on the heap instead         used, or, when PCRE has been compiled to use memory on the heap instead
1728         of the stack, the amount of heap memory that can be used.         of the stack, the amount of heap memory that can be used.
1729    
1730         The  default  value  for  match_limit_recursion can be set when PCRE is         The default value for match_limit_recursion can be  set  when  PCRE  is
1731         built; the default default  is  the  same  value  as  the  default  for         built;  the  default  default  is  the  same  value  as the default for
1732         match_limit.  You can override the default by suppling pcre_exec() with         match_limit. You can override the default by suppling pcre_exec()  with
1733         a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and         a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and
1734         PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the         PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the
1735         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1736    
1737         The pcre_callout field is used in conjunction with the  "callout"  fea-         The  pcre_callout  field is used in conjunction with the "callout" fea-
1738         ture, which is described in the pcrecallout documentation.         ture, which is described in the pcrecallout documentation.
1739    
1740         The  tables  field  is  used  to  pass  a  character  tables pointer to         The tables field  is  used  to  pass  a  character  tables  pointer  to
1741         pcre_exec(); this overrides the value that is stored with the  compiled         pcre_exec();  this overrides the value that is stored with the compiled
1742         pattern.  A  non-NULL value is stored with the compiled pattern only if         pattern. A non-NULL value is stored with the compiled pattern  only  if
1743         custom tables were supplied to pcre_compile() via  its  tableptr  argu-         custom  tables  were  supplied to pcre_compile() via its tableptr argu-
1744         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
1745         PCRE's internal tables to be used. This facility is  helpful  when  re-         PCRE's  internal  tables  to be used. This facility is helpful when re-
1746         using  patterns  that  have been saved after compiling with an external         using patterns that have been saved after compiling  with  an  external
1747         set of tables, because the external tables  might  be  at  a  different         set  of  tables,  because  the  external tables might be at a different
1748         address  when  pcre_exec() is called. See the pcreprecompile documenta-         address when pcre_exec() is called. See the  pcreprecompile  documenta-
1749         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
1750    
1751     Option bits for pcre_exec()     Option bits for pcre_exec()
1752    
1753         The unused bits of the options argument for pcre_exec() must  be  zero.         The  unused  bits of the options argument for pcre_exec() must be zero.
1754         The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,         The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,
1755         PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and         PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and
1756         PCRE_PARTIAL.         PCRE_PARTIAL.
1757    
1758           PCRE_ANCHORED           PCRE_ANCHORED
1759    
1760         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
1761         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
1762         turned  out to be anchored by virtue of its contents, it cannot be made         turned out to be anchored by virtue of its contents, it cannot be  made
1763         unachored at matching time.         unachored at matching time.
1764    
1765           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
1766           PCRE_NEWLINE_LF           PCRE_NEWLINE_LF
1767           PCRE_NEWLINE_CRLF           PCRE_NEWLINE_CRLF
1768             PCRE_NEWLINE_ANYCRLF
1769           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1770    
1771         These options override  the  newline  definition  that  was  chosen  or         These  options  override  the  newline  definition  that  was chosen or
1772         defaulted  when the pattern was compiled. For details, see the descrip-         defaulted when the pattern was compiled. For details, see the  descrip-
1773         tion of pcre_compile()  above.  During  matching,  the  newline  choice         tion  of  pcre_compile()  above.  During  matching,  the newline choice
1774         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-         affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
1775         ters. It may also alter the way the match position is advanced after  a         ters.  It may also alter the way the match position is advanced after a
1776         match  failure  for  an  unanchored  pattern. When PCRE_NEWLINE_CRLF or         match  failure  for  an  unanchored  pattern.  When  PCRE_NEWLINE_CRLF,
1777         PCRE_NEWLINE_ANY is set, and a match attempt  fails  when  the  current         PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY is set, and a match attempt
1778         position  is  at a CRLF sequence, the match position is advanced by two         fails when the current position is at a CRLF sequence, the match  posi-
1779         characters instead of one, in other words, to after the CRLF.         tion  is  advanced by two characters instead of one, in other words, to
1780           after the CRLF.
1781    
1782           PCRE_NOTBOL           PCRE_NOTBOL
1783    
# Line 2132  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2197  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2197    
2198         These  functions call pcre_get_stringnumber(), and if it succeeds, they         These  functions call pcre_get_stringnumber(), and if it succeeds, they
2199         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
2200         ate.         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
2201           behaviour may not be what you want (see the next section).
2202    
2203    
2204  DUPLICATE SUBPATTERN NAMES  DUPLICATE SUBPATTERN NAMES
# Line 2140  DUPLICATE SUBPATTERN NAMES Line 2206  DUPLICATE SUBPATTERN NAMES
2206         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
2207              const char *name, char **first, char **last);              const char *name, char **first, char **last);
2208    
2209         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
2210         subpatterns are not required to  be  unique.  Normally,  patterns  with         subpatterns  are  not  required  to  be unique. Normally, patterns with
2211         duplicate  names  are such that in any one match, only one of the named         duplicate names are such that in any one match, only one of  the  named
2212         subpatterns participates. An example is shown in the pcrepattern  docu-         subpatterns  participates. An example is shown in the pcrepattern docu-
2213         mentation. When duplicates are present, pcre_copy_named_substring() and         mentation. When duplicates are present, pcre_copy_named_substring() and
2214         pcre_get_named_substring() return the first substring corresponding  to         pcre_get_named_substring()  return the first substring corresponding to
2215         the  given  name  that  is  set.  If  none  are set, an empty string is         the given name that is set.  If  none  are  set,  an  empty  string  is
2216         returned.  The pcre_get_stringnumber() function returns one of the num-         returned.  The pcre_get_stringnumber() function returns one of the num-
2217         bers  that are associated with the name, but it is not defined which it         bers that are associated with the name, but it is not defined which  it
2218         is.         is.
2219    
2220         If you want to get full details of all captured substrings for a  given         If  you want to get full details of all captured substrings for a given
2221         name,  you  must  use  the pcre_get_stringtable_entries() function. The         name, you must use  the  pcre_get_stringtable_entries()  function.  The
2222         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
2223         third  and  fourth  are  pointers to variables which are updated by the         third and fourth are pointers to variables which  are  updated  by  the
2224         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
2225         the  name-to-number  table  for  the  given  name.  The function itself         the name-to-number table  for  the  given  name.  The  function  itself
2226         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
2227         there  are none. The format of the table is described above in the sec-         there are none. The format of the table is described above in the  sec-
2228         tion entitled Information about a  pattern.   Given  all  the  relevant         tion  entitled  Information  about  a  pattern.  Given all the relevant
2229         entries  for the name, you can extract each of their numbers, and hence         entries for the name, you can extract each of their numbers, and  hence
2230         the captured data, if any.         the captured data, if any.
2231    
2232    
2233  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
2234    
2235         The traditional matching function uses a  similar  algorithm  to  Perl,         The  traditional  matching  function  uses a similar algorithm to Perl,
2236         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
2237         the subject. If you want to find all possible matches, or  the  longest         the  subject.  If you want to find all possible matches, or the longest
2238         possible  match,  consider using the alternative matching function (see         possible match, consider using the alternative matching  function  (see
2239         below) instead. If you cannot use the alternative function,  but  still         below)  instead.  If you cannot use the alternative function, but still
2240         need  to  find all possible matches, you can kludge it up by making use         need to find all possible matches, you can kludge it up by  making  use
2241         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
2242         tation.         tation.
2243    
2244         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
2245         tern.  When your callout function is called, extract and save the  cur-         tern.   When your callout function is called, extract and save the cur-
2246         rent  matched  substring.  Then  return  1, which forces pcre_exec() to         rent matched substring. Then return  1,  which  forces  pcre_exec()  to
2247         backtrack and try other alternatives. Ultimately, when it runs  out  of         backtrack  and  try other alternatives. Ultimately, when it runs out of
2248         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
2249    
2250    
# Line 2189  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2255  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2255              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
2256              int *workspace, int wscount);              int *workspace, int wscount);
2257    
2258         The  function  pcre_dfa_exec()  is  called  to  match  a subject string         The function pcre_dfa_exec()  is  called  to  match  a  subject  string
2259         against a compiled pattern, using a matching algorithm that  scans  the         against  a  compiled pattern, using a matching algorithm that scans the
2260         subject  string  just  once, and does not backtrack. This has different         subject string just once, and does not backtrack.  This  has  different
2261         characteristics to the normal algorithm, and  is  not  compatible  with         characteristics  to  the  normal  algorithm, and is not compatible with
2262         Perl.  Some  of the features of PCRE patterns are not supported. Never-         Perl. Some of the features of PCRE patterns are not  supported.  Never-
2263         theless, there are times when this kind of matching can be useful.  For         theless,  there are times when this kind of matching can be useful. For
2264         a discussion of the two matching algorithms, see the pcrematching docu-         a discussion of the two matching algorithms, see the pcrematching docu-
2265         mentation.         mentation.
2266    
2267         The arguments for the pcre_dfa_exec() function  are  the  same  as  for         The  arguments  for  the  pcre_dfa_exec()  function are the same as for
2268         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
2269         ent way, and this is described below. The other  common  arguments  are         ent  way,  and  this is described below. The other common arguments are
2270         used  in  the  same way as for pcre_exec(), so their description is not         used in the same way as for pcre_exec(), so their  description  is  not
2271         repeated here.         repeated here.
2272    
2273         The two additional arguments provide workspace for  the  function.  The         The  two  additional  arguments provide workspace for the function. The
2274         workspace  vector  should  contain at least 20 elements. It is used for         workspace vector should contain at least 20 elements. It  is  used  for
2275         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
2276         workspace  will  be  needed for patterns and subjects where there are a         workspace will be needed for patterns and subjects where  there  are  a
2277         lot of potential matches.         lot of potential matches.
2278    
2279         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 2229  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2295  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2295    
2296     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
2297    
2298         The unused bits of the options argument  for  pcre_dfa_exec()  must  be         The  unused  bits  of  the options argument for pcre_dfa_exec() must be
2299         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
2300         LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,         LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,
2301         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
2302         three of these are the same as for pcre_exec(), so their description is         three of these are the same as for pcre_exec(), so their description is
2303         not repeated here.         not repeated here.
2304    
2305           PCRE_PARTIAL           PCRE_PARTIAL
2306    
2307         This  has  the  same general effect as it does for pcre_exec(), but the         This has the same general effect as it does for  pcre_exec(),  but  the
2308         details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for         details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for
2309         pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into         pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into
2310         PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have         PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have
2311         been no complete matches, but there is still at least one matching pos-         been no complete matches, but there is still at least one matching pos-
2312         sibility. The portion of the string that provided the partial match  is         sibility.  The portion of the string that provided the partial match is
2313         set as the first matching string.         set as the first matching string.
2314    
2315           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
2316    
2317         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
2318         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
2319         tive  algorithm  works, this is necessarily the shortest possible match         tive algorithm works, this is necessarily the shortest  possible  match
2320         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
2321    
2322           PCRE_DFA_RESTART           PCRE_DFA_RESTART
2323    
2324         When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and         When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and
2325         returns  a  partial  match, it is possible to call it again, with addi-         returns a partial match, it is possible to call it  again,  with  addi-
2326         tional subject characters, and have it continue with  the  same  match.         tional  subject  characters,  and have it continue with the same match.
2327         The  PCRE_DFA_RESTART  option requests this action; when it is set, the         The PCRE_DFA_RESTART option requests this action; when it is  set,  the
2328         workspace and wscount options must reference the same vector as  before         workspace  and wscount options must reference the same vector as before
2329         because  data  about  the  match so far is left in them after a partial         because data about the match so far is left in  them  after  a  partial
2330         match. There is more discussion of this  facility  in  the  pcrepartial         match.  There  is  more  discussion of this facility in the pcrepartial
2331         documentation.         documentation.
2332    
2333     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
2334    
2335         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
2336         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
2337         of  the  function  start  at the same point in the subject. The shorter         of the function start at the same point in  the  subject.  The  shorter
2338         matches are all initial substrings of the longer matches. For  example,         matches  are all initial substrings of the longer matches. For example,
2339         if the pattern         if the pattern
2340    
2341           <.*>           <.*>
# Line 2284  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2350  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2350           <something> <something else>           <something> <something else>
2351           <something> <something else> <something further>           <something> <something else> <something further>
2352    
2353         On  success,  the  yield of the function is a number greater than zero,         On success, the yield of the function is a number  greater  than  zero,
2354         which is the number of matched substrings.  The  substrings  themselves         which  is  the  number of matched substrings. The substrings themselves
2355         are  returned  in  ovector. Each string uses two elements; the first is         are returned in ovector. Each string uses two elements;  the  first  is
2356         the offset to the start, and the second is the offset to  the  end.  In         the  offset  to  the start, and the second is the offset to the end. In
2357         fact,  all  the  strings  have the same start offset. (Space could have         fact, all the strings have the same start  offset.  (Space  could  have
2358         been saved by giving this only once, but it was decided to retain  some         been  saved by giving this only once, but it was decided to retain some
2359         compatibility  with  the  way pcre_exec() returns data, even though the         compatibility with the way pcre_exec() returns data,  even  though  the
2360         meaning of the strings is different.)         meaning of the strings is different.)
2361    
2362         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
2363         est  matching  string is given first. If there were too many matches to         est matching string is given first. If there were too many  matches  to
2364         fit into ovector, the yield of the function is zero, and the vector  is         fit  into ovector, the yield of the function is zero, and the vector is
2365         filled with the longest matches.         filled with the longest matches.
2366    
2367     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
2368    
2369         The  pcre_dfa_exec()  function returns a negative number when it fails.         The pcre_dfa_exec() function returns a negative number when  it  fails.
2370         Many of the errors are the same  as  for  pcre_exec(),  and  these  are         Many  of  the  errors  are  the  same as for pcre_exec(), and these are
2371         described  above.   There are in addition the following errors that are         described above.  There are in addition the following errors  that  are
2372         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
2373    
2374           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
2375    
2376         This return is given if pcre_dfa_exec() encounters an item in the  pat-         This  return is given if pcre_dfa_exec() encounters an item in the pat-
2377         tern  that  it  does not support, for instance, the use of \C or a back         tern that it does not support, for instance, the use of \C  or  a  back
2378         reference.         reference.
2379    
2380           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
2381    
2382         This return is given if pcre_dfa_exec()  encounters  a  condition  item         This  return  is  given  if pcre_dfa_exec() encounters a condition item
2383         that  uses  a back reference for the condition, or a test for recursion         that uses a back reference for the condition, or a test  for  recursion
2384         in a specific group. These are not supported.         in a specific group. These are not supported.
2385    
2386           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
2387    
2388         This return is given if pcre_dfa_exec() is called with an  extra  block         This  return  is given if pcre_dfa_exec() is called with an extra block
2389         that contains a setting of the match_limit field. This is not supported         that contains a setting of the match_limit field. This is not supported
2390         (it is meaningless).         (it is meaningless).
2391    
2392           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
2393    
2394         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
2395         workspace vector.         workspace vector.
2396    
2397           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
2398    
2399         When  a  recursive subpattern is processed, the matching function calls         When a recursive subpattern is processed, the matching  function  calls
2400         itself recursively, using private vectors for  ovector  and  workspace.         itself  recursively,  using  private vectors for ovector and workspace.
2401         This  error  is  given  if  the output vector is not large enough. This         This error is given if the output vector  is  not  large  enough.  This
2402         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
2403    
2404    
2405  SEE ALSO  SEE ALSO
2406    
2407         pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-         pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-
2408         tial(3),  pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).         tial(3), pcreposix(3), pcreprecompile(3), pcresample(3),  pcrestack(3).
2409    
2410    
2411  AUTHOR  AUTHOR
# Line 2351  AUTHOR Line 2417  AUTHOR
2417    
2418  REVISION  REVISION
2419    
2420         Last updated: 06 March 2007         Last updated: 13 June 2007
2421         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
2422  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2423    
# Line 2379  PCRE CALLOUTS Line 2445  PCRE CALLOUTS
2445         default value is zero.  For  example,  this  pattern  has  two  callout         default value is zero.  For  example,  this  pattern  has  two  callout
2446         points:         points:
2447    
2448           (?C1)eabc(?C2)def           (?C1)abc(?C2)def
2449    
2450         If  the  PCRE_AUTO_CALLOUT  option  bit  is  set when pcre_compile() is         If  the  PCRE_AUTO_CALLOUT  option  bit  is  set when pcre_compile() is
2451         called, PCRE automatically  inserts  callouts,  all  with  number  255,         called, PCRE automatically  inserts  callouts,  all  with  number  255,
# Line 2454  THE CALLOUT INTERFACE Line 2520  THE CALLOUT INTERFACE
2520         The subject and subject_length fields contain copies of the values that         The subject and subject_length fields contain copies of the values that
2521         were passed to pcre_exec().         were passed to pcre_exec().
2522    
2523         The start_match field contains the offset within the subject  at  which         The start_match field normally contains the offset within  the  subject
2524         the  current match attempt started. If the pattern is not anchored, the         at  which  the  current  match  attempt started. However, if the escape
2525         callout function may be called several times from the same point in the         sequence \K has been encountered, this value is changed to reflect  the
2526         pattern for different starting points in the subject.         modified  starting  point.  If the pattern is not anchored, the callout
2527           function may be called several times from the same point in the pattern
2528           for different starting points in the subject.
2529    
2530         The  current_position  field  contains the offset within the subject of         The  current_position  field  contains the offset within the subject of
2531         the current match pointer.         the current match pointer.
# Line 2520  AUTHOR Line 2588  AUTHOR
2588    
2589  REVISION  REVISION
2590    
2591         Last updated: 06 March 2007         Last updated: 29 May 2007
2592         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
2593  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2594    
# Line 2536  DIFFERENCES BETWEEN PCRE AND PERL Line 2604  DIFFERENCES BETWEEN PCRE AND PERL
2604    
2605         This  document describes the differences in the ways that PCRE and Perl         This  document describes the differences in the ways that PCRE and Perl
2606         handle regular expressions. The differences described here  are  mainly         handle regular expressions. The differences described here  are  mainly
2607         with  respect  to  Perl 5.8, though PCRE version 7.0 contains some fea-         with  respect  to  Perl 5.8, though PCRE versions 7.0 and later contain
2608         tures that are expected to be in the forthcoming Perl 5.10.         some features that are expected to be in the forthcoming Perl 5.10.
2609    
2610         1. PCRE has only a subset of Perl's UTF-8 and Unicode support.  Details         1. PCRE has only a subset of Perl's UTF-8 and Unicode support.  Details
2611         of  what  it does have are given in the section on UTF-8 support in the         of  what  it does have are given in the section on UTF-8 support in the
# Line 2615  DIFFERENCES BETWEEN PCRE AND PERL Line 2683  DIFFERENCES BETWEEN PCRE AND PERL
2683         meta-character matches only at the very end of the string.         meta-character matches only at the very end of the string.
2684    
2685         (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-         (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
2686         cial  meaning  is  faulted.  Otherwise,  like  Perl,  the  backslash is         cial meaning is faulted. Otherwise, like Perl, the backslash is quietly
2687         ignored. (Perl can be made to issue a warning.)         ignored.  (Perl can be made to issue a warning.)
2688    
2689         (d) If PCRE_UNGREEDY is set, the greediness of the  repetition  quanti-         (d) If PCRE_UNGREEDY is set, the greediness of the  repetition  quanti-
2690         fiers is inverted, that is, by default they are not greedy, but if fol-         fiers is inverted, that is, by default they are not greedy, but if fol-
# Line 2648  AUTHOR Line 2716  AUTHOR
2716    
2717  REVISION  REVISION
2718    
2719         Last updated: 06 March 2007         Last updated: 13 June 2007
2720         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
2721  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2722    
# Line 2681  PCRE REGULAR EXPRESSION DETAILS Line 2749  PCRE REGULAR EXPRESSION DETAILS
2749         ported  by  PCRE when its main matching function, pcre_exec(), is used.         ported  by  PCRE when its main matching function, pcre_exec(), is used.
2750         From  release  6.0,   PCRE   offers   a   second   matching   function,         From  release  6.0,   PCRE   offers   a   second   matching   function,
2751         pcre_dfa_exec(),  which matches using a different algorithm that is not         pcre_dfa_exec(),  which matches using a different algorithm that is not
2752         Perl-compatible. The advantages and disadvantages  of  the  alternative         Perl-compatible. Some of the features discussed below are not available
2753         function, and how it differs from the normal function, are discussed in         when  pcre_dfa_exec()  is used. The advantages and disadvantages of the
2754         the pcrematching page.         alternative function, and how it differs from the normal function,  are
2755           discussed in the pcrematching page.
2756    
2757    
2758  CHARACTERS AND METACHARACTERS  CHARACTERS AND METACHARACTERS
2759    
2760         A regular expression is a pattern that is  matched  against  a  subject         A  regular  expression  is  a pattern that is matched against a subject
2761         string  from  left  to right. Most characters stand for themselves in a         string from left to right. Most characters stand for  themselves  in  a
2762         pattern, and match the corresponding characters in the  subject.  As  a         pattern,  and  match  the corresponding characters in the subject. As a
2763         trivial example, the pattern         trivial example, the pattern
2764    
2765           The quick brown fox           The quick brown fox
2766    
2767         matches a portion of a subject string that is identical to itself. When         matches a portion of a subject string that is identical to itself. When
2768         caseless matching is specified (the PCRE_CASELESS option), letters  are         caseless  matching is specified (the PCRE_CASELESS option), letters are
2769         matched  independently  of case. In UTF-8 mode, PCRE always understands         matched independently of case. In UTF-8 mode, PCRE  always  understands
2770         the concept of case for characters whose values are less than  128,  so         the  concept  of case for characters whose values are less than 128, so
2771         caseless  matching  is always possible. For characters with higher val-         caseless matching is always possible. For characters with  higher  val-
2772         ues, the concept of case is supported if PCRE is compiled with  Unicode         ues,  the concept of case is supported if PCRE is compiled with Unicode
2773         property  support,  but  not  otherwise.   If  you want to use caseless         property support, but not otherwise.   If  you  want  to  use  caseless
2774         matching for characters 128 and above, you must  ensure  that  PCRE  is         matching  for  characters  128  and above, you must ensure that PCRE is
2775         compiled with Unicode property support as well as with UTF-8 support.         compiled with Unicode property support as well as with UTF-8 support.
2776    
2777         The  power  of  regular  expressions  comes from the ability to include         The power of regular expressions comes  from  the  ability  to  include
2778         alternatives and repetitions in the pattern. These are encoded  in  the         alternatives  and  repetitions in the pattern. These are encoded in the
2779         pattern by the use of metacharacters, which do not stand for themselves         pattern by the use of metacharacters, which do not stand for themselves
2780         but instead are interpreted in some special way.         but instead are interpreted in some special way.
2781    
2782         There are two different sets of metacharacters: those that  are  recog-         There  are  two different sets of metacharacters: those that are recog-
2783         nized  anywhere in the pattern except within square brackets, and those         nized anywhere in the pattern except within square brackets, and  those
2784         that are recognized within square brackets.  Outside  square  brackets,         that  are  recognized  within square brackets. Outside square brackets,
2785         the metacharacters are as follows:         the metacharacters are as follows:
2786    
2787           \      general escape character with several uses           \      general escape character with several uses
# Line 2731  CHARACTERS AND METACHARACTERS Line 2800  CHARACTERS AND METACHARACTERS
2800                  also "possessive quantifier"                  also "possessive quantifier"
2801           {      start min/max quantifier           {      start min/max quantifier
2802    
2803         Part  of  a  pattern  that is in square brackets is called a "character         Part of a pattern that is in square brackets  is  called  a  "character
2804         class". In a character class the only metacharacters are:         class". In a character class the only metacharacters are:
2805    
2806           \      general escape character           \      general escape character
# Line 2741  CHARACTERS AND METACHARACTERS Line 2810  CHARACTERS AND METACHARACTERS
2810                    syntax)                    syntax)
2811           ]      terminates the character class           ]      terminates the character class
2812    
2813         The following sections describe the use of each of the  metacharacters.         The  following sections describe the use of each of the metacharacters.
2814    
2815    
2816  BACKSLASH  BACKSLASH
2817    
2818         The backslash character has several uses. Firstly, if it is followed by         The backslash character has several uses. Firstly, if it is followed by
2819         a non-alphanumeric character, it takes away any  special  meaning  that         a  non-alphanumeric  character,  it takes away any special meaning that
2820         character  may  have.  This  use  of  backslash  as an escape character         character may have. This  use  of  backslash  as  an  escape  character
2821         applies both inside and outside character classes.         applies both inside and outside character classes.
2822    
2823         For example, if you want to match a * character, you write  \*  in  the         For  example,  if  you want to match a * character, you write \* in the
2824         pattern.   This  escaping  action  applies whether or not the following         pattern.  This escaping action applies whether  or  not  the  following
2825         character would otherwise be interpreted as a metacharacter, so  it  is         character  would  otherwise be interpreted as a metacharacter, so it is
2826         always  safe  to  precede  a non-alphanumeric with backslash to specify         always safe to precede a non-alphanumeric  with  backslash  to  specify
2827         that it stands for itself. In particular, if you want to match a  back-         that  it stands for itself. In particular, if you want to match a back-
2828         slash, you write \\.         slash, you write \\.
2829    
2830         If  a  pattern is compiled with the PCRE_EXTENDED option, whitespace in         If a pattern is compiled with the PCRE_EXTENDED option,  whitespace  in
2831         the pattern (other than in a character class) and characters between  a         the  pattern (other than in a character class) and characters between a
2832         # outside a character class and the next newline are ignored. An escap-         # outside a character class and the next newline are ignored. An escap-
2833         ing backslash can be used to include a whitespace  or  #  character  as         ing  backslash  can  be  used to include a whitespace or # character as
2834         part of the pattern.         part of the pattern.
2835    
2836         If  you  want  to remove the special meaning from a sequence of charac-         If you want to remove the special meaning from a  sequence  of  charac-
2837         ters, you can do so by putting them between \Q and \E. This is  differ-         ters,  you can do so by putting them between \Q and \E. This is differ-
2838         ent  from  Perl  in  that  $  and  @ are handled as literals in \Q...\E         ent from Perl in that $ and  @  are  handled  as  literals  in  \Q...\E
2839         sequences in PCRE, whereas in Perl, $ and @ cause  variable  interpola-         sequences  in  PCRE, whereas in Perl, $ and @ cause variable interpola-
2840         tion. Note the following examples:         tion. Note the following examples:
2841    
2842           Pattern            PCRE matches   Perl matches           Pattern            PCRE matches   Perl matches
# Line 2777  BACKSLASH Line 2846  BACKSLASH
2846           \Qabc\$xyz\E       abc\$xyz       abc\$xyz           \Qabc\$xyz\E       abc\$xyz       abc\$xyz
2847           \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz           \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
2848    
2849         The  \Q...\E  sequence  is recognized both inside and outside character         The \Q...\E sequence is recognized both inside  and  outside  character
2850         classes.         classes.
2851    
2852     Non-printing characters     Non-printing characters
2853    
2854         A second use of backslash provides a way of encoding non-printing char-         A second use of backslash provides a way of encoding non-printing char-
2855         acters  in patterns in a visible manner. There is no restriction on the         acters in patterns in a visible manner. There is no restriction on  the
2856         appearance of non-printing characters, apart from the binary zero  that         appearance  of non-printing characters, apart from the binary zero that
2857         terminates  a  pattern,  but  when  a pattern is being prepared by text         terminates a pattern, but when a pattern  is  being  prepared  by  text
2858         editing, it is usually easier  to  use  one  of  the  following  escape         editing,  it  is  usually  easier  to  use  one of the following escape
2859         sequences than the binary character it represents:         sequences than the binary character it represents:
2860    
2861           \a        alarm, that is, the BEL character (hex 07)           \a        alarm, that is, the BEL character (hex 07)
# Line 2800  BACKSLASH Line 2869  BACKSLASH
2869           \xhh      character with hex code hh           \xhh      character with hex code hh
2870           \x{hhh..} character with hex code hhh..           \x{hhh..} character with hex code hhh..
2871    
2872         The  precise  effect of \cx is as follows: if x is a lower case letter,         The precise effect of \cx is as follows: if x is a lower  case  letter,
2873         it is converted to upper case. Then bit 6 of the character (hex 40)  is         it  is converted to upper case. Then bit 6 of the character (hex 40) is
2874         inverted.   Thus  \cz becomes hex 1A, but \c{ becomes hex 3B, while \c;         inverted.  Thus \cz becomes hex 1A, but \c{ becomes hex 3B,  while  \c;
2875         becomes hex 7B.         becomes hex 7B.
2876    
2877         After \x, from zero to two hexadecimal digits are read (letters can  be         After  \x, from zero to two hexadecimal digits are read (letters can be
2878         in  upper  or  lower case). Any number of hexadecimal digits may appear         in upper or lower case). Any number of hexadecimal  digits  may  appear
2879         between \x{ and }, but the value of the character  code  must  be  less         between  \x{  and  },  but the value of the character code must be less
2880         than 256 in non-UTF-8 mode, and less than 2**31 in UTF-8 mode (that is,         than 256 in non-UTF-8 mode, and less than 2**31 in UTF-8 mode (that is,
2881         the maximum hexadecimal value is 7FFFFFFF). If  characters  other  than         the  maximum  hexadecimal  value is 7FFFFFFF). If characters other than
2882         hexadecimal  digits  appear between \x{ and }, or if there is no termi-         hexadecimal digits appear between \x{ and }, or if there is  no  termi-
2883         nating }, this form of escape is not recognized.  Instead, the  initial         nating  }, this form of escape is not recognized.  Instead, the initial
2884         \x will be interpreted as a basic hexadecimal escape, with no following         \x will be interpreted as a basic hexadecimal escape, with no following
2885         digits, giving a character whose value is zero.         digits, giving a character whose value is zero.
2886    
2887         Characters whose value is less than 256 can be defined by either of the         Characters whose value is less than 256 can be defined by either of the
2888         two  syntaxes  for  \x. There is no difference in the way they are han-         two syntaxes for \x. There is no difference in the way  they  are  han-
2889         dled. For example, \xdc is exactly the same as \x{dc}.         dled. For example, \xdc is exactly the same as \x{dc}.
2890    
2891         After \0 up to two further octal digits are read. If  there  are  fewer         After  \0  up  to two further octal digits are read. If there are fewer
2892         than  two  digits,  just  those  that  are  present  are used. Thus the         than two digits, just  those  that  are  present  are  used.  Thus  the
2893         sequence \0\x\07 specifies two binary zeros followed by a BEL character         sequence \0\x\07 specifies two binary zeros followed by a BEL character
2894         (code  value 7). Make sure you supply two digits after the initial zero         (code value 7). Make sure you supply two digits after the initial  zero
2895         if the pattern character that follows is itself an octal digit.         if the pattern character that follows is itself an octal digit.
2896    
2897         The handling of a backslash followed by a digit other than 0 is compli-         The handling of a backslash followed by a digit other than 0 is compli-
2898         cated.  Outside a character class, PCRE reads it and any following dig-         cated.  Outside a character class, PCRE reads it and any following dig-
2899         its as a decimal number. If the number is less than  10,  or  if  there         its  as  a  decimal  number. If the number is less than 10, or if there
2900         have been at least that many previous capturing left parentheses in the         have been at least that many previous capturing left parentheses in the
2901         expression, the entire  sequence  is  taken  as  a  back  reference.  A         expression,  the  entire  sequence  is  taken  as  a  back reference. A
2902         description  of how this works is given later, following the discussion         description of how this works is given later, following the  discussion
2903         of parenthesized subpatterns.         of parenthesized subpatterns.
2904    
2905         Inside a character class, or if the decimal number is  greater  than  9         Inside  a  character  class, or if the decimal number is greater than 9
2906         and  there have not been that many capturing subpatterns, PCRE re-reads         and there have not been that many capturing subpatterns, PCRE  re-reads
2907         up to three octal digits following the backslash, and uses them to gen-         up to three octal digits following the backslash, and uses them to gen-
2908         erate  a data character. Any subsequent digits stand for themselves. In         erate a data character. Any subsequent digits stand for themselves.  In
2909         non-UTF-8 mode, the value of a character specified  in  octal  must  be         non-UTF-8  mode,  the  value  of a character specified in octal must be
2910         less  than  \400.  In  UTF-8 mode, values up to \777 are permitted. For         less than \400. In UTF-8 mode, values up to  \777  are  permitted.  For
2911         example:         example:
2912    
2913           \040   is another way of writing a space           \040   is another way of writing a space
# Line 2856  BACKSLASH Line 2925  BACKSLASH
2925           \81    is either a back reference, or a binary zero           \81    is either a back reference, or a binary zero
2926                     followed by the two characters "8" and "1"                     followed by the two characters "8" and "1"
2927    
2928         Note that octal values of 100 or greater must not be  introduced  by  a         Note  that  octal  values of 100 or greater must not be introduced by a
2929         leading zero, because no more than three octal digits are ever read.         leading zero, because no more than three octal digits are ever read.
2930    
2931         All the sequences that define a single character value can be used both         All the sequences that define a single character value can be used both
2932         inside and outside character classes. In addition, inside  a  character         inside  and  outside character classes. In addition, inside a character
2933         class,  the  sequence \b is interpreted as the backspace character (hex         class, the sequence \b is interpreted as the backspace  character  (hex
2934         08), and the sequences \R and \X are interpreted as the characters  "R"         08),  and the sequences \R and \X are interpreted as the characters "R"
2935         and  "X", respectively. Outside a character class, these sequences have         and "X", respectively. Outside a character class, these sequences  have
2936         different meanings (see below).         different meanings (see below).
2937    
2938     Absolute and relative back references     Absolute and relative back references
2939    
2940         The sequence \g followed by a positive or negative  number,  optionally         The  sequence  \g followed by a positive or negative number, optionally
2941         enclosed  in  braces,  is  an absolute or relative back reference. Back         enclosed in braces, is an absolute or relative back reference. A  named
2942         references are discussed later, following the discussion  of  parenthe-         back  reference can be coded as \g{name}. Back references are discussed
2943         sized subpatterns.         later, following the discussion of parenthesized subpatterns.
2944    
2945     Generic character types     Generic character types
2946    
# Line 2880  BACKSLASH Line 2949  BACKSLASH
2949    
2950           \d     any decimal digit           \d     any decimal digit
2951           \D     any character that is not a decimal digit           \D     any character that is not a decimal digit
2952             \h     any horizontal whitespace character
2953             \H     any character that is not a horizontal whitespace character
2954           \s     any whitespace character           \s     any whitespace character
2955           \S     any character that is not a whitespace character           \S     any character that is not a whitespace character
2956             \v     any vertical whitespace character
2957             \V     any character that is not a vertical whitespace character
2958           \w     any "word" character           \w     any "word" character
2959           \W     any "non-word" character           \W     any "non-word" character
2960    
2961         Each pair of escape sequences partitions the complete set of characters         Each pair of escape sequences partitions the complete set of characters
2962         into  two disjoint sets. Any given character matches one, and only one,         into two disjoint sets. Any given character matches one, and only  one,
2963         of each pair.         of each pair.
2964    
2965         These character type sequences can appear both inside and outside char-         These character type sequences can appear both inside and outside char-
2966         acter  classes.  They each match one character of the appropriate type.         acter classes. They each match one character of the  appropriate  type.
2967         If the current matching point is at the end of the subject string,  all         If  the current matching point is at the end of the subject string, all
2968         of them fail, since there is no character to match.         of them fail, since there is no character to match.
2969    
2970         For  compatibility  with Perl, \s does not match the VT character (code         For compatibility with Perl, \s does not match the VT  character  (code
2971         11).  This makes it different from the the POSIX "space" class. The  \s         11).   This makes it different from the the POSIX "space" class. The \s
2972         characters  are  HT (9), LF (10), FF (12), CR (13), and space (32). (If         characters are HT (9), LF (10), FF (12), CR (13), and  space  (32).  If
2973         "use locale;" is included in a Perl script, \s may match the VT charac-         "use locale;" is included in a Perl script, \s may match the VT charac-
2974         ter. In PCRE, it never does.)         ter. In PCRE, it never does.
   
        A "word" character is an underscore or any character less than 256 that  
        is a letter or digit. The definition of  letters  and  digits  is  con-  
        trolled  by PCRE's low-valued character tables, and may vary if locale-  
        specific matching is taking place (see "Locale support" in the  pcreapi  
        page).  For  example,  in  the  "fr_FR" (French) locale, some character  
        codes greater than 128 are used for accented  letters,  and  these  are  
        matched by \w.  
2975    
2976         In  UTF-8 mode, characters with values greater than 128 never match \d,         In UTF-8 mode, characters with values greater than 128 never match  \d,
2977         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-
2978         code  character  property support is available. The use of locales with         code character property support is available.  These  sequences  retain
2979         Unicode is discouraged.         their original meanings from before UTF-8 support was available, mainly
2980           for efficiency reasons.
2981    
2982           The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to
2983           the  other  sequences, these do match certain high-valued codepoints in
2984           UTF-8 mode.  The horizontal space characters are:
2985    
2986             U+0009     Horizontal tab
2987             U+0020     Space
2988             U+00A0     Non-break space
2989             U+1680     Ogham space mark
2990             U+180E     Mongolian vowel separator
2991             U+2000     En quad
2992             U+2001     Em quad
2993             U+2002     En space
2994             U+2003     Em space
2995             U+2004     Three-per-em space
2996             U+2005     Four-per-em space
2997             U+2006     Six-per-em space
2998             U+2007     Figure space
2999             U+2008     Punctuation space
3000             U+2009     Thin space
3001             U+200A     Hair space
3002             U+202F     Narrow no-break space
3003             U+205F     Medium mathematical space
3004             U+3000     Ideographic space
3005    
3006           The vertical space characters are:
3007    
3008             U+000A     Linefeed
3009             U+000B     Vertical tab
3010             U+000C     Formfeed
3011             U+000D     Carriage return
3012             U+0085     Next line
3013             U+2028     Line separator
3014             U+2029     Paragraph separator
3015    
3016           A "word" character is an underscore or any character less than 256 that
3017           is  a  letter  or  digit.  The definition of letters and digits is con-
3018           trolled by PCRE's low-valued character tables, and may vary if  locale-
3019           specific  matching is taking place (see "Locale support" in the pcreapi
3020           page). For example, in a French locale such  as  "fr_FR"  in  Unix-like
3021           systems,  or "french" in Windows, some character codes greater than 128
3022           are used for accented letters, and these are matched by \w. The use  of
3023           locales with Unicode is discouraged.
3024    
3025     Newline sequences     Newline sequences
3026    
3027         Outside a character class, the escape sequence \R matches  any  Unicode         Outside  a  character class, the escape sequence \R matches any Unicode
3028         newline sequence. This is an extension to Perl. In non-UTF-8 mode \R is         newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R  is
3029         equivalent to the following:         equivalent to the following:
3030    
3031           (?>\r\n|\n|\x0b|\f|\r|\x85)           (?>\r\n|\n|\x0b|\f|\r|\x85)
3032    
3033         This is an example of an "atomic group", details  of  which  are  given         This  is  an  example  of an "atomic group", details of which are given
3034         below.  This particular group matches either the two-character sequence         below.  This particular group matches either the two-character sequence
3035         CR followed by LF, or  one  of  the  single  characters  LF  (linefeed,         CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
3036         U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage         U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
3037         return, U+000D), or NEL (next line, U+0085). The two-character sequence         return, U+000D), or NEL (next line, U+0085). The two-character sequence
3038         is treated as a single unit that cannot be split.         is treated as a single unit that cannot be split.
3039    
3040         In  UTF-8  mode, two additional characters whose codepoints are greater         In UTF-8 mode, two additional characters whose codepoints  are  greater
3041         than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-         than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
3042         rator,  U+2029).   Unicode character property support is not needed for         rator, U+2029).  Unicode character property support is not  needed  for
3043         these characters to be recognized.         these characters to be recognized.
3044    
3045         Inside a character class, \R matches the letter "R".         Inside a character class, \R matches the letter "R".
# Line 2938  BACKSLASH Line 3047  BACKSLASH
3047     Unicode character properties     Unicode character properties
3048    
3049         When PCRE is built with Unicode character property support, three addi-         When PCRE is built with Unicode character property support, three addi-
3050         tional  escape  sequences  to  match character properties are available         tional escape sequences that match characters with specific  properties
3051         when UTF-8 mode is selected. They are:         are  available.   When not in UTF-8 mode, these sequences are of course
3052           limited to testing characters whose codepoints are less than  256,  but
3053           they do work in this mode.  The extra escape sequences are:
3054    
3055           \p{xx}   a character with the xx property           \p{xx}   a character with the xx property
3056           \P{xx}   a character without the xx property           \P{xx}   a character without the xx property
3057           \X       an extended Unicode sequence           \X       an extended Unicode sequence
3058    
3059         The property names represented by xx above are limited to  the  Unicode         The  property  names represented by xx above are limited to the Unicode
3060         script names, the general category properties, and "Any", which matches         script names, the general category properties, and "Any", which matches
3061         any character (including newline). Other properties such as "InMusical-         any character (including newline). Other properties such as "InMusical-
3062         Symbols"  are  not  currently supported by PCRE. Note that \P{Any} does         Symbols" are not currently supported by PCRE. Note  that  \P{Any}  does
3063         not match any characters, so always causes a match failure.         not match any characters, so always causes a match failure.
3064    
3065         Sets of Unicode characters are defined as belonging to certain scripts.         Sets of Unicode characters are defined as belonging to certain scripts.
3066         A  character from one of these sets can be matched using a script name.         A character from one of these sets can be matched using a script  name.
3067         For example:         For example:
3068    
3069           \p{Greek}           \p{Greek}
3070           \P{Han}           \P{Han}
3071    
3072         Those that are not part of an identified script are lumped together  as         Those  that are not part of an identified script are lumped together as
3073         "Common". The current list of scripts is:         "Common". The current list of scripts is:
3074    
3075         Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,         Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
3076         Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,         Buhid,   Canadian_Aboriginal,   Cherokee,  Common,  Coptic,  Cuneiform,
3077         Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,         Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
3078         Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-         Gothic,  Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
3079         gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,         gana, Inherited, Kannada,  Katakana,  Kharoshthi,  Khmer,  Lao,  Latin,
3080         Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,         Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
3081         Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,         Ogham, Old_Italic, Old_Persian, Oriya, Osmanya,  Phags_Pa,  Phoenician,
3082         Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,         Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
3083         Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.         Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
3084    
3085         Each  character has exactly one general category property, specified by         Each character has exactly one general category property, specified  by
3086         a two-letter abbreviation. For compatibility with Perl, negation can be         a two-letter abbreviation. For compatibility with Perl, negation can be
3087         specified  by  including a circumflex between the opening brace and the         specified by including a circumflex between the opening brace  and  the
3088         property name. For example, \p{^Lu} is the same as \P{Lu}.         property name. For example, \p{^Lu} is the same as \P{Lu}.
3089    
3090         If only one letter is specified with \p or \P, it includes all the gen-         If only one letter is specified with \p or \P, it includes all the gen-
3091         eral  category properties that start with that letter. In this case, in         eral category properties that start with that letter. In this case,  in
3092         the absence of negation, the curly brackets in the escape sequence  are         the  absence of negation, the curly brackets in the escape sequence are
3093         optional; these two examples have the same effect:         optional; these two examples have the same effect:
3094    
3095           \p{L}           \p{L}
# Line 3030  BACKSLASH Line 3141  BACKSLASH
3141           Zp    Paragraph separator           Zp    Paragraph separator
3142           Zs    Space separator           Zs    Space separator
3143    
3144         The  special property L& is also supported: it matches a character that         The special property L& is also supported: it matches a character  that
3145         has the Lu, Ll, or Lt property, in other words, a letter  that  is  not         has  the  Lu,  Ll, or Lt property, in other words, a letter that is not
3146         classified as a modifier or "other".         classified as a modifier or "other".
3147    
3148         The  long  synonyms  for  these  properties that Perl supports (such as         The long synonyms for these properties  that  Perl  supports  (such  as
3149         \p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix         \p{Letter})  are  not  supported by PCRE, nor is it permitted to prefix
3150         any of these properties with "Is".         any of these properties with "Is".
3151    
3152         No character that is in the Unicode table has the Cn (unassigned) prop-         No character that is in the Unicode table has the Cn (unassigned) prop-
3153         erty.  Instead, this property is assumed for any code point that is not         erty.  Instead, this property is assumed for any code point that is not
3154         in the Unicode table.         in the Unicode table.
3155    
3156         Specifying  caseless  matching  does not affect these escape sequences.         Specifying caseless matching does not affect  these  escape  sequences.
3157         For example, \p{Lu} always matches only upper case letters.         For example, \p{Lu} always matches only upper case letters.
3158    
3159         The \X escape matches any number of Unicode  characters  that  form  an         The  \X  escape  matches  any number of Unicode characters that form an
3160         extended Unicode sequence. \X is equivalent to         extended Unicode sequence. \X is equivalent to
3161    
3162           (?>\PM\pM*)           (?>\PM\pM*)
3163    
3164         That  is,  it matches a character without the "mark" property, followed         That is, it matches a character without the "mark"  property,  followed
3165         by zero or more characters with the "mark"  property,  and  treats  the         by  zero  or  more  characters with the "mark" property, and treats the
3166         sequence  as  an  atomic group (see below).  Characters with the "mark"         sequence as an atomic group (see below).  Characters  with  the  "mark"
3167         property are typically accents that affect the preceding character.         property  are  typically  accents  that affect the preceding character.
3168           None of them have codepoints less than 256, so  in  non-UTF-8  mode  \X
3169           matches any one character.
3170    
3171         Matching characters by Unicode property is not fast, because  PCRE  has         Matching  characters  by Unicode property is not fast, because PCRE has
3172         to  search  a  structure  that  contains data for over fifteen thousand         to search a structure that contains  data  for  over  fifteen  thousand
3173         characters. That is why the traditional escape sequences such as \d and         characters. That is why the traditional escape sequences such as \d and
3174         \w do not use Unicode properties in PCRE.         \w do not use Unicode properties in PCRE.
3175    
3176       Resetting the match start
3177    
3178           The escape sequence \K, which is a Perl 5.10 feature, causes any previ-
3179           ously  matched  characters  not  to  be  included  in the final matched
3180           sequence. For example, the pattern:
3181    
3182             foo\Kbar
3183    
3184           matches "foobar", but reports that it has matched "bar".  This  feature
3185           is  similar  to  a lookbehind assertion (described below).  However, in
3186           this case, the part of the subject before the real match does not  have
3187           to  be of fixed length, as lookbehind assertions do. The use of \K does
3188           not interfere with the setting of captured  substrings.   For  example,
3189           when the pattern
3190    
3191             (foo)\Kbar
3192    
3193           matches "foobar", the first substring is still set to "foo".
3194    
3195     Simple assertions     Simple assertions
3196    
3197         The  final use of backslash is for certain simple assertions. An asser-         The  final use of backslash is for certain simple assertions. An asser-
# Line 3275  SQUARE BRACKETS AND CHARACTER CLASSES Line 3407  SQUARE BRACKETS AND CHARACTER CLASSES
3407         If a range that includes letters is used when caseless matching is set,         If a range that includes letters is used when caseless matching is set,
3408         it matches the letters in either case. For example, [W-c] is equivalent         it matches the letters in either case. For example, [W-c] is equivalent
3409         to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if         to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
3410         character tables for the "fr_FR" locale are in use, [\xc8-\xcb] matches         character tables for a French locale are in  use,  [\xc8-\xcb]  matches
3411         accented  E  characters in both cases. In UTF-8 mode, PCRE supports the         accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
3412         concept of case for characters with values greater than 128  only  when         concept of case for characters with values greater than 128  only  when
3413         it is compiled with Unicode property support.         it is compiled with Unicode property support.
# Line 3460  SUBPATTERNS Line 3592  SUBPATTERNS
3592         "Saturday".         "Saturday".
3593    
3594    
3595    DUPLICATE SUBPATTERN NUMBERS
3596    
3597           Perl 5.10 introduced a feature whereby each alternative in a subpattern
3598           uses the same numbers for its capturing parentheses. Such a  subpattern
3599           starts  with (?| and is itself a non-capturing subpattern. For example,
3600           consider this pattern:
3601    
3602             (?|(Sat)ur|(Sun))day
3603    
3604           Because the two alternatives are inside a (?| group, both sets of  cap-
3605           turing  parentheses  are  numbered one. Thus, when the pattern matches,
3606           you can look at captured substring number  one,  whichever  alternative
3607           matched.  This  construct  is useful when you want to capture part, but
3608           not all, of one of a number of alternatives. Inside a (?| group, paren-
3609           theses  are  numbered as usual, but the number is reset at the start of
3610           each branch. The numbers of any capturing buffers that follow the  sub-
3611           pattern  start after the highest number used in any branch. The follow-
3612           ing example is taken from the Perl documentation.  The  numbers  under-
3613           neath show in which buffer the captured content will be stored.
3614    
3615             # before  ---------------branch-reset----------- after
3616             / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
3617             # 1            2         2  3        2     3     4
3618    
3619           A  backreference  or  a  recursive call to a numbered subpattern always
3620           refers to the first one in the pattern with the given number.
3621    
3622           An alternative approach to using this "branch reset" feature is to  use
3623           duplicate named subpatterns, as described in the next section.
3624    
3625    
3626  NAMED SUBPATTERNS  NAMED SUBPATTERNS
3627    
3628         Identifying  capturing  parentheses  by number is simple, but it can be         Identifying  capturing  parentheses  by number is simple, but it can be
# Line 3499  NAMED SUBPATTERNS Line 3662  NAMED SUBPATTERNS
3662           (?<DN>Sat)(?:urday)?           (?<DN>Sat)(?:urday)?
3663    
3664         There  are  five capturing substrings, but only one is ever set after a         There  are  five capturing substrings, but only one is ever set after a
3665         match.  The convenience  function  for  extracting  the  data  by  name         match.  (An alternative way of solving this problem is to use a "branch
3666         returns  the  substring  for  the first (and in this example, the only)         reset" subpattern, as described in the previous section.)
3667         subpattern of that name that matched.  This  saves  searching  to  find  
3668         which  numbered  subpattern  it  was. If you make a reference to a non-         The  convenience  function  for extracting the data by name returns the
3669         unique named subpattern from elsewhere in the  pattern,  the  one  that         substring for the first (and in this example, the only)  subpattern  of
3670         corresponds  to  the  lowest number is used. For further details of the         that  name  that  matched.  This saves searching to find which numbered
3671         interfaces for handling named subpatterns, see the  pcreapi  documenta-         subpattern it was. If you make a reference to a non-unique  named  sub-
3672         tion.         pattern  from elsewhere in the pattern, the one that corresponds to the
3673           lowest number is used. For further details of the interfaces  for  han-
3674           dling named subpatterns, see the pcreapi documentation.
3675    
3676    
3677  REPETITION  REPETITION
# Line 3821  BACK REFERENCES Line 3986  BACK REFERENCES
3986         matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the         matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
3987         original capturing subpattern is matched caselessly.         original capturing subpattern is matched caselessly.
3988    
3989         Back references to named subpatterns use the Perl  syntax  \k<name>  or         There are several different ways of writing back  references  to  named
3990         \k'name'  or  the  Python  syntax (?P=name). We could rewrite the above         subpatterns.  The  .NET syntax \k{name} and the Perl syntax \k<name> or
3991         example in either of the following ways:         \k'name' are supported, as is the Python syntax (?P=name). Perl  5.10's
3992           unified back reference syntax, in which \g can be used for both numeric
3993           and named references, is also supported. We  could  rewrite  the  above
3994           example in any of the following ways:
3995    
3996           (?<p1>(?i)rah)\s+\k<p1>           (?<p1>(?i)rah)\s+\k<p1>
3997             (?'p1'(?i)rah)\s+\k{p1}
3998           (?P<p1>(?i)rah)\s+(?P=p1)           (?P<p1>(?i)rah)\s+(?P=p1)
3999             (?<p1>(?i)rah)\s+\g{p1}
4000    
4001         A subpattern that is referenced by  name  may  appear  in  the  pattern         A  subpattern  that  is  referenced  by  name may appear in the pattern
4002         before or after the reference.         before or after the reference.
4003    
4004         There  may be more than one back reference to the same subpattern. If a         There may be more than one back reference to the same subpattern. If  a
4005         subpattern has not actually been used in a particular match,  any  back         subpattern  has  not actually been used in a particular match, any back
4006         references to it always fail. For example, the pattern         references to it always fail. For example, the pattern
4007    
4008           (a|(bc))\2           (a|(bc))\2
4009    
4010         always  fails if it starts to match "a" rather than "bc". Because there         always fails if it starts to match "a" rather than "bc". Because  there
4011         may be many capturing parentheses in a pattern,  all  digits  following         may  be  many  capturing parentheses in a pattern, all digits following
4012         the  backslash  are taken as part of a potential back reference number.         the backslash are taken as part of a potential back  reference  number.
4013         If the pattern continues with a digit character, some delimiter must be         If the pattern continues with a digit character, some delimiter must be
4014         used  to  terminate  the back reference. If the PCRE_EXTENDED option is         used to terminate the back reference. If the  PCRE_EXTENDED  option  is
4015         set, this can be whitespace.  Otherwise an  empty  comment  (see  "Com-         set,  this  can  be  whitespace.  Otherwise an empty comment (see "Com-
4016         ments" below) can be used.         ments" below) can be used.
4017    
4018         A  back reference that occurs inside the parentheses to which it refers         A back reference that occurs inside the parentheses to which it  refers
4019         fails when the subpattern is first used, so, for example,  (a\1)  never         fails  when  the subpattern is first used, so, for example, (a\1) never
4020         matches.   However,  such references can be useful inside repeated sub-         matches.  However, such references can be useful inside  repeated  sub-
4021         patterns. For example, the pattern         patterns. For example, the pattern
4022    
4023           (a|b\1)+           (a|b\1)+
4024    
4025         matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-         matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
4026         ation  of  the  subpattern,  the  back  reference matches the character         ation of the subpattern,  the  back  reference  matches  the  character
4027         string corresponding to the previous iteration. In order  for  this  to         string  corresponding  to  the previous iteration. In order for this to
4028         work,  the  pattern must be such that the first iteration does not need         work, the pattern must be such that the first iteration does  not  need
4029         to match the back reference. This can be done using alternation, as  in         to  match the back reference. This can be done using alternation, as in
4030         the example above, or by a quantifier with a minimum of zero.         the example above, or by a quantifier with a minimum of zero.
4031    
4032    
4033  ASSERTIONS  ASSERTIONS
4034    
4035         An  assertion  is  a  test on the characters following or preceding the         An assertion is a test on the characters  following  or  preceding  the
4036         current matching point that does not actually consume  any  characters.         current  matching  point that does not actually consume any characters.
4037         The  simple  assertions  coded  as  \b, \B, \A, \G, \Z, \z, ^ and $ are         The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
4038         described above.         described above.
4039    
4040         More complicated assertions are coded as  subpatterns.  There  are  two         More  complicated  assertions  are  coded as subpatterns. There are two
4041         kinds:  those  that  look  ahead of the current position in the subject         kinds: those that look ahead of the current  position  in  the  subject
4042         string, and those that look  behind  it.  An  assertion  subpattern  is         string,  and  those  that  look  behind  it. An assertion subpattern is
4043         matched  in  the  normal way, except that it does not cause the current         matched in the normal way, except that it does not  cause  the  current
4044         matching position to be changed.         matching position to be changed.
4045    
4046         Assertion subpatterns are not capturing subpatterns,  and  may  not  be         Assertion  subpatterns  are  not  capturing subpatterns, and may not be
4047         repeated,  because  it  makes no sense to assert the same thing several         repeated, because it makes no sense to assert the  same  thing  several
4048         times. If any kind of assertion contains capturing  subpatterns  within         times.  If  any kind of assertion contains capturing subpatterns within
4049         it,  these are counted for the purposes of numbering the capturing sub-         it, these are counted for the purposes of numbering the capturing  sub-
4050         patterns in the whole pattern.  However, substring capturing is carried         patterns in the whole pattern.  However, substring capturing is carried
4051         out  only  for  positive assertions, because it does not make sense for         out only for positive assertions, because it does not  make  sense  for
4052         negative assertions.         negative assertions.
4053    
4054     Lookahead assertions     Lookahead assertions
# Line 3888  ASSERTIONS Line 4058  ASSERTIONS
4058    
4059           \w+(?=;)           \w+(?=;)
4060    
4061         matches  a word followed by a semicolon, but does not include the semi-         matches a word followed by a semicolon, but does not include the  semi-
4062         colon in the match, and         colon in the match, and
4063    
4064           foo(?!bar)           foo(?!bar)
4065    
4066         matches any occurrence of "foo" that is not  followed  by  "bar".  Note         matches  any  occurrence  of  "foo" that is not followed by "bar". Note
4067         that the apparently similar pattern         that the apparently similar pattern
4068    
4069           (?!foo)bar           (?!foo)bar
4070    
4071         does  not  find  an  occurrence  of "bar" that is preceded by something         does not find an occurrence of "bar"  that  is  preceded  by  something
4072         other than "foo"; it finds any occurrence of "bar" whatsoever,  because         other  than "foo"; it finds any occurrence of "bar" whatsoever, because
4073         the assertion (?!foo) is always true when the next three characters are         the assertion (?!foo) is always true when the next three characters are
4074         "bar". A lookbehind assertion is needed to achieve the other effect.         "bar". A lookbehind assertion is needed to achieve the other effect.
4075    
4076         If you want to force a matching failure at some point in a pattern, the         If you want to force a matching failure at some point in a pattern, the
4077         most  convenient  way  to  do  it  is with (?!) because an empty string         most convenient way to do it is  with  (?!)  because  an  empty  string
4078         always matches, so an assertion that requires there not to be an  empty         always  matches, so an assertion that requires there not to be an empty
4079         string must always fail.         string must always fail.
4080    
4081     Lookbehind assertions     Lookbehind assertions
4082    
4083         Lookbehind  assertions start with (?<= for positive assertions and (?<!         Lookbehind assertions start with (?<= for positive assertions and  (?<!
4084         for negative assertions. For example,         for negative assertions. For example,
4085    
4086           (?<!foo)bar           (?<!foo)bar
4087    
4088         does find an occurrence of "bar" that is not  preceded  by  "foo".  The         does  find  an  occurrence  of "bar" that is not preceded by "foo". The
4089         contents  of  a  lookbehind  assertion are restricted such that all the         contents of a lookbehind assertion are restricted  such  that  all  the
4090         strings it matches must have a fixed length. However, if there are sev-         strings it matches must have a fixed length. However, if there are sev-
4091         eral  top-level  alternatives,  they  do  not all have to have the same         eral top-level alternatives, they do not all  have  to  have  the  same
4092         fixed length. Thus         fixed length. Thus
4093    
4094           (?<=bullock|donkey)           (?<=bullock|donkey)
# Line 3927  ASSERTIONS Line 4097  ASSERTIONS
4097    
4098           (?<!dogs?|cats?)           (?<!dogs?|cats?)
4099    
4100         causes an error at compile time. Branches that match  different  length         causes  an  error at compile time. Branches that match different length
4101         strings  are permitted only at the top level of a lookbehind assertion.         strings are permitted only at the top level of a lookbehind  assertion.
4102         This is an extension compared with  Perl  (at  least  for  5.8),  which         This  is  an  extension  compared  with  Perl (at least for 5.8), which
4103         requires  all branches to match the same length of string. An assertion         requires all branches to match the same length of string. An  assertion
4104         such as         such as
4105    
4106           (?<=ab(c|de))           (?<=ab(c|de))
4107    
4108         is not permitted, because its single top-level  branch  can  match  two         is  not  permitted,  because  its single top-level branch can match two
4109         different  lengths,  but  it is acceptable if rewritten to use two top-         different lengths, but it is acceptable if rewritten to  use  two  top-
4110         level branches:         level branches:
4111    
4112           (?<=abc|abde)           (?<=abc|abde)
4113    
4114         The implementation of lookbehind assertions is, for  each  alternative,         In some cases, the Perl 5.10 escape sequence \K (see above) can be used
4115         to  temporarily  move the current position back by the fixed length and         instead of a lookbehind assertion; this is not restricted to  a  fixed-
4116           length.
4117    
4118           The  implementation  of lookbehind assertions is, for each alternative,
4119           to temporarily move the current position back by the fixed  length  and
4120         then try to match. If there are insufficient characters before the cur-         then try to match. If there are insufficient characters before the cur-
4121         rent position, the assertion fails.         rent position, the assertion fails.
4122    
4123         PCRE does not allow the \C escape (which matches a single byte in UTF-8         PCRE does not allow the \C escape (which matches a single byte in UTF-8
4124         mode) to appear in lookbehind assertions, because it makes it  impossi-         mode)  to appear in lookbehind assertions, because it makes it impossi-
4125         ble  to  calculate the length of the lookbehind. The \X and \R escapes,         ble to calculate the length of the lookbehind. The \X and  \R  escapes,
4126         which can match different numbers of bytes, are also not permitted.         which can match different numbers of bytes, are also not permitted.
4127    
4128         Possessive quantifiers can  be  used  in  conjunction  with  lookbehind         Possessive  quantifiers  can  be  used  in  conjunction with lookbehind
4129         assertions  to  specify  efficient  matching  at the end of the subject         assertions to specify efficient matching at  the  end  of  the  subject
4130         string. Consider a simple pattern such as         string. Consider a simple pattern such as
4131    
4132           abcd$           abcd$
4133    
4134         when applied to a long string that does  not  match.  Because  matching         when  applied  to  a  long string that does not match. Because matching
4135         proceeds from left to right, PCRE will look for each "a" in the subject         proceeds from left to right, PCRE will look for each "a" in the subject
4136         and then see if what follows matches the rest of the  pattern.  If  the         and  then  see  if what follows matches the rest of the pattern. If the
4137         pattern is specified as         pattern is specified as
4138    
4139           ^.*abcd$           ^.*abcd$
4140    
4141         the  initial .* matches the entire string at first, but when this fails         the initial .* matches the entire string at first, but when this  fails
4142         (because there is no following "a"), it backtracks to match all but the         (because there is no following "a"), it backtracks to match all but the
4143         last  character,  then all but the last two characters, and so on. Once         last character, then all but the last two characters, and so  on.  Once
4144         again the search for "a" covers the entire string, from right to  left,         again  the search for "a" covers the entire string, from right to left,
4145         so we are no better off. However, if the pattern is written as         so we are no better off. However, if the pattern is written as
4146    
4147           ^.*+(?<=abcd)           ^.*+(?<=abcd)
4148    
4149         there  can  be  no backtracking for the .*+ item; it can match only the         there can be no backtracking for the .*+ item; it can  match  only  the
4150         entire string. The subsequent lookbehind assertion does a  single  test         entire  string.  The subsequent lookbehind assertion does a single test
4151         on  the last four characters. If it fails, the match fails immediately.         on the last four characters. If it fails, the match fails  immediately.
4152         For long strings, this approach makes a significant difference  to  the         For  long  strings, this approach makes a significant difference to the
4153         processing time.         processing time.
4154    
4155     Using multiple assertions     Using multiple assertions
# Line 3984  ASSERTIONS Line 4158  ASSERTIONS
4158    
4159           (?<=\d{3})(?<!999)foo           (?<=\d{3})(?<!999)foo
4160    
4161         matches  "foo" preceded by three digits that are not "999". Notice that         matches "foo" preceded by three digits that are not "999". Notice  that
4162         each of the assertions is applied independently at the  same  point  in         each  of  the  assertions is applied independently at the same point in
4163         the  subject  string.  First  there  is a check that the previous three         the subject string. First there is a  check  that  the  previous  three
4164         characters are all digits, and then there is  a  check  that  the  same         characters  are  all  digits,  and  then there is a check that the same
4165         three characters are not "999".  This pattern does not match "foo" pre-         three characters are not "999".  This pattern does not match "foo" pre-
4166         ceded by six characters, the first of which are  digits  and  the  last         ceded  by  six  characters,  the first of which are digits and the last
4167         three  of  which  are not "999". For example, it doesn't match "123abc-         three of which are not "999". For example, it  doesn't  match  "123abc-
4168         foo". A pattern to do that is         foo". A pattern to do that is
4169    
4170           (?<=\d{3}...)(?<!999)foo           (?<=\d{3}...)(?<!999)foo
4171    
4172         This time the first assertion looks at the  preceding  six  characters,         This  time  the  first assertion looks at the preceding six characters,
4173         checking that the first three are digits, and then the second assertion         checking that the first three are digits, and then the second assertion
4174         checks that the preceding three characters are not "999".         checks that the preceding three characters are not "999".
4175    
# Line 4003  ASSERTIONS Line 4177  ASSERTIONS
4177    
4178           (?<=(?<!foo)bar)baz           (?<=(?<!foo)bar)baz
4179    
4180         matches an occurrence of "baz" that is preceded by "bar" which in  turn         matches  an occurrence of "baz" that is preceded by "bar" which in turn
4181         is not preceded by "foo", while         is not preceded by "foo", while
4182    
4183           (?<=\d{3}(?!999)...)foo           (?<=\d{3}(?!999)...)foo
4184    
4185         is  another pattern that matches "foo" preceded by three digits and any         is another pattern that matches "foo" preceded by three digits and  any
4186         three characters that are not "999".         three characters that are not "999".
4187    
4188    
4189  CONDITIONAL SUBPATTERNS  CONDITIONAL SUBPATTERNS
4190    
4191         It is possible to cause the matching process to obey a subpattern  con-         It  is possible to cause the matching process to obey a subpattern con-
4192         ditionally  or to choose between two alternative subpatterns, depending         ditionally or to choose between two alternative subpatterns,  depending
4193         on the result of an assertion, or whether a previous capturing  subpat-         on  the result of an assertion, or whether a previous capturing subpat-
4194         tern  matched  or not. The two possible forms of conditional subpattern         tern matched or not. The two possible forms of  conditional  subpattern
4195         are         are
4196    
4197           (?(condition)yes-pattern)           (?(condition)yes-pattern)
4198           (?(condition)yes-pattern|no-pattern)           (?(condition)yes-pattern|no-pattern)
4199    
4200         If the condition is satisfied, the yes-pattern is used;  otherwise  the         If  the  condition is satisfied, the yes-pattern is used; otherwise the
4201         no-pattern  (if  present)  is used. If there are more than two alterna-         no-pattern (if present) is used. If there are more  than  two  alterna-
4202         tives in the subpattern, a compile-time error occurs.         tives in the subpattern, a compile-time error occurs.
4203    
4204         There are four kinds of condition: references  to  subpatterns,  refer-         There  are  four  kinds of condition: references to subpatterns, refer-
4205         ences to recursion, a pseudo-condition called DEFINE, and assertions.         ences to recursion, a pseudo-condition called DEFINE, and assertions.
4206    
4207     Checking for a used subpattern by number     Checking for a used subpattern by number
4208    
4209         If  the  text between the parentheses consists of a sequence of digits,         If the text between the parentheses consists of a sequence  of  digits,
4210         the condition is true if the capturing subpattern of  that  number  has         the  condition  is  true if the capturing subpattern of that number has
4211         previously matched.         previously matched. An alternative notation is to  precede  the  digits
4212           with a plus or minus sign. In this case, the subpattern number is rela-
4213           tive rather than absolute.  The most recently opened parentheses can be
4214           referenced  by  (?(-1),  the  next most recent by (?(-2), and so on. In
4215           looping constructs it can also make sense to refer to subsequent groups
4216           with constructs such as (?(+2).
4217    
4218         Consider  the  following  pattern, which contains non-significant white         Consider  the  following  pattern, which contains non-significant white
4219         space to make it more readable (assume the PCRE_EXTENDED option) and to         space to make it more readable (assume the PCRE_EXTENDED option) and to
# Line 4053  CONDITIONAL SUBPATTERNS Line 4232  CONDITIONAL SUBPATTERNS
4232         other  words,  this  pattern  matches  a  sequence  of non-parentheses,         other  words,  this  pattern  matches  a  sequence  of non-parentheses,
4233         optionally enclosed in parentheses.         optionally enclosed in parentheses.
4234    
4235           If you were embedding this pattern in a larger one,  you  could  use  a
4236           relative reference:
4237    
4238             ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
4239    
4240           This  makes  the  fragment independent of the parentheses in the larger
4241           pattern.
4242    
4243     Checking for a used subpattern by name     Checking for a used subpattern by name
4244    
4245         Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a         Perl uses the syntax (?(<name>)...) or (?('name')...)  to  test  for  a
# Line 4194  RECURSIVE PATTERNS Line 4381  RECURSIVE PATTERNS
4381           ( \( ( (?>[^()]+) | (?1) )* \) )           ( \( ( (?>[^()]+) | (?1) )* \) )
4382    
4383         We  have  put the pattern into parentheses, and caused the recursion to         We  have  put the pattern into parentheses, and caused the recursion to
4384         refer to them instead of the whole pattern. In a larger pattern,  keep-         refer to them instead of the whole pattern.
4385         ing  track  of parenthesis numbers can be tricky. It may be more conve-  
4386         nient to use named parentheses instead. The Perl  syntax  for  this  is         In a larger pattern,  keeping  track  of  parenthesis  numbers  can  be
4387         (?&name);  PCRE's  earlier syntax (?P>name) is also supported. We could         tricky.  This is made easier by the use of relative references. (A Perl
4388         rewrite the above example as follows:         5.10 feature.)  Instead of (?1) in the  pattern  above  you  can  write
4389           (?-2) to refer to the second most recently opened parentheses preceding
4390           the recursion. In other  words,  a  negative  number  counts  capturing
4391           parentheses leftwards from the point at which it is encountered.
4392    
4393           It  is  also  possible  to refer to subsequently opened parentheses, by
4394           writing references such as (?+2). However, these  cannot  be  recursive
4395           because  the  reference  is  not inside the parentheses that are refer-
4396           enced. They are always "subroutine" calls, as  described  in  the  next
4397           section.
4398    
4399           An  alternative  approach is to use named parentheses instead. The Perl
4400           syntax for this is (?&name); PCRE's earlier syntax  (?P>name)  is  also
4401           supported. We could rewrite the above example as follows:
4402    
4403           (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )           (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
4404    
4405         If there is more than one subpattern with the same name,  the  earliest         If  there  is more than one subpattern with the same name, the earliest
4406         one  is used. This particular example pattern contains nested unlimited         one is used.
4407         repeats, and so the use of atomic grouping for matching strings of non-  
4408         parentheses  is  important when applying the pattern to strings that do         This particular example pattern that we have been looking  at  contains
4409         not match. For example, when this pattern is applied to         nested  unlimited repeats, and so the use of atomic grouping for match-
4410           ing strings of non-parentheses is important when applying  the  pattern
4411           to strings that do not match. For example, when this pattern is applied
4412           to
4413    
4414           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
4415    
# Line 4256  SUBPATTERNS AS SUBROUTINES Line 4459  SUBPATTERNS AS SUBROUTINES
4459         If the syntax for a recursive subpattern reference (either by number or         If the syntax for a recursive subpattern reference (either by number or
4460         by  name)  is used outside the parentheses to which it refers, it oper-         by  name)  is used outside the parentheses to which it refers, it oper-
4461         ates like a subroutine in a programming language. The "called"  subpat-         ates like a subroutine in a programming language. The "called"  subpat-
4462         tern  may  be defined before or after the reference. An earlier example         tern may be defined before or after the reference. A numbered reference
4463         pointed out that the pattern         can be absolute or relative, as in these examples:
4464    
4465             (...(absolute)...)...(?2)...
4466             (...(relative)...)...(?-1)...
4467             (...(?+1)...(relative)...
4468    
4469           An earlier example pointed out that the pattern
4470    
4471           (sens|respons)e and \1ibility           (sens|respons)e and \1ibility
4472    
# Line 4279  SUBPATTERNS AS SUBROUTINES Line 4488  SUBPATTERNS AS SUBROUTINES
4488         case-independence are fixed when the subpattern is defined. They cannot         case-independence are fixed when the subpattern is defined. They cannot
4489         be changed for different calls. For example, consider this pattern:         be changed for different calls. For example, consider this pattern:
4490    
4491           (abc)(?i:(?1))           (abc)(?i:(?-1))
4492    
4493         It matches "abcabc". It does not match "abcABC" because the  change  of         It matches "abcabc". It does not match "abcABC" because the  change  of
4494         processing option does not affect the called subpattern.         processing option does not affect the called subpattern.
# Line 4334  AUTHOR Line 4543  AUTHOR
4543    
4544  REVISION  REVISION
4545    
4546         Last updated: 06 March 2007         Last updated: 19 June 2007
4547         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
4548  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4549    
# Line 4415  RESTRICTED PATTERNS FOR PCRE_PARTIAL Line 4624  RESTRICTED PATTERNS FOR PCRE_PARTIAL
4624    
4625         If PCRE_PARTIAL is set for a pattern  that  does  not  conform  to  the         If PCRE_PARTIAL is set for a pattern  that  does  not  conform  to  the
4626         restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL         restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
4627         (-13).         (-13).  You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo()  to
4628           find out if a compiled pattern can be used for partial matching.
4629    
4630    
4631  EXAMPLE OF PARTIAL MATCHING USING PCRETEST  EXAMPLE OF PARTIAL MATCHING USING PCRETEST
4632    
4633         If the escape sequence \P is present  in  a  pcretest  data  line,  the         If  the  escape  sequence  \P  is  present in a pcretest data line, the
4634         PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that         PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
4635         uses the date example quoted above:         uses the date example quoted above:
4636    
# Line 4437  EXAMPLE OF PARTIAL MATCHING USING PCRETE Line 4647  EXAMPLE OF PARTIAL MATCHING USING PCRETE
4647           data> j\P           data> j\P
4648           No match           No match
4649    
4650         The first data string is matched  completely,  so  pcretest  shows  the         The  first  data  string  is  matched completely, so pcretest shows the
4651         matched  substrings.  The  remaining four strings do not match the com-         matched substrings. The remaining four strings do not  match  the  com-
4652         plete pattern, but the first two are partial matches.  The  same  test,         plete  pattern,  but  the first two are partial matches. The same test,
4653         using  pcre_dfa_exec()  matching  (by means of the \D escape sequence),         using pcre_dfa_exec() matching (by means of the  \D  escape  sequence),
4654         produces the following output:         produces the following output:
4655    
4656             re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
4657           data> 25jun04\P\D           data> 25jun04\P\D
4658            0: 25jun04            0: 25jun04
4659           data> 23dec3\P\D           data> 23dec3\P\D
# Line 4455  EXAMPLE OF PARTIAL MATCHING USING PCRETE Line 4665  EXAMPLE OF PARTIAL MATCHING USING PCRETE
4665           data> j\P\D           data> j\P\D
4666           No match           No match
4667    
4668         Notice that in this case the portion of the string that was matched  is         Notice  that in this case the portion of the string that was matched is
4669         made available.         made available.
4670    
4671    
4672  MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()  MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()
4673    
4674         When a partial match has been found using pcre_dfa_exec(), it is possi-         When a partial match has been found using pcre_dfa_exec(), it is possi-
4675         ble to continue the match by  providing  additional  subject  data  and         ble  to  continue  the  match  by providing additional subject data and
4676         calling  pcre_dfa_exec()  again  with the same compiled regular expres-         calling pcre_dfa_exec() again with the same  compiled  regular  expres-
4677         sion, this time setting the PCRE_DFA_RESTART option. You must also pass         sion, this time setting the PCRE_DFA_RESTART option. You must also pass
4678         the  same working space as before, because this is where details of the         the same working space as before, because this is where details of  the
4679         previous partial match are stored. Here is an example  using  pcretest,         previous  partial  match are stored. Here is an example using pcretest,
4680         using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and         using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and
4681         \D are as above):         \D are as above):
4682    
4683             re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
4684           data> 23ja\P\D           data> 23ja\P\D
4685           Partial match: 23ja           Partial match: 23ja
4686           data> n05\R\D           data> n05\R\D
4687            0: n05            0: n05
4688    
4689         The first call has "23ja" as the subject, and requests  partial  match-         The  first  call has "23ja" as the subject, and requests partial match-
4690         ing;  the  second  call  has  "n05"  as  the  subject for the continued         ing; the second call  has  "n05"  as  the  subject  for  the  continued
4691         (restarted) match.  Notice that when the match is  complete,  only  the         (restarted)  match.   Notice  that when the match is complete, only the
4692         last  part  is  shown;  PCRE  does not retain the previously partially-         last part is shown; PCRE does  not  retain  the  previously  partially-
4693         matched string. It is up to the calling program to do that if it  needs         matched  string. It is up to the calling program to do that if it needs
4694         to.         to.
4695    
4696         You  can  set  PCRE_PARTIAL  with  PCRE_DFA_RESTART to continue partial         You can set PCRE_PARTIAL  with  PCRE_DFA_RESTART  to  continue  partial
4697         matching over multiple segments. This facility can be used to pass very         matching over multiple segments. This facility can be used to pass very
4698         long  subject  strings to pcre_dfa_exec(). However, some care is needed         long subject strings to pcre_dfa_exec(). However, some care  is  needed
4699         for certain types of pattern.         for certain types of pattern.
4700    
4701         1. If the pattern contains tests for the beginning or end  of  a  line,         1.  If  the  pattern contains tests for the beginning or end of a line,
4702         you  need  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri-         you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,  as  appropri-
4703         ate, when the subject string for any call does not contain  the  begin-         ate,  when  the subject string for any call does not contain the begin-
4704         ning or end of a line.         ning or end of a line.
4705    
4706         2.  If  the  pattern contains backward assertions (including \b or \B),         2. If the pattern contains backward assertions (including  \b  or  \B),
4707         you need to arrange for some overlap in the subject  strings  to  allow         you  need  to  arrange for some overlap in the subject strings to allow
4708         for  this.  For  example, you could pass the subject in chunks that are         for this. For example, you could pass the subject in  chunks  that  are
4709         500 bytes long, but in a buffer of 700 bytes, with the starting  offset         500  bytes long, but in a buffer of 700 bytes, with the starting offset
4710         set to 200 and the previous 200 bytes at the start of the buffer.         set to 200 and the previous 200 bytes at the start of the buffer.
4711    
4712         3.  Matching a subject string that is split into multiple segments does         3. Matching a subject string that is split into multiple segments  does
4713         not always produce exactly the same result as matching over one  single         not  always produce exactly the same result as matching over one single
4714         long  string.   The  difference arises when there are multiple matching         long string.  The difference arises when there  are  multiple  matching
4715         possibilities, because a partial match result is given only when  there         possibilities,  because a partial match result is given only when there
4716         are  no  completed  matches  in a call to fBpcre_dfa_exec(). This means         are no completed matches in a call to pcre_dfa_exec(). This means  that
4717         that as soon as the shortest match has been found,  continuation  to  a         as  soon  as  the  shortest match has been found, continuation to a new
4718         new  subject  segment  is  no  longer possible.  Consider this pcretest         subject segment is no longer possible.  Consider this pcretest example:
        example:  
4719    
4720             re> /dog(sbody)?/             re> /dog(sbody)?/
4721           data> do\P\D           data> do\P\D
# Line 4517  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe Line 4726  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe
4726            0: dogsbody            0: dogsbody
4727            1: dog            1: dog
4728    
4729         The pattern matches the words "dog" or "dogsbody". When the subject  is         The  pattern matches the words "dog" or "dogsbody". When the subject is
4730         presented  in  several  parts  ("do" and "gsb" being the first two) the         presented in several parts ("do" and "gsb" being  the  first  two)  the
4731         match stops when "dog" has been found, and it is not possible  to  con-         match  stops  when "dog" has been found, and it is not possible to con-
4732         tinue.  On  the  other  hand,  if  "dogsbody"  is presented as a single         tinue. On the other hand,  if  "dogsbody"  is  presented  as  a  single
4733         string, both matches are found.         string, both matches are found.
4734    
4735         Because of this phenomenon, it does not usually make  sense  to  end  a         Because  of  this  phenomenon,  it does not usually make sense to end a
4736         pattern that is going to be matched in this way with a variable repeat.         pattern that is going to be matched in this way with a variable repeat.
4737    
4738         4. Patterns that contain alternatives at the top level which do not all         4. Patterns that contain alternatives at the top level which do not all
# Line 4532  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe Line 4741  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe
4741    
4742           1234|3789           1234|3789
4743    
4744         If the first part of the subject is "ABC123", a partial  match  of  the         If  the  first  part of the subject is "ABC123", a partial match of the
4745         first  alternative  is found at offset 3. There is no partial match for         first alternative is found at offset 3. There is no partial  match  for
4746         the second alternative, because such a match does not start at the same         the second alternative, because such a match does not start at the same
4747         point  in  the  subject  string. Attempting to continue with the string         point in the subject string. Attempting to  continue  with  the  string
4748         "789" does not yield a match because only those alternatives that match         "789" does not yield a match because only those alternatives that match
4749         at  one point in the subject are remembered. The problem arises because         at one point in the subject are remembered. The problem arises  because
4750         the start of the second alternative matches within the  first  alterna-         the  start  of the second alternative matches within the first alterna-
4751         tive. There is no problem with anchored patterns or patterns such as:         tive. There is no problem with anchored patterns or patterns such as:
4752    
4753           1234|ABCD           1234|ABCD
# Line 4555  AUTHOR Line 4764  AUTHOR
4764    
4765  REVISION  REVISION
4766    
4767         Last updated: 06 March 2007         Last updated: 04 June 2007
4768         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
4769  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4770    
# Line 4580  SAVING AND RE-USING PRECOMPILED PCRE PAT Line 4789  SAVING AND RE-USING PRECOMPILED PCRE PAT
4789         ent  host  and  run them there. This works even if the new host has the         ent  host  and  run them there. This works even if the new host has the
4790         opposite endianness to the one on which  the  patterns  were  compiled.         opposite endianness to the one on which  the  patterns  were  compiled.
4791         There  may  be a small performance penalty, but it should be insignifi-         There  may  be a small performance penalty, but it should be insignifi-
4792         cant.         cant. However, compiling regular expressions with one version  of  PCRE
4793           for  use  with  a  different  version is not guaranteed to work and may
4794           cause crashes.
4795    
4796    
4797  SAVING A COMPILED PATTERN  SAVING A COMPILED PATTERN
# Line 4663  RE-USING A PRECOMPILED PATTERN Line 4874  RE-USING A PRECOMPILED PATTERN
4874    
4875  COMPATIBILITY WITH DIFFERENT PCRE RELEASES  COMPATIBILITY WITH DIFFERENT PCRE RELEASES
4876    
4877         The layout of the control block that is at the start of the  data  that         In general, it is safest to  recompile  all  saved  patterns  when  you
4878         makes  up  a  compiled pattern was changed for release 5.0. If you have         update  to  a new PCRE release, though not all updates actually require
4879         any saved patterns that were compiled with  previous  releases  (not  a         this. Recompiling is definitely needed for release 7.2.
        facility  that  was  previously advertised), you will have to recompile  
        them for release 5.0 and above.  
   
        If you have any saved patterns in UTF-8 mode that use  \p  or  \P  that  
        were  compiled  with any release up to and including 6.4, you will have  
        to recompile them for release 6.5 and above.  
   
        All saved patterns from earlier releases must be recompiled for release  
        7.0  or  higher,  because  there was an internal reorganization at that  
        release.  
4880    
4881    
4882  AUTHOR  AUTHOR
# Line 4687  AUTHOR Line 4888  AUTHOR
4888    
4889  REVISION  REVISION
4890    
4891         Last updated: 06 March 2007         Last updated: 13 June 2007
4892         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
4893  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
4894    
# Line 5155  MATCHING INTERFACE Line 5356  MATCHING INTERFACE
5356         return false (because the empty string is not a valid number):         return false (because the empty string is not a valid number):
5357    
5358            int number;            int number;
5359            pcrecpp::RE::FullMatch("abc", "[a-z]+(\d+)?", &number);            pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
5360    
5361         The matching interface supports at most 16 arguments per call.  If  you         The matching interface supports at most 16 arguments per call.  If  you
5362         need    more,    consider    using    the    more   general   interface         need    more,    consider    using    the    more   general   interface
# Line 5422  PCRE SAMPLE PROGRAM Line 5623  PCRE SAMPLE PROGRAM
5623         bility  of  matching an empty string. Comments in the code explain what         bility  of  matching an empty string. Comments in the code explain what
5624         is going on.         is going on.
5625    
5626         If PCRE is installed in the standard include  and  library  directories         The demonstration program is automatically built if you use  "./config-
5627         for  your  system, you should be able to compile the demonstration pro-         ure;make"  to  build PCRE. Otherwise, if PCRE is installed in the stan-
5628         gram using this command:         dard include and library directories for your  system,  you  should  be
5629           able to compile the demonstration program using this command:
5630    
5631           gcc -o pcredemo pcredemo.c -lpcre           gcc -o pcredemo pcredemo.c -lpcre
5632    
5633         If PCRE is installed elsewhere, you may need to add additional  options         If  PCRE is installed elsewhere, you may need to add additional options
5634         to  the  command line. For example, on a Unix-like system that has PCRE         to the command line. For example, on a Unix-like system that  has  PCRE
5635         installed in /usr/local, you  can  compile  the  demonstration  program         installed  in  /usr/local,  you  can  compile the demonstration program
5636         using a command like this:         using a command like this:
5637    
5638           gcc -o pcredemo -I/usr/local/include pcredemo.c \           gcc -o pcredemo -I/usr/local/include pcredemo.c \
5639               -L/usr/local/lib -lpcre               -L/usr/local/lib -lpcre
5640    
5641         Once  you  have  compiled the demonstration program, you can run simple         Once you have compiled the demonstration program, you  can  run  simple
5642         tests like this:         tests like this:
5643    
5644           ./pcredemo 'cat|dog' 'the cat sat on the mat'           ./pcredemo 'cat|dog' 'the cat sat on the mat'
5645           ./pcredemo -g 'cat|dog' 'the dog sat on the cat'           ./pcredemo -g 'cat|dog' 'the dog sat on the cat'
5646    
5647         Note that there is a  much  more  comprehensive  test  program,  called         Note  that  there  is  a  much  more comprehensive test program, called
5648         pcretest,  which  supports  many  more  facilities  for testing regular         pcretest, which supports  many  more  facilities  for  testing  regular
5649         expressions and the PCRE library. The pcredemo program is provided as a         expressions and the PCRE library. The pcredemo program is provided as a
5650         simple coding example.         simple coding example.
5651    
# Line 5451  PCRE SAMPLE PROGRAM Line 5653  PCRE SAMPLE PROGRAM
5653         the standard library directory, you may get an error like this when you         the standard library directory, you may get an error like this when you
5654         try to run pcredemo:         try to run pcredemo:
5655    
5656           ld.so.1:  a.out:  fatal:  libpcre.so.0:  open failed: No such file or           ld.so.1: a.out: fatal: libpcre.so.0: open failed:  No  such  file  or
5657         directory         directory
5658    
5659         This is caused by the way shared library support works  on  those  sys-         This  is  caused  by the way shared library support works on those sys-
5660         tems. You need to add         tems. You need to add
5661    
5662           -R/usr/local/lib           -R/usr/local/lib
# Line 5471  AUTHOR Line 5673  AUTHOR
5673    
5674  REVISION  REVISION
5675    
5676         Last updated: 06 March 2007         Last updated: 13 June 2007
5677         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
5678  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5679  PCRESTACK(3)                                                      PCRESTACK(3)  PCRESTACK(3)                                                      PCRESTACK(3)
# Line 5541  PCRE DISCUSSION OF STACK USAGE Line 5743  PCRE DISCUSSION OF STACK USAGE
5743         In environments where stack memory is constrained, you  might  want  to         In environments where stack memory is constrained, you  might  want  to
5744         compile  PCRE to use heap memory instead of stack for remembering back-         compile  PCRE to use heap memory instead of stack for remembering back-
5745         up points. This makes it run a lot more slowly, however. Details of how         up points. This makes it run a lot more slowly, however. Details of how
5746         to do this are given in the pcrebuild documentation.         to do this are given in the pcrebuild documentation. When built in this
5747           way, instead of using the stack, PCRE obtains and frees memory by call-
5748         In  Unix-like environments, there is not often a problem with the stack         ing  the  functions  that  are  pointed to by the pcre_stack_malloc and
5749         unless very long strings are involved,  though  the  default  limit  on         pcre_stack_free variables. By default,  these  point  to  malloc()  and
5750         stack  size  varies  from system to system. Values from 8Mb to 64Mb are         free(),  but you can replace the pointers to cause PCRE to use your own
5751           functions. Since the block sizes are always the same,  and  are  always
5752           freed in reverse order, it may be possible to implement customized mem-
5753           ory handlers that are more efficient than the standard functions.
5754    
5755           In Unix-like environments, there is not often a problem with the  stack
5756           unless  very  long  strings  are  involved, though the default limit on
5757           stack size varies from system to system. Values from 8Mb  to  64Mb  are
5758         common. You can find your default limit by running the command:         common. You can find your default limit by running the command:
5759    
5760           ulimit -s           ulimit -s
5761    
5762         Unfortunately, the effect of running out of  stack  is  often  SIGSEGV,         Unfortunately,  the  effect  of  running out of stack is often SIGSEGV,
5763         though  sometimes  a more explicit error message is given. You can nor-         though sometimes a more explicit error message is given. You  can  nor-
5764         mally increase the limit on stack size by code such as this:         mally increase the limit on stack size by code such as this:
5765    
5766           struct rlimit rlim;           struct rlimit rlim;
# Line 5559  PCRE DISCUSSION OF STACK USAGE Line 5768  PCRE DISCUSSION OF STACK USAGE
5768           rlim.rlim_cur = 100*1024*1024;           rlim.rlim_cur = 100*1024*1024;
5769           setrlimit(RLIMIT_STACK, &rlim);           setrlimit(RLIMIT_STACK, &rlim);
5770    
5771         This reads the current limits (soft and hard) using  getrlimit(),  then         This  reads  the current limits (soft and hard) using getrlimit(), then
5772         attempts  to  increase  the  soft limit to 100Mb using setrlimit(). You         attempts to increase the soft limit to  100Mb  using  setrlimit().  You
5773         must do this before calling pcre_exec().         must do this before calling pcre_exec().
5774    
5775         PCRE has an internal counter that can be used to  limit  the  depth  of         PCRE  has  an  internal  counter that can be used to limit the depth of
5776         recursion,  and  thus cause pcre_exec() to give an error code before it         recursion, and thus cause pcre_exec() to give an error code  before  it
5777         runs out of stack. By default, the limit is very  large,  and  unlikely         runs  out  of  stack. By default, the limit is very large, and unlikely
5778         ever  to operate. It can be changed when PCRE is built, and it can also         ever to operate. It can be changed when PCRE is built, and it can  also
5779         be set when pcre_exec() is called. For details of these interfaces, see         be set when pcre_exec() is called. For details of these interfaces, see
5780         the pcrebuild and pcreapi documentation.         the pcrebuild and pcreapi documentation.
5781    
5782         As a very rough rule of thumb, you should reckon on about 500 bytes per         As a very rough rule of thumb, you should reckon on about 500 bytes per
5783         recursion. Thus, if you want to limit your  stack  usage  to  8Mb,  you         recursion.  Thus,  if  you  want  to limit your stack usage to 8Mb, you
5784         should  set  the  limit at 16000 recursions. A 64Mb stack, on the other         should set the limit at 16000 recursions. A 64Mb stack,  on  the  other
5785         hand, can support around 128000 recursions. The pcretest  test  program         hand,  can  support around 128000 recursions. The pcretest test program
5786         has a command line option (-S) that can be used to increase the size of         has a command line option (-S) that can be used to increase the size of
5787         its stack.         its stack.
5788    
# Line 5587  AUTHOR Line 5796  AUTHOR
5796    
5797  REVISION  REVISION
5798    
5799         Last updated: 12 March 2007         Last updated: 05 June 2007
5800         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
5801  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5802    

Legend:
Removed from v.123  
changed lines
  Added in v.185

  ViewVC Help
Powered by ViewVC 1.1.5