/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 41 by nigel, Sat Feb 24 21:39:17 2007 UTC revision 43 by nigel, Sat Feb 24 21:39:21 2007 UTC
# Line 30  SYNOPSIS Line 30  SYNOPSIS
30    
31       const unsigned char *pcre_maketables(void);       const unsigned char *pcre_maketables(void);
32    
33         int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
34              int what, void *where);
35    
36       int pcre_info(const pcre *code, int *optptr, *firstcharptr);       int pcre_info(const pcre *code, int *optptr, *firstcharptr);
37    
38       char *pcre_version(void);       char *pcre_version(void);
# Line 46  DESCRIPTION Line 49  DESCRIPTION
49       lar  expression  pattern  matching using the same syntax and       lar  expression  pattern  matching using the same syntax and
50       semantics as Perl  5,  with  just  a  few  differences  (see       semantics as Perl  5,  with  just  a  few  differences  (see
51       below).  The  current  implementation  corresponds  to  Perl       below).  The  current  implementation  corresponds  to  Perl
52       5.005.       5.005, with some additional features from the Perl  develop-
53         ment release.
54    
55       PCRE has its own native API,  which  is  described  in  this       PCRE has its own native API,  which  is  described  in  this
56       document.  There  is  also  a  set of wrapper functions that       document.  There  is  also  a  set of wrapper functions that
57       correspond to the POSIX API.  These  are  described  in  the       correspond to the POSIX regular expression API.   These  are
58       pcreposix documentation.       described in the pcreposix documentation.
59    
60       The native API function prototypes are defined in the header       The native API function prototypes are defined in the header
61       file  pcre.h,  and  on  Unix  systems  the library itself is       file  pcre.h,  and  on  Unix  systems  the library itself is
62       called libpcre.a, so can be accessed by adding -lpcre to the       called libpcre.a, so can be accessed by adding -lpcre to the
63       command for linking an application which calls it.       command  for  linking  an  application  which  calls it. The
64         header file defines the macros PCRE_MAJOR and PCRE_MINOR  to
65         contain the major and minor release numbers for the library.
66         Applications can use these to include support for  different
67         releases.
68    
69       The functions pcre_compile(), pcre_study(), and  pcre_exec()       The functions pcre_compile(), pcre_study(), and  pcre_exec()
70       are  used  for  compiling  and matching regular expressions,       are  used  for  compiling  and matching regular expressions,
# Line 66  DESCRIPTION Line 75  DESCRIPTION
75       to build a set of character tables in the current locale for       to build a set of character tables in the current locale for
76       passing to pcre_compile().       passing to pcre_compile().
77    
78       The function pcre_info() is used  to  find  out  information       The function pcre_fullinfo() is used to find out information
79       about  a compiled pattern, while the function pcre_version()       about a compiled pattern; pcre_info() is an obsolete version
80       returns a pointer to a string containing the version of PCRE       which returns only some of the available information, but is
81       and its date of release.       retained   for   backwards   compatibility.    The  function
82         pcre_version() returns a pointer to a string containing  the
83         version of PCRE and its date of release.
84    
85       The global variables  pcre_malloc  and  pcre_free  initially       The global variables  pcre_malloc  and  pcre_free  initially
86       contain the entry points of the standard malloc() and free()       contain the entry points of the standard malloc() and free()
# Line 92  MULTI-THREADING Line 103  MULTI-THREADING
103    
104    
105    
106    
107  COMPILING A PATTERN  COMPILING A PATTERN
108       The function pcre_compile() is called to compile  a  pattern       The function pcre_compile() is called to compile  a  pattern
109       into  an internal form. The pattern is a C string terminated       into  an internal form. The pattern is a C string terminated
# Line 187  COMPILING A PATTERN Line 199  COMPILING A PATTERN
199    
200         PCRE_EXTRA         PCRE_EXTRA
201    
202       This option turns on additional functionality of  PCRE  that       This option was invented in  order  to  turn  on  additional
203       is  incompatible  with Perl. Any backslash in a pattern that       functionality of PCRE that is incompatible with Perl, but it
204       is followed by a letter that has no special  meaning  causes       is currently of very little use. When set, any backslash  in
205       an  error,  thus  reserving  these  combinations  for future       a  pattern  that is followed by a letter that has no special
206       expansion. By default, as in Perl, a backslash followed by a       meaning causes an error, thus reserving  these  combinations
207       letter  with  no  special  meaning  is treated as a literal.       for  future  expansion.  By default, as in Perl, a backslash
208       There are at present no other features  controlled  by  this       followed by a letter with no special meaning is treated as a
209       option.       literal.  There  are at present no other features controlled
210         by this option. It can also be set by a (?X) option  setting
211         within a pattern.
212    
213         PCRE_MULTILINE         PCRE_MULTILINE
214    
# Line 207  COMPILING A PATTERN Line 221  COMPILING A PATTERN
221       PCRE_DOLLAR_ENDONLY is set). This is the same as Perl.       PCRE_DOLLAR_ENDONLY is set). This is the same as Perl.
222    
223       When PCRE_MULTILINE it is set, the "start of line" and  "end       When PCRE_MULTILINE it is set, the "start of line" and  "end
224       of   line"   constructs   match   immediately  following  or       of  line"  constructs match immediately following or immedi-
225       immediately  before  any  newline  in  the  subject  string,       ately before any newline  in  the  subject  string,  respec-
226       respectively,  as well as at the very start and end. This is       tively,  as  well  as  at  the  very  start and end. This is
227       equivalent to Perl's /m option. If there are no "\n" charac-       equivalent to Perl's /m option. If there are no "\n" charac-
228       ters  in  a subject string, or no occurrences of ^ or $ in a       ters  in  a subject string, or no occurrences of ^ or $ in a
229       pattern, setting PCRE_MULTILINE has no effect.       pattern, setting PCRE_MULTILINE has no effect.
# Line 284  LOCALE SUPPORT Line 298  LOCALE SUPPORT
298    
299    
300  INFORMATION ABOUT A PATTERN  INFORMATION ABOUT A PATTERN
301       The pcre_info() function returns information  about  a  com-       The pcre_fullinfo() function  returns  information  about  a
302       piled pattern.  Its yield is the number of capturing subpat-       compiled pattern. It replaces the obsolete pcre_info() func-
303       terns, or one of the following negative numbers:       tion, which is nevertheless retained for backwards compabil-
304         ity (and is documented below).
305    
306         The first argument for pcre_fullinfo() is a pointer  to  the
307         compiled  pattern.  The  second  argument  is  the result of
308         pcre_study(), or NULL if the pattern was  not  studied.  The
309         third  argument  specifies  which  piece  of  information is
310         required, while the fourth argument is a pointer to a  vari-
311         able  to receive the data. The yield of the function is zero
312         for success, or one of the following negative numbers:
313    
314         PCRE_ERROR_NULL       the argument code was NULL         PCRE_ERROR_NULL       the argument code was NULL
315                                 the argument where was NULL
316         PCRE_ERROR_BADMAGIC   the "magic number" was not found         PCRE_ERROR_BADMAGIC   the "magic number" was not found
317           PCRE_ERROR_BADOPTION  the value of what was invalid
318    
319       If the optptr argument is not NULL, a copy  of  the  options       The possible values for the third argument  are  defined  in
320       with which the pattern was compiled is placed in the integer       pcre.h, and are as follows:
321       it points to. These option bits are those specified  in  the  
322           PCRE_INFO_OPTIONS
323    
324         Return a copy of the options with which the pattern was com-
325         piled.  The fourth argument should point to au unsigned long
326         int variable. These option bits are those specified  in  the
327       call  to  pcre_compile(),  modified  by any top-level option       call  to  pcre_compile(),  modified  by any top-level option
328       settings  within  the   pattern   itself,   and   with   the       settings  within  the   pattern   itself,   and   with   the
329       PCRE_ANCHORED  bit  set  if  the form of the pattern implies       PCRE_ANCHORED  bit  forcibly  set if the form of the pattern
330       that it can match only at the start of a subject string.       implies that it can match only at the  start  of  a  subject
331         string.
332    
333       If the pattern is not anchored and the firstcharptr argument         PCRE_INFO_SIZE
334       is  not  NULL, it is used to pass back information about the  
335       first character of any matched string. If there is  a  fixed       Return the size of the compiled pattern, that is, the  value
336       first    character,    e.g.   from   a   pattern   such   as       that  was  passed as the argument to pcre_malloc() when PCRE
337         was getting memory in which to place the compiled data.  The
338         fourth argument should point to a size_t variable.
339    
340           PCRE_INFO_CAPTURECOUNT
341    
342         Return the number of capturing subpatterns in  the  pattern.
343         The fourth argument should point to an int variable.
344    
345           PCRE_INFO_BACKREFMAX
346    
347         Return the number of the highest back reference in the  pat-
348         tern.  The  fourth argument should point to an int variable.
349         Zero is returned if there are no back references.
350    
351           PCRE_INFO_FIRSTCHAR
352    
353         Return information about the first character of any  matched
354         string,  for  a  non-anchored  pattern.  If there is a fixed
355         first   character,   e.g.   from   a   pattern    such    as
356       (cat|cow|coyote), then it is returned in the integer pointed       (cat|cow|coyote), then it is returned in the integer pointed
357       to by firstcharptr. Otherwise, if either       to by where. Otherwise, if either
358    
359       (a) the pattern was compiled with the PCRE_MULTILINE option,       (a) the pattern was compiled with the PCRE_MULTILINE option,
360       and every branch starts with "^", or       and every branch starts with "^", or
# Line 312  INFORMATION ABOUT A PATTERN Line 362  INFORMATION ABOUT A PATTERN
362       (b) every  branch  of  the  pattern  starts  with  ".*"  and       (b) every  branch  of  the  pattern  starts  with  ".*"  and
363       PCRE_DOTALL is not set (if it were set, the pattern would be       PCRE_DOTALL is not set (if it were set, the pattern would be
364       anchored),       anchored),
365    
366       then -1 is returned, indicating  that  the  pattern  matches       then -1 is returned, indicating  that  the  pattern  matches
367       only  at  the  start  of  a subject string or after any "\n"       only  at  the  start  of  a subject string or after any "\n"
368       within the string. Otherwise -2 is returned.       within the string. Otherwise -2 is  returned.  For  anchored
369         patterns, -2 is returned.
370    
371           PCRE_INFO_FIRSTTABLE
372    
373         If the pattern was studied, and this resulted  in  the  con-
374         struction of a 256-bit table indicating a fixed set of char-
375         acters for the first character in  any  matching  string,  a
376         pointer   to  the  table  is  returned.  Otherwise  NULL  is
377         returned. The fourth argument should point  to  an  unsigned
378         char * variable.
379    
380           PCRE_INFO_LASTLITERAL
381    
382         For a non-anchored pattern, return the value of  the  right-
383         most  literal  character  which  must  exist  in any matched
384         string, other than at its start. The fourth argument  should
385         point  to an int variable. If there is no such character, or
386         if the pattern is anchored, -1 is returned. For example, for
387         the pattern /a\d+z\d+/ the returned value is 'z'.
388    
389         The pcre_info() function is now obsolete because its  inter-
390         face  is  too  restrictive  to return all the available data
391         about  a  compiled  pattern.   New   programs   should   use
392         pcre_fullinfo()  instead.  The  yield  of pcre_info() is the
393         number of capturing subpatterns, or  one  of  the  following
394         negative numbers:
395    
396           PCRE_ERROR_NULL       the argument code was NULL
397           PCRE_ERROR_BADMAGIC   the "magic number" was not found
398    
399         If the optptr argument is not NULL, a copy  of  the  options
400         with which the pattern was compiled is placed in the integer
401         it points to (see PCRE_INFO_OPTIONS above).
402    
403         If the pattern is not anchored and the firstcharptr argument
404         is  not  NULL, it is used to pass back information about the
405         first    character    of    any    matched    string    (see
406         PCRE_INFO_FIRSTCHAR above).
407    
408    
409    
# Line 640  DIFFERENCES FROM PERL Line 729  DIFFERENCES FROM PERL
729       6. The Perl \G assertion is  not  supported  as  it  is  not       6. The Perl \G assertion is  not  supported  as  it  is  not
730       relevant to single pattern matches.       relevant to single pattern matches.
731    
732       7. Fairly obviously, PCRE does  not  support  the  (?{code})       7. Fairly obviously, PCRE does not support the (?{code}) and
733       construction.       (?p{code})  constructions. However, there is some experimen-
734         tal support for recursive patterns using the  non-Perl  item
735         (?R).
736       8. There are at the time of writing some  oddities  in  Perl       8. There are at the time of writing some  oddities  in  Perl
737       5.005_02  concerned  with  the  settings of captured strings       5.005_02  concerned  with  the  settings of captured strings
738       when part of a pattern is repeated.  For  example,  matching       when part of a pattern is repeated.  For  example,  matching
# Line 675  DIFFERENCES FROM PERL Line 765  DIFFERENCES FROM PERL
765       (c) If PCRE_EXTRA is set, a backslash followed by  a  letter       (c) If PCRE_EXTRA is set, a backslash followed by  a  letter
766       with no special meaning is faulted.       with no special meaning is faulted.
767    
768       (d)  If  PCRE_UNGREEDY  is  set,  the  greediness   of   the       (d) If PCRE_UNGREEDY is set, the greediness of  the  repeti-
769       repetition quantifiers is inverted, that is, by default they       tion  quantifiers  is inverted, that is, by default they are
770       are not greedy, but if followed by a question mark they are.       not greedy, but if followed by a question mark they are.
771    
772       (e) PCRE_ANCHORED can be used to force a pattern to be tried       (e) PCRE_ANCHORED can be used to force a pattern to be tried
773       only at the start of the subject.       only at the start of the subject.
# Line 685  DIFFERENCES FROM PERL Line 775  DIFFERENCES FROM PERL
775       (f) The PCRE_NOTBOL, PCRE_NOTEOL, and PCRE_NOTEMPTY  options       (f) The PCRE_NOTBOL, PCRE_NOTEOL, and PCRE_NOTEMPTY  options
776       for pcre_exec() have no Perl equivalents.       for pcre_exec() have no Perl equivalents.
777    
778         (g) The (?R) construct allows for recursive pattern matching
779         (Perl  5.6 can do this using the (?p{code}) construct, which
780         PCRE cannot of course support.)
781    
782    
783    
784  REGULAR EXPRESSION DETAILS  REGULAR EXPRESSION DETAILS
785       The syntax and semantics of  the  regular  expressions  sup-       The syntax and semantics of  the  regular  expressions  sup-
786       ported  by PCRE are described below. Regular expressions are       ported  by PCRE are described below. Regular expressions are
787       also described in the Perl documentation and in a number  of       also described in the Perl documentation and in a number  of
788    
789       other  books,  some  of which have copious examples. Jeffrey       other  books,  some  of which have copious examples. Jeffrey
790       Friedl's  "Mastering  Regular  Expressions",  published   by       Friedl's  "Mastering  Regular  Expressions",  published   by
791       O'Reilly  (ISBN 1-56592-257-3), covers them in great detail.       O'Reilly  (ISBN  1-56592-257),  covers them in great detail.
792       The description here is intended as reference documentation.       The description here is intended as reference documentation.
793    
794       A regular expression is a pattern that is matched against  a       A regular expression is a pattern that is matched against  a
# Line 780  BACKSLASH Line 875  BACKSLASH
875         \f     formfeed (hex 0C)         \f     formfeed (hex 0C)
876         \n     newline (hex 0A)         \n     newline (hex 0A)
877         \r     carriage return (hex 0D)         \r     carriage return (hex 0D)
878           \t     tab (hex 09)
             tab (hex 09)  
879         \xhh   character with hex code hh         \xhh   character with hex code hh
880         \ddd   character with octal code ddd, or backreference         \ddd   character with octal code ddd, or backreference
881    
# Line 833  BACKSLASH Line 927  BACKSLASH
927       Note that octal values of 100 or greater must not be  intro-       Note that octal values of 100 or greater must not be  intro-
928       duced  by  a  leading zero, because no more than three octal       duced  by  a  leading zero, because no more than three octal
929       digits are ever read.       digits are ever read.
930    
931       All the sequences that define a single  byte  value  can  be       All the sequences that define a single  byte  value  can  be
932       used both inside and outside character classes. In addition,       used both inside and outside character classes. In addition,
933       inside a character class, the sequence "\b"  is  interpreted       inside a character class, the sequence "\b"  is  interpreted
# Line 885  BACKSLASH Line 980  BACKSLASH
980       These assertions may not appear in  character  classes  (but       These assertions may not appear in  character  classes  (but
981       note that "\b" has a different meaning, namely the backspace       note that "\b" has a different meaning, namely the backspace
982       character, inside a character class).       character, inside a character class).
983    
984       A word boundary is a position in the  subject  string  where       A word boundary is a position in the  subject  string  where
985       the current character and the previous character do not both       the current character and the previous character do not both
986       match \w or \W (i.e. one matches \w and  the  other  matches       match \w or \W (i.e. one matches \w and  the  other  matches
# Line 1046  SQUARE BRACKETS Line 1142  SQUARE BRACKETS
1142    
1143    
1144    
1145    POSIX CHARACTER CLASSES
1146         Perl 5.6 (not yet released at the time of writing) is  going
1147         to  support  the POSIX notation for character classes, which
1148         uses names enclosed by  [:  and  :]   within  the  enclosing
1149         square brackets. PCRE supports this notation. For example,
1150    
1151           [01[:alpha:]%]
1152    
1153         matches "0", "1", any alphabetic character, or "%". The sup-
1154         ported class names are
1155    
1156           alnum    letters and digits
1157           alpha    letters
1158           ascii    character codes 0 - 127
1159           cntrl    control characters
1160           digit    decimal digits (same as \d)
1161           graph    printing characters, excluding space
1162           lower    lower case letters
1163           print    printing characters, including space
1164           punct    printing characters, excluding letters and digits
1165           space    white space (same as \s)
1166           upper    upper case letters
1167           word     "word" characters (same as \w)
1168           xdigit   hexadecimal digits
1169    
1170         The names "ascii" and "word" are  Perl  extensions.  Another
1171         Perl  extension is negation, which is indicated by a ^ char-
1172         acter after the colon. For example,
1173    
1174           [12[:^digit:]]
1175    
1176         matches "1", "2", or any non-digit.  PCRE  (and  Perl)  also
1177         recogize  the POSIX syntax [.ch.] and [=ch=] where "ch" is a
1178         "collating element", but these are  not  supported,  and  an
1179         error is given if they are encountered.
1180    
1181    
1182    
1183  VERTICAL BAR  VERTICAL BAR
1184       Vertical bar characters are  used  to  separate  alternative       Vertical bar characters are  used  to  separate  alternative
1185       patterns. For example, the pattern       patterns. For example, the pattern
# Line 1197  REPETITION Line 1331  REPETITION
1331       Repetition is specified by quantifiers, which can follow any       Repetition is specified by quantifiers, which can follow any
1332       of the following items:       of the following items:
1333    
   
1334         a single character, possibly escaped         a single character, possibly escaped
1335         the . metacharacter         the . metacharacter
1336         a character class         a character class
# Line 1384  BACK REFERENCES Line 1517  BACK REFERENCES
1517       A back reference that occurs inside the parentheses to which       A back reference that occurs inside the parentheses to which
1518       it  refers  fails when the subpattern is first used, so, for       it  refers  fails when the subpattern is first used, so, for
1519       example, (a\1) never matches.  However, such references  can       example, (a\1) never matches.  However, such references  can
1520       be useful inside repeated subpatterns. For example, the pat-       be  useful  inside  repeated  subpatterns.  For example, the
1521       tern       pattern
1522    
1523         (a|b\1)+         (a|b\1)+
1524    
# Line 1407  ASSERTIONS Line 1540  ASSERTIONS
1540       cated assertions are coded as  subpatterns.  There  are  two       cated assertions are coded as  subpatterns.  There  are  two
1541       kinds:  those that look ahead of the current position in the       kinds:  those that look ahead of the current position in the
1542       subject string, and those that look behind it.       subject string, and those that look behind it.
1543    
1544       An assertion subpattern is matched in the normal way, except       An assertion subpattern is matched in the normal way, except
1545       that  it  does not cause the current matching position to be       that  it  does not cause the current matching position to be
1546       changed. Lookahead assertions start with  (?=  for  positive       changed. Lookahead assertions start with  (?=  for  positive
# Line 1572  ONCE-ONLY SUBPATTERNS Line 1706  ONCE-ONLY SUBPATTERNS
1706    
1707         abcd$         abcd$
1708    
1709       when applied to a long  string  which  does  not  match  it.       when applied to a long string which does not match.  Because
1710       Because matching proceeds from left to right, PCRE will look       matching  proceeds  from  left  to right, PCRE will look for
1711       for each "a" in the subject and then  see  if  what  follows       each "a" in the subject and then see if what follows matches
1712       matches the rest of the pattern. If the pattern is specified       the rest of the pattern. If the pattern is specified as
      as  
1713    
1714         ^.*abcd$         ^.*abcd$
1715    
1716       then the initial .* matches the entire string at first,  but       then the initial .* matches the entire string at first,  but
1717       when  this  fails,  it  backtracks to match all but the last       when  this  fails  (because  there  is no following "a"), it
1718       character, then all but the last two characters, and so  on.       backtracks to match all but the last character, then all but
1719       Once again the search for "a" covers the entire string, from       the  last  two  characters, and so on. Once again the search
1720       right to left, so we are no better off. However, if the pat-       for "a" covers the entire string, from right to left, so  we
1721       tern is written as       are no better off. However, if the pattern is written as
1722    
1723         ^(?>.*)(?<=abcd)         ^(?>.*)(?<=abcd)
1724    
# Line 1596  ONCE-ONLY SUBPATTERNS Line 1729  ONCE-ONLY SUBPATTERNS
1729       this approach makes a significant difference to the process-       this approach makes a significant difference to the process-
1730       ing time.       ing time.
1731    
1732         When a pattern contains an unlimited repeat inside a subpat-
1733         tern  that  can  itself  be  repeated an unlimited number of
1734         times, the use of a once-only subpattern is the only way  to
1735         avoid  some  failing matches taking a very long time indeed.
1736         The pattern
1737    
1738           (\D+|<\d+>)*[!?]
1739    
1740         matches an unlimited number of substrings that  either  con-
1741         sist  of  non-digits,  or digits enclosed in <>, followed by
1742         either ! or ?. When it matches, it runs quickly. However, if
1743         it is applied to
1744    
1745           aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1746    
1747         it takes a long  time  before  reporting  failure.  This  is
1748         because the string can be divided between the two repeats in
1749         a large number of ways, and all have to be tried. (The exam-
1750         ple  used  [!?]  rather  than a single character at the end,
1751         because both PCRE and Perl have an optimization that  allows
1752         for  fast  failure  when  a  single  character is used. They
1753         remember the last single character that is  required  for  a
1754         match,  and  fail early if it is not present in the string.)
1755         If the pattern is changed to
1756    
1757           ((?>\D+)|<\d+>)*[!?]
1758    
1759         sequences of non-digits cannot be broken, and  failure  hap-
1760         pens quickly.
1761    
1762    
1763    
1764  CONDITIONAL SUBPATTERNS  CONDITIONAL SUBPATTERNS
# Line 1668  COMMENTS Line 1831  COMMENTS
1831    
1832    
1833    
1834    RECURSIVE PATTERNS
1835         Consider the problem of matching a  string  in  parentheses,
1836         allowing  for  unlimited nested parentheses. Without the use
1837         of recursion, the best that can be done is to use a  pattern
1838         that  matches  up  to some fixed depth of nesting. It is not
1839         possible to handle an arbitrary nesting depth. Perl 5.6  has
1840         provided   an  experimental  facility  that  allows  regular
1841         expressions to recurse (amongst other things). It does  this
1842         by  interpolating  Perl  code in the expression at run time,
1843         and the code can refer to the expression itself. A Perl pat-
1844         tern  to  solve  the parentheses problem can be created like
1845         this:
1846    
1847           $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
1848    
1849         The (?p{...}) item interpolates Perl code at run  time,  and
1850         in  this  case refers recursively to the pattern in which it
1851         appears. Obviously, PCRE cannot support the interpolation of
1852         Perl  code.  Instead,  the special item (?R) is provided for
1853         the specific case of recursion. This PCRE pattern solves the
1854         parentheses  problem (assume the PCRE_EXTENDED option is set
1855         so that white space is ignored):
1856    
1857           \( ( (?>[^()]+) | (?R) )* \)
1858    
1859         First it matches an opening parenthesis. Then it matches any
1860         number  of substrings which can either be a sequence of non-
1861         parentheses, or a recursive  match  of  the  pattern  itself
1862         (i.e. a correctly parenthesized substring). Finally there is
1863         a closing parenthesis.
1864    
1865         This particular example pattern  contains  nested  unlimited
1866         repeats, and so the use of a once-only subpattern for match-
1867         ing strings of non-parentheses is  important  when  applying
1868         the  pattern to strings that do not match. For example, when
1869         it is applied to
1870    
1871           (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
1872    
1873         it yields "no match" quickly. However, if a  once-only  sub-
1874         pattern  is  not  used,  the match runs for a very long time
1875         indeed because there are so many different ways the + and  *
1876         repeats  can carve up the subject, and all have to be tested
1877         before failure can be reported.
1878    
1879         The values set for any capturing subpatterns are those  from
1880         the outermost level of the recursion at which the subpattern
1881         value is set. If the pattern above is matched against
1882    
1883           (ab(cd)ef)
1884    
1885         the value for the capturing parentheses is  "ef",  which  is
1886         the  last  value  taken  on  at the top level. If additional
1887         parentheses are added, giving
1888    
1889           \( ( ( (?>[^()]+) | (?R) )* ) \)
1890              ^                        ^
1891              ^                        ^ then the string they capture
1892         is "ab(cd)ef", the contents of the top level parentheses. If
1893         there are more than 15 capturing parentheses in  a  pattern,
1894         PCRE  has  to  obtain  extra  memory  to store data during a
1895         recursion, which it does by using  pcre_malloc,  freeing  it
1896         via  pcre_free  afterwards. If no memory can be obtained, it
1897         saves data for the first 15 capturing parentheses  only,  as
1898         there is no way to give an out-of-memory error from within a
1899         recursion.
1900    
1901    
1902    
1903  PERFORMANCE  PERFORMANCE
1904       Certain items that may appear in patterns are more efficient       Certain items that may appear in patterns are more efficient
1905       than  others.  It is more efficient to use a character class       than  others.  It is more efficient to use a character class
# Line 1742  AUTHOR Line 1974  AUTHOR
1974       Cambridge CB2 3QG, England.       Cambridge CB2 3QG, England.
1975       Phone: +44 1223 334714       Phone: +44 1223 334714
1976    
1977       Last updated: 29 July 1999       Last updated: 27 January 2000
1978       Copyright (c) 1997-1999 University of Cambridge.       Copyright (c) 1997-2000 University of Cambridge.

Legend:
Removed from v.41  
changed lines
  Added in v.43

  ViewVC Help
Powered by ViewVC 1.1.5