/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 902 by ph10, Sun Jan 15 15:44:47 2012 UTC revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC
# Line 138  REVISION Line 138  REVISION
138         Last updated: 10 January 2012         Last updated: 10 January 2012
139         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
140  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
141    
142    
143  PCRE(3)                                                                PCRE(3)  PCRE(3)                                                                PCRE(3)
144    
145    
# Line 463  REVISION Line 463  REVISION
463         Last updated: 08 January 2012         Last updated: 08 January 2012
464         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
465  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
466    
467    
468  PCREBUILD(3)                                                      PCREBUILD(3)  PCREBUILD(3)                                                      PCREBUILD(3)
469    
470    
# Line 859  REVISION Line 859  REVISION
859         Last updated: 07 January 2012         Last updated: 07 January 2012
860         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
861  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
862    
863    
864  PCREMATCHING(3)                                                PCREMATCHING(3)  PCREMATCHING(3)                                                PCREMATCHING(3)
865    
866    
# Line 1066  REVISION Line 1066  REVISION
1066         Last updated: 08 January 2012         Last updated: 08 January 2012
1067         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
1068  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
1069    
1070    
1071  PCREAPI(3)                                                          PCREAPI(3)  PCREAPI(3)                                                          PCREAPI(3)
1072    
1073    
# Line 1405  CHECKING BUILD-TIME OPTIONS Line 1405  CHECKING BUILD-TIME OPTIONS
1405         The output is an integer that is set to one if support for just-in-time         The output is an integer that is set to one if support for just-in-time
1406         compiling is available; otherwise it is set to zero.         compiling is available; otherwise it is set to zero.
1407    
1408             PCRE_CONFIG_JITTARGET
1409    
1410           The  output is a pointer to a zero-terminated "const char *" string. If
1411           JIT support is available, the string contains the name of the architec-
1412           ture  for  which the JIT compiler is configured, for example "x86 32bit
1413           (little endian + unaligned)". If JIT  support  is  not  available,  the
1414           result is NULL.
1415    
1416           PCRE_CONFIG_NEWLINE           PCRE_CONFIG_NEWLINE
1417    
1418         The  output  is  an integer whose value specifies the default character         The  output  is  an integer whose value specifies the default character
# Line 3255  FINDING ALL POSSIBLE MATCHES Line 3263  FINDING ALL POSSIBLE MATCHES
3263         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
3264    
3265    
3266    OBTAINING AN ESTIMATE OF STACK USAGE
3267    
3268           Matching  certain  patterns  using pcre_exec() can use a lot of process
3269           stack, which in certain environments can be  rather  limited  in  size.
3270           Some  users  find it helpful to have an estimate of the amount of stack
3271           that is used by pcre_exec(), to help  them  set  recursion  limits,  as
3272           described  in  the pcrestack documentation. The estimate that is output
3273           by pcretest when called with the -m and -C options is obtained by call-
3274           ing  pcre_exec with the values NULL, NULL, NULL, -999, and -999 for its
3275           first five arguments.
3276    
3277           Normally, if  its  first  argument  is  NULL,  pcre_exec()  immediately
3278           returns  the negative error code PCRE_ERROR_NULL, but with this special
3279           combination of arguments, it returns instead a  negative  number  whose
3280           absolute  value  is the approximate stack frame size in bytes. (A nega-
3281           tive number is used so that it is clear that no  match  has  happened.)
3282           The  value  is  approximate  because  in some cases, recursive calls to
3283           pcre_exec() occur when there are one or two additional variables on the
3284           stack.
3285    
3286           If  PCRE  has  been  compiled  to use the heap instead of the stack for
3287           recursion, the value returned  is  the  size  of  each  block  that  is
3288           obtained from the heap.
3289    
3290    
3291  MATCHING A PATTERN: THE ALTERNATIVE FUNCTION  MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
3292    
3293         int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,         int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
# Line 3436  AUTHOR Line 3469  AUTHOR
3469    
3470  REVISION  REVISION
3471    
3472         Last updated: 07 January 2012         Last updated: 21 January 2012
3473         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3474  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3475    
3476    
3477  PCRECALLOUT(3)                                                  PCRECALLOUT(3)  PCRECALLOUT(3)                                                  PCRECALLOUT(3)
3478    
3479    
# Line 3638  REVISION Line 3671  REVISION
3671         Last updated: 08 Janurary 2012         Last updated: 08 Janurary 2012
3672         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3673  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3674    
3675    
3676  PCRECOMPAT(3)                                                    PCRECOMPAT(3)  PCRECOMPAT(3)                                                    PCRECOMPAT(3)
3677    
3678    
# Line 3813  REVISION Line 3846  REVISION
3846         Last updated: 08 Januray 2012         Last updated: 08 Januray 2012
3847         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
3848  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
3849    
3850    
3851  PCREPATTERN(3)                                                  PCREPATTERN(3)  PCREPATTERN(3)                                                  PCREPATTERN(3)
3852    
3853    
# Line 6418  REVISION Line 6451  REVISION
6451         Last updated: 09 January 2012         Last updated: 09 January 2012
6452         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
6453  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6454    
6455    
6456  PCRESYNTAX(3)                                                    PCRESYNTAX(3)  PCRESYNTAX(3)                                                    PCRESYNTAX(3)
6457    
6458    
# Line 6794  REVISION Line 6827  REVISION
6827         Last updated: 10 January 2012         Last updated: 10 January 2012
6828         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
6829  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6830    
6831    
6832  PCREUNICODE(3)                                                  PCREUNICODE(3)  PCREUNICODE(3)                                                  PCREUNICODE(3)
6833    
6834    
# Line 6992  REVISION Line 7025  REVISION
7025         Last updated: 13 January 2012         Last updated: 13 January 2012
7026         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7027  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7028    
7029    
7030  PCREJIT(3)                                                          PCREJIT(3)  PCREJIT(3)                                                          PCREJIT(3)
7031    
7032    
# Line 7348  REVISION Line 7381  REVISION
7381         Last updated: 08 January 2012         Last updated: 08 January 2012
7382         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7383  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7384    
7385    
7386  PCREPARTIAL(3)                                                  PCREPARTIAL(3)  PCREPARTIAL(3)                                                  PCREPARTIAL(3)
7387    
7388    
# Line 7469  PARTIAL MATCHING USING pcre_exec() OR pc Line 7502  PARTIAL MATCHING USING pcre_exec() OR pc
7502         plete  match.  For  this reason, the assumption is made that the end of         plete  match.  For  this reason, the assumption is made that the end of
7503         the supplied subject string may not be the true end  of  the  available         the supplied subject string may not be the true end  of  the  available
7504         data, and so, if \z, \Z, \b, \B, or $ are encountered at the end of the         data, and so, if \z, \Z, \b, \B, or $ are encountered at the end of the
7505         subject, the result is PCRE_ERROR_PARTIAL.         subject, the result is PCRE_ERROR_PARTIAL, provided that at  least  one
7506           character in the subject has been inspected.
7507    
7508         Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16 subject         Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16 subject
7509         strings  are checked for validity. Normally, an invalid sequence causes         strings are checked for validity. Normally, an invalid sequence  causes
7510         the error PCRE_ERROR_BADUTF8 or PCRE_ERROR_BADUTF16.  However,  in  the         the  error  PCRE_ERROR_BADUTF8  or PCRE_ERROR_BADUTF16. However, in the
7511         special  case  of  a  truncated  character  at  the end of the subject,         special case of a truncated  character  at  the  end  of  the  subject,
7512         PCRE_ERROR_SHORTUTF8  or   PCRE_ERROR_SHORTUTF16   is   returned   when         PCRE_ERROR_SHORTUTF8   or   PCRE_ERROR_SHORTUTF16   is   returned  when
7513         PCRE_PARTIAL_HARD is set.         PCRE_PARTIAL_HARD is set.
7514    
7515     Comparing hard and soft partial matching     Comparing hard and soft partial matching
7516    
7517         The  difference  between the two partial matching options can be illus-         The difference between the two partial matching options can  be  illus-
7518         trated by a pattern such as:         trated by a pattern such as:
7519    
7520           /dog(sbody)?/           /dog(sbody)?/
7521    
7522         This matches either "dog" or "dogsbody", greedily (that is, it  prefers         This  matches either "dog" or "dogsbody", greedily (that is, it prefers
7523         the  longer  string  if  possible). If it is matched against the string         the longer string if possible). If it is  matched  against  the  string
7524         "dog" with PCRE_PARTIAL_SOFT, it yields a  complete  match  for  "dog".         "dog"  with  PCRE_PARTIAL_SOFT,  it  yields a complete match for "dog".
7525         However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL.         However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL.
7526         On the other hand, if the pattern is made ungreedy the result  is  dif-         On  the  other hand, if the pattern is made ungreedy the result is dif-
7527         ferent:         ferent:
7528    
7529           /dog(sbody)??/           /dog(sbody)??/
7530    
7531         In  this  case  the  result  is always a complete match because that is         In this case the result is always a  complete  match  because  that  is
7532         found first, and matching never  continues  after  finding  a  complete         found  first,  and  matching  never  continues after finding a complete
7533         match. It might be easier to follow this explanation by thinking of the         match. It might be easier to follow this explanation by thinking of the
7534         two patterns like this:         two patterns like this:
7535    
7536           /dog(sbody)?/    is the same as  /dogsbody|dog/           /dog(sbody)?/    is the same as  /dogsbody|dog/
7537           /dog(sbody)??/   is the same as  /dog|dogsbody/           /dog(sbody)??/   is the same as  /dog|dogsbody/
7538    
7539         The second pattern will never match "dogsbody", because it will  always         The  second pattern will never match "dogsbody", because it will always
7540         find the shorter match first.         find the shorter match first.
7541    
7542    
7543  PARTIAL MATCHING USING pcre_dfa_exec() OR pcre16_dfa_exec()  PARTIAL MATCHING USING pcre_dfa_exec() OR pcre16_dfa_exec()
7544    
7545         The DFA functions move along the subject string character by character,         The DFA functions move along the subject string character by character,
7546         without backtracking, searching for  all  possible  matches  simultane-         without  backtracking,  searching  for  all possible matches simultane-
7547         ously.  If the end of the subject is reached before the end of the pat-         ously. If the end of the subject is reached before the end of the  pat-
7548         tern, there is the possibility of a partial match, again provided  that         tern,  there is the possibility of a partial match, again provided that
7549         at least one character has been inspected.         at least one character has been inspected.
7550    
7551         When  PCRE_PARTIAL_SOFT  is set, PCRE_ERROR_PARTIAL is returned only if         When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned  only  if
7552         there have been no complete matches. Otherwise,  the  complete  matches         there  have  been  no complete matches. Otherwise, the complete matches
7553         are  returned.   However,  if PCRE_PARTIAL_HARD is set, a partial match         are returned.  However, if PCRE_PARTIAL_HARD is set,  a  partial  match
7554         takes precedence over any complete matches. The portion of  the  string         takes  precedence  over any complete matches. The portion of the string
7555         that  was  inspected when the longest partial match was found is set as         that was inspected when the longest partial match was found is  set  as
7556         the first matching string, provided there are at least two slots in the         the first matching string, provided there are at least two slots in the
7557         offsets vector.         offsets vector.
7558    
7559         Because  the  DFA functions always search for all possible matches, and         Because the DFA functions always search for all possible  matches,  and
7560         there is no difference between greedy and  ungreedy  repetition,  their         there  is  no  difference between greedy and ungreedy repetition, their
7561         behaviour  is  different  from  the  standard  functions when PCRE_PAR-         behaviour is different  from  the  standard  functions  when  PCRE_PAR-
7562         TIAL_HARD is  set.  Consider  the  string  "dog"  matched  against  the         TIAL_HARD  is  set.  Consider  the  string  "dog"  matched  against the
7563         ungreedy pattern shown above:         ungreedy pattern shown above:
7564    
7565           /dog(sbody)??/           /dog(sbody)??/
7566    
7567         Whereas  the  standard functions stop as soon as they find the complete         Whereas the standard functions stop as soon as they find  the  complete
7568         match for "dog", the DFA functions also  find  the  partial  match  for         match  for  "dog",  the  DFA  functions also find the partial match for
7569         "dogsbody", and so return that when PCRE_PARTIAL_HARD is set.         "dogsbody", and so return that when PCRE_PARTIAL_HARD is set.
7570    
7571    
7572  PARTIAL MATCHING AND WORD BOUNDARIES  PARTIAL MATCHING AND WORD BOUNDARIES
7573    
7574         If  a  pattern ends with one of sequences \b or \B, which test for word         If a pattern ends with one of sequences \b or \B, which test  for  word
7575         boundaries, partial matching with PCRE_PARTIAL_SOFT can  give  counter-         boundaries,  partial  matching with PCRE_PARTIAL_SOFT can give counter-
7576         intuitive results. Consider this pattern:         intuitive results. Consider this pattern:
7577    
7578           /\bcat\b/           /\bcat\b/
7579    
7580         This matches "cat", provided there is a word boundary at either end. If         This matches "cat", provided there is a word boundary at either end. If
7581         the subject string is "the cat", the comparison of the final "t" with a         the subject string is "the cat", the comparison of the final "t" with a
7582         following  character  cannot  take  place, so a partial match is found.         following character cannot take place, so a  partial  match  is  found.
7583         However, normal matching carries on, and \b matches at the end  of  the         However,  normal  matching carries on, and \b matches at the end of the
7584         subject  when  the  last  character is a letter, so a complete match is         subject when the last character is a letter, so  a  complete  match  is
7585         found.  The  result,  therefore,  is  not   PCRE_ERROR_PARTIAL.   Using         found.   The   result,  therefore,  is  not  PCRE_ERROR_PARTIAL.  Using
7586         PCRE_PARTIAL_HARD  in  this case does yield PCRE_ERROR_PARTIAL, because         PCRE_PARTIAL_HARD in this case does yield  PCRE_ERROR_PARTIAL,  because
7587         then the partial match takes precedence.         then the partial match takes precedence.
7588    
7589    
7590  FORMERLY RESTRICTED PATTERNS  FORMERLY RESTRICTED PATTERNS
7591    
7592         For releases of PCRE prior to 8.00, because of the way certain internal         For releases of PCRE prior to 8.00, because of the way certain internal
7593         optimizations   were  implemented  in  the  pcre_exec()  function,  the         optimizations  were  implemented  in  the  pcre_exec()  function,   the
7594         PCRE_PARTIAL option (predecessor of  PCRE_PARTIAL_SOFT)  could  not  be         PCRE_PARTIAL  option  (predecessor  of  PCRE_PARTIAL_SOFT) could not be
7595         used  with all patterns. From release 8.00 onwards, the restrictions no         used with all patterns. From release 8.00 onwards, the restrictions  no
7596         longer apply, and partial matching with can be requested for  any  pat-         longer  apply,  and partial matching with can be requested for any pat-
7597         tern.         tern.
7598    
7599         Items that were formerly restricted were repeated single characters and         Items that were formerly restricted were repeated single characters and
7600         repeated metasequences. If PCRE_PARTIAL was set for a pattern that  did         repeated  metasequences. If PCRE_PARTIAL was set for a pattern that did
7601         not  conform  to  the restrictions, pcre_exec() returned the error code         not conform to the restrictions, pcre_exec() returned  the  error  code
7602         PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in  use.  The         PCRE_ERROR_BADPARTIAL  (-13).  This error code is no longer in use. The
7603         PCRE_INFO_OKPARTIAL  call  to pcre_fullinfo() to find out if a compiled         PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if  a  compiled
7604         pattern can be used for partial matching now always returns 1.         pattern can be used for partial matching now always returns 1.
7605    
7606    
7607  EXAMPLE OF PARTIAL MATCHING USING PCRETEST  EXAMPLE OF PARTIAL MATCHING USING PCRETEST
7608    
7609         If the escape sequence \P is present  in  a  pcretest  data  line,  the         If  the  escape  sequence  \P  is  present in a pcretest data line, the
7610         PCRE_PARTIAL_SOFT  option  is  used  for  the  match.  Here is a run of         PCRE_PARTIAL_SOFT option is used for  the  match.  Here  is  a  run  of
7611         pcretest that uses the date example quoted above:         pcretest that uses the date example quoted above:
7612    
7613             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
# Line 7589  EXAMPLE OF PARTIAL MATCHING USING PCRETE Line 7623  EXAMPLE OF PARTIAL MATCHING USING PCRETE
7623           data> j\P           data> j\P
7624           No match           No match
7625    
7626         The first data string is matched  completely,  so  pcretest  shows  the         The  first  data  string  is  matched completely, so pcretest shows the
7627         matched  substrings.  The  remaining four strings do not match the com-         matched substrings. The remaining four strings do not  match  the  com-
7628         plete pattern, but the first two are partial matches. Similar output is         plete pattern, but the first two are partial matches. Similar output is
7629         obtained if DFA matching is used.         obtained if DFA matching is used.
7630    
7631         If  the escape sequence \P is present more than once in a pcretest data         If the escape sequence \P is present more than once in a pcretest  data
7632         line, the PCRE_PARTIAL_HARD option is set for the match.         line, the PCRE_PARTIAL_HARD option is set for the match.
7633    
7634    
7635  MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre16_dfa_exec()  MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre16_dfa_exec()
7636    
7637         When a partial match has been found using a DFA matching  function,  it         When  a  partial match has been found using a DFA matching function, it
7638         is  possible to continue the match by providing additional subject data         is possible to continue the match by providing additional subject  data
7639         and calling the function again with the same compiled  regular  expres-         and  calling  the function again with the same compiled regular expres-
7640         sion,  this time setting the PCRE_DFA_RESTART option. You must pass the         sion, this time setting the PCRE_DFA_RESTART option. You must pass  the
7641         same working space as before, because this is where details of the pre-         same working space as before, because this is where details of the pre-
7642         vious  partial  match  are  stored.  Here is an example using pcretest,         vious partial match are stored. Here  is  an  example  using  pcretest,
7643         using the \R escape sequence to set  the  PCRE_DFA_RESTART  option  (\D         using  the  \R  escape  sequence to set the PCRE_DFA_RESTART option (\D
7644         specifies the use of the DFA matching function):         specifies the use of the DFA matching function):
7645    
7646             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
# Line 7615  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe Line 7649  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe
7649           data> n05\R\D           data> n05\R\D
7650            0: n05            0: n05
7651    
7652         The  first  call has "23ja" as the subject, and requests partial match-         The first call has "23ja" as the subject, and requests  partial  match-
7653         ing; the second call  has  "n05"  as  the  subject  for  the  continued         ing;  the  second  call  has  "n05"  as  the  subject for the continued
7654         (restarted)  match.   Notice  that when the match is complete, only the         (restarted) match.  Notice that when the match is  complete,  only  the
7655         last part is shown; PCRE does  not  retain  the  previously  partially-         last  part  is  shown;  PCRE  does not retain the previously partially-
7656         matched  string. It is up to the calling program to do that if it needs         matched string. It is up to the calling program to do that if it  needs
7657         to.         to.
7658    
7659         You can set the PCRE_PARTIAL_SOFT  or  PCRE_PARTIAL_HARD  options  with         You  can  set  the  PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with
7660         PCRE_DFA_RESTART  to  continue partial matching over multiple segments.         PCRE_DFA_RESTART to continue partial matching over  multiple  segments.
7661         This facility can be used to pass very long subject strings to the  DFA         This  facility can be used to pass very long subject strings to the DFA
7662         matching functions.         matching functions.
7663    
7664    
7665  MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre16_exec()  MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre16_exec()
7666    
7667         From  release 8.00, the standard matching functions can also be used to         From release 8.00, the standard matching functions can also be used  to
7668         do multi-segment matching. Unlike the DFA functions, it is not possible         do multi-segment matching. Unlike the DFA functions, it is not possible
7669         to  restart the previous match with a new segment of data. Instead, new         to restart the previous match with a new segment of data. Instead,  new
7670         data must be added to the previous subject string, and the entire match         data must be added to the previous subject string, and the entire match
7671         re-run,  starting from the point where the partial match occurred. Ear-         re-run, starting from the point where the partial match occurred.  Ear-
7672         lier data can be discarded.         lier data can be discarded.
7673    
7674         It is best to use PCRE_PARTIAL_HARD in this situation, because it  does         It  is best to use PCRE_PARTIAL_HARD in this situation, because it does
7675         not  treat the end of a segment as the end of the subject when matching         not treat the end of a segment as the end of the subject when  matching
7676         \z, \Z, \b, \B, and $. Consider  an  unanchored  pattern  that  matches         \z,  \Z,  \b,  \B,  and  $. Consider an unanchored pattern that matches
7677         dates:         dates:
7678    
7679             re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/             re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
7680           data> The date is 23ja\P\P           data> The date is 23ja\P\P
7681           Partial match: 23ja           Partial match: 23ja
7682    
7683         At  this stage, an application could discard the text preceding "23ja",         At this stage, an application could discard the text preceding  "23ja",
7684         add on text from the next  segment,  and  call  the  matching  function         add  on  text  from  the  next  segment, and call the matching function
7685         again.  Unlike  the  DFA  matching functions the entire matching string         again. Unlike the DFA matching functions  the  entire  matching  string
7686         must always be available, and the complete matching process occurs  for         must  always be available, and the complete matching process occurs for
7687         each call, so more memory and more processing time is needed.         each call, so more memory and more processing time is needed.
7688    
7689         Note:  If  the pattern contains lookbehind assertions, or \K, or starts         Note: If the pattern contains lookbehind assertions, or \K,  or  starts
7690         with \b or \B, the string that is returned for a partial match includes         with \b or \B, the string that is returned for a partial match includes
7691         characters  that  precede  the partially matched string itself, because         characters that precede the partially matched  string  itself,  because
7692         these must be retained when adding on more characters for a  subsequent         these  must be retained when adding on more characters for a subsequent
7693         matching attempt.         matching attempt.
7694    
7695    
# Line 7665  ISSUES WITH MULTI-SEGMENT MATCHING Line 7699  ISSUES WITH MULTI-SEGMENT MATCHING
7699         whichever matching function is used.         whichever matching function is used.
7700    
7701         1. If the pattern contains a test for the beginning of a line, you need         1. If the pattern contains a test for the beginning of a line, you need
7702         to  pass  the  PCRE_NOTBOL  option when the subject string for any call         to pass the PCRE_NOTBOL option when the subject  string  for  any  call
7703         does start at the beginning of a line.  There  is  also  a  PCRE_NOTEOL         does  start  at  the  beginning  of a line. There is also a PCRE_NOTEOL
7704         option, but in practice when doing multi-segment matching you should be         option, but in practice when doing multi-segment matching you should be
7705         using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL.         using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL.
7706    
7707         2. Lookbehind assertions at the start of a pattern are catered  for  in         2.  Lookbehind  assertions at the start of a pattern are catered for in
7708         the  offsets that are returned for a partial match. However, in theory,         the offsets that are returned for a partial match. However, in  theory,
7709         a lookbehind assertion later in the pattern could require even  earlier         a  lookbehind assertion later in the pattern could require even earlier
7710         characters  to  be inspected, and it might not have been reached when a         characters to be inspected, and it might not have been reached  when  a
7711         partial match occurs. This is probably an extremely unlikely case;  you         partial  match occurs. This is probably an extremely unlikely case; you
7712         could  guard  against  it to a certain extent by always including extra         could guard against it to a certain extent by  always  including  extra
7713         characters at the start.         characters at the start.
7714    
7715         3. Matching a subject string that is split into multiple  segments  may         3.  Matching  a subject string that is split into multiple segments may
7716         not  always produce exactly the same result as matching over one single         not always produce exactly the same result as matching over one  single
7717         long string, especially when PCRE_PARTIAL_SOFT  is  used.  The  section         long  string,  especially  when  PCRE_PARTIAL_SOFT is used. The section
7718         "Partial  Matching  and  Word Boundaries" above describes an issue that         "Partial Matching and Word Boundaries" above describes  an  issue  that
7719         arises if the pattern ends with \b or \B. Another  kind  of  difference         arises  if  the  pattern ends with \b or \B. Another kind of difference
7720         may  occur when there are multiple matching possibilities, because (for         may occur when there are multiple matching possibilities, because  (for
7721         PCRE_PARTIAL_SOFT) a partial match result is given only when there  are         PCRE_PARTIAL_SOFT)  a partial match result is given only when there are
7722         no completed matches. This means that as soon as the shortest match has         no completed matches. This means that as soon as the shortest match has
7723         been found, continuation to a new subject segment is no  longer  possi-         been  found,  continuation to a new subject segment is no longer possi-
7724         ble. Consider again this pcretest example:         ble. Consider again this pcretest example:
7725    
7726             re> /dog(sbody)?/             re> /dog(sbody)?/
# Line 7700  ISSUES WITH MULTI-SEGMENT MATCHING Line 7734  ISSUES WITH MULTI-SEGMENT MATCHING
7734            0: dogsbody            0: dogsbody
7735            1: dog            1: dog
7736    
7737         The  first  data  line passes the string "dogsb" to a standard matching         The first data line passes the string "dogsb" to  a  standard  matching
7738         function, setting the PCRE_PARTIAL_SOFT option. Although the string  is         function,  setting the PCRE_PARTIAL_SOFT option. Although the string is
7739         a  partial  match for "dogsbody", the result is not PCRE_ERROR_PARTIAL,         a partial match for "dogsbody", the result is  not  PCRE_ERROR_PARTIAL,
7740         because the shorter string "dog" is a complete match.  Similarly,  when         because  the  shorter string "dog" is a complete match. Similarly, when
7741         the  subject  is  presented to a DFA matching function in several parts         the subject is presented to a DFA matching function  in  several  parts
7742         ("do" and "gsb" being the first two) the match  stops  when  "dog"  has         ("do"  and  "gsb"  being  the first two) the match stops when "dog" has
7743         been  found, and it is not possible to continue.  On the other hand, if         been found, and it is not possible to continue.  On the other hand,  if
7744         "dogsbody" is presented as a single string,  a  DFA  matching  function         "dogsbody"  is  presented  as  a single string, a DFA matching function
7745         finds both matches.         finds both matches.
7746    
7747         Because  of  these  problems,  it is best to use PCRE_PARTIAL_HARD when         Because of these problems, it is best  to  use  PCRE_PARTIAL_HARD  when
7748         matching multi-segment data. The example  above  then  behaves  differ-         matching  multi-segment  data.  The  example above then behaves differ-
7749         ently:         ently:
7750    
7751             re> /dog(sbody)?/             re> /dog(sbody)?/
# Line 7723  ISSUES WITH MULTI-SEGMENT MATCHING Line 7757  ISSUES WITH MULTI-SEGMENT MATCHING
7757           Partial match: gsb           Partial match: gsb
7758    
7759         4. Patterns that contain alternatives at the top level which do not all         4. Patterns that contain alternatives at the top level which do not all
7760         start with the  same  pattern  item  may  not  work  as  expected  when         start  with  the  same  pattern  item  may  not  work  as expected when
7761         PCRE_DFA_RESTART is used. For example, consider this pattern:         PCRE_DFA_RESTART is used. For example, consider this pattern:
7762    
7763           1234|3789           1234|3789
7764    
7765         If  the  first  part of the subject is "ABC123", a partial match of the         If the first part of the subject is "ABC123", a partial  match  of  the
7766         first alternative is found at offset 3. There is no partial  match  for         first  alternative  is found at offset 3. There is no partial match for
7767         the second alternative, because such a match does not start at the same         the second alternative, because such a match does not start at the same
7768         point in the subject string. Attempting to  continue  with  the  string         point  in  the  subject  string. Attempting to continue with the string
7769         "7890"  does  not  yield  a  match because only those alternatives that         "7890" does not yield a match  because  only  those  alternatives  that
7770         match at one point in the subject are remembered.  The  problem  arises         match  at  one  point in the subject are remembered. The problem arises
7771         because  the  start  of the second alternative matches within the first         because the start of the second alternative matches  within  the  first
7772         alternative. There is no problem with  anchored  patterns  or  patterns         alternative.  There  is  no  problem with anchored patterns or patterns
7773         such as:         such as:
7774    
7775           1234|ABCD           1234|ABCD
7776    
7777         where  no  string can be a partial match for both alternatives. This is         where no string can be a partial match for both alternatives.  This  is
7778         not a problem if a standard matching  function  is  used,  because  the         not  a  problem  if  a  standard matching function is used, because the
7779         entire match has to be rerun each time:         entire match has to be rerun each time:
7780    
7781             re> /1234|3789/             re> /1234|3789/
# Line 7751  ISSUES WITH MULTI-SEGMENT MATCHING Line 7785  ISSUES WITH MULTI-SEGMENT MATCHING
7785            0: 3789            0: 3789
7786    
7787         Of course, instead of using PCRE_DFA_RESTART, the same technique of re-         Of course, instead of using PCRE_DFA_RESTART, the same technique of re-
7788         running the entire match can also be used with the DFA  matching  func-         running  the  entire match can also be used with the DFA matching func-
7789         tions.  Another  possibility  is to work with two buffers. If a partial         tions. Another possibility is to work with two buffers.  If  a  partial
7790         match at offset n in the first buffer is followed by  "no  match"  when         match  at  offset  n in the first buffer is followed by "no match" when
7791         PCRE_DFA_RESTART  is  used on the second buffer, you can then try a new         PCRE_DFA_RESTART is used on the second buffer, you can then try  a  new
7792         match starting at offset n+1 in the first buffer.         match starting at offset n+1 in the first buffer.
7793    
7794    
# Line 7767  AUTHOR Line 7801  AUTHOR
7801    
7802  REVISION  REVISION
7803    
7804         Last updated: 08 January 2012         Last updated: 21 January 2012
7805         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7806  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7807    
7808    
7809  PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)  PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)
7810    
7811    
# Line 7905  REVISION Line 7939  REVISION
7939         Last updated: 10 January 2012         Last updated: 10 January 2012
7940         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
7941  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
7942    
7943    
7944  PCREPERFORM(3)                                                  PCREPERFORM(3)  PCREPERFORM(3)                                                  PCREPERFORM(3)
7945    
7946    
# Line 8075  REVISION Line 8109  REVISION
8109         Last updated: 09 January 2012         Last updated: 09 January 2012
8110         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8111  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8112    
8113    
8114  PCREPOSIX(3)                                                      PCREPOSIX(3)  PCREPOSIX(3)                                                      PCREPOSIX(3)
8115    
8116    
# Line 8339  REVISION Line 8373  REVISION
8373         Last updated: 09 January 2012         Last updated: 09 January 2012
8374         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8375  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8376    
8377    
8378  PCRECPP(3)                                                          PCRECPP(3)  PCRECPP(3)                                                          PCRECPP(3)
8379    
8380    
# Line 8681  REVISION Line 8715  REVISION
8715    
8716         Last updated: 08 January 2012         Last updated: 08 January 2012
8717  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8718    
8719    
8720  PCRESAMPLE(3)                                                    PCRESAMPLE(3)  PCRESAMPLE(3)                                                    PCRESAMPLE(3)
8721    
8722    
# Line 8825  REVISION Line 8859  REVISION
8859         Last updated: 08 January 2012         Last updated: 08 January 2012
8860         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
8861  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
8862    
8863    
8864  PCRESTACK(3)                                                      PCRESTACK(3)  PCRESTACK(3)                                                      PCRESTACK(3)
8865    
8866    
# Line 8944  PCRE DISCUSSION OF STACK USAGE Line 8978  PCRE DISCUSSION OF STACK USAGE
8978         subject string. This is done by calling pcre[16]_exec() repeatedly with         subject string. This is done by calling pcre[16]_exec() repeatedly with
8979         different limits.         different limits.
8980    
8981       Obtaining an estimate of stack usage
8982    
8983           The  actual  amount  of  stack used per recursion can vary quite a lot,
8984           depending on the compiler that was used to build PCRE and the optimiza-
8985           tion or debugging options that were set for it. The rule of thumb value
8986           of 500 bytes mentioned above may be larger  or  smaller  than  what  is
8987           actually needed. A better approximation can be obtained by running this
8988           command:
8989    
8990             pcretest -m -C
8991    
8992           The -C option causes pcretest to output information about  the  options
8993           with which PCRE was compiled. When -m is also given (before -C), infor-
8994           mation about stack use is given in a line like this:
8995    
8996             Match recursion uses stack: approximate frame size = 640 bytes
8997    
8998           The value is approximate because some recursions need a bit more (up to
8999           perhaps 16 more bytes).
9000    
9001           If  the  above  command  is given when PCRE is compiled to use the heap
9002           instead of the stack for recursion, the value that  is  output  is  the
9003           size of each block that is obtained from the heap.
9004    
9005     Changing stack size in Unix-like systems     Changing stack size in Unix-like systems
9006    
9007         In  Unix-like environments, there is not often a problem with the stack         In  Unix-like environments, there is not often a problem with the stack
# Line 8983  AUTHOR Line 9041  AUTHOR
9041    
9042  REVISION  REVISION
9043    
9044         Last updated: 10 January 2012         Last updated: 21 January 2012
9045         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.
9046  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
9047    
9048    

Legend:
Removed from v.902  
changed lines
  Added in v.903

  ViewVC Help
Powered by ViewVC 1.1.5