/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 358 by ph10, Wed Jul 9 11:03:07 2008 UTC revision 371 by ph10, Mon Aug 25 18:28:05 2008 UTC
# Line 2047  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2047  MATCHING A PATTERN: THE TRADITIONAL FUNC
2047     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
2048    
2049         The  subject string is passed to pcre_exec() as a pointer in subject, a         The  subject string is passed to pcre_exec() as a pointer in subject, a
2050         length in length, and a starting byte offset in startoffset.  In  UTF-8         length (in bytes) in length, and a starting byte offset in startoffset.
2051         mode,  the  byte  offset  must point to the start of a UTF-8 character.         In UTF-8 mode, the byte offset must point to the start of a UTF-8 char-
2052         Unlike the pattern string, the subject may contain binary  zero  bytes.         acter. Unlike the pattern string, the subject may contain  binary  zero
2053         When  the starting offset is zero, the search for a match starts at the         bytes.  When the starting offset is zero, the search for a match starts
2054         beginning of the subject, and this is by far the most common case.         at the beginning of the subject, and this is by  far  the  most  common
2055           case.
2056         A non-zero starting offset is useful when searching for  another  match  
2057         in  the same subject by calling pcre_exec() again after a previous suc-         A  non-zero  starting offset is useful when searching for another match
2058         cess.  Setting startoffset differs from just passing over  a  shortened         in the same subject by calling pcre_exec() again after a previous  suc-
2059         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins         cess.   Setting  startoffset differs from just passing over a shortened
2060           string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
2061         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
2062    
2063           \Biss\B           \Biss\B
2064    
2065         which finds occurrences of "iss" in the middle of  words.  (\B  matches         which  finds  occurrences  of "iss" in the middle of words. (\B matches
2066         only  if  the  current position in the subject is not a word boundary.)         only if the current position in the subject is not  a  word  boundary.)
2067         When applied to the string "Mississipi" the first call  to  pcre_exec()         When  applied  to the string "Mississipi" the first call to pcre_exec()
2068         finds  the  first  occurrence. If pcre_exec() is called again with just         finds the first occurrence. If pcre_exec() is called  again  with  just
2069         the remainder of the subject,  namely  "issipi",  it  does  not  match,         the  remainder  of  the  subject,  namely  "issipi", it does not match,
2070         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
2071         to be a word boundary. However, if pcre_exec()  is  passed  the  entire         to  be  a  word  boundary. However, if pcre_exec() is passed the entire
2072         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
2073         rence of "iss" because it is able to look behind the starting point  to         rence  of "iss" because it is able to look behind the starting point to
2074         discover that it is preceded by a letter.         discover that it is preceded by a letter.
2075    
2076         If  a  non-zero starting offset is passed when the pattern is anchored,         If a non-zero starting offset is passed when the pattern  is  anchored,
2077         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
2078         if  the  pattern  does  not require the match to be at the start of the         if the pattern does not require the match to be at  the  start  of  the
2079         subject.         subject.
2080    
2081     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
2082    
2083         In general, a pattern matches a certain portion of the subject, and  in         In  general, a pattern matches a certain portion of the subject, and in
2084         addition,  further  substrings  from  the  subject may be picked out by         addition, further substrings from the subject  may  be  picked  out  by
2085         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,
2086         this  is  called "capturing" in what follows, and the phrase "capturing         this is called "capturing" in what follows, and the  phrase  "capturing
2087         subpattern" is used for a fragment of a pattern that picks out  a  sub-         subpattern"  is  used for a fragment of a pattern that picks out a sub-
2088         string.  PCRE  supports several other kinds of parenthesized subpattern         string. PCRE supports several other kinds of  parenthesized  subpattern
2089         that do not cause substrings to be captured.         that do not cause substrings to be captured.
2090    
2091         Captured substrings are returned to the caller via a vector of  integer         Captured substrings are returned to the caller via a vector of integers
2092         offsets  whose  address is passed in ovector. The number of elements in         whose address is passed in ovector. The number of elements in the  vec-
2093         the vector is passed in ovecsize, which must be a non-negative  number.         tor  is  passed in ovecsize, which must be a non-negative number. Note:
2094         Note: this argument is NOT the size of ovector in bytes.         this argument is NOT the size of ovector in bytes.
2095    
2096         The  first  two-thirds of the vector is used to pass back captured sub-         The first two-thirds of the vector is used to pass back  captured  sub-
2097         strings, each substring using a pair of integers. The  remaining  third         strings,  each  substring using a pair of integers. The remaining third
2098         of  the  vector is used as workspace by pcre_exec() while matching cap-         of the vector is used as workspace by pcre_exec() while  matching  cap-
2099         turing subpatterns, and is not available for passing back  information.         turing  subpatterns, and is not available for passing back information.
2100         The  length passed in ovecsize should always be a multiple of three. If         The number passed in ovecsize should always be a multiple of three.  If
2101         it is not, it is rounded down.         it is not, it is rounded down.
2102    
2103         When a match is successful, information about  captured  substrings  is         When  a  match  is successful, information about captured substrings is
2104         returned  in  pairs  of integers, starting at the beginning of ovector,         returned in pairs of integers, starting at the  beginning  of  ovector,
2105         and continuing up to two-thirds of its length at the  most.  The  first         and  continuing  up  to two-thirds of its length at the most. The first
2106         element of a pair is set to the offset of the first character in a sub-         element of each pair is set to the byte offset of the  first  character
2107         string, and the second is set to the  offset  of  the  first  character         in  a  substring, and the second is set to the byte offset of the first
2108         after  the  end  of  a  substring. The first pair, ovector[0] and ovec-         character after the end of a substring. Note: these values  are  always
2109         tor[1], identify the portion of  the  subject  string  matched  by  the         byte offsets, even in UTF-8 mode. They are not character counts.
2110         entire  pattern.  The next pair is used for the first capturing subpat-  
2111         tern, and so on. The value returned by pcre_exec() is one more than the         The  first  pair  of  integers, ovector[0] and ovector[1], identify the
2112         highest numbered pair that has been set. For example, if two substrings         portion of the subject string matched by the entire pattern.  The  next
2113         have been captured, the returned value is 3. If there are no  capturing         pair  is  used for the first capturing subpattern, and so on. The value
2114         subpatterns,  the return value from a successful match is 1, indicating         returned by pcre_exec() is one more than the highest numbered pair that
2115         that just the first pair of offsets has been set.         has  been  set.  For example, if two substrings have been captured, the
2116           returned value is 3. If there are no capturing subpatterns, the  return
2117           value from a successful match is 1, indicating that just the first pair
2118           of offsets has been set.
2119    
2120         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
2121         of the string that it matched that is returned.         of the string that it matched that is returned.
2122    
2123         If  the vector is too small to hold all the captured substring offsets,         If  the vector is too small to hold all the captured substring offsets,
2124         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
2125         function  returns a value of zero. In particular, if the substring off-         function  returns  a value of zero. If the substring offsets are not of
2126         sets are not of interest, pcre_exec() may be called with ovector passed         interest, pcre_exec() may be called with ovector  passed  as  NULL  and
2127         as  NULL  and  ovecsize  as zero. However, if the pattern contains back         ovecsize  as zero. However, if the pattern contains back references and
2128         references and the ovector is not big enough to  remember  the  related         the ovector is not big enough to remember the related substrings,  PCRE
2129         substrings,  PCRE has to get additional memory for use during matching.         has  to  get additional memory for use during matching. Thus it is usu-
2130         Thus it is usually advisable to supply an ovector.         ally advisable to supply an ovector.
2131    
2132         The pcre_info() function can be used to find  out  how  many  capturing         The pcre_info() function can be used to find  out  how  many  capturing
2133         subpatterns  there  are  in  a  compiled pattern. The smallest size for         subpatterns  there  are  in  a  compiled pattern. The smallest size for
# Line 2604  AUTHOR Line 2608  AUTHOR
2608    
2609  REVISION  REVISION
2610    
2611         Last updated: 12 April 2008         Last updated: 24 August 2008
2612         Copyright (c) 1997-2008 University of Cambridge.         Copyright (c) 1997-2008 University of Cambridge.
2613  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2614    

Legend:
Removed from v.358  
changed lines
  Added in v.371

  ViewVC Help
Powered by ViewVC 1.1.5