/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 148 by ph10, Mon Apr 16 13:25:10 2007 UTC revision 150 by ph10, Tue Apr 17 08:22:40 2007 UTC
# Line 312  CODE VALUE OF NEWLINE Line 312  CODE VALUE OF NEWLINE
312    
313         to the configure command. There is a fourth option, specified by         to the configure command. There is a fourth option, specified by
314    
315             --enable-newline-is-anycrlf
316    
317           which causes PCRE to recognize any of the three sequences  CR,  LF,  or
318           CRLF as indicating a line ending. Finally, a fifth option, specified by
319    
320           --enable-newline-is-any           --enable-newline-is-any
321    
322         which causes PCRE to recognize any Unicode newline sequence.         causes PCRE to recognize any Unicode newline sequence.
323    
324         Whatever line ending convention is selected when PCRE is built  can  be         Whatever line ending convention is selected when PCRE is built  can  be
325         overridden  when  the library functions are called. At build time it is         overridden  when  the library functions are called. At build time it is
# Line 468  AUTHOR Line 473  AUTHOR
473    
474  REVISION  REVISION
475    
476         Last updated: 20 March 2007         Last updated: 16 April 2007
477         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
478  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
479    
# Line 841  PCRE API OVERVIEW Line 846  PCRE API OVERVIEW
846    
847  NEWLINES  NEWLINES
848    
849         PCRE  supports four different conventions for indicating line breaks in         PCRE  supports five different conventions for indicating line breaks in
850         strings: a single CR (carriage return) character, a  single  LF  (line-         strings: a single CR (carriage return) character, a  single  LF  (line-
851         feed)  character,  the two-character sequence CRLF, or any Unicode new-         feed) character, the two-character sequence CRLF, any of the three pre-
852         line sequence.  The Unicode newline sequences are the three  just  men-         ceding, or any Unicode newline sequence. The Unicode newline  sequences
853         tioned, plus the single characters VT (vertical tab, U+000B), FF (form-         are  the  three just mentioned, plus the single characters VT (vertical
854         feed, U+000C), NEL (next line, U+0085), LS  (line  separator,  U+2028),         tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
855         and PS (paragraph separator, U+2029).         separator, U+2028), and PS (paragraph separator, U+2029).
856    
857         Each  of  the first three conventions is used by at least one operating         Each  of  the first three conventions is used by at least one operating
858         system as its standard newline sequence. When PCRE is built, a  default         system as its standard newline sequence. When PCRE is built, a  default
# Line 912  CHECKING BUILD-TIME OPTIONS Line 917  CHECKING BUILD-TIME OPTIONS
917    
918         The output is an integer whose value specifies  the  default  character         The output is an integer whose value specifies  the  default  character
919         sequence  that is recognized as meaning "newline". The four values that         sequence  that is recognized as meaning "newline". The four values that
920         are supported are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY.         are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
921         The default should normally be the standard sequence for your operating         and  -1  for  ANY. The default should normally be the standard sequence
922         system.         for your operating system.
923    
924           PCRE_CONFIG_LINK_SIZE           PCRE_CONFIG_LINK_SIZE
925    
# Line 1138  COMPILING A PATTERN Line 1143  COMPILING A PATTERN
1143           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
1144           PCRE_NEWLINE_LF           PCRE_NEWLINE_LF
1145           PCRE_NEWLINE_CRLF           PCRE_NEWLINE_CRLF
1146             PCRE_NEWLINE_ANYCRLF
1147           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1148    
1149         These  options  override the default newline definition that was chosen         These  options  override the default newline definition that was chosen
1150         when PCRE was built. Setting the first or the second specifies  that  a         when PCRE was built. Setting the first or the second specifies  that  a
1151         newline  is  indicated  by a single character (CR or LF, respectively).         newline  is  indicated  by a single character (CR or LF, respectively).
1152         Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the         Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
1153         two-character  CRLF  sequence.  Setting PCRE_NEWLINE_ANY specifies that         two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
1154         any Unicode newline sequence should be recognized. The Unicode  newline         that any of the three preceding sequences should be recognized. Setting
1155         sequences  are  the three just mentioned, plus the single characters VT         PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
1156         (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085),         recognized. The Unicode newline sequences are the three just mentioned,
1157         LS  (line separator, U+2028), and PS (paragraph separator, U+2029). The         plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
1158         last two are recognized only in UTF-8 mode.         U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
1159           (paragraph  separator,  U+2029).  The  last  two are recognized only in
1160           UTF-8 mode.
1161    
1162         The newline setting in the  options  word  uses  three  bits  that  are         The newline setting in the  options  word  uses  three  bits  that  are
1163         treated  as  a  number, giving eight possibilities. Currently only five         treated as a number, giving eight possibilities. Currently only six are
1164         are used (default plus the four values above). This means that  if  you         used (default plus the five values above). This means that if  you  set
1165         set  more  than  one  newline option, the combination may or may not be         more  than one newline option, the combination may or may not be sensi-
1166         sensible. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is  equiva-         ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
1167         lent  to PCRE_NEWLINE_CRLF, but other combinations yield unused numbers         PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
1168         and cause an error.         cause an error.
1169    
1170         The only time that a line break is specially recognized when  compiling         The only time that a line break is specially recognized when  compiling
1171         a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a         a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
# Line 1725  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1733  MATCHING A PATTERN: THE TRADITIONAL FUNC
1733           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
1734           PCRE_NEWLINE_LF           PCRE_NEWLINE_LF
1735           PCRE_NEWLINE_CRLF           PCRE_NEWLINE_CRLF
1736             PCRE_NEWLINE_ANYCRLF
1737           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1738    
1739         These options override  the  newline  definition  that  was  chosen  or         These options override  the  newline  definition  that  was  chosen  or
# Line 1732  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1741  MATCHING A PATTERN: THE TRADITIONAL FUNC
1741         tion of pcre_compile()  above.  During  matching,  the  newline  choice         tion of pcre_compile()  above.  During  matching,  the  newline  choice
1742         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-
1743         ters. It may also alter the way the match position is advanced after  a         ters. It may also alter the way the match position is advanced after  a
1744         match  failure  for  an  unanchored  pattern. When PCRE_NEWLINE_CRLF or         match  failure  for  an  unanchored  pattern.  When  PCRE_NEWLINE_CRLF,
1745         PCRE_NEWLINE_ANY is set, and a match attempt  fails  when  the  current         PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a  match  attempt
1746         position  is  at a CRLF sequence, the match position is advanced by two         fails  when the current position is at a CRLF sequence, the match posi-
1747         characters instead of one, in other words, to after the CRLF.         tion is advanced by two characters instead of one, in other  words,  to
1748           after the CRLF.
1749    
1750           PCRE_NOTBOL           PCRE_NOTBOL
1751    
1752         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
1753         the  beginning  of  a  line, so the circumflex metacharacter should not         the beginning of a line, so the  circumflex  metacharacter  should  not
1754         match before it. Setting this without PCRE_MULTILINE (at compile  time)         match  before it. Setting this without PCRE_MULTILINE (at compile time)
1755         causes  circumflex  never to match. This option affects only the behav-         causes circumflex never to match. This option affects only  the  behav-
1756         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
1757    
1758           PCRE_NOTEOL           PCRE_NOTEOL
1759    
1760         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
1761         of  a line, so the dollar metacharacter should not match it nor (except         of a line, so the dollar metacharacter should not match it nor  (except
1762         in multiline mode) a newline immediately before it. Setting this  with-         in  multiline mode) a newline immediately before it. Setting this with-
1763         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
1764         option affects only the behaviour of the dollar metacharacter. It  does         option  affects only the behaviour of the dollar metacharacter. It does
1765         not affect \Z or \z.         not affect \Z or \z.
1766    
1767           PCRE_NOTEMPTY           PCRE_NOTEMPTY
1768    
1769         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
1770         set. If there are alternatives in the pattern, they are tried.  If  all         set.  If  there are alternatives in the pattern, they are tried. If all
1771         the  alternatives  match  the empty string, the entire match fails. For         the alternatives match the empty string, the entire  match  fails.  For
1772         example, if the pattern         example, if the pattern
1773    
1774           a?b?           a?b?
1775    
1776         is applied to a string not beginning with "a" or "b",  it  matches  the         is  applied  to  a string not beginning with "a" or "b", it matches the
1777         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
1778         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
1779         rences of "a" or "b".         rences of "a" or "b".
1780    
1781         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
1782         cial case of a pattern match of the empty  string  within  its  split()         cial  case  of  a  pattern match of the empty string within its split()
1783         function,  and  when  using  the /g modifier. It is possible to emulate         function, and when using the /g modifier. It  is  possible  to  emulate
1784         Perl's behaviour after matching a null string by first trying the match         Perl's behaviour after matching a null string by first trying the match
1785         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
1786         if that fails by advancing the starting offset (see below)  and  trying         if  that  fails by advancing the starting offset (see below) and trying
1787         an ordinary match again. There is some code that demonstrates how to do         an ordinary match again. There is some code that demonstrates how to do
1788         this in the pcredemo.c sample program.         this in the pcredemo.c sample program.
1789    
1790           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
1791    
1792         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
1793         UTF-8  string is automatically checked when pcre_exec() is subsequently         UTF-8 string is automatically checked when pcre_exec() is  subsequently
1794         called.  The value of startoffset is also checked  to  ensure  that  it         called.   The  value  of  startoffset is also checked to ensure that it
1795         points  to the start of a UTF-8 character. If an invalid UTF-8 sequence         points to the start of a UTF-8 character. If an invalid UTF-8  sequence
1796         of bytes is found, pcre_exec() returns the error PCRE_ERROR_BADUTF8. If         of bytes is found, pcre_exec() returns the error PCRE_ERROR_BADUTF8. If
1797         startoffset  contains  an  invalid  value, PCRE_ERROR_BADUTF8_OFFSET is         startoffset contains an  invalid  value,  PCRE_ERROR_BADUTF8_OFFSET  is
1798         returned.         returned.
1799    
1800         If you already know that your subject is valid, and you  want  to  skip         If  you  already  know that your subject is valid, and you want to skip
1801         these    checks    for   performance   reasons,   you   can   set   the         these   checks   for   performance   reasons,   you   can    set    the
1802         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
1803         do  this  for the second and subsequent calls to pcre_exec() if you are         do this for the second and subsequent calls to pcre_exec() if  you  are
1804         making repeated calls to find all  the  matches  in  a  single  subject         making  repeated  calls  to  find  all  the matches in a single subject
1805         string.  However,  you  should  be  sure  that the value of startoffset         string. However, you should be  sure  that  the  value  of  startoffset
1806         points to the start of a UTF-8 character.  When  PCRE_NO_UTF8_CHECK  is         points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
1807         set,  the  effect of passing an invalid UTF-8 string as a subject, or a         set, the effect of passing an invalid UTF-8 string as a subject,  or  a
1808         value of startoffset that does not point to the start of a UTF-8  char-         value  of startoffset that does not point to the start of a UTF-8 char-
1809         acter, is undefined. Your program may crash.         acter, is undefined. Your program may crash.
1810    
1811           PCRE_PARTIAL           PCRE_PARTIAL
1812    
1813         This  option  turns  on  the  partial  matching feature. If the subject         This option turns on the  partial  matching  feature.  If  the  subject
1814         string fails to match the pattern, but at some point during the  match-         string  fails to match the pattern, but at some point during the match-
1815         ing  process  the  end of the subject was reached (that is, the subject         ing process the end of the subject was reached (that  is,  the  subject
1816         partially matches the pattern and the failure to  match  occurred  only         partially  matches  the  pattern and the failure to match occurred only
1817         because  there were not enough subject characters), pcre_exec() returns         because there were not enough subject characters), pcre_exec()  returns
1818         PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL  is         PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
1819         used,  there  are restrictions on what may appear in the pattern. These         used, there are restrictions on what may appear in the  pattern.  These
1820         are discussed in the pcrepartial documentation.         are discussed in the pcrepartial documentation.
1821    
1822     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
1823    
1824         The subject string is passed to pcre_exec() as a pointer in subject,  a         The  subject string is passed to pcre_exec() as a pointer in subject, a
1825         length  in  length, and a starting byte offset in startoffset. In UTF-8         length in length, and a starting byte offset in startoffset.  In  UTF-8
1826         mode, the byte offset must point to the start  of  a  UTF-8  character.         mode,  the  byte  offset  must point to the start of a UTF-8 character.
1827         Unlike  the  pattern string, the subject may contain binary zero bytes.         Unlike the pattern string, the subject may contain binary  zero  bytes.
1828         When the starting offset is zero, the search for a match starts at  the         When  the starting offset is zero, the search for a match starts at the
1829         beginning of the subject, and this is by far the most common case.         beginning of the subject, and this is by far the most common case.
1830    
1831         A  non-zero  starting offset is useful when searching for another match         A non-zero starting offset is useful when searching for  another  match
1832         in the same subject by calling pcre_exec() again after a previous  suc-         in  the same subject by calling pcre_exec() again after a previous suc-
1833         cess.   Setting  startoffset differs from just passing over a shortened         cess.  Setting startoffset differs from just passing over  a  shortened
1834         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
1835         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
1836    
1837           \Biss\B           \Biss\B
1838    
1839         which  finds  occurrences  of "iss" in the middle of words. (\B matches         which finds occurrences of "iss" in the middle of  words.  (\B  matches
1840         only if the current position in the subject is not  a  word  boundary.)         only  if  the  current position in the subject is not a word boundary.)
1841         When  applied  to the string "Mississipi" the first call to pcre_exec()         When applied to the string "Mississipi" the first call  to  pcre_exec()
1842         finds the first occurrence. If pcre_exec() is called  again  with  just         finds  the  first  occurrence. If pcre_exec() is called again with just
1843         the  remainder  of  the  subject,  namely  "issipi", it does not match,         the remainder of the subject,  namely  "issipi",  it  does  not  match,
1844         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
1845         to  be  a  word  boundary. However, if pcre_exec() is passed the entire         to be a word boundary. However, if pcre_exec()  is  passed  the  entire
1846         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
1847         rence  of "iss" because it is able to look behind the starting point to         rence of "iss" because it is able to look behind the starting point  to
1848         discover that it is preceded by a letter.         discover that it is preceded by a letter.
1849    
1850         If a non-zero starting offset is passed when the pattern  is  anchored,         If  a  non-zero starting offset is passed when the pattern is anchored,
1851         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
1852         if the pattern does not require the match to be at  the  start  of  the         if  the  pattern  does  not require the match to be at the start of the
1853         subject.         subject.
1854    
1855     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
1856    
1857         In  general, a pattern matches a certain portion of the subject, and in         In general, a pattern matches a certain portion of the subject, and  in
1858         addition, further substrings from the subject  may  be  picked  out  by         addition,  further  substrings  from  the  subject may be picked out by
1859         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
1860         this is called "capturing" in what follows, and the  phrase  "capturing         this  is  called "capturing" in what follows, and the phrase "capturing
1861         subpattern"  is  used for a fragment of a pattern that picks out a sub-         subpattern" is used for a fragment of a pattern that picks out  a  sub-
1862         string. PCRE supports several other kinds of  parenthesized  subpattern         string.  PCRE  supports several other kinds of parenthesized subpattern
1863         that do not cause substrings to be captured.         that do not cause substrings to be captured.
1864    
1865         Captured  substrings are returned to the caller via a vector of integer         Captured substrings are returned to the caller via a vector of  integer
1866         offsets whose address is passed in ovector. The number of  elements  in         offsets  whose  address is passed in ovector. The number of elements in
1867         the  vector is passed in ovecsize, which must be a non-negative number.         the vector is passed in ovecsize, which must be a non-negative  number.
1868         Note: this argument is NOT the size of ovector in bytes.         Note: this argument is NOT the size of ovector in bytes.
1869    
1870         The first two-thirds of the vector is used to pass back  captured  sub-         The  first  two-thirds of the vector is used to pass back captured sub-
1871         strings,  each  substring using a pair of integers. The remaining third         strings, each substring using a pair of integers. The  remaining  third
1872         of the vector is used as workspace by pcre_exec() while  matching  cap-         of  the  vector is used as workspace by pcre_exec() while matching cap-
1873         turing  subpatterns, and is not available for passing back information.         turing subpatterns, and is not available for passing back  information.
1874         The length passed in ovecsize should always be a multiple of three.  If         The  length passed in ovecsize should always be a multiple of three. If
1875         it is not, it is rounded down.         it is not, it is rounded down.
1876    
1877         When  a  match  is successful, information about captured substrings is         When a match is successful, information about  captured  substrings  is
1878         returned in pairs of integers, starting at the  beginning  of  ovector,         returned  in  pairs  of integers, starting at the beginning of ovector,
1879         and  continuing  up  to two-thirds of its length at the most. The first         and continuing up to two-thirds of its length at the  most.  The  first
1880         element of a pair is set to the offset of the first character in a sub-         element of a pair is set to the offset of the first character in a sub-
1881         string,  and  the  second  is  set to the offset of the first character         string, and the second is set to the  offset  of  the  first  character
1882         after the end of a substring. The  first  pair,  ovector[0]  and  ovec-         after  the  end  of  a  substring. The first pair, ovector[0] and ovec-
1883         tor[1],  identify  the  portion  of  the  subject string matched by the         tor[1], identify the portion of  the  subject  string  matched  by  the
1884         entire pattern. The next pair is used for the first  capturing  subpat-         entire  pattern.  The next pair is used for the first capturing subpat-
1885         tern, and so on. The value returned by pcre_exec() is one more than the         tern, and so on. The value returned by pcre_exec() is one more than the
1886         highest numbered pair that has been set. For example, if two substrings         highest numbered pair that has been set. For example, if two substrings
1887         have  been captured, the returned value is 3. If there are no capturing         have been captured, the returned value is 3. If there are no  capturing
1888         subpatterns, the return value from a successful match is 1,  indicating         subpatterns,  the return value from a successful match is 1, indicating
1889         that just the first pair of offsets has been set.         that just the first pair of offsets has been set.
1890    
1891         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
1892         of the string that it matched that is returned.         of the string that it matched that is returned.
1893    
1894         If the vector is too small to hold all the captured substring  offsets,         If  the vector is too small to hold all the captured substring offsets,
1895         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
1896         function returns a value of zero. In particular, if the substring  off-         function  returns a value of zero. In particular, if the substring off-
1897         sets are not of interest, pcre_exec() may be called with ovector passed         sets are not of interest, pcre_exec() may be called with ovector passed
1898         as NULL and ovecsize as zero. However, if  the  pattern  contains  back         as  NULL  and  ovecsize  as zero. However, if the pattern contains back
1899         references  and  the  ovector is not big enough to remember the related         references and the ovector is not big enough to  remember  the  related
1900         substrings, PCRE has to get additional memory for use during  matching.         substrings,  PCRE has to get additional memory for use during matching.
1901         Thus it is usually advisable to supply an ovector.         Thus it is usually advisable to supply an ovector.
1902    
1903         The  pcre_info()  function  can  be used to find out how many capturing         The pcre_info() function can be used to find  out  how  many  capturing
1904         subpatterns there are in a compiled  pattern.  The  smallest  size  for         subpatterns  there  are  in  a  compiled pattern. The smallest size for
1905         ovector  that  will allow for n captured substrings, in addition to the         ovector that will allow for n captured substrings, in addition  to  the
1906         offsets of the substring matched by the whole pattern, is (n+1)*3.         offsets of the substring matched by the whole pattern, is (n+1)*3.
1907    
1908         It is possible for capturing subpattern number n+1 to match  some  part         It  is  possible for capturing subpattern number n+1 to match some part
1909         of the subject when subpattern n has not been used at all. For example,         of the subject when subpattern n has not been used at all. For example,
1910         if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the         if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
1911         return from the function is 4, and subpatterns 1 and 3 are matched, but         return from the function is 4, and subpatterns 1 and 3 are matched, but
1912         2 is not. When this happens, both values in  the  offset  pairs  corre-         2  is  not.  When  this happens, both values in the offset pairs corre-
1913         sponding to unused subpatterns are set to -1.         sponding to unused subpatterns are set to -1.
1914    
1915         Offset  values  that correspond to unused subpatterns at the end of the         Offset values that correspond to unused subpatterns at the end  of  the
1916         expression are also set to -1. For example,  if  the  string  "abc"  is         expression  are  also  set  to  -1. For example, if the string "abc" is
1917         matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not         matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
1918         matched. The return from the function is 2, because  the  highest  used         matched.  The  return  from the function is 2, because the highest used
1919         capturing subpattern number is 1. However, you can refer to the offsets         capturing subpattern number is 1. However, you can refer to the offsets
1920         for the second and third capturing subpatterns if  you  wish  (assuming         for  the  second  and third capturing subpatterns if you wish (assuming
1921         the vector is large enough, of course).         the vector is large enough, of course).
1922    
1923         Some  convenience  functions  are  provided for extracting the captured         Some convenience functions are provided  for  extracting  the  captured
1924         substrings as separate strings. These are described below.         substrings as separate strings. These are described below.
1925    
1926     Error return values from pcre_exec()     Error return values from pcre_exec()
1927    
1928         If pcre_exec() fails, it returns a negative number. The  following  are         If  pcre_exec()  fails, it returns a negative number. The following are
1929         defined in the header file:         defined in the header file:
1930    
1931           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 1924  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1934  MATCHING A PATTERN: THE TRADITIONAL FUNC
1934    
1935           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
1936    
1937         Either  code  or  subject  was  passed as NULL, or ovector was NULL and         Either code or subject was passed as NULL,  or  ovector  was  NULL  and
1938         ovecsize was not zero.         ovecsize was not zero.
1939    
1940           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 1933  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1943  MATCHING A PATTERN: THE TRADITIONAL FUNC
1943    
1944           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
1945    
1946         PCRE stores a 4-byte "magic number" at the start of the compiled  code,         PCRE  stores a 4-byte "magic number" at the start of the compiled code,
1947         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
1948         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
1949         an  environment  with the other endianness. This is the error that PCRE         an environment with the other endianness. This is the error  that  PCRE
1950         gives when the magic number is not present.         gives when the magic number is not present.
1951    
1952           PCRE_ERROR_UNKNOWN_OPCODE (-5)           PCRE_ERROR_UNKNOWN_OPCODE (-5)
1953    
1954         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
1955         compiled  pattern.  This  error  could be caused by a bug in PCRE or by         compiled pattern. This error could be caused by a bug  in  PCRE  or  by
1956         overwriting of the compiled pattern.         overwriting of the compiled pattern.
1957    
1958           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
1959    
1960         If a pattern contains back references, but the ovector that  is  passed         If  a  pattern contains back references, but the ovector that is passed
1961         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
1962         PCRE gets a block of memory at the start of matching to  use  for  this         PCRE  gets  a  block of memory at the start of matching to use for this
1963         purpose.  If the call via pcre_malloc() fails, this error is given. The         purpose. If the call via pcre_malloc() fails, this error is given.  The
1964         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
1965    
1966           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
1967    
1968         This error is used by the pcre_copy_substring(),  pcre_get_substring(),         This  error is used by the pcre_copy_substring(), pcre_get_substring(),
1969         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
1970         returned by pcre_exec().         returned by pcre_exec().
1971    
1972           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
1973    
1974         The backtracking limit, as specified by  the  match_limit  field  in  a         The  backtracking  limit,  as  specified  by the match_limit field in a
1975         pcre_extra  structure  (or  defaulted) was reached. See the description         pcre_extra structure (or defaulted) was reached.  See  the  description
1976         above.         above.
1977    
1978           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
1979    
1980         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
1981         use  by  callout functions that want to yield a distinctive error code.         use by callout functions that want to yield a distinctive  error  code.
1982         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
1983    
1984           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
1985    
1986         A string that contains an invalid UTF-8 byte sequence was passed  as  a         A  string  that contains an invalid UTF-8 byte sequence was passed as a
1987         subject.         subject.
1988    
1989           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
1990    
1991         The UTF-8 byte sequence that was passed as a subject was valid, but the         The UTF-8 byte sequence that was passed as a subject was valid, but the
1992         value of startoffset did not point to the beginning of a UTF-8  charac-         value  of startoffset did not point to the beginning of a UTF-8 charac-
1993         ter.         ter.
1994    
1995           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
1996    
1997         The  subject  string did not match, but it did match partially. See the         The subject string did not match, but it did match partially.  See  the
1998         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
1999    
2000           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
2001    
2002         The PCRE_PARTIAL option was used with  a  compiled  pattern  containing         The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
2003         items  that are not supported for partial matching. See the pcrepartial         items that are not supported for partial matching. See the  pcrepartial
2004         documentation for details of partial matching.         documentation for details of partial matching.
2005    
2006           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
2007    
2008         An unexpected internal error has occurred. This error could  be  caused         An  unexpected  internal error has occurred. This error could be caused
2009         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
2010    
2011           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
2012    
2013         This  error is given if the value of the ovecsize argument is negative.         This error is given if the value of the ovecsize argument is  negative.
2014    
2015           PCRE_ERROR_RECURSIONLIMIT (-21)           PCRE_ERROR_RECURSIONLIMIT (-21)
2016    
2017         The internal recursion limit, as specified by the match_limit_recursion         The internal recursion limit, as specified by the match_limit_recursion
2018         field  in  a  pcre_extra  structure (or defaulted) was reached. See the         field in a pcre_extra structure (or defaulted)  was  reached.  See  the
2019         description above.         description above.
2020    
2021           PCRE_ERROR_NULLWSLIMIT    (-22)           PCRE_ERROR_NULLWSLIMIT    (-22)
2022    
2023         When a group that can match an empty  substring  is  repeated  with  an         When  a  group  that  can  match an empty substring is repeated with an
2024         unbounded  upper  limit, the subject position at the start of the group         unbounded upper limit, the subject position at the start of  the  group
2025         must be remembered, so that a test for an empty string can be made when         must be remembered, so that a test for an empty string can be made when
2026         the  end  of the group is reached. Some workspace is required for this;         the end of the group is reached. Some workspace is required  for  this;
2027         if it runs out, this error is given.         if it runs out, this error is given.
2028    
2029           PCRE_ERROR_BADNEWLINE     (-23)           PCRE_ERROR_BADNEWLINE     (-23)
# Line 2036  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 2046  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
2046         int pcre_get_substring_list(const char *subject,         int pcre_get_substring_list(const char *subject,
2047              int *ovector, int stringcount, const char ***listptr);              int *ovector, int stringcount, const char ***listptr);
2048    
2049         Captured substrings can be  accessed  directly  by  using  the  offsets         Captured  substrings  can  be  accessed  directly  by using the offsets
2050         returned  by  pcre_exec()  in  ovector.  For convenience, the functions         returned by pcre_exec() in  ovector.  For  convenience,  the  functions
2051         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
2052         string_list()  are  provided for extracting captured substrings as new,         string_list() are provided for extracting captured substrings  as  new,
2053         separate, zero-terminated strings. These functions identify  substrings         separate,  zero-terminated strings. These functions identify substrings
2054         by  number.  The  next section describes functions for extracting named         by number. The next section describes functions  for  extracting  named
2055         substrings.         substrings.
2056    
2057         A substring that contains a binary zero is correctly extracted and  has         A  substring that contains a binary zero is correctly extracted and has
2058         a  further zero added on the end, but the result is not, of course, a C         a further zero added on the end, but the result is not, of course, a  C
2059         string.  However, you can process such a string  by  referring  to  the         string.   However,  you  can  process such a string by referring to the
2060         length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-         length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-
2061         string().  Unfortunately, the interface to pcre_get_substring_list() is         string().  Unfortunately, the interface to pcre_get_substring_list() is
2062         not  adequate for handling strings containing binary zeros, because the         not adequate for handling strings containing binary zeros, because  the
2063         end of the final string is not independently indicated.         end of the final string is not independently indicated.
2064    
2065         The first three arguments are the same for all  three  of  these  func-         The  first  three  arguments  are the same for all three of these func-
2066         tions:  subject  is  the subject string that has just been successfully         tions: subject is the subject string that has  just  been  successfully
2067         matched, ovector is a pointer to the vector of integer offsets that was         matched, ovector is a pointer to the vector of integer offsets that was
2068         passed to pcre_exec(), and stringcount is the number of substrings that         passed to pcre_exec(), and stringcount is the number of substrings that
2069         were captured by the match, including the substring  that  matched  the         were  captured  by  the match, including the substring that matched the
2070         entire regular expression. This is the value returned by pcre_exec() if         entire regular expression. This is the value returned by pcre_exec() if
2071         it is greater than zero. If pcre_exec() returned zero, indicating  that         it  is greater than zero. If pcre_exec() returned zero, indicating that
2072         it  ran out of space in ovector, the value passed as stringcount should         it ran out of space in ovector, the value passed as stringcount  should
2073         be the number of elements in the vector divided by three.         be the number of elements in the vector divided by three.
2074    
2075         The functions pcre_copy_substring() and pcre_get_substring() extract  a         The  functions pcre_copy_substring() and pcre_get_substring() extract a
2076         single  substring,  whose  number  is given as stringnumber. A value of         single substring, whose number is given as  stringnumber.  A  value  of
2077         zero extracts the substring that matched the  entire  pattern,  whereas         zero  extracts  the  substring that matched the entire pattern, whereas
2078         higher  values  extract  the  captured  substrings.  For pcre_copy_sub-         higher values  extract  the  captured  substrings.  For  pcre_copy_sub-
2079         string(), the string is placed in buffer,  whose  length  is  given  by         string(),  the  string  is  placed  in buffer, whose length is given by
2080         buffersize,  while  for  pcre_get_substring()  a new block of memory is         buffersize, while for pcre_get_substring() a new  block  of  memory  is
2081         obtained via pcre_malloc, and its address is  returned  via  stringptr.         obtained  via  pcre_malloc,  and its address is returned via stringptr.
2082         The  yield  of  the function is the length of the string, not including         The yield of the function is the length of the  string,  not  including
2083         the terminating zero, or one of these error codes:         the terminating zero, or one of these error codes:
2084    
2085           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2086    
2087         The buffer was too small for pcre_copy_substring(), or the  attempt  to         The  buffer  was too small for pcre_copy_substring(), or the attempt to
2088         get memory failed for pcre_get_substring().         get memory failed for pcre_get_substring().
2089    
2090           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2091    
2092         There is no substring whose number is stringnumber.         There is no substring whose number is stringnumber.
2093    
2094         The  pcre_get_substring_list()  function  extracts  all  available sub-         The pcre_get_substring_list()  function  extracts  all  available  sub-
2095         strings and builds a list of pointers to them. All this is  done  in  a         strings  and  builds  a list of pointers to them. All this is done in a
2096         single block of memory that is obtained via pcre_malloc. The address of         single block of memory that is obtained via pcre_malloc. The address of
2097         the memory block is returned via listptr, which is also  the  start  of         the  memory  block  is returned via listptr, which is also the start of
2098         the  list  of  string pointers. The end of the list is marked by a NULL         the list of string pointers. The end of the list is marked  by  a  NULL
2099         pointer. The yield of the function is zero if all  went  well,  or  the         pointer.  The  yield  of  the function is zero if all went well, or the
2100         error code         error code
2101    
2102           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2103    
2104         if the attempt to get the memory block failed.         if the attempt to get the memory block failed.
2105    
2106         When  any of these functions encounter a substring that is unset, which         When any of these functions encounter a substring that is unset,  which
2107         can happen when capturing subpattern number n+1 matches  some  part  of         can  happen  when  capturing subpattern number n+1 matches some part of
2108         the  subject, but subpattern n has not been used at all, they return an         the subject, but subpattern n has not been used at all, they return  an
2109         empty string. This can be distinguished from a genuine zero-length sub-         empty string. This can be distinguished from a genuine zero-length sub-
2110         string  by inspecting the appropriate offset in ovector, which is nega-         string by inspecting the appropriate offset in ovector, which is  nega-
2111         tive for unset substrings.         tive for unset substrings.
2112    
2113         The two convenience functions pcre_free_substring() and  pcre_free_sub-         The  two convenience functions pcre_free_substring() and pcre_free_sub-
2114         string_list()  can  be  used  to free the memory returned by a previous         string_list() can be used to free the memory  returned  by  a  previous
2115         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
2116         tively.  They  do  nothing  more  than  call the function pointed to by         tively. They do nothing more than  call  the  function  pointed  to  by
2117         pcre_free, which of course could be called directly from a  C  program.         pcre_free,  which  of course could be called directly from a C program.
2118         However,  PCRE is used in some situations where it is linked via a spe-         However, PCRE is used in some situations where it is linked via a  spe-
2119         cial  interface  to  another  programming  language  that  cannot   use         cial   interface  to  another  programming  language  that  cannot  use
2120         pcre_free  directly;  it is for these cases that the functions are pro-         pcre_free directly; it is for these cases that the functions  are  pro-
2121         vided.         vided.
2122    
2123    
# Line 2126  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2136  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2136              int stringcount, const char *stringname,              int stringcount, const char *stringname,
2137              const char **stringptr);              const char **stringptr);
2138    
2139         To extract a substring by name, you first have to find associated  num-         To  extract a substring by name, you first have to find associated num-
2140         ber.  For example, for this pattern         ber.  For example, for this pattern
2141    
2142           (a+)b(?<xxx>\d+)...           (a+)b(?<xxx>\d+)...
# Line 2135  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2145  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2145         be unique (PCRE_DUPNAMES was not set), you can find the number from the         be unique (PCRE_DUPNAMES was not set), you can find the number from the
2146         name by calling pcre_get_stringnumber(). The first argument is the com-         name by calling pcre_get_stringnumber(). The first argument is the com-
2147         piled pattern, and the second is the name. The yield of the function is         piled pattern, and the second is the name. The yield of the function is
2148         the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no         the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no
2149         subpattern of that name.         subpattern of that name.
2150    
2151         Given the number, you can extract the substring directly, or use one of         Given the number, you can extract the substring directly, or use one of
2152         the functions described in the previous section. For convenience, there         the functions described in the previous section. For convenience, there
2153         are also two functions that do the whole job.         are also two functions that do the whole job.
2154    
2155         Most   of   the   arguments    of    pcre_copy_named_substring()    and         Most    of    the    arguments   of   pcre_copy_named_substring()   and
2156         pcre_get_named_substring()  are  the  same  as  those for the similarly         pcre_get_named_substring() are the same  as  those  for  the  similarly
2157         named functions that extract by number. As these are described  in  the         named  functions  that extract by number. As these are described in the
2158         previous  section,  they  are not re-described here. There are just two         previous section, they are not re-described here. There  are  just  two
2159         differences:         differences:
2160    
2161         First, instead of a substring number, a substring name is  given.  Sec-         First,  instead  of a substring number, a substring name is given. Sec-
2162         ond, there is an extra argument, given at the start, which is a pointer         ond, there is an extra argument, given at the start, which is a pointer
2163         to the compiled pattern. This is needed in order to gain access to  the         to  the compiled pattern. This is needed in order to gain access to the
2164         name-to-number translation table.         name-to-number translation table.
2165    
2166         These  functions call pcre_get_stringnumber(), and if it succeeds, they         These functions call pcre_get_stringnumber(), and if it succeeds,  they
2167         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-         then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-
2168         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the         ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the
2169         behaviour may not be what you want (see the next section).         behaviour may not be what you want (see the next section).
2170    
2171    
# Line 2164  DUPLICATE SUBPATTERN NAMES Line 2174  DUPLICATE SUBPATTERN NAMES
2174         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
2175              const char *name, char **first, char **last);              const char *name, char **first, char **last);
2176    
2177         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
2178         subpatterns  are  not  required  to  be unique. Normally, patterns with         subpatterns are not required to  be  unique.  Normally,  patterns  with
2179         duplicate names are such that in any one match, only one of  the  named         duplicate  names  are such that in any one match, only one of the named
2180         subpatterns  participates. An example is shown in the pcrepattern docu-         subpatterns participates. An example is shown in the pcrepattern  docu-
2181         mentation. When duplicates are present, pcre_copy_named_substring() and         mentation. When duplicates are present, pcre_copy_named_substring() and
2182         pcre_get_named_substring()  return the first substring corresponding to         pcre_get_named_substring() return the first substring corresponding  to
2183         the given name that is set.  If  none  are  set,  an  empty  string  is         the  given  name  that  is  set.  If  none  are set, an empty string is
2184         returned.  The pcre_get_stringnumber() function returns one of the num-         returned.  The pcre_get_stringnumber() function returns one of the num-
2185         bers that are associated with the name, but it is not defined which  it         bers  that are associated with the name, but it is not defined which it
2186         is.         is.
2187    
2188         If  you want to get full details of all captured substrings for a given         If you want to get full details of all captured substrings for a  given
2189         name, you must use  the  pcre_get_stringtable_entries()  function.  The         name,  you  must  use  the pcre_get_stringtable_entries() function. The
2190         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
2191         third and fourth are pointers to variables which  are  updated  by  the         third  and  fourth  are  pointers to variables which are updated by the
2192         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
2193         the name-to-number table  for  the  given  name.  The  function  itself         the  name-to-number  table  for  the  given  name.  The function itself
2194         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
2195         there are none. The format of the table is described above in the  sec-         there  are none. The format of the table is described above in the sec-
2196         tion  entitled  Information  about  a  pattern.  Given all the relevant         tion entitled Information about a  pattern.   Given  all  the  relevant
2197         entries for the name, you can extract each of their numbers, and  hence         entries  for the name, you can extract each of their numbers, and hence
2198         the captured data, if any.         the captured data, if any.
2199    
2200    
2201  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
2202    
2203         The  traditional  matching  function  uses a similar algorithm to Perl,         The traditional matching function uses a  similar  algorithm  to  Perl,
2204         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
2205         the  subject.  If you want to find all possible matches, or the longest         the subject. If you want to find all possible matches, or  the  longest
2206         possible match, consider using the alternative matching  function  (see         possible  match,  consider using the alternative matching function (see
2207         below)  instead.  If you cannot use the alternative function, but still         below) instead. If you cannot use the alternative function,  but  still
2208         need to find all possible matches, you can kludge it up by  making  use         need  to  find all possible matches, you can kludge it up by making use
2209         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
2210         tation.         tation.
2211    
2212         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
2213         tern.   When your callout function is called, extract and save the cur-         tern.  When your callout function is called, extract and save the  cur-
2214         rent matched substring. Then return  1,  which  forces  pcre_exec()  to         rent  matched  substring.  Then  return  1, which forces pcre_exec() to
2215         backtrack  and  try other alternatives. Ultimately, when it runs out of         backtrack and try other alternatives. Ultimately, when it runs  out  of
2216         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
2217    
2218    
# Line 2213  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2223  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2223              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
2224              int *workspace, int wscount);              int *workspace, int wscount);
2225    
2226         The function pcre_dfa_exec()  is  called  to  match  a  subject  string         The  function  pcre_dfa_exec()  is  called  to  match  a subject string
2227         against  a  compiled pattern, using a matching algorithm that scans the         against a compiled pattern, using a matching algorithm that  scans  the
2228         subject string just once, and does not backtrack.  This  has  different         subject  string  just  once, and does not backtrack. This has different
2229         characteristics  to  the  normal  algorithm, and is not compatible with         characteristics to the normal algorithm, and  is  not  compatible  with
2230         Perl. Some of the features of PCRE patterns are not  supported.  Never-         Perl.  Some  of the features of PCRE patterns are not supported. Never-
2231         theless,  there are times when this kind of matching can be useful. For         theless, there are times when this kind of matching can be useful.  For
2232         a discussion of the two matching algorithms, see the pcrematching docu-         a discussion of the two matching algorithms, see the pcrematching docu-
2233         mentation.         mentation.
2234    
2235         The  arguments  for  the  pcre_dfa_exec()  function are the same as for         The arguments for the pcre_dfa_exec() function  are  the  same  as  for
2236         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
2237         ent  way,  and  this is described below. The other common arguments are         ent way, and this is described below. The other  common  arguments  are
2238         used in the same way as for pcre_exec(), so their  description  is  not         used  in  the  same way as for pcre_exec(), so their description is not
2239         repeated here.         repeated here.
2240    
2241         The  two  additional  arguments provide workspace for the function. The         The two additional arguments provide workspace for  the  function.  The
2242         workspace vector should contain at least 20 elements. It  is  used  for         workspace  vector  should  contain at least 20 elements. It is used for
2243         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
2244         workspace will be needed for patterns and subjects where  there  are  a         workspace  will  be  needed for patterns and subjects where there are a
2245         lot of potential matches.         lot of potential matches.
2246    
2247         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 2253  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2263  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2263    
2264     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
2265    
2266         The  unused  bits  of  the options argument for pcre_dfa_exec() must be         The unused bits of the options argument  for  pcre_dfa_exec()  must  be
2267         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
2268         LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,         LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,
2269         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
2270         three of these are the same as for pcre_exec(), so their description is         three of these are the same as for pcre_exec(), so their description is
2271         not repeated here.         not repeated here.
2272    
2273           PCRE_PARTIAL           PCRE_PARTIAL
2274    
2275         This has the same general effect as it does for  pcre_exec(),  but  the         This  has  the  same general effect as it does for pcre_exec(), but the
2276         details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for         details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for
2277         pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into         pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into
2278         PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have         PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have
2279         been no complete matches, but there is still at least one matching pos-         been no complete matches, but there is still at least one matching pos-
2280         sibility.  The portion of the string that provided the partial match is         sibility. The portion of the string that provided the partial match  is
2281         set as the first matching string.         set as the first matching string.
2282    
2283           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
2284    
2285         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
2286         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
2287         tive algorithm works, this is necessarily the shortest  possible  match         tive  algorithm  works, this is necessarily the shortest possible match
2288         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
2289    
2290           PCRE_DFA_RESTART           PCRE_DFA_RESTART
2291    
2292         When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and         When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and
2293         returns a partial match, it is possible to call it  again,  with  addi-         returns  a  partial  match, it is possible to call it again, with addi-
2294         tional  subject  characters,  and have it continue with the same match.         tional subject characters, and have it continue with  the  same  match.
2295         The PCRE_DFA_RESTART option requests this action; when it is  set,  the         The  PCRE_DFA_RESTART  option requests this action; when it is set, the
2296         workspace  and wscount options must reference the same vector as before         workspace and wscount options must reference the same vector as  before
2297         because data about the match so far is left in  them  after  a  partial         because  data  about  the  match so far is left in them after a partial
2298         match.  There  is  more  discussion of this facility in the pcrepartial         match. There is more discussion of this  facility  in  the  pcrepartial
2299         documentation.         documentation.
2300    
2301     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
2302    
2303         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
2304         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
2305         of the function start at the same point in  the  subject.  The  shorter         of  the  function  start  at the same point in the subject. The shorter
2306         matches  are all initial substrings of the longer matches. For example,         matches are all initial substrings of the longer matches. For  example,
2307         if the pattern         if the pattern
2308    
2309           <.*>           <.*>
# Line 2308  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2318  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2318           <something> <something else>           <something> <something else>
2319           <something> <something else> <something further>           <something> <something else> <something further>
2320    
2321         On success, the yield of the function is a number  greater  than  zero,         On  success,  the  yield of the function is a number greater than zero,
2322         which  is  the  number of matched substrings. The substrings themselves         which is the number of matched substrings.  The  substrings  themselves
2323         are returned in ovector. Each string uses two elements;  the  first  is         are  returned  in  ovector. Each string uses two elements; the first is
2324         the  offset  to  the start, and the second is the offset to the end. In         the offset to the start, and the second is the offset to  the  end.  In
2325         fact, all the strings have the same start  offset.  (Space  could  have         fact,  all  the  strings  have the same start offset. (Space could have
2326         been  saved by giving this only once, but it was decided to retain some         been saved by giving this only once, but it was decided to retain  some
2327         compatibility with the way pcre_exec() returns data,  even  though  the         compatibility  with  the  way pcre_exec() returns data, even though the
2328         meaning of the strings is different.)         meaning of the strings is different.)
2329    
2330         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
2331         est matching string is given first. If there were too many  matches  to         est  matching  string is given first. If there were too many matches to
2332         fit  into ovector, the yield of the function is zero, and the vector is         fit into ovector, the yield of the function is zero, and the vector  is
2333         filled with the longest matches.         filled with the longest matches.
2334    
2335     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
2336    
2337         The pcre_dfa_exec() function returns a negative number when  it  fails.         The  pcre_dfa_exec()  function returns a negative number when it fails.
2338         Many  of  the  errors  are  the  same as for pcre_exec(), and these are         Many of the errors are the same  as  for  pcre_exec(),  and  these  are
2339         described above.  There are in addition the following errors  that  are         described  above.   There are in addition the following errors that are
2340         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
2341    
2342           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
2343    
2344         This  return is given if pcre_dfa_exec() encounters an item in the pat-         This return is given if pcre_dfa_exec() encounters an item in the  pat-
2345         tern that it does not support, for instance, the use of \C  or  a  back         tern  that  it  does not support, for instance, the use of \C or a back
2346         reference.         reference.
2347    
2348           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
2349    
2350         This  return  is  given  if pcre_dfa_exec() encounters a condition item         This return is given if pcre_dfa_exec()  encounters  a  condition  item
2351         that uses a back reference for the condition, or a test  for  recursion         that  uses  a back reference for the condition, or a test for recursion
2352         in a specific group. These are not supported.         in a specific group. These are not supported.
2353    
2354           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
2355    
2356         This  return  is given if pcre_dfa_exec() is called with an extra block         This return is given if pcre_dfa_exec() is called with an  extra  block
2357         that contains a setting of the match_limit field. This is not supported         that contains a setting of the match_limit field. This is not supported
2358         (it is meaningless).         (it is meaningless).
2359    
2360           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
2361    
2362         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
2363         workspace vector.         workspace vector.
2364    
2365           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
2366    
2367         When a recursive subpattern is processed, the matching  function  calls         When  a  recursive subpattern is processed, the matching function calls
2368         itself  recursively,  using  private vectors for ovector and workspace.         itself recursively, using private vectors for  ovector  and  workspace.
2369         This error is given if the output vector  is  not  large  enough.  This         This  error  is  given  if  the output vector is not large enough. This
2370         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
2371    
2372    
2373  SEE ALSO  SEE ALSO
2374    
2375         pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-         pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
2376         tial(3), pcreposix(3), pcreprecompile(3), pcresample(3),  pcrestack(3).         tial(3),  pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
2377    
2378    
2379  AUTHOR  AUTHOR
# Line 2375  AUTHOR Line 2385  AUTHOR
2385    
2386  REVISION  REVISION
2387    
2388         Last updated: 06 March 2007         Last updated: 16 April 2007
2389         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
2390  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2391    

Legend:
Removed from v.148  
changed lines
  Added in v.150

  ViewVC Help
Powered by ViewVC 1.1.5