/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 242 by ph10, Tue Sep 11 11:15:33 2007 UTC revision 243 by ph10, Thu Sep 13 09:28:14 2007 UTC
# Line 1565  INFORMATION ABOUT A PATTERN Line 1565  INFORMATION ABOUT A PATTERN
1565    
1566         Return  1  if  the  pattern  contains any explicit matches for CR or LF         Return  1  if  the  pattern  contains any explicit matches for CR or LF
1567         characters, otherwise 0. The fourth argument should  point  to  an  int         characters, otherwise 0. The fourth argument should  point  to  an  int
1568         variable.         variable.  An explicit match is either a literal CR or LF character, or
1569           \r or \n.
1570    
1571           PCRE_INFO_JCHANGED           PCRE_INFO_JCHANGED
1572    
1573         Return  1  if the (?J) option setting is used in the pattern, otherwise         Return 1 if the (?J) option setting is used in the  pattern,  otherwise
1574         0. The fourth argument should point to an int variable. The (?J) inter-         0. The fourth argument should point to an int variable. The (?J) inter-
1575         nal option setting changes the local PCRE_DUPNAMES option.         nal option setting changes the local PCRE_DUPNAMES option.
1576    
1577           PCRE_INFO_LASTLITERAL           PCRE_INFO_LASTLITERAL
1578    
1579         Return  the  value of the rightmost literal byte that must exist in any         Return the value of the rightmost literal byte that must exist  in  any
1580         matched string, other than at its  start,  if  such  a  byte  has  been         matched  string,  other  than  at  its  start,  if such a byte has been
1581         recorded. The fourth argument should point to an int variable. If there         recorded. The fourth argument should point to an int variable. If there
1582         is no such byte, -1 is returned. For anchored patterns, a last  literal         is  no such byte, -1 is returned. For anchored patterns, a last literal
1583         byte  is  recorded only if it follows something of variable length. For         byte is recorded only if it follows something of variable  length.  For
1584         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for         example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
1585         /^a\dz\d/ the returned value is -1.         /^a\dz\d/ the returned value is -1.
1586    
# Line 1587  INFORMATION ABOUT A PATTERN Line 1588  INFORMATION ABOUT A PATTERN
1588           PCRE_INFO_NAMEENTRYSIZE           PCRE_INFO_NAMEENTRYSIZE
1589           PCRE_INFO_NAMETABLE           PCRE_INFO_NAMETABLE
1590    
1591         PCRE  supports the use of named as well as numbered capturing parenthe-         PCRE supports the use of named as well as numbered capturing  parenthe-
1592         ses. The names are just an additional way of identifying the  parenthe-         ses.  The names are just an additional way of identifying the parenthe-
1593         ses, which still acquire numbers. Several convenience functions such as         ses, which still acquire numbers. Several convenience functions such as
1594         pcre_get_named_substring() are provided for  extracting  captured  sub-         pcre_get_named_substring()  are  provided  for extracting captured sub-
1595         strings  by  name. It is also possible to extract the data directly, by         strings by name. It is also possible to extract the data  directly,  by
1596         first converting the name to a number in order to  access  the  correct         first  converting  the  name to a number in order to access the correct
1597         pointers in the output vector (described with pcre_exec() below). To do         pointers in the output vector (described with pcre_exec() below). To do
1598         the conversion, you need  to  use  the  name-to-number  map,  which  is         the  conversion,  you  need  to  use  the  name-to-number map, which is
1599         described by these three values.         described by these three values.
1600    
1601         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT         The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
1602         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size         gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
1603         of  each  entry;  both  of  these  return  an int value. The entry size         of each entry; both of these  return  an  int  value.  The  entry  size
1604         depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns         depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
1605         a  pointer  to  the  first  entry of the table (a pointer to char). The         a pointer to the first entry of the table  (a  pointer  to  char).  The
1606         first two bytes of each entry are the number of the capturing parenthe-         first two bytes of each entry are the number of the capturing parenthe-
1607         sis,  most  significant byte first. The rest of the entry is the corre-         sis, most significant byte first. The rest of the entry is  the  corre-
1608         sponding name, zero terminated. The names are  in  alphabetical  order.         sponding  name,  zero  terminated. The names are in alphabetical order.
1609         When PCRE_DUPNAMES is set, duplicate names are in order of their paren-         When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
1610         theses numbers. For example, consider  the  following  pattern  (assume         theses  numbers.  For  example,  consider the following pattern (assume
1611         PCRE_EXTENDED  is  set,  so  white  space  -  including  newlines  - is         PCRE_EXTENDED is  set,  so  white  space  -  including  newlines  -  is
1612         ignored):         ignored):
1613    
1614           (?<date> (?<year>(\d\d)?\d\d) -           (?<date> (?<year>(\d\d)?\d\d) -
1615           (?<month>\d\d) - (?<day>\d\d) )           (?<month>\d\d) - (?<day>\d\d) )
1616    
1617         There are four named subpatterns, so the table has  four  entries,  and         There  are  four  named subpatterns, so the table has four entries, and
1618         each  entry  in the table is eight bytes long. The table is as follows,         each entry in the table is eight bytes long. The table is  as  follows,
1619         with non-printing bytes shows in hexadecimal, and undefined bytes shown         with non-printing bytes shows in hexadecimal, and undefined bytes shown
1620         as ??:         as ??:
1621    
# Line 1623  INFORMATION ABOUT A PATTERN Line 1624  INFORMATION ABOUT A PATTERN
1624           00 04 m  o  n  t  h  00           00 04 m  o  n  t  h  00
1625           00 02 y  e  a  r  00 ??           00 02 y  e  a  r  00 ??
1626    
1627         When  writing  code  to  extract  data from named subpatterns using the         When writing code to extract data  from  named  subpatterns  using  the
1628         name-to-number map, remember that the length of the entries  is  likely         name-to-number  map,  remember that the length of the entries is likely
1629         to be different for each compiled pattern.         to be different for each compiled pattern.
1630    
1631           PCRE_INFO_OKPARTIAL           PCRE_INFO_OKPARTIAL
1632    
1633         Return  1 if the pattern can be used for partial matching, otherwise 0.         Return 1 if the pattern can be used for partial matching, otherwise  0.
1634         The fourth argument should point to an int  variable.  The  pcrepartial         The  fourth  argument  should point to an int variable. The pcrepartial
1635         documentation  lists  the restrictions that apply to patterns when par-         documentation lists the restrictions that apply to patterns  when  par-
1636         tial matching is used.         tial matching is used.
1637    
1638           PCRE_INFO_OPTIONS           PCRE_INFO_OPTIONS
1639    
1640         Return a copy of the options with which the pattern was  compiled.  The         Return  a  copy of the options with which the pattern was compiled. The
1641         fourth  argument  should  point to an unsigned long int variable. These         fourth argument should point to an unsigned long  int  variable.  These
1642         option bits are those specified in the call to pcre_compile(), modified         option bits are those specified in the call to pcre_compile(), modified
1643         by any top-level option settings at the start of the pattern itself. In         by any top-level option settings at the start of the pattern itself. In
1644         other words, they are the options that will be in force  when  matching         other  words,  they are the options that will be in force when matching
1645         starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with         starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
1646         the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,         the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
1647         and PCRE_EXTENDED.         and PCRE_EXTENDED.
1648    
1649         A  pattern  is  automatically  anchored by PCRE if all of its top-level         A pattern is automatically anchored by PCRE if  all  of  its  top-level
1650         alternatives begin with one of the following:         alternatives begin with one of the following:
1651    
1652           ^     unless PCRE_MULTILINE is set           ^     unless PCRE_MULTILINE is set
# Line 1659  INFORMATION ABOUT A PATTERN Line 1660  INFORMATION ABOUT A PATTERN
1660    
1661           PCRE_INFO_SIZE           PCRE_INFO_SIZE
1662    
1663         Return  the  size  of the compiled pattern, that is, the value that was         Return the size of the compiled pattern, that is, the  value  that  was
1664         passed as the argument to pcre_malloc() when PCRE was getting memory in         passed as the argument to pcre_malloc() when PCRE was getting memory in
1665         which to place the compiled data. The fourth argument should point to a         which to place the compiled data. The fourth argument should point to a
1666         size_t variable.         size_t variable.
# Line 1667  INFORMATION ABOUT A PATTERN Line 1668  INFORMATION ABOUT A PATTERN
1668           PCRE_INFO_STUDYSIZE           PCRE_INFO_STUDYSIZE
1669    
1670         Return the size of the data block pointed to by the study_data field in         Return the size of the data block pointed to by the study_data field in
1671         a  pcre_extra  block.  That  is,  it  is  the  value that was passed to         a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
1672         pcre_malloc() when PCRE was getting memory into which to place the data         pcre_malloc() when PCRE was getting memory into which to place the data
1673         created  by  pcre_study(). The fourth argument should point to a size_t         created by pcre_study(). The fourth argument should point to  a  size_t
1674         variable.         variable.
1675    
1676    
# Line 1677  OBSOLETE INFO FUNCTION Line 1678  OBSOLETE INFO FUNCTION
1678    
1679         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);         int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
1680    
1681         The pcre_info() function is now obsolete because its interface  is  too         The  pcre_info()  function is now obsolete because its interface is too
1682         restrictive  to return all the available data about a compiled pattern.         restrictive to return all the available data about a compiled  pattern.
1683         New  programs  should  use  pcre_fullinfo()  instead.  The   yield   of         New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
1684         pcre_info()  is the number of capturing subpatterns, or one of the fol-         pcre_info() is the number of capturing subpatterns, or one of the  fol-
1685         lowing negative numbers:         lowing negative numbers:
1686    
1687           PCRE_ERROR_NULL       the argument code was NULL           PCRE_ERROR_NULL       the argument code was NULL
1688           PCRE_ERROR_BADMAGIC   the "magic number" was not found           PCRE_ERROR_BADMAGIC   the "magic number" was not found
1689    
1690         If the optptr argument is not NULL, a copy of the  options  with  which         If  the  optptr  argument is not NULL, a copy of the options with which
1691         the  pattern  was  compiled  is placed in the integer it points to (see         the pattern was compiled is placed in the integer  it  points  to  (see
1692         PCRE_INFO_OPTIONS above).         PCRE_INFO_OPTIONS above).
1693    
1694         If the pattern is not anchored and the  firstcharptr  argument  is  not         If  the  pattern  is  not anchored and the firstcharptr argument is not
1695         NULL,  it is used to pass back information about the first character of         NULL, it is used to pass back information about the first character  of
1696         any matched string (see PCRE_INFO_FIRSTBYTE above).         any matched string (see PCRE_INFO_FIRSTBYTE above).
1697    
1698    
# Line 1699  REFERENCE COUNTS Line 1700  REFERENCE COUNTS
1700    
1701         int pcre_refcount(pcre *code, int adjust);         int pcre_refcount(pcre *code, int adjust);
1702    
1703         The pcre_refcount() function is used to maintain a reference  count  in         The  pcre_refcount()  function is used to maintain a reference count in
1704         the data block that contains a compiled pattern. It is provided for the         the data block that contains a compiled pattern. It is provided for the
1705         benefit of applications that  operate  in  an  object-oriented  manner,         benefit  of  applications  that  operate  in an object-oriented manner,
1706         where different parts of the application may be using the same compiled         where different parts of the application may be using the same compiled
1707         pattern, but you want to free the block when they are all done.         pattern, but you want to free the block when they are all done.
1708    
1709         When a pattern is compiled, the reference count field is initialized to         When a pattern is compiled, the reference count field is initialized to
1710         zero.   It is changed only by calling this function, whose action is to         zero.  It is changed only by calling this function, whose action is  to
1711         add the adjust value (which may be positive or  negative)  to  it.  The         add  the  adjust  value  (which may be positive or negative) to it. The
1712         yield of the function is the new value. However, the value of the count         yield of the function is the new value. However, the value of the count
1713         is constrained to lie between 0 and 65535, inclusive. If the new  value         is  constrained to lie between 0 and 65535, inclusive. If the new value
1714         is outside these limits, it is forced to the appropriate limit value.         is outside these limits, it is forced to the appropriate limit value.
1715    
1716         Except  when it is zero, the reference count is not correctly preserved         Except when it is zero, the reference count is not correctly  preserved
1717         if a pattern is compiled on one host and then  transferred  to  a  host         if  a  pattern  is  compiled on one host and then transferred to a host
1718         whose byte-order is different. (This seems a highly unlikely scenario.)         whose byte-order is different. (This seems a highly unlikely scenario.)
1719    
1720    
# Line 1723  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1724  MATCHING A PATTERN: THE TRADITIONAL FUNC
1724              const char *subject, int length, int startoffset,              const char *subject, int length, int startoffset,
1725              int options, int *ovector, int ovecsize);              int options, int *ovector, int ovecsize);
1726    
1727         The function pcre_exec() is called to match a subject string against  a         The  function pcre_exec() is called to match a subject string against a
1728         compiled  pattern, which is passed in the code argument. If the pattern         compiled pattern, which is passed in the code argument. If the  pattern
1729         has been studied, the result of the study should be passed in the extra         has been studied, the result of the study should be passed in the extra
1730         argument.  This  function is the main matching facility of the library,         argument. This function is the main matching facility of  the  library,
1731         and it operates in a Perl-like manner. For specialist use there is also         and it operates in a Perl-like manner. For specialist use there is also
1732         an  alternative matching function, which is described below in the sec-         an alternative matching function, which is described below in the  sec-
1733         tion about the pcre_dfa_exec() function.         tion about the pcre_dfa_exec() function.
1734    
1735         In most applications, the pattern will have been compiled (and  option-         In  most applications, the pattern will have been compiled (and option-
1736         ally  studied)  in the same process that calls pcre_exec(). However, it         ally studied) in the same process that calls pcre_exec().  However,  it
1737         is possible to save compiled patterns and study data, and then use them         is possible to save compiled patterns and study data, and then use them
1738         later  in  different processes, possibly even on different hosts. For a         later in different processes, possibly even on different hosts.  For  a
1739         discussion about this, see the pcreprecompile documentation.         discussion about this, see the pcreprecompile documentation.
1740    
1741         Here is an example of a simple call to pcre_exec():         Here is an example of a simple call to pcre_exec():
# Line 1753  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1754  MATCHING A PATTERN: THE TRADITIONAL FUNC
1754    
1755     Extra data for pcre_exec()     Extra data for pcre_exec()
1756    
1757         If the extra argument is not NULL, it must point to a  pcre_extra  data         If  the  extra argument is not NULL, it must point to a pcre_extra data
1758         block.  The pcre_study() function returns such a block (when it doesn't         block. The pcre_study() function returns such a block (when it  doesn't
1759         return NULL), but you can also create one for yourself, and pass  addi-         return  NULL), but you can also create one for yourself, and pass addi-
1760         tional  information  in it. The pcre_extra block contains the following         tional information in it. The pcre_extra block contains  the  following
1761         fields (not necessarily in this order):         fields (not necessarily in this order):
1762    
1763           unsigned long int flags;           unsigned long int flags;
# Line 1766  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1767  MATCHING A PATTERN: THE TRADITIONAL FUNC
1767           void *callout_data;           void *callout_data;
1768           const unsigned char *tables;           const unsigned char *tables;
1769    
1770         The flags field is a bitmap that specifies which of  the  other  fields         The  flags  field  is a bitmap that specifies which of the other fields
1771         are set. The flag bits are:         are set. The flag bits are:
1772    
1773           PCRE_EXTRA_STUDY_DATA           PCRE_EXTRA_STUDY_DATA
# Line 1775  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1776  MATCHING A PATTERN: THE TRADITIONAL FUNC
1776           PCRE_EXTRA_CALLOUT_DATA           PCRE_EXTRA_CALLOUT_DATA
1777           PCRE_EXTRA_TABLES           PCRE_EXTRA_TABLES
1778    
1779         Other  flag  bits should be set to zero. The study_data field is set in         Other flag bits should be set to zero. The study_data field is  set  in
1780         the pcre_extra block that is returned by  pcre_study(),  together  with         the  pcre_extra  block  that is returned by pcre_study(), together with
1781         the appropriate flag bit. You should not set this yourself, but you may         the appropriate flag bit. You should not set this yourself, but you may
1782         add to the block by setting the other fields  and  their  corresponding         add  to  the  block by setting the other fields and their corresponding
1783         flag bits.         flag bits.
1784    
1785         The match_limit field provides a means of preventing PCRE from using up         The match_limit field provides a means of preventing PCRE from using up
1786         a vast amount of resources when running patterns that are not going  to         a  vast amount of resources when running patterns that are not going to
1787         match,  but  which  have  a very large number of possibilities in their         match, but which have a very large number  of  possibilities  in  their
1788         search trees. The classic  example  is  the  use  of  nested  unlimited         search  trees.  The  classic  example  is  the  use of nested unlimited
1789         repeats.         repeats.
1790    
1791         Internally,  PCRE uses a function called match() which it calls repeat-         Internally, PCRE uses a function called match() which it calls  repeat-
1792         edly (sometimes recursively). The limit set by match_limit  is  imposed         edly  (sometimes  recursively). The limit set by match_limit is imposed
1793         on  the  number  of times this function is called during a match, which         on the number of times this function is called during  a  match,  which
1794         has the effect of limiting the amount of  backtracking  that  can  take         has  the  effect  of  limiting the amount of backtracking that can take
1795         place. For patterns that are not anchored, the count restarts from zero         place. For patterns that are not anchored, the count restarts from zero
1796         for each position in the subject string.         for each position in the subject string.
1797    
1798         The default value for the limit can be set  when  PCRE  is  built;  the         The  default  value  for  the  limit can be set when PCRE is built; the
1799         default  default  is 10 million, which handles all but the most extreme         default default is 10 million, which handles all but the  most  extreme
1800         cases. You can override the default  by  suppling  pcre_exec()  with  a         cases.  You  can  override  the  default by suppling pcre_exec() with a
1801         pcre_extra     block    in    which    match_limit    is    set,    and         pcre_extra    block    in    which    match_limit    is    set,     and
1802         PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is         PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
1803         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.         exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1804    
1805         The  match_limit_recursion field is similar to match_limit, but instead         The match_limit_recursion field is similar to match_limit, but  instead
1806         of limiting the total number of times that match() is called, it limits         of limiting the total number of times that match() is called, it limits
1807         the  depth  of  recursion. The recursion depth is a smaller number than         the depth of recursion. The recursion depth is a  smaller  number  than
1808         the total number of calls, because not all calls to match() are  recur-         the  total number of calls, because not all calls to match() are recur-
1809         sive.  This limit is of use only if it is set smaller than match_limit.         sive.  This limit is of use only if it is set smaller than match_limit.
1810    
1811         Limiting the recursion depth limits the amount of  stack  that  can  be         Limiting  the  recursion  depth  limits the amount of stack that can be
1812         used, or, when PCRE has been compiled to use memory on the heap instead         used, or, when PCRE has been compiled to use memory on the heap instead
1813         of the stack, the amount of heap memory that can be used.         of the stack, the amount of heap memory that can be used.
1814    
1815         The default value for match_limit_recursion can be  set  when  PCRE  is         The  default  value  for  match_limit_recursion can be set when PCRE is
1816         built;  the  default  default  is  the  same  value  as the default for         built; the default default  is  the  same  value  as  the  default  for
1817         match_limit. You can override the default by suppling pcre_exec()  with         match_limit.  You can override the default by suppling pcre_exec() with
1818         a   pcre_extra   block  in  which  match_limit_recursion  is  set,  and         a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
1819         PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in  the  flags  field.  If  the         PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
1820         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.         limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1821    
1822         The  pcre_callout  field is used in conjunction with the "callout" fea-         The pcre_callout field is used in conjunction with the  "callout"  fea-
1823         ture, which is described in the pcrecallout documentation.         ture, which is described in the pcrecallout documentation.
1824    
1825         The tables field  is  used  to  pass  a  character  tables  pointer  to         The  tables  field  is  used  to  pass  a  character  tables pointer to
1826         pcre_exec();  this overrides the value that is stored with the compiled         pcre_exec(); this overrides the value that is stored with the  compiled
1827         pattern. A non-NULL value is stored with the compiled pattern  only  if         pattern.  A  non-NULL value is stored with the compiled pattern only if
1828         custom  tables  were  supplied to pcre_compile() via its tableptr argu-         custom tables were supplied to pcre_compile() via  its  tableptr  argu-
1829         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces         ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
1830         PCRE's  internal  tables  to be used. This facility is helpful when re-         PCRE's internal tables to be used. This facility is  helpful  when  re-
1831         using patterns that have been saved after compiling  with  an  external         using  patterns  that  have been saved after compiling with an external
1832         set  of  tables,  because  the  external tables might be at a different         set of tables, because the external tables  might  be  at  a  different
1833         address when pcre_exec() is called. See the  pcreprecompile  documenta-         address  when  pcre_exec() is called. See the pcreprecompile documenta-
1834         tion for a discussion of saving compiled patterns for later use.         tion for a discussion of saving compiled patterns for later use.
1835    
1836     Option bits for pcre_exec()     Option bits for pcre_exec()
1837    
1838         The  unused  bits of the options argument for pcre_exec() must be zero.         The unused bits of the options argument for pcre_exec() must  be  zero.
1839         The only bits that may  be  set  are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,         The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
1840         PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and         PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and
1841         PCRE_PARTIAL.         PCRE_PARTIAL.
1842    
1843           PCRE_ANCHORED           PCRE_ANCHORED
1844    
1845         The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first         The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
1846         matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or         matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
1847         turned out to be anchored by virtue of its contents, it cannot be  made         turned  out to be anchored by virtue of its contents, it cannot be made
1848         unachored at matching time.         unachored at matching time.
1849    
1850           PCRE_BSR_ANYCRLF           PCRE_BSR_ANYCRLF
1851           PCRE_BSR_UNICODE           PCRE_BSR_UNICODE
1852    
1853         These options (which are mutually exclusive) control what the \R escape         These options (which are mutually exclusive) control what the \R escape
1854         sequence matches. The choice is either to match only CR, LF,  or  CRLF,         sequence  matches.  The choice is either to match only CR, LF, or CRLF,
1855         or  to  match  any Unicode newline sequence. These options override the         or to match any Unicode newline sequence. These  options  override  the
1856         choice that was made or defaulted when the pattern was compiled.         choice that was made or defaulted when the pattern was compiled.
1857    
1858           PCRE_NEWLINE_CR           PCRE_NEWLINE_CR
# Line 1860  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 1861  MATCHING A PATTERN: THE TRADITIONAL FUNC
1861           PCRE_NEWLINE_ANYCRLF           PCRE_NEWLINE_ANYCRLF
1862           PCRE_NEWLINE_ANY           PCRE_NEWLINE_ANY
1863    
1864         These options override  the  newline  definition  that  was  chosen  or         These  options  override  the  newline  definition  that  was chosen or
1865         defaulted  when the pattern was compiled. For details, see the descrip-         defaulted when the pattern was compiled. For details, see the  descrip-
1866         tion of pcre_compile()  above.  During  matching,  the  newline  choice         tion  of  pcre_compile()  above.  During  matching,  the newline choice
1867         affects  the  behaviour  of the dot, circumflex, and dollar metacharac-         affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
1868         ters. It may also alter the way the match position is advanced after  a         ters.  It may also alter the way the match position is advanced after a
1869         match failure for an unanchored pattern.         match failure for an unanchored pattern.
1870    
1871         When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is         When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is
1872         set, and a match attempt for an unanchored pattern fails when the  cur-         set,  and a match attempt for an unanchored pattern fails when the cur-
1873         rent  position  is  at  a  CRLF  sequence,  and the pattern contains no         rent position is at a  CRLF  sequence,  and  the  pattern  contains  no
1874         explicit matches for  CR  or  LF  characters,  the  match  position  is         explicit  matches  for  CR  or  LF  characters,  the  match position is
1875         advanced by two characters instead of one, in other words, to after the         advanced by two characters instead of one, in other words, to after the
1876         CRLF.         CRLF.
1877    
1878         The above rule is a compromise that makes the most common cases work as         The above rule is a compromise that makes the most common cases work as
1879         expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL         expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL
1880         option is not set), it does not match the string "\r\nA" because, after         option is not set), it does not match the string "\r\nA" because, after
1881         failing  at the start, it skips both the CR and the LF before retrying.         failing at the start, it skips both the CR and the LF before  retrying.
1882         However, the pattern [\r\n]A does match that string,  because  it  con-         However,  the  pattern  [\r\n]A does match that string, because it con-
1883         tains an explicit CR or LF reference, and so advances only by one char-         tains an explicit CR or LF reference, and so advances only by one char-
1884         acter after the first failure.         acter after the first failure.
1885    
1886         An explicit match for CR of LF is either a literal appearance of one of         An explicit match for CR of LF is either a literal appearance of one of
1887         those  characters,  or  one  of the \r or \n escape sequences. Implicit         those characters, or one of the \r or  \n  escape  sequences.  Implicit
1888         matches such as [^X] do not count, nor does \s (which includes  CR  and         matches  such  as [^X] do not count, nor does \s (which includes CR and
1889         LF in the characters that it matches).         LF in the characters that it matches).
1890    
1891         Notwithstanding  the above, anomalous effects may still occur when CRLF         Notwithstanding the above, anomalous effects may still occur when  CRLF
1892         is a valid newline sequence and explicit \r or \n escapes appear in the         is a valid newline sequence and explicit \r or \n escapes appear in the
1893         pattern.         pattern.
1894    
1895           PCRE_NOTBOL           PCRE_NOTBOL
1896    
1897         This option specifies that first character of the subject string is not         This option specifies that first character of the subject string is not
1898         the beginning of a line, so the  circumflex  metacharacter  should  not         the  beginning  of  a  line, so the circumflex metacharacter should not
1899         match  before it. Setting this without PCRE_MULTILINE (at compile time)         match before it. Setting this without PCRE_MULTILINE (at compile  time)
1900         causes circumflex never to match. This option affects only  the  behav-         causes  circumflex  never to match. This option affects only the behav-
1901         iour of the circumflex metacharacter. It does not affect \A.         iour of the circumflex metacharacter. It does not affect \A.
1902    
1903           PCRE_NOTEOL           PCRE_NOTEOL
1904    
1905         This option specifies that the end of the subject string is not the end         This option specifies that the end of the subject string is not the end
1906         of a line, so the dollar metacharacter should not match it nor  (except         of  a line, so the dollar metacharacter should not match it nor (except
1907         in  multiline mode) a newline immediately before it. Setting this with-         in multiline mode) a newline immediately before it. Setting this  with-
1908         out PCRE_MULTILINE (at compile time) causes dollar never to match. This         out PCRE_MULTILINE (at compile time) causes dollar never to match. This
1909         option  affects only the behaviour of the dollar metacharacter. It does         option affects only the behaviour of the dollar metacharacter. It  does
1910         not affect \Z or \z.         not affect \Z or \z.
1911    
1912           PCRE_NOTEMPTY           PCRE_NOTEMPTY
1913    
1914         An empty string is not considered to be a valid match if this option is         An empty string is not considered to be a valid match if this option is
1915         set.  If  there are alternatives in the pattern, they are tried. If all         set. If there are alternatives in the pattern, they are tried.  If  all
1916         the alternatives match the empty string, the entire  match  fails.  For         the  alternatives  match  the empty string, the entire match fails. For
1917         example, if the pattern         example, if the pattern
1918    
1919           a?b?           a?b?
1920    
1921         is  applied  to  a string not beginning with "a" or "b", it matches the         is applied to a string not beginning with "a" or "b",  it  matches  the
1922         empty string at the start of the subject. With PCRE_NOTEMPTY set,  this         empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
1923         match is not valid, so PCRE searches further into the string for occur-         match is not valid, so PCRE searches further into the string for occur-
1924         rences of "a" or "b".         rences of "a" or "b".
1925    
1926         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-         Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
1927         cial  case  of  a  pattern match of the empty string within its split()         cial case of a pattern match of the empty  string  within  its  split()
1928         function, and when using the /g modifier. It  is  possible  to  emulate         function,  and  when  using  the /g modifier. It is possible to emulate
1929         Perl's behaviour after matching a null string by first trying the match         Perl's behaviour after matching a null string by first trying the match
1930         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then         again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
1931         if  that  fails by advancing the starting offset (see below) and trying         if that fails by advancing the starting offset (see below)  and  trying
1932         an ordinary match again. There is some code that demonstrates how to do         an ordinary match again. There is some code that demonstrates how to do
1933         this in the pcredemo.c sample program.         this in the pcredemo.c sample program.
1934    
1935           PCRE_NO_UTF8_CHECK           PCRE_NO_UTF8_CHECK
1936    
1937         When PCRE_UTF8 is set at compile time, the validity of the subject as a         When PCRE_UTF8 is set at compile time, the validity of the subject as a
1938         UTF-8 string is automatically checked when pcre_exec() is  subsequently         UTF-8  string is automatically checked when pcre_exec() is subsequently
1939         called.   The  value  of  startoffset is also checked to ensure that it         called.  The value of startoffset is also checked  to  ensure  that  it
1940         points to the start of a UTF-8 character. There is a  discussion  about         points  to  the start of a UTF-8 character. There is a discussion about
1941         the  validity  of  UTF-8 strings in the section on UTF-8 support in the         the validity of UTF-8 strings in the section on UTF-8  support  in  the
1942         main pcre page. If  an  invalid  UTF-8  sequence  of  bytes  is  found,         main  pcre  page.  If  an  invalid  UTF-8  sequence  of bytes is found,
1943         pcre_exec()  returns  the error PCRE_ERROR_BADUTF8. If startoffset con-         pcre_exec() returns the error PCRE_ERROR_BADUTF8. If  startoffset  con-
1944         tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.         tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
1945    
1946         If you already know that your subject is valid, and you  want  to  skip         If  you  already  know that your subject is valid, and you want to skip
1947         these    checks    for   performance   reasons,   you   can   set   the         these   checks   for   performance   reasons,   you   can    set    the
1948         PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might  want  to         PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
1949         do  this  for the second and subsequent calls to pcre_exec() if you are         do this for the second and subsequent calls to pcre_exec() if  you  are
1950         making repeated calls to find all  the  matches  in  a  single  subject         making  repeated  calls  to  find  all  the matches in a single subject
1951         string.  However,  you  should  be  sure  that the value of startoffset         string. However, you should be  sure  that  the  value  of  startoffset
1952         points to the start of a UTF-8 character.  When  PCRE_NO_UTF8_CHECK  is         points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
1953         set,  the  effect of passing an invalid UTF-8 string as a subject, or a         set, the effect of passing an invalid UTF-8 string as a subject,  or  a
1954         value of startoffset that does not point to the start of a UTF-8  char-         value  of startoffset that does not point to the start of a UTF-8 char-
1955         acter, is undefined. Your program may crash.         acter, is undefined. Your program may crash.
1956    
1957           PCRE_PARTIAL           PCRE_PARTIAL
1958    
1959         This  option  turns  on  the  partial  matching feature. If the subject         This option turns on the  partial  matching  feature.  If  the  subject
1960         string fails to match the pattern, but at some point during the  match-         string  fails to match the pattern, but at some point during the match-
1961         ing  process  the  end of the subject was reached (that is, the subject         ing process the end of the subject was reached (that  is,  the  subject
1962         partially matches the pattern and the failure to  match  occurred  only         partially  matches  the  pattern and the failure to match occurred only
1963         because  there were not enough subject characters), pcre_exec() returns         because there were not enough subject characters), pcre_exec()  returns
1964         PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL  is         PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
1965         used,  there  are restrictions on what may appear in the pattern. These         used, there are restrictions on what may appear in the  pattern.  These
1966         are discussed in the pcrepartial documentation.         are discussed in the pcrepartial documentation.
1967    
1968     The string to be matched by pcre_exec()     The string to be matched by pcre_exec()
1969    
1970         The subject string is passed to pcre_exec() as a pointer in subject,  a         The  subject string is passed to pcre_exec() as a pointer in subject, a
1971         length  in  length, and a starting byte offset in startoffset. In UTF-8         length in length, and a starting byte offset in startoffset.  In  UTF-8
1972         mode, the byte offset must point to the start  of  a  UTF-8  character.         mode,  the  byte  offset  must point to the start of a UTF-8 character.
1973         Unlike  the  pattern string, the subject may contain binary zero bytes.         Unlike the pattern string, the subject may contain binary  zero  bytes.
1974         When the starting offset is zero, the search for a match starts at  the         When  the starting offset is zero, the search for a match starts at the
1975         beginning of the subject, and this is by far the most common case.         beginning of the subject, and this is by far the most common case.
1976    
1977         A  non-zero  starting offset is useful when searching for another match         A non-zero starting offset is useful when searching for  another  match
1978         in the same subject by calling pcre_exec() again after a previous  suc-         in  the same subject by calling pcre_exec() again after a previous suc-
1979         cess.   Setting  startoffset differs from just passing over a shortened         cess.  Setting startoffset differs from just passing over  a  shortened
1980         string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins         string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
1981         with any kind of lookbehind. For example, consider the pattern         with any kind of lookbehind. For example, consider the pattern
1982    
1983           \Biss\B           \Biss\B
1984    
1985         which  finds  occurrences  of "iss" in the middle of words. (\B matches         which finds occurrences of "iss" in the middle of  words.  (\B  matches
1986         only if the current position in the subject is not  a  word  boundary.)         only  if  the  current position in the subject is not a word boundary.)
1987         When  applied  to the string "Mississipi" the first call to pcre_exec()         When applied to the string "Mississipi" the first call  to  pcre_exec()
1988         finds the first occurrence. If pcre_exec() is called  again  with  just         finds  the  first  occurrence. If pcre_exec() is called again with just
1989         the  remainder  of  the  subject,  namely  "issipi", it does not match,         the remainder of the subject,  namely  "issipi",  it  does  not  match,
1990         because \B is always false at the start of the subject, which is deemed         because \B is always false at the start of the subject, which is deemed
1991         to  be  a  word  boundary. However, if pcre_exec() is passed the entire         to be a word boundary. However, if pcre_exec()  is  passed  the  entire
1992         string again, but with startoffset set to 4, it finds the second occur-         string again, but with startoffset set to 4, it finds the second occur-
1993         rence  of "iss" because it is able to look behind the starting point to         rence of "iss" because it is able to look behind the starting point  to
1994         discover that it is preceded by a letter.         discover that it is preceded by a letter.
1995    
1996         If a non-zero starting offset is passed when the pattern  is  anchored,         If  a  non-zero starting offset is passed when the pattern is anchored,
1997         one attempt to match at the given offset is made. This can only succeed         one attempt to match at the given offset is made. This can only succeed
1998         if the pattern does not require the match to be at  the  start  of  the         if  the  pattern  does  not require the match to be at the start of the
1999         subject.         subject.
2000    
2001     How pcre_exec() returns captured substrings     How pcre_exec() returns captured substrings
2002    
2003         In  general, a pattern matches a certain portion of the subject, and in         In general, a pattern matches a certain portion of the subject, and  in
2004         addition, further substrings from the subject  may  be  picked  out  by         addition,  further  substrings  from  the  subject may be picked out by
2005         parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,         parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
2006         this is called "capturing" in what follows, and the  phrase  "capturing         this  is  called "capturing" in what follows, and the phrase "capturing
2007         subpattern"  is  used for a fragment of a pattern that picks out a sub-         subpattern" is used for a fragment of a pattern that picks out  a  sub-
2008         string. PCRE supports several other kinds of  parenthesized  subpattern         string.  PCRE  supports several other kinds of parenthesized subpattern
2009         that do not cause substrings to be captured.         that do not cause substrings to be captured.
2010    
2011         Captured  substrings are returned to the caller via a vector of integer         Captured substrings are returned to the caller via a vector of  integer
2012         offsets whose address is passed in ovector. The number of  elements  in         offsets  whose  address is passed in ovector. The number of elements in
2013         the  vector is passed in ovecsize, which must be a non-negative number.         the vector is passed in ovecsize, which must be a non-negative  number.
2014         Note: this argument is NOT the size of ovector in bytes.         Note: this argument is NOT the size of ovector in bytes.
2015    
2016         The first two-thirds of the vector is used to pass back  captured  sub-         The  first  two-thirds of the vector is used to pass back captured sub-
2017         strings,  each  substring using a pair of integers. The remaining third         strings, each substring using a pair of integers. The  remaining  third
2018         of the vector is used as workspace by pcre_exec() while  matching  cap-         of  the  vector is used as workspace by pcre_exec() while matching cap-
2019         turing  subpatterns, and is not available for passing back information.         turing subpatterns, and is not available for passing back  information.
2020         The length passed in ovecsize should always be a multiple of three.  If         The  length passed in ovecsize should always be a multiple of three. If
2021         it is not, it is rounded down.         it is not, it is rounded down.
2022    
2023         When  a  match  is successful, information about captured substrings is         When a match is successful, information about  captured  substrings  is
2024         returned in pairs of integers, starting at the  beginning  of  ovector,         returned  in  pairs  of integers, starting at the beginning of ovector,
2025         and  continuing  up  to two-thirds of its length at the most. The first         and continuing up to two-thirds of its length at the  most.  The  first
2026         element of a pair is set to the offset of the first character in a sub-         element of a pair is set to the offset of the first character in a sub-
2027         string,  and  the  second  is  set to the offset of the first character         string, and the second is set to the  offset  of  the  first  character
2028         after the end of a substring. The  first  pair,  ovector[0]  and  ovec-         after  the  end  of  a  substring. The first pair, ovector[0] and ovec-
2029         tor[1],  identify  the  portion  of  the  subject string matched by the         tor[1], identify the portion of  the  subject  string  matched  by  the
2030         entire pattern. The next pair is used for the first  capturing  subpat-         entire  pattern.  The next pair is used for the first capturing subpat-
2031         tern, and so on. The value returned by pcre_exec() is one more than the         tern, and so on. The value returned by pcre_exec() is one more than the
2032         highest numbered pair that has been set. For example, if two substrings         highest numbered pair that has been set. For example, if two substrings
2033         have  been captured, the returned value is 3. If there are no capturing         have been captured, the returned value is 3. If there are no  capturing
2034         subpatterns, the return value from a successful match is 1,  indicating         subpatterns,  the return value from a successful match is 1, indicating
2035         that just the first pair of offsets has been set.         that just the first pair of offsets has been set.
2036    
2037         If a capturing subpattern is matched repeatedly, it is the last portion         If a capturing subpattern is matched repeatedly, it is the last portion
2038         of the string that it matched that is returned.         of the string that it matched that is returned.
2039    
2040         If the vector is too small to hold all the captured substring  offsets,         If  the vector is too small to hold all the captured substring offsets,
2041         it is used as far as possible (up to two-thirds of its length), and the         it is used as far as possible (up to two-thirds of its length), and the
2042         function returns a value of zero. In particular, if the substring  off-         function  returns a value of zero. In particular, if the substring off-
2043         sets are not of interest, pcre_exec() may be called with ovector passed         sets are not of interest, pcre_exec() may be called with ovector passed
2044         as NULL and ovecsize as zero. However, if  the  pattern  contains  back         as  NULL  and  ovecsize  as zero. However, if the pattern contains back
2045         references  and  the  ovector is not big enough to remember the related         references and the ovector is not big enough to  remember  the  related
2046         substrings, PCRE has to get additional memory for use during  matching.         substrings,  PCRE has to get additional memory for use during matching.
2047         Thus it is usually advisable to supply an ovector.         Thus it is usually advisable to supply an ovector.
2048    
2049         The  pcre_info()  function  can  be used to find out how many capturing         The pcre_info() function can be used to find  out  how  many  capturing
2050         subpatterns there are in a compiled  pattern.  The  smallest  size  for         subpatterns  there  are  in  a  compiled pattern. The smallest size for
2051         ovector  that  will allow for n captured substrings, in addition to the         ovector that will allow for n captured substrings, in addition  to  the
2052         offsets of the substring matched by the whole pattern, is (n+1)*3.         offsets of the substring matched by the whole pattern, is (n+1)*3.
2053    
2054         It is possible for capturing subpattern number n+1 to match  some  part         It  is  possible for capturing subpattern number n+1 to match some part
2055         of the subject when subpattern n has not been used at all. For example,         of the subject when subpattern n has not been used at all. For example,
2056         if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the         if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
2057         return from the function is 4, and subpatterns 1 and 3 are matched, but         return from the function is 4, and subpatterns 1 and 3 are matched, but
2058         2 is not. When this happens, both values in  the  offset  pairs  corre-         2  is  not.  When  this happens, both values in the offset pairs corre-
2059         sponding to unused subpatterns are set to -1.         sponding to unused subpatterns are set to -1.
2060    
2061         Offset  values  that correspond to unused subpatterns at the end of the         Offset values that correspond to unused subpatterns at the end  of  the
2062         expression are also set to -1. For example,  if  the  string  "abc"  is         expression  are  also  set  to  -1. For example, if the string "abc" is
2063         matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not         matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
2064         matched. The return from the function is 2, because  the  highest  used         matched.  The  return  from the function is 2, because the highest used
2065         capturing subpattern number is 1. However, you can refer to the offsets         capturing subpattern number is 1. However, you can refer to the offsets
2066         for the second and third capturing subpatterns if  you  wish  (assuming         for  the  second  and third capturing subpatterns if you wish (assuming
2067         the vector is large enough, of course).         the vector is large enough, of course).
2068    
2069         Some  convenience  functions  are  provided for extracting the captured         Some convenience functions are provided  for  extracting  the  captured
2070         substrings as separate strings. These are described below.         substrings as separate strings. These are described below.
2071    
2072     Error return values from pcre_exec()     Error return values from pcre_exec()
2073    
2074         If pcre_exec() fails, it returns a negative number. The  following  are         If  pcre_exec()  fails, it returns a negative number. The following are
2075         defined in the header file:         defined in the header file:
2076    
2077           PCRE_ERROR_NOMATCH        (-1)           PCRE_ERROR_NOMATCH        (-1)
# Line 2079  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2080  MATCHING A PATTERN: THE TRADITIONAL FUNC
2080    
2081           PCRE_ERROR_NULL           (-2)           PCRE_ERROR_NULL           (-2)
2082    
2083         Either  code  or  subject  was  passed as NULL, or ovector was NULL and         Either code or subject was passed as NULL,  or  ovector  was  NULL  and
2084         ovecsize was not zero.         ovecsize was not zero.
2085    
2086           PCRE_ERROR_BADOPTION      (-3)           PCRE_ERROR_BADOPTION      (-3)
# Line 2088  MATCHING A PATTERN: THE TRADITIONAL FUNC Line 2089  MATCHING A PATTERN: THE TRADITIONAL FUNC
2089    
2090           PCRE_ERROR_BADMAGIC       (-4)           PCRE_ERROR_BADMAGIC       (-4)
2091    
2092         PCRE stores a 4-byte "magic number" at the start of the compiled  code,         PCRE  stores a 4-byte "magic number" at the start of the compiled code,
2093         to catch the case when it is passed a junk pointer and to detect when a         to catch the case when it is passed a junk pointer and to detect when a
2094         pattern that was compiled in an environment of one endianness is run in         pattern that was compiled in an environment of one endianness is run in
2095         an  environment  with the other endianness. This is the error that PCRE         an environment with the other endianness. This is the error  that  PCRE
2096         gives when the magic number is not present.         gives when the magic number is not present.
2097    
2098           PCRE_ERROR_UNKNOWN_OPCODE (-5)           PCRE_ERROR_UNKNOWN_OPCODE (-5)
2099    
2100         While running the pattern match, an unknown item was encountered in the         While running the pattern match, an unknown item was encountered in the
2101         compiled  pattern.  This  error  could be caused by a bug in PCRE or by         compiled pattern. This error could be caused by a bug  in  PCRE  or  by
2102         overwriting of the compiled pattern.         overwriting of the compiled pattern.
2103    
2104           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2105    
2106         If a pattern contains back references, but the ovector that  is  passed         If  a  pattern contains back references, but the ovector that is passed
2107         to pcre_exec() is not big enough to remember the referenced substrings,         to pcre_exec() is not big enough to remember the referenced substrings,
2108         PCRE gets a block of memory at the start of matching to  use  for  this         PCRE  gets  a  block of memory at the start of matching to use for this
2109         purpose.  If the call via pcre_malloc() fails, this error is given. The         purpose. If the call via pcre_malloc() fails, this error is given.  The
2110         memory is automatically freed at the end of matching.         memory is automatically freed at the end of matching.
2111    
2112           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2113    
2114         This error is used by the pcre_copy_substring(),  pcre_get_substring(),         This  error is used by the pcre_copy_substring(), pcre_get_substring(),
2115         and  pcre_get_substring_list()  functions  (see  below).  It  is  never         and  pcre_get_substring_list()  functions  (see  below).  It  is  never
2116         returned by pcre_exec().         returned by pcre_exec().
2117    
2118           PCRE_ERROR_MATCHLIMIT     (-8)           PCRE_ERROR_MATCHLIMIT     (-8)
2119    
2120         The backtracking limit, as specified by  the  match_limit  field  in  a         The  backtracking  limit,  as  specified  by the match_limit field in a
2121         pcre_extra  structure  (or  defaulted) was reached. See the description         pcre_extra structure (or defaulted) was reached.  See  the  description
2122         above.         above.
2123    
2124           PCRE_ERROR_CALLOUT        (-9)           PCRE_ERROR_CALLOUT        (-9)
2125    
2126         This error is never generated by pcre_exec() itself. It is provided for         This error is never generated by pcre_exec() itself. It is provided for
2127         use  by  callout functions that want to yield a distinctive error code.         use by callout functions that want to yield a distinctive  error  code.
2128         See the pcrecallout documentation for details.         See the pcrecallout documentation for details.
2129    
2130           PCRE_ERROR_BADUTF8        (-10)           PCRE_ERROR_BADUTF8        (-10)
2131    
2132         A string that contains an invalid UTF-8 byte sequence was passed  as  a         A  string  that contains an invalid UTF-8 byte sequence was passed as a
2133         subject.         subject.
2134    
2135           PCRE_ERROR_BADUTF8_OFFSET (-11)           PCRE_ERROR_BADUTF8_OFFSET (-11)
2136    
2137         The UTF-8 byte sequence that was passed as a subject was valid, but the         The UTF-8 byte sequence that was passed as a subject was valid, but the
2138         value of startoffset did not point to the beginning of a UTF-8  charac-         value  of startoffset did not point to the beginning of a UTF-8 charac-
2139         ter.         ter.
2140    
2141           PCRE_ERROR_PARTIAL        (-12)           PCRE_ERROR_PARTIAL        (-12)
2142    
2143         The  subject  string did not match, but it did match partially. See the         The subject string did not match, but it did match partially.  See  the
2144         pcrepartial documentation for details of partial matching.         pcrepartial documentation for details of partial matching.
2145    
2146           PCRE_ERROR_BADPARTIAL     (-13)           PCRE_ERROR_BADPARTIAL     (-13)
2147    
2148         The PCRE_PARTIAL option was used with  a  compiled  pattern  containing         The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
2149         items  that are not supported for partial matching. See the pcrepartial         items that are not supported for partial matching. See the  pcrepartial
2150         documentation for details of partial matching.         documentation for details of partial matching.
2151    
2152           PCRE_ERROR_INTERNAL       (-14)           PCRE_ERROR_INTERNAL       (-14)
2153    
2154         An unexpected internal error has occurred. This error could  be  caused         An  unexpected  internal error has occurred. This error could be caused
2155         by a bug in PCRE or by overwriting of the compiled pattern.         by a bug in PCRE or by overwriting of the compiled pattern.
2156    
2157           PCRE_ERROR_BADCOUNT       (-15)           PCRE_ERROR_BADCOUNT       (-15)
2158    
2159         This  error is given if the value of the ovecsize argument is negative.         This error is given if the value of the ovecsize argument is  negative.
2160    
2161           PCRE_ERROR_RECURSIONLIMIT (-21)           PCRE_ERROR_RECURSIONLIMIT (-21)
2162    
2163         The internal recursion limit, as specified by the match_limit_recursion         The internal recursion limit, as specified by the match_limit_recursion
2164         field  in  a  pcre_extra  structure (or defaulted) was reached. See the         field in a pcre_extra structure (or defaulted)  was  reached.  See  the
2165         description above.         description above.
2166    
2167           PCRE_ERROR_BADNEWLINE     (-23)           PCRE_ERROR_BADNEWLINE     (-23)
# Line 2183  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 2184  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
2184         int pcre_get_substring_list(const char *subject,         int pcre_get_substring_list(const char *subject,
2185              int *ovector, int stringcount, const char ***listptr);              int *ovector, int stringcount, const char ***listptr);
2186    
2187         Captured substrings can be  accessed  directly  by  using  the  offsets         Captured  substrings  can  be  accessed  directly  by using the offsets
2188         returned  by  pcre_exec()  in  ovector.  For convenience, the functions         returned by pcre_exec() in  ovector.  For  convenience,  the  functions
2189         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-         pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
2190         string_list()  are  provided for extracting captured substrings as new,         string_list() are provided for extracting captured substrings  as  new,
2191         separate, zero-terminated strings. These functions identify  substrings         separate,  zero-terminated strings. These functions identify substrings
2192         by  number.  The  next section describes functions for extracting named         by number. The next section describes functions  for  extracting  named
2193         substrings.         substrings.
2194    
2195         A substring that contains a binary zero is correctly extracted and  has         A  substring that contains a binary zero is correctly extracted and has
2196         a  further zero added on the end, but the result is not, of course, a C         a further zero added on the end, but the result is not, of course, a  C
2197         string.  However, you can process such a string  by  referring  to  the         string.   However,  you  can  process such a string by referring to the
2198         length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub-         length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-
2199         string().  Unfortunately, the interface to pcre_get_substring_list() is         string().  Unfortunately, the interface to pcre_get_substring_list() is
2200         not  adequate for handling strings containing binary zeros, because the         not adequate for handling strings containing binary zeros, because  the
2201         end of the final string is not independently indicated.         end of the final string is not independently indicated.
2202    
2203         The first three arguments are the same for all  three  of  these  func-         The  first  three  arguments  are the same for all three of these func-
2204         tions:  subject  is  the subject string that has just been successfully         tions: subject is the subject string that has  just  been  successfully
2205         matched, ovector is a pointer to the vector of integer offsets that was         matched, ovector is a pointer to the vector of integer offsets that was
2206         passed to pcre_exec(), and stringcount is the number of substrings that         passed to pcre_exec(), and stringcount is the number of substrings that
2207         were captured by the match, including the substring  that  matched  the         were  captured  by  the match, including the substring that matched the
2208         entire regular expression. This is the value returned by pcre_exec() if         entire regular expression. This is the value returned by pcre_exec() if
2209         it is greater than zero. If pcre_exec() returned zero, indicating  that         it  is greater than zero. If pcre_exec() returned zero, indicating that
2210         it  ran out of space in ovector, the value passed as stringcount should         it ran out of space in ovector, the value passed as stringcount  should
2211         be the number of elements in the vector divided by three.         be the number of elements in the vector divided by three.
2212    
2213         The functions pcre_copy_substring() and pcre_get_substring() extract  a         The  functions pcre_copy_substring() and pcre_get_substring() extract a
2214         single  substring,  whose  number  is given as stringnumber. A value of         single substring, whose number is given as  stringnumber.  A  value  of
2215         zero extracts the substring that matched the  entire  pattern,  whereas         zero  extracts  the  substring that matched the entire pattern, whereas
2216         higher  values  extract  the  captured  substrings.  For pcre_copy_sub-         higher values  extract  the  captured  substrings.  For  pcre_copy_sub-
2217         string(), the string is placed in buffer,  whose  length  is  given  by         string(),  the  string  is  placed  in buffer, whose length is given by
2218         buffersize,  while  for  pcre_get_substring()  a new block of memory is         buffersize, while for pcre_get_substring() a new  block  of  memory  is
2219         obtained via pcre_malloc, and its address is  returned  via  stringptr.         obtained  via  pcre_malloc,  and its address is returned via stringptr.
2220         The  yield  of  the function is the length of the string, not including         The yield of the function is the length of the  string,  not  including
2221         the terminating zero, or one of these error codes:         the terminating zero, or one of these error codes:
2222    
2223           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2224    
2225         The buffer was too small for pcre_copy_substring(), or the  attempt  to         The  buffer  was too small for pcre_copy_substring(), or the attempt to
2226         get memory failed for pcre_get_substring().         get memory failed for pcre_get_substring().
2227    
2228           PCRE_ERROR_NOSUBSTRING    (-7)           PCRE_ERROR_NOSUBSTRING    (-7)
2229    
2230         There is no substring whose number is stringnumber.         There is no substring whose number is stringnumber.
2231    
2232         The  pcre_get_substring_list()  function  extracts  all  available sub-         The pcre_get_substring_list()  function  extracts  all  available  sub-
2233         strings and builds a list of pointers to them. All this is  done  in  a         strings  and  builds  a list of pointers to them. All this is done in a
2234         single block of memory that is obtained via pcre_malloc. The address of         single block of memory that is obtained via pcre_malloc. The address of
2235         the memory block is returned via listptr, which is also  the  start  of         the  memory  block  is returned via listptr, which is also the start of
2236         the  list  of  string pointers. The end of the list is marked by a NULL         the list of string pointers. The end of the list is marked  by  a  NULL
2237         pointer. The yield of the function is zero if all  went  well,  or  the         pointer.  The  yield  of  the function is zero if all went well, or the
2238         error code         error code
2239    
2240           PCRE_ERROR_NOMEMORY       (-6)           PCRE_ERROR_NOMEMORY       (-6)
2241    
2242         if the attempt to get the memory block failed.         if the attempt to get the memory block failed.
2243    
2244         When  any of these functions encounter a substring that is unset, which         When any of these functions encounter a substring that is unset,  which
2245         can happen when capturing subpattern number n+1 matches  some  part  of         can  happen  when  capturing subpattern number n+1 matches some part of
2246         the  subject, but subpattern n has not been used at all, they return an         the subject, but subpattern n has not been used at all, they return  an
2247         empty string. This can be distinguished from a genuine zero-length sub-         empty string. This can be distinguished from a genuine zero-length sub-
2248         string  by inspecting the appropriate offset in ovector, which is nega-         string by inspecting the appropriate offset in ovector, which is  nega-
2249         tive for unset substrings.         tive for unset substrings.
2250    
2251         The two convenience functions pcre_free_substring() and  pcre_free_sub-         The  two convenience functions pcre_free_substring() and pcre_free_sub-
2252         string_list()  can  be  used  to free the memory returned by a previous         string_list() can be used to free the memory  returned  by  a  previous
2253         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-         call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
2254         tively.  They  do  nothing  more  than  call the function pointed to by         tively. They do nothing more than  call  the  function  pointed  to  by
2255         pcre_free, which of course could be called directly from a  C  program.         pcre_free,  which  of course could be called directly from a C program.
2256         However,  PCRE is used in some situations where it is linked via a spe-         However, PCRE is used in some situations where it is linked via a  spe-
2257         cial  interface  to  another  programming  language  that  cannot   use         cial   interface  to  another  programming  language  that  cannot  use
2258         pcre_free  directly;  it is for these cases that the functions are pro-         pcre_free directly; it is for these cases that the functions  are  pro-
2259         vided.         vided.
2260    
2261    
# Line 2273  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2274  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2274              int stringcount, const char *stringname,              int stringcount, const char *stringname,
2275              const char **stringptr);              const char **stringptr);
2276    
2277         To extract a substring by name, you first have to find associated  num-         To  extract a substring by name, you first have to find associated num-
2278         ber.  For example, for this pattern         ber.  For example, for this pattern
2279    
2280           (a+)b(?<xxx>\d+)...           (a+)b(?<xxx>\d+)...
# Line 2282  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2283  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2283         be unique (PCRE_DUPNAMES was not set), you can find the number from the         be unique (PCRE_DUPNAMES was not set), you can find the number from the
2284         name by calling pcre_get_stringnumber(). The first argument is the com-         name by calling pcre_get_stringnumber(). The first argument is the com-
2285         piled pattern, and the second is the name. The yield of the function is         piled pattern, and the second is the name. The yield of the function is
2286         the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no         the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no
2287         subpattern of that name.         subpattern of that name.
2288    
2289         Given the number, you can extract the substring directly, or use one of         Given the number, you can extract the substring directly, or use one of
2290         the functions described in the previous section. For convenience, there         the functions described in the previous section. For convenience, there
2291         are also two functions that do the whole job.         are also two functions that do the whole job.
2292    
2293         Most   of   the   arguments    of    pcre_copy_named_substring()    and         Most    of    the    arguments   of   pcre_copy_named_substring()   and
2294         pcre_get_named_substring()  are  the  same  as  those for the similarly         pcre_get_named_substring() are the same  as  those  for  the  similarly
2295         named functions that extract by number. As these are described  in  the         named  functions  that extract by number. As these are described in the
2296         previous  section,  they  are not re-described here. There are just two         previous section, they are not re-described here. There  are  just  two
2297         differences:         differences:
2298    
2299         First, instead of a substring number, a substring name is  given.  Sec-         First,  instead  of a substring number, a substring name is given. Sec-
2300         ond, there is an extra argument, given at the start, which is a pointer         ond, there is an extra argument, given at the start, which is a pointer
2301         to the compiled pattern. This is needed in order to gain access to  the         to  the compiled pattern. This is needed in order to gain access to the
2302         name-to-number translation table.         name-to-number translation table.
2303    
2304         These  functions call pcre_get_stringnumber(), and if it succeeds, they         These functions call pcre_get_stringnumber(), and if it succeeds,  they
2305         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-         then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-
2306         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the         ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the
2307         behaviour may not be what you want (see the next section).         behaviour may not be what you want (see the next section).
2308    
2309    
# Line 2311  DUPLICATE SUBPATTERN NAMES Line 2312  DUPLICATE SUBPATTERN NAMES
2312         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
2313              const char *name, char **first, char **last);              const char *name, char **first, char **last);
2314    
2315         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
2316         subpatterns  are  not  required  to  be unique. Normally, patterns with         subpatterns are not required to  be  unique.  Normally,  patterns  with
2317         duplicate names are such that in any one match, only one of  the  named         duplicate  names  are such that in any one match, only one of the named
2318         subpatterns  participates. An example is shown in the pcrepattern docu-         subpatterns participates. An example is shown in the pcrepattern  docu-
2319         mentation.         mentation.
2320    
2321         When   duplicates   are   present,   pcre_copy_named_substring()    and         When    duplicates   are   present,   pcre_copy_named_substring()   and
2322         pcre_get_named_substring()  return the first substring corresponding to         pcre_get_named_substring() return the first substring corresponding  to
2323         the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING         the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING
2324         (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()         (-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()
2325         function returns one of the numbers that are associated with the  name,         function  returns one of the numbers that are associated with the name,
2326         but it is not defined which it is.         but it is not defined which it is.
2327    
2328         If  you want to get full details of all captured substrings for a given         If you want to get full details of all captured substrings for a  given
2329         name, you must use  the  pcre_get_stringtable_entries()  function.  The         name,  you  must  use  the pcre_get_stringtable_entries() function. The
2330         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
2331         third and fourth are pointers to variables which  are  updated  by  the         third  and  fourth  are  pointers to variables which are updated by the
2332         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
2333         the name-to-number table  for  the  given  name.  The  function  itself         the  name-to-number  table  for  the  given  name.  The function itself
2334         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
2335         there are none. The format of the table is described above in the  sec-         there  are none. The format of the table is described above in the sec-
2336         tion  entitled  Information  about  a  pattern.  Given all the relevant         tion entitled Information about a  pattern.   Given  all  the  relevant
2337         entries for the name, you can extract each of their numbers, and  hence         entries  for the name, you can extract each of their numbers, and hence
2338         the captured data, if any.         the captured data, if any.
2339    
2340    
2341  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
2342    
2343         The  traditional  matching  function  uses a similar algorithm to Perl,         The traditional matching function uses a  similar  algorithm  to  Perl,
2344         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
2345         the  subject.  If you want to find all possible matches, or the longest         the subject. If you want to find all possible matches, or  the  longest
2346         possible match, consider using the alternative matching  function  (see         possible  match,  consider using the alternative matching function (see
2347         below)  instead.  If you cannot use the alternative function, but still         below) instead. If you cannot use the alternative function,  but  still
2348         need to find all possible matches, you can kludge it up by  making  use         need  to  find all possible matches, you can kludge it up by making use
2349         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
2350         tation.         tation.
2351    
2352         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
2353         tern.   When your callout function is called, extract and save the cur-         tern.  When your callout function is called, extract and save the  cur-
2354         rent matched substring. Then return  1,  which  forces  pcre_exec()  to         rent  matched  substring.  Then  return  1, which forces pcre_exec() to
2355         backtrack  and  try other alternatives. Ultimately, when it runs out of         backtrack and try other alternatives. Ultimately, when it runs  out  of
2356         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
2357    
2358    
# Line 2362  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2363  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2363              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
2364              int *workspace, int wscount);              int *workspace, int wscount);
2365    
2366         The function pcre_dfa_exec()  is  called  to  match  a  subject  string         The  function  pcre_dfa_exec()  is  called  to  match  a subject string
2367         against  a  compiled pattern, using a matching algorithm that scans the         against a compiled pattern, using a matching algorithm that  scans  the
2368         subject string just once, and does not backtrack.  This  has  different         subject  string  just  once, and does not backtrack. This has different
2369         characteristics  to  the  normal  algorithm, and is not compatible with         characteristics to the normal algorithm, and  is  not  compatible  with
2370         Perl. Some of the features of PCRE patterns are not  supported.  Never-         Perl.  Some  of the features of PCRE patterns are not supported. Never-
2371         theless,  there are times when this kind of matching can be useful. For         theless, there are times when this kind of matching can be useful.  For
2372         a discussion of the two matching algorithms, see the pcrematching docu-         a discussion of the two matching algorithms, see the pcrematching docu-
2373         mentation.         mentation.
2374    
2375         The  arguments  for  the  pcre_dfa_exec()  function are the same as for         The arguments for the pcre_dfa_exec() function  are  the  same  as  for
2376         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
2377         ent  way,  and  this is described below. The other common arguments are         ent way, and this is described below. The other  common  arguments  are
2378         used in the same way as for pcre_exec(), so their  description  is  not         used  in  the  same way as for pcre_exec(), so their description is not
2379         repeated here.         repeated here.
2380    
2381         The  two  additional  arguments provide workspace for the function. The         The two additional arguments provide workspace for  the  function.  The
2382         workspace vector should contain at least 20 elements. It  is  used  for         workspace  vector  should  contain at least 20 elements. It is used for
2383         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
2384         workspace will be needed for patterns and subjects where  there  are  a         workspace  will  be  needed for patterns and subjects where there are a
2385         lot of potential matches.         lot of potential matches.
2386    
2387         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 2402  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2403  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2403    
2404     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
2405    
2406         The  unused  bits  of  the options argument for pcre_dfa_exec() must be         The unused bits of the options argument  for  pcre_dfa_exec()  must  be
2407         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
2408         LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,         LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,
2409         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
2410         three of these are the same as for pcre_exec(), so their description is         three of these are the same as for pcre_exec(), so their description is
2411         not repeated here.         not repeated here.
2412    
2413           PCRE_PARTIAL           PCRE_PARTIAL
2414    
2415         This has the same general effect as it does for  pcre_exec(),  but  the         This  has  the  same general effect as it does for pcre_exec(), but the
2416         details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for         details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for
2417         pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into         pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into
2418         PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have         PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have
2419         been no complete matches, but there is still at least one matching pos-         been no complete matches, but there is still at least one matching pos-
2420         sibility.  The portion of the string that provided the partial match is         sibility. The portion of the string that provided the partial match  is
2421         set as the first matching string.         set as the first matching string.
2422    
2423           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
2424    
2425         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
2426         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
2427         tive algorithm works, this is necessarily the shortest  possible  match         tive  algorithm  works, this is necessarily the shortest possible match
2428         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
2429    
2430           PCRE_DFA_RESTART           PCRE_DFA_RESTART
2431    
2432         When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and         When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and
2433         returns a partial match, it is possible to call it  again,  with  addi-         returns  a  partial  match, it is possible to call it again, with addi-
2434         tional  subject  characters,  and have it continue with the same match.         tional subject characters, and have it continue with  the  same  match.
2435         The PCRE_DFA_RESTART option requests this action; when it is  set,  the         The  PCRE_DFA_RESTART  option requests this action; when it is set, the
2436         workspace  and wscount options must reference the same vector as before         workspace and wscount options must reference the same vector as  before
2437         because data about the match so far is left in  them  after  a  partial         because  data  about  the  match so far is left in them after a partial
2438         match.  There  is  more  discussion of this facility in the pcrepartial         match. There is more discussion of this  facility  in  the  pcrepartial
2439         documentation.         documentation.
2440    
2441     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
2442    
2443         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
2444         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
2445         of the function start at the same point in  the  subject.  The  shorter         of  the  function  start  at the same point in the subject. The shorter
2446         matches  are all initial substrings of the longer matches. For example,         matches are all initial substrings of the longer matches. For  example,
2447         if the pattern         if the pattern
2448    
2449           <.*>           <.*>
# Line 2457  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2458  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2458           <something> <something else>           <something> <something else>
2459           <something> <something else> <something further>           <something> <something else> <something further>
2460    
2461         On success, the yield of the function is a number  greater  than  zero,         On  success,  the  yield of the function is a number greater than zero,
2462         which  is  the  number of matched substrings. The substrings themselves         which is the number of matched substrings.  The  substrings  themselves
2463         are returned in ovector. Each string uses two elements;  the  first  is         are  returned  in  ovector. Each string uses two elements; the first is
2464         the  offset  to  the start, and the second is the offset to the end. In         the offset to the start, and the second is the offset to  the  end.  In
2465         fact, all the strings have the same start  offset.  (Space  could  have         fact,  all  the  strings  have the same start offset. (Space could have
2466         been  saved by giving this only once, but it was decided to retain some         been saved by giving this only once, but it was decided to retain  some
2467         compatibility with the way pcre_exec() returns data,  even  though  the         compatibility  with  the  way pcre_exec() returns data, even though the
2468         meaning of the strings is different.)         meaning of the strings is different.)
2469    
2470         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
2471         est matching string is given first. If there were too many  matches  to         est  matching  string is given first. If there were too many matches to
2472         fit  into ovector, the yield of the function is zero, and the vector is         fit into ovector, the yield of the function is zero, and the vector  is
2473         filled with the longest matches.         filled with the longest matches.
2474    
2475     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
2476    
2477         The pcre_dfa_exec() function returns a negative number when  it  fails.         The  pcre_dfa_exec()  function returns a negative number when it fails.
2478         Many  of  the  errors  are  the  same as for pcre_exec(), and these are         Many of the errors are the same  as  for  pcre_exec(),  and  these  are
2479         described above.  There are in addition the following errors  that  are         described  above.   There are in addition the following errors that are
2480         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
2481    
2482           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
2483    
2484         This  return is given if pcre_dfa_exec() encounters an item in the pat-         This return is given if pcre_dfa_exec() encounters an item in the  pat-
2485         tern that it does not support, for instance, the use of \C  or  a  back         tern  that  it  does not support, for instance, the use of \C or a back
2486         reference.         reference.
2487    
2488           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
2489    
2490         This  return  is  given  if pcre_dfa_exec() encounters a condition item         This return is given if pcre_dfa_exec()  encounters  a  condition  item
2491         that uses a back reference for the condition, or a test  for  recursion         that  uses  a back reference for the condition, or a test for recursion
2492         in a specific group. These are not supported.         in a specific group. These are not supported.
2493    
2494           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
2495    
2496         This  return  is given if pcre_dfa_exec() is called with an extra block         This return is given if pcre_dfa_exec() is called with an  extra  block
2497         that contains a setting of the match_limit field. This is not supported         that contains a setting of the match_limit field. This is not supported
2498         (it is meaningless).         (it is meaningless).
2499    
2500           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
2501    
2502         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
2503         workspace vector.         workspace vector.
2504    
2505           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
2506    
2507         When a recursive subpattern is processed, the matching  function  calls         When  a  recursive subpattern is processed, the matching function calls
2508         itself  recursively,  using  private vectors for ovector and workspace.         itself recursively, using private vectors for  ovector  and  workspace.
2509         This error is given if the output vector  is  not  large  enough.  This         This  error  is  given  if  the output vector is not large enough. This
2510         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
2511    
2512    
2513  SEE ALSO  SEE ALSO
2514    
2515         pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-         pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
2516         tial(3), pcreposix(3), pcreprecompile(3), pcresample(3),  pcrestack(3).         tial(3),  pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
2517    
2518    
2519  AUTHOR  AUTHOR
# Line 3675  VERTICAL BAR Line 3676  VERTICAL BAR
3676  INTERNAL OPTION SETTING  INTERNAL OPTION SETTING
3677    
3678         The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and         The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
3679         PCRE_EXTENDED options can be changed  from  within  the  pattern  by  a         PCRE_EXTENDED options (which are Perl-compatible) can be  changed  from
3680         sequence  of  Perl  option  letters  enclosed between "(?" and ")". The         within  the  pattern  by  a  sequence  of  Perl option letters enclosed
3681         option letters are         between "(?" and ")".  The option letters are
3682    
3683           i  for PCRE_CASELESS           i  for PCRE_CASELESS
3684           m  for PCRE_MULTILINE           m  for PCRE_MULTILINE
# Line 3691  INTERNAL OPTION SETTING Line 3692  INTERNAL OPTION SETTING
3692         is also permitted. If a  letter  appears  both  before  and  after  the         is also permitted. If a  letter  appears  both  before  and  after  the
3693         hyphen, the option is unset.         hyphen, the option is unset.
3694    
3695           The  PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
3696           can be changed in the same way as the Perl-compatible options by  using
3697           the characters J, U and X respectively.
3698    
3699         When  an option change occurs at top level (that is, not inside subpat-         When  an option change occurs at top level (that is, not inside subpat-
3700         tern parentheses), the change applies to the remainder of  the  pattern         tern parentheses), the change applies to the remainder of  the  pattern
3701         that follows.  If the change is placed right at the start of a pattern,         that follows.  If the change is placed right at the start of a pattern,
# Line 3716  INTERNAL OPTION SETTING Line 3721  INTERNAL OPTION SETTING
3721         the  effects  of option settings happen at compile time. There would be         the  effects  of option settings happen at compile time. There would be
3722         some very weird behaviour otherwise.         some very weird behaviour otherwise.
3723    
        The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA  
        can  be changed in the same way as the Perl-compatible options by using  
        the characters J, U and X respectively.  
   
3724    
3725  SUBPATTERNS  SUBPATTERNS
3726    
# Line 4718  CALLOUTS Line 4719  CALLOUTS
4719         is given in the pcrecallout documentation.         is given in the pcrecallout documentation.
4720    
4721    
4722  BACTRACKING CONTROL  BACKTRACKING CONTROL
4723    
4724         Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",         Perl 5.10 introduced a number of "Special Backtracking Control  Verbs",
4725         which are described in the Perl documentation as "experimental and sub-         which are described in the Perl documentation as "experimental and sub-

Legend:
Removed from v.242  
changed lines
  Added in v.243

  ViewVC Help
Powered by ViewVC 1.1.5