/[pcre]/code/trunk/doc/pcre.html
ViewVC logotype

Diff of /code/trunk/doc/pcre.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 43 by nigel, Sat Feb 24 21:39:21 2007 UTC revision 47 by nigel, Sat Feb 24 21:39:29 2007 UTC
# Line 430  no back references. Line 430  no back references.
430  <P>  <P>
431  Return information about the first character of any matched string, for a  Return information about the first character of any matched string, for a
432  non-anchored pattern. If there is a fixed first character, e.g. from a pattern  non-anchored pattern. If there is a fixed first character, e.g. from a pattern
433  such as (cat|cow|coyote), then it is returned in the integer pointed to by  such as (cat|cow|coyote), it is returned in the integer pointed to by
434  <I>where</I>. Otherwise, if either  <I>where</I>. Otherwise, if either
435  </P>  </P>
436  <P>  <P>
# Line 442  starts with "^", or Line 442  starts with "^", or
442  (if it were set, the pattern would be anchored),  (if it were set, the pattern would be anchored),
443  </P>  </P>
444  <P>  <P>
445  then -1 is returned, indicating that the pattern matches only at the  -1 is returned, indicating that the pattern matches only at the start of a
446  start of a subject string or after any "\n" within the string. Otherwise -2 is  subject string or after any "\n" within the string. Otherwise -2 is returned.
447  returned. For anchored patterns, -2 is returned.  For anchored patterns, -2 is returned.
448  </P>  </P>
449  <P>  <P>
450  <PRE>  <PRE>
# Line 734  is a pointer to the vector of integer of Line 734  is a pointer to the vector of integer of
734  were captured by the match, including the substring that matched the entire  were captured by the match, including the substring that matched the entire
735  regular expression. This is the value returned by <B>pcre_exec</B> if it  regular expression. This is the value returned by <B>pcre_exec</B> if it
736  is greater than zero. If <B>pcre_exec()</B> returned zero, indicating that it  is greater than zero. If <B>pcre_exec()</B> returned zero, indicating that it
737  ran out of space in <I>ovector</I>, then the value passed as  ran out of space in <I>ovector</I>, the value passed as <I>stringcount</I> should
738  <I>stringcount</I> should be the size of the vector divided by three.  be the size of the vector divided by three.
739  </P>  </P>
740  <P>  <P>
741  The functions <B>pcre_copy_substring()</B> and <B>pcre_get_substring()</B>  The functions <B>pcre_copy_substring()</B> and <B>pcre_get_substring()</B>
# Line 857  patterns using the non-Perl item (?R). Line 857  patterns using the non-Perl item (?R).
857  with the settings of captured strings when part of a pattern is repeated. For  with the settings of captured strings when part of a pattern is repeated. For
858  example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value  example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
859  "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if  "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if
860  the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) get set.  the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.
861  </P>  </P>
862  <P>  <P>
863  In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the  In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the
# Line 1186  end of the subject in both modes, and if Line 1186  end of the subject in both modes, and if
1186  <P>  <P>
1187  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
1188  the subject, including a non-printing character, but not (by default) newline.  the subject, including a non-printing character, but not (by default) newline.
1189  If the PCRE_DOTALL option is set, then dots match newlines as well. The  If the PCRE_DOTALL option is set, dots match newlines as well. The handling of
1190  handling of dot is entirely independent of the handling of circumflex and  dot is entirely independent of the handling of circumflex and dollar, the only
1191  dollar, the only relationship being that they both involve newline characters.  relationship being that they both involve newline characters. Dot has no
1192  Dot has no special meaning in a character class.  special meaning in a character class.
1193  </P>  </P>
1194  <LI><A NAME="SEC17" HREF="#TOC1">SQUARE BRACKETS</A>  <LI><A NAME="SEC17" HREF="#TOC1">SQUARE BRACKETS</A>
1195  <P>  <P>
# Line 1580  fails, because it matches the entire str Line 1580  fails, because it matches the entire str
1580  item.  item.
1581  </P>  </P>
1582  <P>  <P>
1583  However, if a quantifier is followed by a question mark, then it ceases to be  However, if a quantifier is followed by a question mark, it ceases to be
1584  greedy, and instead matches the minimum number of times possible, so the  greedy, and instead matches the minimum number of times possible, so the
1585  pattern  pattern
1586  </P>  </P>
# Line 1605  which matches one digit by preference, b Line 1605  which matches one digit by preference, b
1605  way the rest of the pattern matches.  way the rest of the pattern matches.
1606  </P>  </P>
1607  <P>  <P>
1608  If the PCRE_UNGREEDY option is set (an option which is not available in Perl)  If the PCRE_UNGREEDY option is set (an option which is not available in Perl),
1609  then the quantifiers are not greedy by default, but individual ones can be made  the quantifiers are not greedy by default, but individual ones can be made
1610  greedy by following them with a question mark. In other words, it inverts the  greedy by following them with a question mark. In other words, it inverts the
1611  default behaviour.  default behaviour.
1612  </P>  </P>
# Line 1617  compiled pattern, in proportion to the s Line 1617  compiled pattern, in proportion to the s
1617  </P>  </P>
1618  <P>  <P>
1619  If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent  If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
1620  to Perl's /s) is set, thus allowing the . to match newlines, then the pattern  to Perl's /s) is set, thus allowing the . to match newlines, the pattern is
1621  is implicitly anchored, because whatever follows will be tried against every  implicitly anchored, because whatever follows will be tried against every
1622  character position in the subject string, so there is no point in retrying the  character position in the subject string, so there is no point in retrying the
1623  overall match at any position after the first. PCRE treats such a pattern as  overall match at any position after the first. PCRE treats such a pattern as
1624  though it were preceded by \A. In cases where it is known that the subject  though it were preceded by \A. In cases where it is known that the subject
# Line 1677  itself. So the pattern Line 1677  itself. So the pattern
1677  <P>  <P>
1678  matches "sense and sensibility" and "response and responsibility", but not  matches "sense and sensibility" and "response and responsibility", but not
1679  "sense and responsibility". If caseful matching is in force at the time of the  "sense and responsibility". If caseful matching is in force at the time of the
1680  back reference, then the case of letters is relevant. For example,  back reference, the case of letters is relevant. For example,
1681  </P>  </P>
1682  <P>  <P>
1683  <PRE>  <PRE>
# Line 1690  capturing subpattern is matched caseless Line 1690  capturing subpattern is matched caseless
1690  </P>  </P>
1691  <P>  <P>
1692  There may be more than one back reference to the same subpattern. If a  There may be more than one back reference to the same subpattern. If a
1693  subpattern has not actually been used in a particular match, then any back  subpattern has not actually been used in a particular match, any back
1694  references to it always fail. For example, the pattern  references to it always fail. For example, the pattern
1695  </P>  </P>
1696  <P>  <P>
# Line 1702  references to it always fail. For exampl Line 1702  references to it always fail. For exampl
1702  always fails if it starts to match "a" rather than "bc". Because there may be  always fails if it starts to match "a" rather than "bc". Because there may be
1703  up to 99 back references, all digits following the backslash are taken  up to 99 back references, all digits following the backslash are taken
1704  as part of a potential back reference number. If the pattern continues with a  as part of a potential back reference number. If the pattern continues with a
1705  digit character, then some delimiter must be used to terminate the back  digit character, some delimiter must be used to terminate the back reference.
1706  reference. If the PCRE_EXTENDED option is set, this can be whitespace.  If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty
1707  Otherwise an empty comment can be used.  comment can be used.
1708  </P>  </P>
1709  <P>  <P>
1710  A back reference that occurs inside the parentheses to which it refers fails  A back reference that occurs inside the parentheses to which it refers fails
# Line 1836  Several assertions (of any sort) may occ Line 1836  Several assertions (of any sort) may occ
1836  matches "foo" preceded by three digits that are not "999". Notice that each of  matches "foo" preceded by three digits that are not "999". Notice that each of
1837  the assertions is applied independently at the same point in the subject  the assertions is applied independently at the same point in the subject
1838  string. First there is a check that the previous three characters are all  string. First there is a check that the previous three characters are all
1839  digits, then there is a check that the same three characters are not "999".  digits, and then there is a check that the same three characters are not "999".
1840  This pattern does <I>not</I> match "foo" preceded by six characters, the first  This pattern does <I>not</I> match "foo" preceded by six characters, the first
1841  of which are digits and the last three of which are not "999". For example, it  of which are digits and the last three of which are not "999". For example, it
1842  doesn't match "123abcfoo". A pattern to do that is  doesn't match "123abcfoo". A pattern to do that is
# Line 1957  what follows matches the rest of the pat Line 1957  what follows matches the rest of the pat
1957  </PRE>  </PRE>
1958  </P>  </P>
1959  <P>  <P>
1960  then the initial .* matches the entire string at first, but when this fails  the initial .* matches the entire string at first, but when this fails (because
1961  (because there is no following "a"), it backtracks to match all but the last  there is no following "a"), it backtracks to match all but the last character,
1962  character, then all but the last two characters, and so on. Once again the  then all but the last two characters, and so on. Once again the search for "a"
1963  search for "a" covers the entire string, from right to left, so we are no  covers the entire string, from right to left, so we are no better off. However,
1964  better off. However, if the pattern is written as  if the pattern is written as
1965  </P>  </P>
1966  <P>  <P>
1967  <PRE>  <PRE>
# Line 1969  better off. However, if the pattern is w Line 1969  better off. However, if the pattern is w
1969  </PRE>  </PRE>
1970  </P>  </P>
1971  <P>  <P>
1972  then there can be no backtracking for the .* item; it can match only the entire  there can be no backtracking for the .* item; it can match only the entire
1973  string. The subsequent lookbehind assertion does a single test on the last four  string. The subsequent lookbehind assertion does a single test on the last four
1974  characters. If it fails, the match fails immediately. For long strings, this  characters. If it fails, the match fails immediately. For long strings, this
1975  approach makes a significant difference to the processing time.  approach makes a significant difference to the processing time.
# Line 2032  subpattern, a compile-time error occurs. Line 2032  subpattern, a compile-time error occurs.
2032  </P>  </P>
2033  <P>  <P>
2034  There are two kinds of condition. If the text between the parentheses consists  There are two kinds of condition. If the text between the parentheses consists
2035  of a sequence of digits, then the condition is satisfied if the capturing  of a sequence of digits, the condition is satisfied if the capturing subpattern
2036  subpattern of that number has previously matched. Consider the following  of that number has previously matched. Consider the following pattern, which
2037  pattern, which contains non-significant white space to make it more readable  contains non-significant white space to make it more readable (assume the
2038  (assume the PCRE_EXTENDED option) and to divide it into three parts for ease  PCRE_EXTENDED option) and to divide it into three parts for ease of discussion:
 of discussion:  
2039  </P>  </P>
2040  <P>  <P>
2041  <PRE>  <PRE>
# Line 2157  on at the top level. If additional paren Line 2156  on at the top level. If additional paren
2156       ^                        ^       ^                        ^
2157       ^                        ^       ^                        ^
2158  </PRE>  </PRE>
2159  then the string they capture is "ab(cd)ef", the contents of the top level  the string they capture is "ab(cd)ef", the contents of the top level
2160  parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE  parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
2161  has to obtain extra memory to store data during a recursion, which it does by  has to obtain extra memory to store data during a recursion, which it does by
2162  using <B>pcre_malloc</B>, freeing it via <B>pcre_free</B> afterwards. If no  using <B>pcre_malloc</B>, freeing it via <B>pcre_free</B> afterwards. If no

Legend:
Removed from v.43  
changed lines
  Added in v.47

  ViewVC Help
Powered by ViewVC 1.1.5