# Diff of /code/trunk/doc/html/pcrematching.html

revision 453 by ph10, Fri Sep 18 19:12:35 2009 UTC revision 461 by ph10, Mon Oct 5 10:59:35 2009 UTC
# Line 96  traditional finite state machine (it kee Line 96  traditional finite state machine (it kee
96  simultaneously).  simultaneously).
97  </P>  </P>
98  <P>  <P>
99    Although the general principle of this matching algorithm is that it scans the
100    subject string only once, without backtracking, there is one exception: when a
101    lookaround assertion is encountered, the characters following or preceding the
102    current point have to be independently inspected.
103    </P>
104    <P>
105  The scan continues until either the end of the subject is reached, or there are  The scan continues until either the end of the subject is reached, or there are
106  no more unterminated paths. At this point, terminated paths represent the  no more unterminated paths. At this point, terminated paths represent the
107  different matching possibilities (if there are none, the match has failed).  different matching possibilities (if there are none, the match has failed).
108  Thus, if there is more than one possible match, this algorithm finds all of  Thus, if there is more than one possible match, this algorithm finds all of
109  them, and in particular, it finds the longest. In PCRE, there is an option to  them, and in particular, it finds the longest. There is an option to stop the
110  stop the algorithm after the first match (which is necessarily the shortest)  algorithm after the first match (which is necessarily the shortest) is found.
has been found.
111  </P>  </P>
112  <P>  <P>
113  Note that all the matches that are found start at the same point in the  Note that all the matches that are found start at the same point in the
# Line 116  character of the subject. The algorithm Line 121  character of the subject. The algorithm
121  matches that start at later positions.  matches that start at later positions.
122  </P>  </P>
123  <P>  <P>
Although the general principle of this matching algorithm is that it scans the
subject string only once, without backtracking, there is one exception: when a
lookbehind assertion is encountered, the preceding characters have to be
re-inspected.
</P>
<P>
124  There are a number of features of PCRE regular expressions that are not  There are a number of features of PCRE regular expressions that are not
125  supported by the alternative matching algorithm. They are as follows:  supported by the alternative matching algorithm. They are as follows:
126  </P>  </P>
# Line 186  callouts. Line 185  callouts.
185  2. Because the alternative algorithm scans the subject string just once, and  2. Because the alternative algorithm scans the subject string just once, and
186  never needs to backtrack, it is possible to pass very long subject strings to  never needs to backtrack, it is possible to pass very long subject strings to
187  the matching function in several pieces, checking for partial matching each  the matching function in several pieces, checking for partial matching each
188  time.  time. The
189    <a href="pcrepartial.html"><b>pcrepartial</b></a>
190    documentation gives details of partial matching.
191  </P>  </P>
192  <br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>  <br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
193  <P>  <P>
# Line 215  Cambridge CB2 3QH, England. Line 216  Cambridge CB2 3QH, England.
216  </P>  </P>
217  <br><a name="SEC8" href="#TOC1">REVISION</a><br>  <br><a name="SEC8" href="#TOC1">REVISION</a><br>
218  <P>  <P>
219  Last updated: 05 September 2009  Last updated: 29 September 2009
220  <br>  <br>
221  Copyright &copy; 1997-2009 University of Cambridge.  Copyright &copy; 1997-2009 University of Cambridge.
222  <br>  <br>

Legend:
 Removed from v.453 changed lines Added in v.461