# Diff of /code/trunk/doc/pcrematching.3

revision 455 by ph10, Sat Sep 5 10:20:44 2009 UTC revision 456 by ph10, Fri Oct 2 08:53:31 2009 UTC
# Line 74  this is a kind of "DFA algorithm", thoug Line 74  this is a kind of "DFA algorithm", thoug
74  traditional finite state machine (it keeps multiple states active  traditional finite state machine (it keeps multiple states active
75  simultaneously).  simultaneously).
76  .P  .P
77    Although the general principle of this matching algorithm is that it scans the
78    subject string only once, without backtracking, there is one exception: when a
79    lookaround assertion is encountered, the characters following or preceding the
80    current point have to be independently inspected.
81    .P
82  The scan continues until either the end of the subject is reached, or there are  The scan continues until either the end of the subject is reached, or there are
83  no more unterminated paths. At this point, terminated paths represent the  no more unterminated paths. At this point, terminated paths represent the
84  different matching possibilities (if there are none, the match has failed).  different matching possibilities (if there are none, the match has failed).
85  Thus, if there is more than one possible match, this algorithm finds all of  Thus, if there is more than one possible match, this algorithm finds all of
86  them, and in particular, it finds the longest. In PCRE, there is an option to  them, and in particular, it finds the longest. There is an option to stop the
87  stop the algorithm after the first match (which is necessarily the shortest)  algorithm after the first match (which is necessarily the shortest) is found.
has been found.
88  .P  .P
89  Note that all the matches that are found start at the same point in the  Note that all the matches that are found start at the same point in the
90  subject. If the pattern  subject. If the pattern
# Line 92  the three strings "cat", "cater", and "c Line 96  the three strings "cat", "cater", and "c
96  character of the subject. The algorithm does not automatically move on to find  character of the subject. The algorithm does not automatically move on to find
97  matches that start at later positions.  matches that start at later positions.
98  .P  .P
Although the general principle of this matching algorithm is that it scans the
subject string only once, without backtracking, there is one exception: when a
lookbehind assertion is encountered, the preceding characters have to be
re-inspected.
.P
99  There are a number of features of PCRE regular expressions that are not  There are a number of features of PCRE regular expressions that are not
100  supported by the alternative matching algorithm. They are as follows:  supported by the alternative matching algorithm. They are as follows:
101  .P  .P
# Line 152  callouts. Line 151  callouts.
151  2. Because the alternative algorithm scans the subject string just once, and  2. Because the alternative algorithm scans the subject string just once, and
152  never needs to backtrack, it is possible to pass very long subject strings to  never needs to backtrack, it is possible to pass very long subject strings to
153  the matching function in several pieces, checking for partial matching each  the matching function in several pieces, checking for partial matching each
154  time.  time. The
155    .\" HREF
156    \fBpcrepartial\fP
157    .\"
158    documentation gives details of partial matching.
159    .
160  .  .
161  .SH "DISADVANTAGES OF THE ALTERNATIVE ALGORITHM"  .SH "DISADVANTAGES OF THE ALTERNATIVE ALGORITHM"
162  .rs  .rs
# Line 183  Cambridge CB2 3QH, England. Line 187  Cambridge CB2 3QH, England.
187  .rs  .rs
188  .sp  .sp
189  .nf  .nf
190  Last updated: 05 September 2009  Last updated: 29 September 2009
191  Copyright (c) 1997-2009 University of Cambridge.  Copyright (c) 1997-2009 University of Cambridge.
192  .fi  .fi

Legend:
 Removed from v.455 changed lines Added in v.456