41 |
.SH "THE STANDARD MATCHING ALGORITHM" |
.SH "THE STANDARD MATCHING ALGORITHM" |
42 |
.rs |
.rs |
43 |
.sp |
.sp |
44 |
In the terminology of Jeffrey Friedl's book \fIMastering Regular |
In the terminology of Jeffrey Friedl's book "Mastering Regular |
45 |
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a |
Expressions", the standard algorithm is an "NFA algorithm". It conducts a |
46 |
depth-first search of the pattern tree. That is, it proceeds along a single |
depth-first search of the pattern tree. That is, it proceeds along a single |
47 |
path through the tree, checking that the subject matches what is required. When |
path through the tree, checking that the subject matches what is required. When |
48 |
there is a mismatch, the algorithm tries any alternatives at the current point, |
there is a mismatch, the algorithm tries any alternatives at the current point, |
119 |
4. For the same reason, conditional expressions that use a backreference as the |
4. For the same reason, conditional expressions that use a backreference as the |
120 |
condition or test for a specific group recursion are not supported. |
condition or test for a specific group recursion are not supported. |
121 |
.P |
.P |
122 |
5. Callouts are supported, but the value of the \fIcapture_top\fP field is |
5. Because many paths through the tree may be active, the \eK escape sequence, |
123 |
|
which resets the start of the match when encountered (but may be on some paths |
124 |
|
and not on others), is not supported. It causes an error if encountered. |
125 |
|
.P |
126 |
|
6. Callouts are supported, but the value of the \fIcapture_top\fP field is |
127 |
always 1, and the value of the \fIcapture_last\fP field is always -1. |
always 1, and the value of the \fIcapture_last\fP field is always -1. |
128 |
.P |
.P |
129 |
6. |
7. |
130 |
The \eC escape sequence, which (in the standard algorithm) matches a single |
The \eC escape sequence, which (in the standard algorithm) matches a single |
131 |
byte, even in UTF-8 mode, is not supported because the alternative algorithm |
byte, even in UTF-8 mode, is not supported because the alternative algorithm |
132 |
moves through the subject string one character at a time, for all active paths |
moves through the subject string one character at a time, for all active paths |
165 |
.P |
.P |
166 |
3. Although atomic groups are supported, their use does not provide the |
3. Although atomic groups are supported, their use does not provide the |
167 |
performance advantage that it does for the standard algorithm. |
performance advantage that it does for the standard algorithm. |
168 |
.P |
. |
169 |
.in 0 |
. |
170 |
Last updated: 24 November 2006 |
.SH AUTHOR |
171 |
.br |
.rs |
172 |
Copyright (c) 1997-2006 University of Cambridge. |
.sp |
173 |
|
.nf |
174 |
|
Philip Hazel |
175 |
|
University Computing Service |
176 |
|
Cambridge CB2 3QH, England. |
177 |
|
.fi |
178 |
|
. |
179 |
|
. |
180 |
|
.SH REVISION |
181 |
|
.rs |
182 |
|
.sp |
183 |
|
.nf |
184 |
|
Last updated: 29 May 2007 |
185 |
|
Copyright (c) 1997-2007 University of Cambridge. |
186 |
|
.fi |