1090 |
|
|
1091 |
(?<=ab(c|de)) |
(?<=ab(c|de)) |
1092 |
|
|
1093 |
is not permitted, because its single branch can match two different lengths, |
is not permitted, because its single top-level branch can match two different |
1094 |
but it is acceptable if rewritten to use two branches: |
lengths, but it is acceptable if rewritten to use two top-level branches: |
1095 |
|
|
1096 |
(?<=abc|abde) |
(?<=abc|abde) |
1097 |
|
|
1098 |
The implementation of lookbehind assertions is, for each alternative, to |
The implementation of lookbehind assertions is, for each alternative, to |
1099 |
temporarily move the current position back by the fixed width and then try to |
temporarily move the current position back by the fixed width and then try to |
1100 |
match. If there are insufficient characters before the current position, the |
match. If there are insufficient characters before the current position, the |
1101 |
match is deemed to fail. |
match is deemed to fail. Lookbehinds in conjunction with once-only subpatterns |
1102 |
|
can be particularly useful for matching at the ends of strings; an example is |
1103 |
|
given at the end of the section on once-only subpatterns. |
1104 |
|
|
1105 |
Assertions can be nested in any combination. For example, |
Several assertions (of any sort) may occur in succession. For example, |
1106 |
|
|
1107 |
|
(?<=\\d{3})(?<!999)foo |
1108 |
|
|
1109 |
|
matches "foo" preceded by three digits that are not "999". Furthermore, |
1110 |
|
assertions can be nested in any combination. For example, |
1111 |
|
|
1112 |
(?<=(?<!foo)bar)baz |
(?<=(?<!foo)bar)baz |
1113 |
|
|
1164 |
This construction can of course contain arbitrarily complicated subpatterns, |
This construction can of course contain arbitrarily complicated subpatterns, |
1165 |
and it can be nested. |
and it can be nested. |
1166 |
|
|
1167 |
|
Once-only subpatterns can be used in conjunction with lookbehind assertions to |
1168 |
|
specify efficient matching at the end of the subject string. Consider a simple |
1169 |
|
pattern such as |
1170 |
|
|
1171 |
|
abcd$ |
1172 |
|
|
1173 |
|
when applied to a long string which does not match it. Because matching |
1174 |
|
proceeds from left to right, PCRE will look for each "a" in the subject and |
1175 |
|
then see if what follows matches the rest of the pattern. If the pattern is |
1176 |
|
specified as |
1177 |
|
|
1178 |
|
.*abcd$ |
1179 |
|
|
1180 |
|
then the initial .* matches the entire string at first, but when this fails, it |
1181 |
|
backtracks to match all but the last character, then all but the last two |
1182 |
|
characters, and so on. Once again the search for "a" covers the entire string, |
1183 |
|
from right to left, so we are no better off. However, if the pattern is written |
1184 |
|
as |
1185 |
|
|
1186 |
|
(?>.*)(?<=abcd) |
1187 |
|
|
1188 |
|
then there can be no backtracking for the .* item; it can match only the entire |
1189 |
|
string. The subsequent lookbehind assertion does a single test on the last four |
1190 |
|
characters. If it fails, the match fails immediately. For long strings, this |
1191 |
|
approach makes a significant difference to the processing time. |
1192 |
|
|
1193 |
|
|
1194 |
.SH CONDITIONAL SUBPATTERNS |
.SH CONDITIONAL SUBPATTERNS |
1195 |
It is possible to cause the matching process to obey a subpattern |
It is possible to cause the matching process to obey a subpattern |
1269 |
.br |
.br |
1270 |
Phone: +44 1223 334714 |
Phone: +44 1223 334714 |
1271 |
|
|
1272 |
Copyright (c) 1998 University of Cambridge. |
Copyright (c) 1997-1999 University of Cambridge. |