165 |
</P> |
</P> |
166 |
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br> |
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHING AND WORD BOUNDARIES</a><br> |
167 |
<P> |
<P> |
168 |
If a pattern ends with one of sequences \w or \W, which test for word |
If a pattern ends with one of sequences \b or \B, which test for word |
169 |
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive |
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive |
170 |
results. Consider this pattern: |
results. Consider this pattern: |
171 |
<pre> |
<pre> |
269 |
data> The date is 23ja\P |
data> The date is 23ja\P |
270 |
Partial match: 23ja |
Partial match: 23ja |
271 |
</pre> |
</pre> |
272 |
The this stage, an application could discard the text preceding "23ja", add on |
At this stage, an application could discard the text preceding "23ja", add on |
273 |
text from the next segment, and call <b>pcre_exec()</b> again. Unlike |
text from the next segment, and call <b>pcre_exec()</b> again. Unlike |
274 |
<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and |
<b>pcre_dfa_exec()</b>, the entire matching string must always be available, and |
275 |
the complete matching process occurs for each call, so more memory and more |
the complete matching process occurs for each call, so more memory and more |
347 |
<P> |
<P> |
348 |
4. Patterns that contain alternatives at the top level which do not all |
4. Patterns that contain alternatives at the top level which do not all |
349 |
start with the same pattern item may not work as expected when |
start with the same pattern item may not work as expected when |
350 |
<b>pcre_dfa_exec()</b> is used. For example, consider this pattern: |
PCRE_DFA_RESTART is used with <b>pcre_dfa_exec()</b>. For example, consider this |
351 |
|
pattern: |
352 |
<pre> |
<pre> |
353 |
1234|3789 |
1234|3789 |
354 |
</pre> |
</pre> |
364 |
1234|ABCD |
1234|ABCD |
365 |
</pre> |
</pre> |
366 |
where no string can be a partial match for both alternatives. This is not a |
where no string can be a partial match for both alternatives. This is not a |
367 |
problem if \fPpcre_exec()\fP is used, because the entire match has to be rerun |
problem if <b>pcre_exec()</b> is used, because the entire match has to be rerun |
368 |
each time: |
each time: |
369 |
<pre> |
<pre> |
370 |
re> /1234|3789/ |
re> /1234|3789/ |
372 |
Partial match: 123 |
Partial match: 123 |
373 |
data> 1237890 |
data> 1237890 |
374 |
0: 3789 |
0: 3789 |
375 |
|
</pre> |
376 |
</PRE> |
Of course, instead of using PCRE_DFA_PARTIAL, the same technique of re-running |
377 |
|
the entire match can also be used with <b>pcre_dfa_exec()</b>. Another |
378 |
|
possibility is to work with two buffers. If a partial match at offset <i>n</i> |
379 |
|
in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on |
380 |
|
the second buffer, you can then try a new match starting at offset <i>n+1</i> in |
381 |
|
the first buffer. |
382 |
</P> |
</P> |
383 |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
384 |
<P> |
<P> |
391 |
</P> |
</P> |
392 |
<br><a name="SEC11" href="#TOC1">REVISION</a><br> |
<br><a name="SEC11" href="#TOC1">REVISION</a><br> |
393 |
<P> |
<P> |
394 |
Last updated: 29 September 2009 |
Last updated: 19 October 2009 |
395 |
<br> |
<br> |
396 |
Copyright © 1997-2009 University of Cambridge. |
Copyright © 1997-2009 University of Cambridge. |
397 |
<br> |
<br> |