/[pcre]/code/trunk/doc/html/pcrepartial.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepartial.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 953 by ph10, Fri Feb 24 12:05:54 2012 UTC revision 954 by ph10, Sat Mar 31 18:09:26 2012 UTC
# Line 327  treat the end of a segment as the end of Line 327  treat the end of a segment as the end of
327  </pre>  </pre>
328  At this stage, an application could discard the text preceding "23ja", add on  At this stage, an application could discard the text preceding "23ja", add on
329  text from the next segment, and call the matching function again. Unlike the  text from the next segment, and call the matching function again. Unlike the
330  DFA matching functions the entire matching string must always be available, and  DFA matching functions, the entire matching string must always be available,
331  the complete matching process occurs for each call, so more memory and more  and the complete matching process occurs for each call, so more memory and more
332  processing time is needed.  processing time is needed.
333  </P>  </P>
334  <P>  <P>
# Line 336  processing time is needed. Line 336  processing time is needed.
336  with \b or \B, the string that is returned for a partial match includes  with \b or \B, the string that is returned for a partial match includes
337  characters that precede the partially matched string itself, because these must  characters that precede the partially matched string itself, because these must
338  be retained when adding on more characters for a subsequent matching attempt.  be retained when adding on more characters for a subsequent matching attempt.
339    However, in some cases you may need to retain even earlier characters, as
340    discussed in the next section.
341  </P>  </P>
342  <br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>  <br><a name="SEC9" href="#TOC1">ISSUES WITH MULTI-SEGMENT MATCHING</a><br>
343  <P>  <P>
# Line 350  doing multi-segment matching you should Line 352  doing multi-segment matching you should
352  includes the effect of PCRE_NOTEOL.  includes the effect of PCRE_NOTEOL.
353  </P>  </P>
354  <P>  <P>
355  2. Lookbehind assertions at the start of a pattern are catered for in the  2. Lookbehind assertions that have already been obeyed are catered for in the
356  offsets that are returned for a partial match. However, in theory, a lookbehind  offsets that are returned for a partial match. However a lookbehind assertion
357  assertion later in the pattern could require even earlier characters to be  later in the pattern could require even earlier characters to be inspected. You
358  inspected, and it might not have been reached when a partial match occurs. This  can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
359  is probably an extremely unlikely case; you could guard against it to a certain  <b>pcre_fullinfo()</b> or <b>pcre16_fullinfo()</b> functions to obtain the length
360  extent by always including extra characters at the start.  of the largest lookbehind in the pattern. This length is given in characters,
361    not bytes. If you always retain at least that many characters before the
362    partially matched string, all should be well. (Of course, near the start of the
363    subject, fewer characters may be present; in that case all characters should be
364    retained.)
365  </P>  </P>
366  <P>  <P>
367  3. Matching a subject string that is split into multiple segments may not  3. Because a partial match must always contain at least one character, what
368    might be considered a partial match of an empty string actually gives a "no
369    match" result. For example:
370    <pre>
371        re&#62; /c(?&#60;=abc)x/
372      data&#62; ab\P
373      No match
374    </pre>
375    If the next segment begins "cx", a match should be found, but this will only
376    happen if characters from the previous segment are retained. For this reason, a
377    "no match" result should be interpreted as "partial match of an empty string"
378    when the pattern contains lookbehinds.
379    </P>
380    <P>
381    4. Matching a subject string that is split into multiple segments may not
382  always produce exactly the same result as matching over one single long string,  always produce exactly the same result as matching over one single long string,
383  especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and  especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and
384  Word Boundaries" above describes an issue that arises if the pattern ends with  Word Boundaries" above describes an issue that arises if the pattern ends with
# Line 400  multi-segment data. The example above th Line 420  multi-segment data. The example above th
420    data&#62; gsb\R\P\P\D    data&#62; gsb\R\P\P\D
421    Partial match: gsb    Partial match: gsb
422  </pre>  </pre>
423  4. Patterns that contain alternatives at the top level which do not all start  5. Patterns that contain alternatives at the top level which do not all start
424  with the same pattern item may not work as expected when PCRE_DFA_RESTART is  with the same pattern item may not work as expected when PCRE_DFA_RESTART is
425  used. For example, consider this pattern:  used. For example, consider this pattern:
426  <pre>  <pre>
# Line 445  Cambridge CB2 3QH, England. Line 465  Cambridge CB2 3QH, England.
465  </P>  </P>
466  <br><a name="SEC11" href="#TOC1">REVISION</a><br>  <br><a name="SEC11" href="#TOC1">REVISION</a><br>
467  <P>  <P>
468  Last updated: 18 February 2012  Last updated: 24 February 2012
469  <br>  <br>
470  Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
471  <br>  <br>

Legend:
Removed from v.953  
changed lines
  Added in v.954

  ViewVC Help
Powered by ViewVC 1.1.5