/[pcre]/code/trunk/doc/pcrepartial.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepartial.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 931 by ph10, Sat Feb 18 18:45:55 2012 UTC revision 932 by ph10, Fri Feb 24 18:54:43 2012 UTC
# Line 302  treat the end of a segment as the end of Line 302  treat the end of a segment as the end of
302  .sp  .sp
303  At this stage, an application could discard the text preceding "23ja", add on  At this stage, an application could discard the text preceding "23ja", add on
304  text from the next segment, and call the matching function again. Unlike the  text from the next segment, and call the matching function again. Unlike the
305  DFA matching functions the entire matching string must always be available, and  DFA matching functions, the entire matching string must always be available,
306  the complete matching process occurs for each call, so more memory and more  and the complete matching process occurs for each call, so more memory and more
307  processing time is needed.  processing time is needed.
308  .P  .P
309  \fBNote:\fP If the pattern contains lookbehind assertions, or \eK, or starts  \fBNote:\fP If the pattern contains lookbehind assertions, or \eK, or starts
310  with \eb or \eB, the string that is returned for a partial match includes  with \eb or \eB, the string that is returned for a partial match includes
311  characters that precede the partially matched string itself, because these must  characters that precede the partially matched string itself, because these must
312  be retained when adding on more characters for a subsequent matching attempt.  be retained when adding on more characters for a subsequent matching attempt.
313    However, in some cases you may need to retain even earlier characters, as
314    discussed in the next section.
315  .  .
316  .  .
317  .SH "ISSUES WITH MULTI-SEGMENT MATCHING"  .SH "ISSUES WITH MULTI-SEGMENT MATCHING"
# Line 324  beginning of a line. There is also a PCR Line 326  beginning of a line. There is also a PCR
326  doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which  doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which
327  includes the effect of PCRE_NOTEOL.  includes the effect of PCRE_NOTEOL.
328  .P  .P
329  2. Lookbehind assertions at the start of a pattern are catered for in the  2. Lookbehind assertions that have already been obeyed are catered for in the
330  offsets that are returned for a partial match. However, in theory, a lookbehind  offsets that are returned for a partial match. However a lookbehind assertion
331  assertion later in the pattern could require even earlier characters to be  later in the pattern could require even earlier characters to be inspected. You
332  inspected, and it might not have been reached when a partial match occurs. This  can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the
333  is probably an extremely unlikely case; you could guard against it to a certain  \fBpcre_fullinfo()\fP or \fBpcre16_fullinfo()\fP functions to obtain the length
334  extent by always including extra characters at the start.  of the largest lookbehind in the pattern. This length is given in characters,
335    not bytes. If you always retain at least that many characters before the
336    partially matched string, all should be well. (Of course, near the start of the
337    subject, fewer characters may be present; in that case all characters should be
338    retained.)
339  .P  .P
340  3. Matching a subject string that is split into multiple segments may not  3. Because a partial match must always contain at least one character, what
341    might be considered a partial match of an empty string actually gives a "no
342    match" result. For example:
343    .sp
344        re> /c(?<=abc)x/
345      data> ab\eP
346      No match
347    .sp
348    If the next segment begins "cx", a match should be found, but this will only
349    happen if characters from the previous segment are retained. For this reason, a
350    "no match" result should be interpreted as "partial match of an empty string"
351    when the pattern contains lookbehinds.
352    .P
353    4. Matching a subject string that is split into multiple segments may not
354  always produce exactly the same result as matching over one single long string,  always produce exactly the same result as matching over one single long string,
355  especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and  especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and
356  Word Boundaries" above describes an issue that arises if the pattern ends with  Word Boundaries" above describes an issue that arises if the pattern ends with
# Line 372  multi-segment data. The example above th Line 391  multi-segment data. The example above th
391    data> gsb\eR\eP\eP\eD    data> gsb\eR\eP\eP\eD
392    Partial match: gsb    Partial match: gsb
393  .sp  .sp
394  4. Patterns that contain alternatives at the top level which do not all start  5. Patterns that contain alternatives at the top level which do not all start
395  with the same pattern item may not work as expected when PCRE_DFA_RESTART is  with the same pattern item may not work as expected when PCRE_DFA_RESTART is
396  used. For example, consider this pattern:  used. For example, consider this pattern:
397  .sp  .sp
# Line 421  Cambridge CB2 3QH, England. Line 440  Cambridge CB2 3QH, England.
440  .rs  .rs
441  .sp  .sp
442  .nf  .nf
443  Last updated: 18 February 2012  Last updated: 24 February 2012
444  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
445  .fi  .fi

Legend:
Removed from v.931  
changed lines
  Added in v.932

  ViewVC Help
Powered by ViewVC 1.1.5