/[pcre]/code/trunk/doc/pcrepartial.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepartial.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 578 by ph10, Wed Nov 17 17:55:57 2010 UTC revision 579 by ph10, Wed Nov 24 17:39:25 2010 UTC
# Line 49  subject string is reached successfully, Line 49  subject string is reached successfully,
49  more characters are needed. However, at least one character in the subject must  more characters are needed. However, at least one character in the subject must
50  have been inspected. This character need not form part of the final matched  have been inspected. This character need not form part of the final matched
51  string; lookbehind assertions and the \eK escape sequence provide ways of  string; lookbehind assertions and the \eK escape sequence provide ways of
52  inspecting characters before the start of a matched substring. The requirement  inspecting characters before the start of a matched substring. The requirement
53  for inspecting at least one character exists because an empty string can always  for inspecting at least one character exists because an empty string can always
54  be matched; without such a restriction there would always be a partial match of  be matched; without such a restriction there would always be a partial match of
55  an empty string at the end of the subject.  an empty string at the end of the subject.
56  .P  .P
57  If there are at least two slots in the offsets vector when \fBpcre_exec()\fP  If there are at least two slots in the offsets vector when \fBpcre_exec()\fP
58  returns with a partial match, the first slot is set to the offset of the  returns with a partial match, the first slot is set to the offset of the
59  earliest character that was inspected when the partial match was found. For  earliest character that was inspected when the partial match was found. For
60  convenience, the second offset points to the end of the subject so that a  convenience, the second offset points to the end of the subject so that a
61  substring can easily be identified.  substring can easily be identified.
62  .P  .P
63  For the majority of patterns, the first offset identifies the start of the  For the majority of patterns, the first offset identifies the start of the
64  partially matched string. However, for patterns that contain lookbehind  partially matched string. However, for patterns that contain lookbehind
# Line 73  string is "xyzabc12", the offsets after Line 73  string is "xyzabc12", the offsets after
73  with extra characters added to the subject.  with extra characters added to the subject.
74  .P  .P
75  What happens when a partial match is identified depends on which of the two  What happens when a partial match is identified depends on which of the two
76  partial matching options are set.  partial matching options are set.
77  .  .
78  .  .
79  .SS "PCRE_PARTIAL_SOFT with pcre_exec()"  .SS "PCRE_PARTIAL_SOFT with pcre_exec()"
# Line 84  the partial match is remembered, but mat Line 84  the partial match is remembered, but mat
84  alternatives in the pattern are tried. If no complete match can be found,  alternatives in the pattern are tried. If no complete match can be found,
85  \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH.  \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH.
86  .P  .P
87  This option is "soft" because it prefers a complete match over a partial match.  This option is "soft" because it prefers a complete match over a partial match.
88  All the various matching items in a pattern behave as if the subject string is  All the various matching items in a pattern behave as if the subject string is
89  potentially complete. For example, \ez, \eZ, and $ match at the end of the  potentially complete. For example, \ez, \eZ, and $ match at the end of the
90  subject, as normal, and for \eb and \eB the end of the subject is treated as a  subject, as normal, and for \eb and \eB the end of the subject is treated as a
91  non-alphanumeric.  non-alphanumeric.
92  .P  .P
93  If there is more than one partial match, the first one that was found provides  If there is more than one partial match, the first one that was found provides
# Line 108  matches the second alternative.) Line 108  matches the second alternative.)
108  .sp  .sp
109  If PCRE_PARTIAL_HARD is set for \fBpcre_exec()\fP, it returns  If PCRE_PARTIAL_HARD is set for \fBpcre_exec()\fP, it returns
110  PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to  PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to
111  search for possible complete matches. This option is "hard" because it prefers  search for possible complete matches. This option is "hard" because it prefers
112  an earlier partial match over a later complete match. For this reason, the  an earlier partial match over a later complete match. For this reason, the
113  assumption is made that the end of the supplied subject string may not be the  assumption is made that the end of the supplied subject string may not be the
114  true end of the available data, and so, if \ez, \eZ, \eb, \eB, or $ are  true end of the available data, and so, if \ez, \eZ, \eb, \eB, or $ are
115  encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL.  encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL.
116  .P  .P
117  Setting PCRE_PARTIAL_HARD also affects the way \fBpcre_exec()\fP checks UTF-8  Setting PCRE_PARTIAL_HARD also affects the way \fBpcre_exec()\fP checks UTF-8
118  subject strings for validity. Normally, an invalid UTF-8 sequence causes the  subject strings for validity. Normally, an invalid UTF-8 sequence causes the
119  error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8  error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8
120  character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when  character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when
121  PCRE_PARTIAL_HARD is set.  PCRE_PARTIAL_HARD is set.
122  .  .
123  .  .
# Line 280  From release 8.00, \fBpcre_exec()\fP can Line 280  From release 8.00, \fBpcre_exec()\fP can
280  matching. Unlike \fBpcre_dfa_exec()\fP, it is not possible to restart the  matching. Unlike \fBpcre_dfa_exec()\fP, it is not possible to restart the
281  previous match with a new segment of data. Instead, new data must be added to  previous match with a new segment of data. Instead, new data must be added to
282  the previous subject string, and the entire match re-run, starting from the  the previous subject string, and the entire match re-run, starting from the
283  point where the partial match occurred. Earlier data can be discarded. It is  point where the partial match occurred. Earlier data can be discarded. It is
284  best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the  best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the
285  end of a segment as the end of the subject when matching \ez, \eZ, \eb, \eB,  end of a segment as the end of the subject when matching \ez, \eZ, \eb, \eB,
286  and $. Consider an unanchored pattern that matches dates:  and $. Consider an unanchored pattern that matches dates:
287  .sp  .sp
# Line 309  whichever matching function is used. Line 309  whichever matching function is used.
309  .P  .P
310  1. If the pattern contains a test for the beginning of a line, you need to pass  1. If the pattern contains a test for the beginning of a line, you need to pass
311  the PCRE_NOTBOL option when the subject string for any call does start at the  the PCRE_NOTBOL option when the subject string for any call does start at the
312  beginning of a line. There is also a PCRE_NOTEOL option, but in practice when  beginning of a line. There is also a PCRE_NOTEOL option, but in practice when
313  doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which  doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which
314  includes the effect of PCRE_NOTEOL.  includes the effect of PCRE_NOTEOL.
315  .P  .P
316  2. Lookbehind assertions at the start of a pattern are catered for in the  2. Lookbehind assertions at the start of a pattern are catered for in the

Legend:
Removed from v.578  
changed lines
  Added in v.579

  ViewVC Help
Powered by ViewVC 1.1.5