/[pcre]/code/trunk/doc/html/pcrepartial.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepartial.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 578 by ph10, Wed Nov 17 17:55:57 2010 UTC revision 579 by ph10, Wed Nov 24 17:39:25 2010 UTC
# Line 73  subject string is reached successfully, Line 73  subject string is reached successfully,
73  more characters are needed. However, at least one character in the subject must  more characters are needed. However, at least one character in the subject must
74  have been inspected. This character need not form part of the final matched  have been inspected. This character need not form part of the final matched
75  string; lookbehind assertions and the \K escape sequence provide ways of  string; lookbehind assertions and the \K escape sequence provide ways of
76  inspecting characters before the start of a matched substring. The requirement  inspecting characters before the start of a matched substring. The requirement
77  for inspecting at least one character exists because an empty string can always  for inspecting at least one character exists because an empty string can always
78  be matched; without such a restriction there would always be a partial match of  be matched; without such a restriction there would always be a partial match of
79  an empty string at the end of the subject.  an empty string at the end of the subject.
80  </P>  </P>
81  <P>  <P>
# Line 83  If there are at least two slots in the o Line 83  If there are at least two slots in the o
83  returns with a partial match, the first slot is set to the offset of the  returns with a partial match, the first slot is set to the offset of the
84  earliest character that was inspected when the partial match was found. For  earliest character that was inspected when the partial match was found. For
85  convenience, the second offset points to the end of the subject so that a  convenience, the second offset points to the end of the subject so that a
86  substring can easily be identified.  substring can easily be identified.
87  </P>  </P>
88  <P>  <P>
89  For the majority of patterns, the first offset identifies the start of the  For the majority of patterns, the first offset identifies the start of the
# Line 100  with extra characters added to the subje Line 100  with extra characters added to the subje
100  </P>  </P>
101  <P>  <P>
102  What happens when a partial match is identified depends on which of the two  What happens when a partial match is identified depends on which of the two
103  partial matching options are set.  partial matching options are set.
104  </P>  </P>
105  <br><b>  <br><b>
106  PCRE_PARTIAL_SOFT with pcre_exec()  PCRE_PARTIAL_SOFT with pcre_exec()
# Line 112  alternatives in the pattern are tried. I Line 112  alternatives in the pattern are tried. I
112  <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH.  <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH.
113  </P>  </P>
114  <P>  <P>
115  This option is "soft" because it prefers a complete match over a partial match.  This option is "soft" because it prefers a complete match over a partial match.
116  All the various matching items in a pattern behave as if the subject string is  All the various matching items in a pattern behave as if the subject string is
117  potentially complete. For example, \z, \Z, and $ match at the end of the  potentially complete. For example, \z, \Z, and $ match at the end of the
118  subject, as normal, and for \b and \B the end of the subject is treated as a  subject, as normal, and for \b and \B the end of the subject is treated as a
119  non-alphanumeric.  non-alphanumeric.
120  </P>  </P>
121  <P>  <P>
# Line 137  PCRE_PARTIAL_HARD with pcre_exec() Line 137  PCRE_PARTIAL_HARD with pcre_exec()
137  <P>  <P>
138  If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b>, it returns  If PCRE_PARTIAL_HARD is set for <b>pcre_exec()</b>, it returns
139  PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to  PCRE_ERROR_PARTIAL as soon as a partial match is found, without continuing to
140  search for possible complete matches. This option is "hard" because it prefers  search for possible complete matches. This option is "hard" because it prefers
141  an earlier partial match over a later complete match. For this reason, the  an earlier partial match over a later complete match. For this reason, the
142  assumption is made that the end of the supplied subject string may not be the  assumption is made that the end of the supplied subject string may not be the
143  true end of the available data, and so, if \z, \Z, \b, \B, or $ are  true end of the available data, and so, if \z, \Z, \b, \B, or $ are
144  encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL.  encountered at the end of the subject, the result is PCRE_ERROR_PARTIAL.
145  </P>  </P>
146  <P>  <P>
147  Setting PCRE_PARTIAL_HARD also affects the way <b>pcre_exec()</b> checks UTF-8  Setting PCRE_PARTIAL_HARD also affects the way <b>pcre_exec()</b> checks UTF-8
148  subject strings for validity. Normally, an invalid UTF-8 sequence causes the  subject strings for validity. Normally, an invalid UTF-8 sequence causes the
149  error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8  error PCRE_ERROR_BADUTF8. However, in the special case of a truncated UTF-8
150  character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when  character at the end of the subject, PCRE_ERROR_SHORTUTF8 is returned when
151  PCRE_PARTIAL_HARD is set.  PCRE_PARTIAL_HARD is set.
152  </P>  </P>
153  <br><b>  <br><b>
# Line 304  From release 8.00, <b>pcre_exec()</b> ca Line 304  From release 8.00, <b>pcre_exec()</b> ca
304  matching. Unlike <b>pcre_dfa_exec()</b>, it is not possible to restart the  matching. Unlike <b>pcre_dfa_exec()</b>, it is not possible to restart the
305  previous match with a new segment of data. Instead, new data must be added to  previous match with a new segment of data. Instead, new data must be added to
306  the previous subject string, and the entire match re-run, starting from the  the previous subject string, and the entire match re-run, starting from the
307  point where the partial match occurred. Earlier data can be discarded. It is  point where the partial match occurred. Earlier data can be discarded. It is
308  best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the  best to use PCRE_PARTIAL_HARD in this situation, because it does not treat the
309  end of a segment as the end of the subject when matching \z, \Z, \b, \B,  end of a segment as the end of the subject when matching \z, \Z, \b, \B,
310  and $. Consider an unanchored pattern that matches dates:  and $. Consider an unanchored pattern that matches dates:
311  <pre>  <pre>
# Line 333  whichever matching function is used. Line 333  whichever matching function is used.
333  <P>  <P>
334  1. If the pattern contains a test for the beginning of a line, you need to pass  1. If the pattern contains a test for the beginning of a line, you need to pass
335  the PCRE_NOTBOL option when the subject string for any call does start at the  the PCRE_NOTBOL option when the subject string for any call does start at the
336  beginning of a line. There is also a PCRE_NOTEOL option, but in practice when  beginning of a line. There is also a PCRE_NOTEOL option, but in practice when
337  doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which  doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which
338  includes the effect of PCRE_NOTEOL.  includes the effect of PCRE_NOTEOL.
339  </P>  </P>
340  <P>  <P>

Legend:
Removed from v.578  
changed lines
  Added in v.579

  ViewVC Help
Powered by ViewVC 1.1.5