/[pcre]/code/trunk/doc/pcrepartial.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepartial.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 79 by nigel, Sat Feb 24 21:40:52 2007 UTC revision 172 by ph10, Tue Jun 5 10:40:13 2007 UTC
# Line 71  envisaged for this facility, this is not Line 71  envisaged for this facility, this is not
71  .P  .P
72  If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,  If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
73  \fBpcre_exec()\fP returns the error code PCRE_ERROR_BADPARTIAL (-13).  \fBpcre_exec()\fP returns the error code PCRE_ERROR_BADPARTIAL (-13).
74    You can use the PCRE_INFO_OKPARTIAL call to \fBpcre_fullinfo()\fP to find out
75    if a compiled pattern can be used for partial matching.
76  .  .
77  .  .
78  .SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST"  .SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST"
# Line 81  PCRE_PARTIAL flag is used for the match. Line 83  PCRE_PARTIAL flag is used for the match.
83  uses the date example quoted above:  uses the date example quoted above:
84  .sp  .sp
85      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
86    data> 25jun04\P    data> 25jun04\eP
87     0: 25jun04     0: 25jun04
88     1: jun     1: jun
89    data> 25dec3\P    data> 25dec3\eP
90    Partial match    Partial match
91    data> 3ju\P    data> 3ju\eP
92    Partial match    Partial match
93    data> 3juj\P    data> 3juj\eP
94    No match    No match
95    data> j\P    data> j\eP
96    No match    No match
97  .sp  .sp
98  The first data string is matched completely, so \fBpcretest\fP shows the  The first data string is matched completely, so \fBpcretest\fP shows the
99  matched substrings. The remaining four strings do not match the complete  matched substrings. The remaining four strings do not match the complete
100  pattern, but the first two are partial matches. The same test, using DFA  pattern, but the first two are partial matches. The same test, using
101  matching (by means of the \eD escape sequence), produces the following output:  \fBpcre_dfa_exec()\fP matching (by means of the \eD escape sequence), produces
102    the following output:
103  .sp  .sp
104      re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
105    data> 25jun04\eP\eD    data> 25jun04\eP\eD
106     0: 25jun04     0: 25jun04
107    data> 23dec3\eP\eD    data> 23dec3\eP\eD
# Line 119  available. Line 122  available.
122  .sp  .sp
123  When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible  When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible
124  to continue the match by providing additional subject data and calling  to continue the match by providing additional subject data and calling
125  \fBpcre_dfa_exec()\fP again with the PCRE_DFA_RESTART option and the same  \fBpcre_dfa_exec()\fP again with the same compiled regular expression, this
126  working space (where details of the previous partial match are stored). Here is  time setting the PCRE_DFA_RESTART option. You must also pass the same working
127  an example using \fBpcretest\fP, where the \eR escape sequence sets the  space as before, because this is where details of the previous partial match
128  PCRE_DFA_RESTART option and the \eD escape sequence requests the use of  are stored. Here is an example using \fBpcretest\fP, using the \eR escape
129  \fBpcre_dfa_exec()\fP:  sequence to set the PCRE_DFA_RESTART option (\eP and \eD are as above):
130  .sp  .sp
131      re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
132    data> 23ja\eP\eD    data> 23ja\eP\eD
133    Partial match: 23ja    Partial match: 23ja
134    data> n05\eR\eD    data> n05\eR\eD
# Line 137  Notice that when the match is complete, Line 140  Notice that when the match is complete,
140  not retain the previously partially-matched string. It is up to the calling  not retain the previously partially-matched string. It is up to the calling
141  program to do that if it needs to.  program to do that if it needs to.
142  .P  .P
143  This facility can be used to pass very long subject strings to  You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
144  \fBpcre_dfa_exec()\fP. However, some care is needed for certain types of  over multiple segments. This facility can be used to pass very long subject
145  pattern.  strings to \fBpcre_dfa_exec()\fP. However, some care is needed for certain
146    types of pattern.
147  .P  .P
148  1. If the pattern contains tests for the beginning or end of a line, you need  1. If the pattern contains tests for the beginning or end of a line, you need
149  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
# Line 147  subject string for any call does not con Line 151  subject string for any call does not con
151  .P  .P
152  2. If the pattern contains backward assertions (including \eb or \eB), you need  2. If the pattern contains backward assertions (including \eb or \eB), you need
153  to arrange for some overlap in the subject strings to allow for this. For  to arrange for some overlap in the subject strings to allow for this. For
154  example, you could pass the subject in chunks that were 500 bytes long, but in  example, you could pass the subject in chunks that are 500 bytes long, but in
155  a buffer of 700 bytes, with the starting offset set to 200 and the previous 200  a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
156  bytes at the start of the buffer.  bytes at the start of the buffer.
157  .P  .P
# Line 155  bytes at the start of the buffer. Line 159  bytes at the start of the buffer.
159  always produce exactly the same result as matching over one single long string.  always produce exactly the same result as matching over one single long string.
160  The difference arises when there are multiple matching possibilities, because a  The difference arises when there are multiple matching possibilities, because a
161  partial match result is given only when there are no completed matches in a  partial match result is given only when there are no completed matches in a
162  call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has  call to \fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
163  been found, continuation to a new subject segment is no longer possible.  been found, continuation to a new subject segment is no longer possible.
164  Consider this \fBpcretest\fP example:  Consider this \fBpcretest\fP example:
165  .sp  .sp
# Line 175  hand, if "dogsbody" is presented as a si Line 179  hand, if "dogsbody" is presented as a si
179  .P  .P
180  Because of this phenomenon, it does not usually make sense to end a pattern  Because of this phenomenon, it does not usually make sense to end a pattern
181  that is going to be matched in this way with a variable repeat.  that is going to be matched in this way with a variable repeat.
182    .P
183    4. Patterns that contain alternatives at the top level which do not all
184    start with the same pattern item may not work as expected. For example,
185    consider this pattern:
186    .sp
187      1234|3789
188    .sp
189    If the first part of the subject is "ABC123", a partial match of the first
190    alternative is found at offset 3. There is no partial match for the second
191    alternative, because such a match does not start at the same point in the
192    subject string. Attempting to continue with the string "789" does not yield a
193    match because only those alternatives that match at one point in the subject
194    are remembered. The problem arises because the start of the second alternative
195    matches within the first alternative. There is no problem with anchored
196    patterns or patterns such as:
197    .sp
198      1234|ABCD
199    .sp
200    where no string can be a partial match for both alternatives.
201  .  .
202  .  .
203  .P  .SH AUTHOR
204  .in 0  .rs
205  Last updated: 28 February 2005  .sp
206  .br  .nf
207  Copyright (c) 1997-2005 University of Cambridge.  Philip Hazel
208    University Computing Service
209    Cambridge CB2 3QH, England.
210    .fi
211    .
212    .
213    .SH REVISION
214    .rs
215    .sp
216    .nf
217    Last updated: 04 June 2007
218    Copyright (c) 1997-2007 University of Cambridge.
219    .fi

Legend:
Removed from v.79  
changed lines
  Added in v.172

  ViewVC Help
Powered by ViewVC 1.1.5