/[pcre]/code/trunk/doc/pcrepartial.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepartial.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 79 by nigel, Sat Feb 24 21:40:52 2007 UTC revision 155 by ph10, Tue Apr 24 13:36:11 2007 UTC
# Line 81  PCRE_PARTIAL flag is used for the match. Line 81  PCRE_PARTIAL flag is used for the match.
81  uses the date example quoted above:  uses the date example quoted above:
82  .sp  .sp
83      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
84    data> 25jun04\P    data> 25jun04\eP
85     0: 25jun04     0: 25jun04
86     1: jun     1: jun
87    data> 25dec3\P    data> 25dec3\eP
88    Partial match    Partial match
89    data> 3ju\P    data> 3ju\eP
90    Partial match    Partial match
91    data> 3juj\P    data> 3juj\eP
92    No match    No match
93    data> j\P    data> j\eP
94    No match    No match
95  .sp  .sp
96  The first data string is matched completely, so \fBpcretest\fP shows the  The first data string is matched completely, so \fBpcretest\fP shows the
97  matched substrings. The remaining four strings do not match the complete  matched substrings. The remaining four strings do not match the complete
98  pattern, but the first two are partial matches. The same test, using DFA  pattern, but the first two are partial matches. The same test, using
99  matching (by means of the \eD escape sequence), produces the following output:  \fBpcre_dfa_exec()\fP matching (by means of the \eD escape sequence), produces
100    the following output:
101  .sp  .sp
102      re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
103    data> 25jun04\eP\eD    data> 25jun04\eP\eD
104     0: 25jun04     0: 25jun04
105    data> 23dec3\eP\eD    data> 23dec3\eP\eD
# Line 119  available. Line 120  available.
120  .sp  .sp
121  When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible  When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible
122  to continue the match by providing additional subject data and calling  to continue the match by providing additional subject data and calling
123  \fBpcre_dfa_exec()\fP again with the PCRE_DFA_RESTART option and the same  \fBpcre_dfa_exec()\fP again with the same compiled regular expression, this
124  working space (where details of the previous partial match are stored). Here is  time setting the PCRE_DFA_RESTART option. You must also pass the same working
125  an example using \fBpcretest\fP, where the \eR escape sequence sets the  space as before, because this is where details of the previous partial match
126  PCRE_DFA_RESTART option and the \eD escape sequence requests the use of  are stored. Here is an example using \fBpcretest\fP, using the \eR escape
127  \fBpcre_dfa_exec()\fP:  sequence to set the PCRE_DFA_RESTART option (\eP and \eD are as above):
128  .sp  .sp
129      re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/      re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
130    data> 23ja\eP\eD    data> 23ja\eP\eD
131    Partial match: 23ja    Partial match: 23ja
132    data> n05\eR\eD    data> n05\eR\eD
# Line 137  Notice that when the match is complete, Line 138  Notice that when the match is complete,
138  not retain the previously partially-matched string. It is up to the calling  not retain the previously partially-matched string. It is up to the calling
139  program to do that if it needs to.  program to do that if it needs to.
140  .P  .P
141  This facility can be used to pass very long subject strings to  You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
142  \fBpcre_dfa_exec()\fP. However, some care is needed for certain types of  over multiple segments. This facility can be used to pass very long subject
143  pattern.  strings to \fBpcre_dfa_exec()\fP. However, some care is needed for certain
144    types of pattern.
145  .P  .P
146  1. If the pattern contains tests for the beginning or end of a line, you need  1. If the pattern contains tests for the beginning or end of a line, you need
147  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the  to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
# Line 147  subject string for any call does not con Line 149  subject string for any call does not con
149  .P  .P
150  2. If the pattern contains backward assertions (including \eb or \eB), you need  2. If the pattern contains backward assertions (including \eb or \eB), you need
151  to arrange for some overlap in the subject strings to allow for this. For  to arrange for some overlap in the subject strings to allow for this. For
152  example, you could pass the subject in chunks that were 500 bytes long, but in  example, you could pass the subject in chunks that are 500 bytes long, but in
153  a buffer of 700 bytes, with the starting offset set to 200 and the previous 200  a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
154  bytes at the start of the buffer.  bytes at the start of the buffer.
155  .P  .P
# Line 155  bytes at the start of the buffer. Line 157  bytes at the start of the buffer.
157  always produce exactly the same result as matching over one single long string.  always produce exactly the same result as matching over one single long string.
158  The difference arises when there are multiple matching possibilities, because a  The difference arises when there are multiple matching possibilities, because a
159  partial match result is given only when there are no completed matches in a  partial match result is given only when there are no completed matches in a
160  call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has  call to \fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
161  been found, continuation to a new subject segment is no longer possible.  been found, continuation to a new subject segment is no longer possible.
162  Consider this \fBpcretest\fP example:  Consider this \fBpcretest\fP example:
163  .sp  .sp
# Line 175  hand, if "dogsbody" is presented as a si Line 177  hand, if "dogsbody" is presented as a si
177  .P  .P
178  Because of this phenomenon, it does not usually make sense to end a pattern  Because of this phenomenon, it does not usually make sense to end a pattern
179  that is going to be matched in this way with a variable repeat.  that is going to be matched in this way with a variable repeat.
180    .P
181    4. Patterns that contain alternatives at the top level which do not all
182    start with the same pattern item may not work as expected. For example,
183    consider this pattern:
184    .sp
185      1234|3789
186    .sp
187    If the first part of the subject is "ABC123", a partial match of the first
188    alternative is found at offset 3. There is no partial match for the second
189    alternative, because such a match does not start at the same point in the
190    subject string. Attempting to continue with the string "789" does not yield a
191    match because only those alternatives that match at one point in the subject
192    are remembered. The problem arises because the start of the second alternative
193    matches within the first alternative. There is no problem with anchored
194    patterns or patterns such as:
195    .sp
196      1234|ABCD
197    .sp
198    where no string can be a partial match for both alternatives.
199  .  .
200  .  .
201  .P  .SH AUTHOR
202  .in 0  .rs
203  Last updated: 28 February 2005  .sp
204  .br  .nf
205  Copyright (c) 1997-2005 University of Cambridge.  Philip Hazel
206    University Computing Service
207    Cambridge CB2 3QH, England.
208    .fi
209    .
210    .
211    .SH REVISION
212    .rs
213    .sp
214    .nf
215    Last updated: 06 March 2007
216    Copyright (c) 1997-2007 University of Cambridge.
217    .fi

Legend:
Removed from v.79  
changed lines
  Added in v.155

  ViewVC Help
Powered by ViewVC 1.1.5