ViewVC logotype

Contents of /code/trunk/doc/html/pcrecallout.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 903 - (show annotations)
Sat Jan 21 16:37:17 2012 UTC (7 years, 7 months ago) by ph10
File MIME type: text/html
File size: 9763 byte(s)
Source file tidies for 8.30-RC1 release; fix Makefile.am bugs for building 
symbolic links to man pages.
1 <html>
2 <head>
3 <title>pcrecallout specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcrecallout man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
17 <li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a>
18 <li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a>
19 <li><a name="TOC4" href="#SEC4">RETURN VALUES</a>
20 <li><a name="TOC5" href="#SEC5">AUTHOR</a>
21 <li><a name="TOC6" href="#SEC6">REVISION</a>
22 </ul>
23 <br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
24 <P>
25 <b>int (*pcre_callout)(pcre_callout_block *);</b>
26 </P>
27 <P>
28 <b>int (*pcre16_callout)(pcre16_callout_block *);</b>
29 </P>
30 <P>
31 PCRE provides a feature called "callout", which is a means of temporarily
32 passing control to the caller of PCRE in the middle of pattern matching. The
33 caller of PCRE provides an external function by putting its entry point in the
34 global variable <i>pcre_callout</i> (<i>pcre16_callout</i> for the 16-bit
35 library). By default, this variable contains NULL, which disables all calling
36 out.
37 </P>
38 <P>
39 Within a regular expression, (?C) indicates the points at which the external
40 function is to be called. Different callout points can be identified by putting
41 a number less than 256 after the letter C. The default value is zero.
42 For example, this pattern has two callout points:
43 <pre>
44 (?C1)abc(?C2)def
45 </pre>
46 If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE
47 automatically inserts callouts, all with number 255, before each item in the
48 pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
49 <pre>
50 A(\d{2}|--)
51 </pre>
52 it is processed as if it were
53 <br>
54 <br>
55 (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
56 <br>
57 <br>
58 Notice that there is a callout before and after each parenthesis and
59 alternation bar. Automatic callouts can be used for tracking the progress of
60 pattern matching. The
61 <a href="pcretest.html"><b>pcretest</b></a>
62 command has an option that sets automatic callouts; when it is used, the output
63 indicates how the pattern is matched. This is useful information when you are
64 trying to optimize the performance of a particular pattern.
65 </P>
66 <P>
67 The use of callouts in a pattern makes it ineligible for optimization by the
68 just-in-time compiler. Studying such a pattern with the PCRE_STUDY_JIT_COMPILE
69 option always fails.
70 </P>
71 <br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br>
72 <P>
73 You should be aware that, because of optimizations in the way PCRE matches
74 patterns by default, callouts sometimes do not happen. For example, if the
75 pattern is
76 <pre>
77 ab(?C4)cd
78 </pre>
79 PCRE knows that any matching string must contain the letter "d". If the subject
80 string is "abyz", the lack of "d" means that matching doesn't ever start, and
81 the callout is never reached. However, with "abyd", though the result is still
82 no match, the callout is obeyed.
83 </P>
84 <P>
85 If the pattern is studied, PCRE knows the minimum length of a matching string,
86 and will immediately give a "no match" return without actually running a match
87 if the subject is not long enough, or, for unanchored patterns, if it has
88 been scanned far enough.
89 </P>
90 <P>
91 You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE
92 option to the matching function, or by starting the pattern with
93 (*NO_START_OPT). This slows down the matching process, but does ensure that
94 callouts such as the example above are obeyed.
95 </P>
96 <br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br>
97 <P>
98 During matching, when PCRE reaches a callout point, the external function
99 defined by <i>pcre_callout</i> or <i>pcre16_callout</i> is called (if it is set).
100 This applies to both normal and DFA matching. The only argument to the callout
101 function is a pointer to a <b>pcre_callout</b> or <b>pcre16_callout</b> block.
102 These structures contains the following fields:
103 <pre>
104 int <i>version</i>;
105 int <i>callout_number</i>;
106 int *<i>offset_vector</i>;
107 const char *<i>subject</i>; (8-bit version)
108 PCRE_SPTR16 <i>subject</i>; (16-bit version)
109 int <i>subject_length</i>;
110 int <i>start_match</i>;
111 int <i>current_position</i>;
112 int <i>capture_top</i>;
113 int <i>capture_last</i>;
114 void *<i>callout_data</i>;
115 int <i>pattern_position</i>;
116 int <i>next_item_length</i>;
117 const unsigned char *<i>mark</i>; (8-bit version)
118 const PCRE_UCHAR16 *<i>mark</i>; (16-bit version)
119 </pre>
120 The <i>version</i> field is an integer containing the version number of the
121 block format. The initial version was 0; the current version is 2. The version
122 number will change again in future if additional fields are added, but the
123 intention is never to remove any of the existing fields.
124 </P>
125 <P>
126 The <i>callout_number</i> field contains the number of the callout, as compiled
127 into the pattern (that is, the number after ?C for manual callouts, and 255 for
128 automatically generated callouts).
129 </P>
130 <P>
131 The <i>offset_vector</i> field is a pointer to the vector of offsets that was
132 passed by the caller to the matching function. When <b>pcre_exec()</b> or
133 <b>pcre16_exec()</b> is used, the contents can be inspected, in order to extract
134 substrings that have been matched so far, in the same way as for extracting
135 substrings after a match has completed. For the DFA matching functions, this
136 field is not useful.
137 </P>
138 <P>
139 The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
140 that were passed to the matching function.
141 </P>
142 <P>
143 The <i>start_match</i> field normally contains the offset within the subject at
144 which the current match attempt started. However, if the escape sequence \K
145 has been encountered, this value is changed to reflect the modified starting
146 point. If the pattern is not anchored, the callout function may be called
147 several times from the same point in the pattern for different starting points
148 in the subject.
149 </P>
150 <P>
151 The <i>current_position</i> field contains the offset within the subject of the
152 current match pointer.
153 </P>
154 <P>
155 When the <b>pcre_exec()</b> or <b>pcre16_exec()</b> is used, the
156 <i>capture_top</i> field contains one more than the number of the highest
157 numbered captured substring so far. If no substrings have been captured, the
158 value of <i>capture_top</i> is one. This is always the case when the DFA
159 functions are used, because they do not support captured substrings.
160 </P>
161 <P>
162 The <i>capture_last</i> field contains the number of the most recently captured
163 substring. If no substrings have been captured, its value is -1. This is always
164 the case for the DFA matching functions.
165 </P>
166 <P>
167 The <i>callout_data</i> field contains a value that is passed to a matching
168 function specifically so that it can be passed back in callouts. It is passed
169 in the <i>callout_data</i> field of a <b>pcre_extra</b> or <b>pcre16_extra</b>
170 data structure. If no such data was passed, the value of <i>callout_data</i> in
171 a callout block is NULL. There is a description of the <b>pcre_extra</b>
172 structure in the
173 <a href="pcreapi.html"><b>pcreapi</b></a>
174 documentation.
175 </P>
176 <P>
177 The <i>pattern_position</i> field is present from version 1 of the callout
178 structure. It contains the offset to the next item to be matched in the pattern
179 string.
180 </P>
181 <P>
182 The <i>next_item_length</i> field is present from version 1 of the callout
183 structure. It contains the length of the next item to be matched in the pattern
184 string. When the callout immediately precedes an alternation bar, a closing
185 parenthesis, or the end of the pattern, the length is zero. When the callout
186 precedes an opening parenthesis, the length is that of the entire subpattern.
187 </P>
188 <P>
189 The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
190 help in distinguishing between different automatic callouts, which all have the
191 same callout number. However, they are set for all callouts.
192 </P>
193 <P>
194 The <i>mark</i> field is present from version 2 of the callout structure. In
195 callouts from <b>pcre_exec()</b> or <b>pcre16_exec()</b> it contains a pointer to
196 the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
197 (*THEN) item in the match, or NULL if no such items have been passed. Instances
198 of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
199 callouts from the DFA matching functions this field always contains NULL.
200 </P>
201 <br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
202 <P>
203 The external callout function returns an integer to PCRE. If the value is zero,
204 matching proceeds as normal. If the value is greater than zero, matching fails
205 at the current point, but the testing of other matching possibilities goes
206 ahead, just as if a lookahead assertion had failed. If the value is less than
207 zero, the match is abandoned, the matching function returns the negative value.
208 </P>
209 <P>
210 Negative values should normally be chosen from the set of PCRE_ERROR_xxx
211 values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
212 The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
213 it will never be used by PCRE itself.
214 </P>
215 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
216 <P>
217 Philip Hazel
218 <br>
219 University Computing Service
220 <br>
221 Cambridge CB2 3QH, England.
222 <br>
223 </P>
224 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
225 <P>
226 Last updated: 08 Janurary 2012
227 <br>
228 Copyright &copy; 1997-2012 University of Cambridge.
229 <br>
230 <p>
231 Return to the <a href="index.html">PCRE index page</a>.
232 </p>


Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

  ViewVC Help
Powered by ViewVC 1.1.5