18 |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a> |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a> |
19 |
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a> |
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a> |
20 |
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a> |
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a> |
21 |
<li><a name="TOC6" href="#SEC6">SCANNING TEXT INCREMENTALLY</a> |
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a> |
22 |
<li><a name="TOC7" href="#SEC7">PARSING HEX/OCTAL/C-RADIX NUMBERS</a> |
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a> |
23 |
<li><a name="TOC8" href="#SEC8">REPLACING PARTS OF STRINGS</a> |
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a> |
24 |
<li><a name="TOC9" href="#SEC9">AUTHOR</a> |
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a> |
25 |
|
<li><a name="TOC10" href="#SEC10">AUTHOR</a> |
26 |
</ul> |
</ul> |
27 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br> |
28 |
<P> |
<P> |
32 |
</P> |
</P> |
33 |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br> |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br> |
34 |
<P> |
<P> |
35 |
The C++ wrapper for PCRE was provided by Google Inc. This brief man page was |
The C++ wrapper for PCRE was provided by Google Inc. Some additional |
36 |
constructed from the notes in the <i>pcrecpp.h</i> file, which should be |
functionality was added by Giuseppe Maxia. This brief man page was constructed |
37 |
consulted for further details. |
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for |
38 |
|
further details. |
39 |
</P> |
</P> |
40 |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br> |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br> |
41 |
<P> |
<P> |
150 |
--enable-utf8 flag. |
--enable-utf8 flag. |
151 |
</PRE> |
</PRE> |
152 |
</P> |
</P> |
153 |
<br><a name="SEC6" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br> |
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br> |
154 |
|
<P> |
155 |
|
PCRE defines some modifiers to change the behavior of the regular expression |
156 |
|
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to |
157 |
|
pass such modifiers to a RE class. Currently, the following modifiers are |
158 |
|
supported: |
159 |
|
<pre> |
160 |
|
modifier description Perl corresponding |
161 |
|
|
162 |
|
PCRE_CASELESS case insensitive match /i |
163 |
|
PCRE_MULTILINE multiple lines match /m |
164 |
|
PCRE_DOTALL dot matches newlines /s |
165 |
|
PCRE_DOLLAR_ENDONLY $ matches only at end N/A |
166 |
|
PCRE_EXTRA strict escape parsing N/A |
167 |
|
PCRE_EXTENDED ignore whitespaces /x |
168 |
|
PCRE_UTF8 handles UTF8 chars built-in |
169 |
|
PCRE_UNGREEDY reverses * and *? N/A |
170 |
|
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) |
171 |
|
</pre> |
172 |
|
(*) Both Perl and PCRE allow non capturing parentheses by means of the |
173 |
|
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not |
174 |
|
capture, while (ab|cd) does. |
175 |
|
</P> |
176 |
|
<P> |
177 |
|
For a full account on how each modifier works, please check the |
178 |
|
PCRE API reference page. |
179 |
|
</P> |
180 |
|
<P> |
181 |
|
For each modifier, there are two member functions whose name is made |
182 |
|
out of the modifier in lowercase, without the "PCRE_" prefix. For |
183 |
|
instance, PCRE_CASELESS is handled by |
184 |
|
<pre> |
185 |
|
bool caseless() |
186 |
|
</pre> |
187 |
|
which returns true if the modifier is set, and |
188 |
|
<pre> |
189 |
|
RE_Options & set_caseless(bool) |
190 |
|
</pre> |
191 |
|
which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be |
192 |
|
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member |
193 |
|
functions. Setting <i>match_limit</i> to a non-zero value will limit the |
194 |
|
execution of pcre to keep it from doing bad things like blowing the stack or |
195 |
|
taking an eternity to return a result. A value of 5000 is good enough to stop |
196 |
|
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables |
197 |
|
match limiting. |
198 |
|
</P> |
199 |
|
<P> |
200 |
|
Normally, to pass one or more modifiers to a RE class, you declare |
201 |
|
a <i>RE_Options</i> object, set the appropriate options, and pass this |
202 |
|
object to a RE constructor. Example: |
203 |
|
<pre> |
204 |
|
RE_options opt; |
205 |
|
opt.set_caseless(true); |
206 |
|
if (RE("HELLO", opt).PartialMatch("hello world")) ... |
207 |
|
</pre> |
208 |
|
RE_options has two constructors. The default constructor takes no arguments and |
209 |
|
creates a set of flags that are off by default. The optional parameter |
210 |
|
<i>option_flags</i> is to facilitate transfer of legacy code from C programs. |
211 |
|
This lets you do |
212 |
|
<pre> |
213 |
|
RE(pattern, |
214 |
|
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); |
215 |
|
</pre> |
216 |
|
However, new code is better off doing |
217 |
|
<pre> |
218 |
|
RE(pattern, |
219 |
|
RE_Options().set_caseless(true).set_multiline(true)) |
220 |
|
.PartialMatch(str); |
221 |
|
</pre> |
222 |
|
If you are going to pass one of the most used modifiers, there are some |
223 |
|
convenience functions that return a RE_Options class with the |
224 |
|
appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>, |
225 |
|
<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>. |
226 |
|
</P> |
227 |
|
<P> |
228 |
|
If you need to set several options at once, and you don't want to go through |
229 |
|
the pains of declaring a RE_Options object and setting several options, there |
230 |
|
is a parallel method that give you such ability on the fly. You can concatenate |
231 |
|
several <b>set_xxxxx()</b> member functions, since each of them returns a |
232 |
|
reference to its class object. For example, to pass PCRE_CASELESS, |
233 |
|
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: |
234 |
|
<pre> |
235 |
|
RE(" ^ xyz \\s+ .* blah$", |
236 |
|
RE_Options() |
237 |
|
.set_caseless(true) |
238 |
|
.set_extended(true) |
239 |
|
.set_multiline(true)).PartialMatch(sometext); |
240 |
|
|
241 |
|
</PRE> |
242 |
|
</P> |
243 |
|
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br> |
244 |
<P> |
<P> |
245 |
The "Consume" operation may be useful if you want to repeatedly |
The "Consume" operation may be useful if you want to repeatedly |
246 |
match regular expressions at the front of a string and skip over |
match regular expressions at the front of a string and skip over |
273 |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) |
274 |
</PRE> |
</PRE> |
275 |
</P> |
</P> |
276 |
<br><a name="SEC7" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br> |
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br> |
277 |
<P> |
<P> |
278 |
By default, if you pass a pointer to a numeric value, the |
By default, if you pass a pointer to a numeric value, the |
279 |
corresponding text is interpreted as a base-10 number. You can |
corresponding text is interpreted as a base-10 number. You can |
291 |
</pre> |
</pre> |
292 |
will leave 64 in a, b, c, and d. |
will leave 64 in a, b, c, and d. |
293 |
</P> |
</P> |
294 |
<br><a name="SEC8" href="#TOC1">REPLACING PARTS OF STRINGS</a><br> |
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br> |
295 |
<P> |
<P> |
296 |
You can replace the first match of "pattern" in "str" with "rewrite". |
You can replace the first match of "pattern" in "str" with "rewrite". |
297 |
Within "rewrite", backslash-escaped digits (\1 to \9) can be |
Within "rewrite", backslash-escaped digits (\1 to \9) can be |
323 |
occurred and the extraction happened successfully; if no match occurs, the |
occurred and the extraction happened successfully; if no match occurs, the |
324 |
string is left unaffected. |
string is left unaffected. |
325 |
</P> |
</P> |
326 |
<br><a name="SEC9" href="#TOC1">AUTHOR</a><br> |
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br> |
327 |
<P> |
<P> |
328 |
The C++ wrapper was contributed by Google Inc. |
The C++ wrapper was contributed by Google Inc. |
329 |
<br> |
<br> |