11 |
.SH DESCRIPTION |
.SH DESCRIPTION |
12 |
.rs |
.rs |
13 |
.sp |
.sp |
14 |
The C++ wrapper for PCRE was provided by Google Inc. This brief man page was |
The C++ wrapper for PCRE was provided by Google Inc. Some additional |
15 |
constructed from the notes in the \fIpcrecpp.h\fP file, which should be |
functionality was added by Giuseppe Maxia. This brief man page was constructed |
16 |
consulted for further details. |
from the notes in the \fIpcrecpp.h\fP file, which should be consulted for |
17 |
|
further details. |
18 |
. |
. |
19 |
. |
. |
20 |
.SH "MATCHING INTERFACE" |
.SH "MATCHING INTERFACE" |
131 |
--enable-utf8 flag. |
--enable-utf8 flag. |
132 |
. |
. |
133 |
. |
. |
134 |
|
.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE" |
135 |
|
.rs |
136 |
|
.sp |
137 |
|
PCRE defines some modifiers to change the behavior of the regular expression |
138 |
|
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to |
139 |
|
pass such modifiers to a RE class. Currently, the following modifiers are |
140 |
|
supported: |
141 |
|
.sp |
142 |
|
modifier description Perl corresponding |
143 |
|
.sp |
144 |
|
PCRE_CASELESS case insensitive match /i |
145 |
|
PCRE_MULTILINE multiple lines match /m |
146 |
|
PCRE_DOTALL dot matches newlines /s |
147 |
|
PCRE_DOLLAR_ENDONLY $ matches only at end N/A |
148 |
|
PCRE_EXTRA strict escape parsing N/A |
149 |
|
PCRE_EXTENDED ignore whitespaces /x |
150 |
|
PCRE_UTF8 handles UTF8 chars built-in |
151 |
|
PCRE_UNGREEDY reverses * and *? N/A |
152 |
|
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) |
153 |
|
.sp |
154 |
|
(*) Both Perl and PCRE allow non capturing parentheses by means of the |
155 |
|
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not |
156 |
|
capture, while (ab|cd) does. |
157 |
|
.P |
158 |
|
For a full account on how each modifier works, please check the |
159 |
|
PCRE API reference page. |
160 |
|
.P |
161 |
|
For each modifier, there are two member functions whose name is made |
162 |
|
out of the modifier in lowercase, without the "PCRE_" prefix. For |
163 |
|
instance, PCRE_CASELESS is handled by |
164 |
|
.sp |
165 |
|
bool caseless() |
166 |
|
.sp |
167 |
|
which returns true if the modifier is set, and |
168 |
|
.sp |
169 |
|
RE_Options & set_caseless(bool) |
170 |
|
.sp |
171 |
|
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be |
172 |
|
accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member |
173 |
|
functions. Setting \fImatch_limit\fR to a non-zero value will limit the |
174 |
|
execution of pcre to keep it from doing bad things like blowing the stack or |
175 |
|
taking an eternity to return a result. A value of 5000 is good enough to stop |
176 |
|
stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables |
177 |
|
match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP |
178 |
|
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE |
179 |
|
recurses. \fBmatch_limit()\fP limits the number of matches PCRE does; |
180 |
|
\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and |
181 |
|
therefore the amount of stack that is used. |
182 |
|
.P |
183 |
|
Normally, to pass one or more modifiers to a RE class, you declare |
184 |
|
a \fIRE_Options\fR object, set the appropriate options, and pass this |
185 |
|
object to a RE constructor. Example: |
186 |
|
.sp |
187 |
|
RE_options opt; |
188 |
|
opt.set_caseless(true); |
189 |
|
if (RE("HELLO", opt).PartialMatch("hello world")) ... |
190 |
|
.sp |
191 |
|
RE_options has two constructors. The default constructor takes no arguments and |
192 |
|
creates a set of flags that are off by default. The optional parameter |
193 |
|
\fIoption_flags\fR is to facilitate transfer of legacy code from C programs. |
194 |
|
This lets you do |
195 |
|
.sp |
196 |
|
RE(pattern, |
197 |
|
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); |
198 |
|
.sp |
199 |
|
However, new code is better off doing |
200 |
|
.sp |
201 |
|
RE(pattern, |
202 |
|
RE_Options().set_caseless(true).set_multiline(true)) |
203 |
|
.PartialMatch(str); |
204 |
|
.sp |
205 |
|
If you are going to pass one of the most used modifiers, there are some |
206 |
|
convenience functions that return a RE_Options class with the |
207 |
|
appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR, |
208 |
|
\fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR. |
209 |
|
.P |
210 |
|
If you need to set several options at once, and you don't want to go through |
211 |
|
the pains of declaring a RE_Options object and setting several options, there |
212 |
|
is a parallel method that give you such ability on the fly. You can concatenate |
213 |
|
several \fBset_xxxxx()\fR member functions, since each of them returns a |
214 |
|
reference to its class object. For example, to pass PCRE_CASELESS, |
215 |
|
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: |
216 |
|
.sp |
217 |
|
RE(" ^ xyz \e\es+ .* blah$", |
218 |
|
RE_Options() |
219 |
|
.set_caseless(true) |
220 |
|
.set_extended(true) |
221 |
|
.set_multiline(true)).PartialMatch(sometext); |
222 |
|
.sp |
223 |
|
. |
224 |
|
. |
225 |
.SH "SCANNING TEXT INCREMENTALLY" |
.SH "SCANNING TEXT INCREMENTALLY" |
226 |
.rs |
.rs |
227 |
.sp |
.sp |