/[pcre]/code/trunk/doc/html/pcrecpp.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrecpp.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 77 by nigel, Sat Feb 24 21:40:45 2007 UTC revision 81 by nigel, Sat Feb 24 21:40:59 2007 UTC
# Line 18  man page, in case the conversion went wr Line 18  man page, in case the conversion went wr
18  <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>  <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
19  <li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>  <li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
20  <li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>  <li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
21  <li><a name="TOC6" href="#SEC6">SCANNING TEXT INCREMENTALLY</a>  <li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
22  <li><a name="TOC7" href="#SEC7">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>  <li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a>
23  <li><a name="TOC8" href="#SEC8">REPLACING PARTS OF STRINGS</a>  <li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
24  <li><a name="TOC9" href="#SEC9">AUTHOR</a>  <li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a>
25    <li><a name="TOC10" href="#SEC10">AUTHOR</a>
26  </ul>  </ul>
27  <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>  <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
28  <P>  <P>
# Line 31  man page, in case the conversion went wr Line 32  man page, in case the conversion went wr
32  </P>  </P>
33  <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>  <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
34  <P>  <P>
35  The C++ wrapper for PCRE was provided by Google Inc. This brief man page was  The C++ wrapper for PCRE was provided by Google Inc. Some additional
36  constructed from the notes in the <i>pcrecpp.h</i> file, which should be  functionality was added by Giuseppe Maxia. This brief man page was constructed
37  consulted for further details.  from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
38    further details.
39  </P>  </P>
40  <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>  <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
41  <P>  <P>
# Line 148  NOTE: The UTF8 flag is ignored if pcre w Line 150  NOTE: The UTF8 flag is ignored if pcre w
150        --enable-utf8 flag.        --enable-utf8 flag.
151  </PRE>  </PRE>
152  </P>  </P>
153  <br><a name="SEC6" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>  <br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
154    <P>
155    PCRE defines some modifiers to change the behavior of the regular expression
156    engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
157    pass such modifiers to a RE class. Currently, the following modifiers are
158    supported:
159    <pre>
160       modifier              description               Perl corresponding
161    
162       PCRE_CASELESS         case insensitive match      /i
163       PCRE_MULTILINE        multiple lines match        /m
164       PCRE_DOTALL           dot matches newlines        /s
165       PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
166       PCRE_EXTRA            strict escape parsing       N/A
167       PCRE_EXTENDED         ignore whitespaces          /x
168       PCRE_UTF8             handles UTF8 chars          built-in
169       PCRE_UNGREEDY         reverses * and *?           N/A
170       PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
171    </pre>
172    (*) Both Perl and PCRE allow non capturing parentheses by means of the
173    "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
174    capture, while (ab|cd) does.
175    </P>
176    <P>
177    For a full account on how each modifier works, please check the
178    PCRE API reference page.
179    </P>
180    <P>
181    For each modifier, there are two member functions whose name is made
182    out of the modifier in lowercase, without the "PCRE_" prefix. For
183    instance, PCRE_CASELESS is handled by
184    <pre>
185      bool caseless()
186    </pre>
187    which returns true if the modifier is set, and
188    <pre>
189      RE_Options & set_caseless(bool)
190    </pre>
191    which sets or unsets the modifier. Moreover, PCRE_CONFIG_MATCH_LIMIT can be
192    accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
193    functions. Setting <i>match_limit</i> to a non-zero value will limit the
194    execution of pcre to keep it from doing bad things like blowing the stack or
195    taking an eternity to return a result. A value of 5000 is good enough to stop
196    stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
197    match limiting.
198    </P>
199    <P>
200    Normally, to pass one or more modifiers to a RE class, you declare
201    a <i>RE_Options</i> object, set the appropriate options, and pass this
202    object to a RE constructor. Example:
203    <pre>
204       RE_options opt;
205       opt.set_caseless(true);
206       if (RE("HELLO", opt).PartialMatch("hello world")) ...
207    </pre>
208    RE_options has two constructors. The default constructor takes no arguments and
209    creates a set of flags that are off by default. The optional parameter
210    <i>option_flags</i> is to facilitate transfer of legacy code from C programs.
211    This lets you do
212    <pre>
213       RE(pattern,
214         RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
215    </pre>
216    However, new code is better off doing
217    <pre>
218       RE(pattern,
219         RE_Options().set_caseless(true).set_multiline(true))
220           .PartialMatch(str);
221    </pre>
222    If you are going to pass one of the most used modifiers, there are some
223    convenience functions that return a RE_Options class with the
224    appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
225    <b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
226    </P>
227    <P>
228    If you need to set several options at once, and you don't want to go through
229    the pains of declaring a RE_Options object and setting several options, there
230    is a parallel method that give you such ability on the fly. You can concatenate
231    several <b>set_xxxxx()</b> member functions, since each of them returns a
232    reference to its class object. For example, to pass PCRE_CASELESS,
233    PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
234    <pre>
235       RE(" ^ xyz \\s+ .* blah$",
236         RE_Options()
237           .set_caseless(true)
238           .set_extended(true)
239           .set_multiline(true)).PartialMatch(sometext);
240    
241    </PRE>
242    </P>
243    <br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
244  <P>  <P>
245  The "Consume" operation may be useful if you want to repeatedly  The "Consume" operation may be useful if you want to repeatedly
246  match regular expressions at the front of a string and skip over  match regular expressions at the front of a string and skip over
# Line 181  could extract all words from a string by Line 273  could extract all words from a string by
273    pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)    pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
274  </PRE>  </PRE>
275  </P>  </P>
276  <br><a name="SEC7" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>  <br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
277  <P>  <P>
278  By default, if you pass a pointer to a numeric value, the  By default, if you pass a pointer to a numeric value, the
279  corresponding text is interpreted as a base-10 number. You can  corresponding text is interpreted as a base-10 number. You can
# Line 199  prefixes, but defaults to base-10. Line 291  prefixes, but defaults to base-10.
291  </pre>  </pre>
292  will leave 64 in a, b, c, and d.  will leave 64 in a, b, c, and d.
293  </P>  </P>
294  <br><a name="SEC8" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>  <br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
295  <P>  <P>
296  You can replace the first match of "pattern" in "str" with "rewrite".  You can replace the first match of "pattern" in "str" with "rewrite".
297  Within "rewrite", backslash-escaped digits (\1 to \9) can be  Within "rewrite", backslash-escaped digits (\1 to \9) can be
# Line 231  The non-matching portions of "text" are Line 323  The non-matching portions of "text" are
323  occurred and the extraction happened successfully;  if no match occurs, the  occurred and the extraction happened successfully;  if no match occurs, the
324  string is left unaffected.  string is left unaffected.
325  </P>  </P>
326  <br><a name="SEC9" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
327  <P>  <P>
328  The C++ wrapper was contributed by Google Inc.  The C++ wrapper was contributed by Google Inc.
329  <br>  <br>

Legend:
Removed from v.77  
changed lines
  Added in v.81

  ViewVC Help
Powered by ViewVC 1.1.5