/[pcre]/code/trunk/doc/html/pcrecpp.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrecpp.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 77 by nigel, Sat Feb 24 21:40:45 2007 UTC revision 99 by ph10, Tue Mar 6 12:27:42 2007 UTC
# Line 7  Line 7 
7  <p>  <p>
8  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.
9  </p>  </p>
10  <p>  <p>
11  This page is part of the PCRE HTML documentation. It was generated automatically  This page is part of the PCRE HTML documentation. It was generated automatically
12  from the original man page. If there is any nonsense in it, please consult the  from the original man page. If there is any nonsense in it, please consult the
13  man page, in case the conversion went wrong.  man page, in case the conversion went wrong.
14  <br>  <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>  <li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
17  <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>  <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
18  <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>  <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
19  <li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>  <li><a name="TOC4" href="#SEC4">QUOTING METACHARACTERS</a>
20  <li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>  <li><a name="TOC5" href="#SEC5">PARTIAL MATCHES</a>
21  <li><a name="TOC6" href="#SEC6">SCANNING TEXT INCREMENTALLY</a>  <li><a name="TOC6" href="#SEC6">UTF-8 AND THE MATCHING INTERFACE</a>
22  <li><a name="TOC7" href="#SEC7">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>  <li><a name="TOC7" href="#SEC7">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
23  <li><a name="TOC8" href="#SEC8">REPLACING PARTS OF STRINGS</a>  <li><a name="TOC8" href="#SEC8">SCANNING TEXT INCREMENTALLY</a>
24  <li><a name="TOC9" href="#SEC9">AUTHOR</a>  <li><a name="TOC9" href="#SEC9">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
25    <li><a name="TOC10" href="#SEC10">REPLACING PARTS OF STRINGS</a>
26    <li><a name="TOC11" href="#SEC11">AUTHOR</a>
27    <li><a name="TOC12" href="#SEC12">REVISION</a>
28  </ul>  </ul>
29  <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>  <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
30  <P>  <P>
31  <b>#include &#60;pcrecpp.h&#62;</b>  <b>#include &#60;pcrecpp.h&#62;</b>
32  </P>  </P>
 <P>  
 </P>  
33  <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>  <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
34  <P>  <P>
35  The C++ wrapper for PCRE was provided by Google Inc. This brief man page was  The C++ wrapper for PCRE was provided by Google Inc. Some additional
36  constructed from the notes in the <i>pcrecpp.h</i> file, which should be  functionality was added by Giuseppe Maxia. This brief man page was constructed
37  consulted for further details.  from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
38    further details.
39  </P>  </P>
40  <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>  <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
41  <P>  <P>
# Line 103  The function returns true iff all of the Line 105  The function returns true iff all of the
105       number of sub-patterns, "i"th captured sub-pattern is       number of sub-patterns, "i"th captured sub-pattern is
106       ignored.       ignored.
107  </pre>  </pre>
108    CAVEAT: An optional sub-pattern that does not exist in the matched
109    string is assigned the empty string. Therefore, the following will
110    return false (because the empty string is not a valid number):
111    <pre>
112       int number;
113       pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
114    </pre>
115  The matching interface supports at most 16 arguments per call.  The matching interface supports at most 16 arguments per call.
116  If you need more, consider using the more general interface  If you need more, consider using the more general interface
117  <b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for  <b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
118  <b>DoMatch</b>.  <b>DoMatch</b>.
119  </P>  </P>
120  <br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>  <br><a name="SEC4" href="#TOC1">QUOTING METACHARACTERS</a><br>
121    <P>
122    You can use the "QuoteMeta" operation to insert backslashes before all
123    potentially meaningful characters in a string. The returned string, used as a
124    regular expression, will exactly match the original string.
125    <pre>
126      Example:
127         string quoted = RE::QuoteMeta(unquoted);
128    </pre>
129    Note that it's legal to escape a character even if it has no special meaning in
130    a regular expression -- so this function does that. (This also makes it
131    identical to the perl function of the same name; see "perldoc -f quotemeta".)
132    For example, "1.5-2.0?" becomes "1\.5\-2\.0\?".
133    </P>
134    <br><a name="SEC5" href="#TOC1">PARTIAL MATCHES</a><br>
135  <P>  <P>
136  You can use the "PartialMatch" operation when you want the pattern  You can use the "PartialMatch" operation when you want the pattern
137  to match any substring of the text.  to match any substring of the text.
# Line 123  to match any substring of the text. Line 146  to match any substring of the text.
146       assert(number == 100);       assert(number == 100);
147  </PRE>  </PRE>
148  </P>  </P>
149  <br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>  <br><a name="SEC6" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
150  <P>  <P>
151  By default, pattern and text are plain text, one byte per character. The UTF8  By default, pattern and text are plain text, one byte per character. The UTF8
152  flag, passed to the constructor, causes both pattern and string to be treated  flag, passed to the constructor, causes both pattern and string to be treated
# Line 148  NOTE: The UTF8 flag is ignored if pcre w Line 171  NOTE: The UTF8 flag is ignored if pcre w
171        --enable-utf8 flag.        --enable-utf8 flag.
172  </PRE>  </PRE>
173  </P>  </P>
174  <br><a name="SEC6" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>  <br><a name="SEC7" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
175    <P>
176    PCRE defines some modifiers to change the behavior of the regular expression
177    engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
178    pass such modifiers to a RE class. Currently, the following modifiers are
179    supported:
180    <pre>
181       modifier              description               Perl corresponding
182    
183       PCRE_CASELESS         case insensitive match      /i
184       PCRE_MULTILINE        multiple lines match        /m
185       PCRE_DOTALL           dot matches newlines        /s
186       PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
187       PCRE_EXTRA            strict escape parsing       N/A
188       PCRE_EXTENDED         ignore whitespaces          /x
189       PCRE_UTF8             handles UTF8 chars          built-in
190       PCRE_UNGREEDY         reverses * and *?           N/A
191       PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
192    </pre>
193    (*) Both Perl and PCRE allow non capturing parentheses by means of the
194    "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
195    capture, while (ab|cd) does.
196    </P>
197    <P>
198    For a full account on how each modifier works, please check the
199    PCRE API reference page.
200    </P>
201    <P>
202    For each modifier, there are two member functions whose name is made
203    out of the modifier in lowercase, without the "PCRE_" prefix. For
204    instance, PCRE_CASELESS is handled by
205    <pre>
206      bool caseless()
207    </pre>
208    which returns true if the modifier is set, and
209    <pre>
210      RE_Options & set_caseless(bool)
211    </pre>
212    which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
213    accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
214    functions. Setting <i>match_limit</i> to a non-zero value will limit the
215    execution of pcre to keep it from doing bad things like blowing the stack or
216    taking an eternity to return a result. A value of 5000 is good enough to stop
217    stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
218    match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
219    which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
220    recurses. <b>match_limit()</b> limits the number of matches PCRE does;
221    <b>match_limit_recursion()</b> limits the depth of internal recursion, and
222    therefore the amount of stack that is used.
223    </P>
224    <P>
225    Normally, to pass one or more modifiers to a RE class, you declare
226    a <i>RE_Options</i> object, set the appropriate options, and pass this
227    object to a RE constructor. Example:
228    <pre>
229       RE_options opt;
230       opt.set_caseless(true);
231       if (RE("HELLO", opt).PartialMatch("hello world")) ...
232    </pre>
233    RE_options has two constructors. The default constructor takes no arguments and
234    creates a set of flags that are off by default. The optional parameter
235    <i>option_flags</i> is to facilitate transfer of legacy code from C programs.
236    This lets you do
237    <pre>
238       RE(pattern,
239         RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
240    </pre>
241    However, new code is better off doing
242    <pre>
243       RE(pattern,
244         RE_Options().set_caseless(true).set_multiline(true))
245           .PartialMatch(str);
246    </pre>
247    If you are going to pass one of the most used modifiers, there are some
248    convenience functions that return a RE_Options class with the
249    appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
250    <b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
251    </P>
252    <P>
253    If you need to set several options at once, and you don't want to go through
254    the pains of declaring a RE_Options object and setting several options, there
255    is a parallel method that give you such ability on the fly. You can concatenate
256    several <b>set_xxxxx()</b> member functions, since each of them returns a
257    reference to its class object. For example, to pass PCRE_CASELESS,
258    PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
259    <pre>
260       RE(" ^ xyz \\s+ .* blah$",
261         RE_Options()
262           .set_caseless(true)
263           .set_extended(true)
264           .set_multiline(true)).PartialMatch(sometext);
265    
266    </PRE>
267    </P>
268    <br><a name="SEC8" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
269  <P>  <P>
270  The "Consume" operation may be useful if you want to repeatedly  The "Consume" operation may be useful if you want to repeatedly
271  match regular expressions at the front of a string and skip over  match regular expressions at the front of a string and skip over
# Line 181  could extract all words from a string by Line 298  could extract all words from a string by
298    pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)    pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
299  </PRE>  </PRE>
300  </P>  </P>
301  <br><a name="SEC7" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>  <br><a name="SEC9" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
302  <P>  <P>
303  By default, if you pass a pointer to a numeric value, the  By default, if you pass a pointer to a numeric value, the
304  corresponding text is interpreted as a base-10 number. You can  corresponding text is interpreted as a base-10 number. You can
# Line 199  prefixes, but defaults to base-10. Line 316  prefixes, but defaults to base-10.
316  </pre>  </pre>
317  will leave 64 in a, b, c, and d.  will leave 64 in a, b, c, and d.
318  </P>  </P>
319  <br><a name="SEC8" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>  <br><a name="SEC10" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
320  <P>  <P>
321  You can replace the first match of "pattern" in "str" with "rewrite".  You can replace the first match of "pattern" in "str" with "rewrite".
322  Within "rewrite", backslash-escaped digits (\1 to \9) can be  Within "rewrite", backslash-escaped digits (\1 to \9) can be
# Line 231  The non-matching portions of "text" are Line 348  The non-matching portions of "text" are
348  occurred and the extraction happened successfully;  if no match occurs, the  occurred and the extraction happened successfully;  if no match occurs, the
349  string is left unaffected.  string is left unaffected.
350  </P>  </P>
351  <br><a name="SEC9" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC11" href="#TOC1">AUTHOR</a><br>
352  <P>  <P>
353  The C++ wrapper was contributed by Google Inc.  The C++ wrapper was contributed by Google Inc.
354  <br>  <br>
355  Copyright &copy; 2005 Google Inc.  Copyright &copy; 2006 Google Inc.
356    <br>
357    </P>
358    <br><a name="SEC12" href="#TOC1">REVISION</a><br>
359    <P>
360    Last updated: 06 March 2007
361    <br>
362  <p>  <p>
363  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.
364  </p>  </p>

Legend:
Removed from v.77  
changed lines
  Added in v.99

  ViewVC Help
Powered by ViewVC 1.1.5