ViewVC logotype

Contents of /code/trunk/doc/html/pcrecpp.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 77 - (show annotations)
Sat Feb 24 21:40:45 2007 UTC (14 years, 1 month ago) by nigel
File MIME type: text/html
File size: 8797 byte(s)
Load pcre-6.0 into code/trunk.
1 <html>
2 <head>
3 <title>pcrecpp specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcrecpp man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
17 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
18 <li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
19 <li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
20 <li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
21 <li><a name="TOC6" href="#SEC6">SCANNING TEXT INCREMENTALLY</a>
22 <li><a name="TOC7" href="#SEC7">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
23 <li><a name="TOC8" href="#SEC8">REPLACING PARTS OF STRINGS</a>
24 <li><a name="TOC9" href="#SEC9">AUTHOR</a>
25 </ul>
26 <br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
27 <P>
28 <b>#include &#60;pcrecpp.h&#62;</b>
29 </P>
30 <P>
31 </P>
32 <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
33 <P>
34 The C++ wrapper for PCRE was provided by Google Inc. This brief man page was
35 constructed from the notes in the <i>pcrecpp.h</i> file, which should be
36 consulted for further details.
37 </P>
38 <br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
39 <P>
40 The "FullMatch" operation checks that supplied text matches a supplied pattern
41 exactly. If pointer arguments are supplied, it copies matched sub-strings that
42 match sub-patterns into them.
43 <pre>
44 Example: successful match
45 pcrecpp::RE re("h.*o");
46 re.FullMatch("hello");
48 Example: unsuccessful match (requires full match):
49 pcrecpp::RE re("e");
50 !re.FullMatch("hello");
52 Example: creating a temporary RE object:
53 pcrecpp::RE("h.*o").FullMatch("hello");
54 </pre>
55 You can pass in a "const char*" or a "string" for "text". The examples below
56 tend to use a const char*. You can, as in the different examples above, store
57 the RE object explicitly in a variable or use a temporary RE object. The
58 examples below use one mode or the other arbitrarily. Either could correctly be
59 used for any of these examples.
60 </P>
61 <P>
62 You must supply extra pointer arguments to extract matched subpieces.
63 <pre>
64 Example: extracts "ruby" into "s" and 1234 into "i"
65 int i;
66 string s;
67 pcrecpp::RE re("(\\w+):(\\d+)");
68 re.FullMatch("ruby:1234", &s, &i);
70 Example: does not try to extract any extra sub-patterns
71 re.FullMatch("ruby:1234", &s);
73 Example: does not try to extract into NULL
74 re.FullMatch("ruby:1234", NULL, &i);
76 Example: integer overflow causes failure
77 !re.FullMatch("ruby:1234567891234", NULL, &i);
79 Example: fails because there aren't enough sub-patterns:
80 !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
82 Example: fails because string cannot be stored in integer
83 !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
84 </pre>
85 The provided pointer arguments can be pointers to any scalar numeric
86 type, or one of:
87 <pre>
88 string (matched piece is copied to string)
89 StringPiece (StringPiece is mutated to point to matched piece)
90 T (where "bool T::ParseFrom(const char*, int)" exists)
91 NULL (the corresponding matched sub-pattern is not copied)
92 </pre>
93 The function returns true iff all of the following conditions are satisfied:
94 <pre>
95 a. "text" matches "pattern" exactly;
97 b. The number of matched sub-patterns is &#62;= number of supplied
98 pointers;
100 c. The "i"th argument has a suitable type for holding the
101 string captured as the "i"th sub-pattern. If you pass in
102 NULL for the "i"th argument, or pass fewer arguments than
103 number of sub-patterns, "i"th captured sub-pattern is
104 ignored.
105 </pre>
106 The matching interface supports at most 16 arguments per call.
107 If you need more, consider using the more general interface
108 <b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
109 <b>DoMatch</b>.
110 </P>
111 <br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>
112 <P>
113 You can use the "PartialMatch" operation when you want the pattern
114 to match any substring of the text.
115 <pre>
116 Example: simple search for a string:
117 pcrecpp::RE("ell").PartialMatch("hello");
119 Example: find first number in a string:
120 int number;
121 pcrecpp::RE re("(\\d+)");
122 re.PartialMatch("x*100 + 20", &number);
123 assert(number == 100);
124 </PRE>
125 </P>
126 <br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
127 <P>
128 By default, pattern and text are plain text, one byte per character. The UTF8
129 flag, passed to the constructor, causes both pattern and string to be treated
130 as UTF-8 text, still a byte stream but potentially multiple bytes per
131 character. In practice, the text is likelier to be UTF-8 than the pattern, but
132 the match returned may depend on the UTF8 flag, so always use it when matching
133 UTF8 text. For example, "." will match one byte normally but with UTF8 set may
134 match up to three bytes of a multi-byte character.
135 <pre>
136 Example:
137 pcrecpp::RE_Options options;
138 options.set_utf8();
139 pcrecpp::RE re(utf8_pattern, options);
140 re.FullMatch(utf8_string);
142 Example: using the convenience function UTF8():
143 pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
144 re.FullMatch(utf8_string);
145 </pre>
146 NOTE: The UTF8 flag is ignored if pcre was not configured with the
147 <pre>
148 --enable-utf8 flag.
149 </PRE>
150 </P>
151 <br><a name="SEC6" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
152 <P>
153 The "Consume" operation may be useful if you want to repeatedly
154 match regular expressions at the front of a string and skip over
155 them as they match. This requires use of the "StringPiece" type,
156 which represents a sub-range of a real string. Like RE, StringPiece
157 is defined in the pcrecpp namespace.
158 <pre>
159 Example: read lines of the form "var = value" from a string.
160 string contents = ...; // Fill string somehow
161 pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
162 </PRE>
163 </P>
164 <P>
165 <pre>
166 string var;
167 int value;
168 pcrecpp::RE re("(\\w+) = (\\d+)\n");
169 while (re.Consume(&input, &var, &value)) {
170 ...;
171 }
172 </pre>
173 Each successful call to "Consume" will set "var/value", and also
174 advance "input" so it points past the matched text.
175 </P>
176 <P>
177 The "FindAndConsume" operation is similar to "Consume" but does not
178 anchor your match at the beginning of the string. For example, you
179 could extract all words from a string by repeatedly calling
180 <pre>
181 pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
182 </PRE>
183 </P>
184 <br><a name="SEC7" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
185 <P>
186 By default, if you pass a pointer to a numeric value, the
187 corresponding text is interpreted as a base-10 number. You can
188 instead wrap the pointer with a call to one of the operators Hex(),
189 Octal(), or CRadix() to interpret the text in another base. The
190 CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
191 prefixes, but defaults to base-10.
192 <pre>
193 Example:
194 int a, b, c, d;
195 pcrecpp::RE re("(.*) (.*) (.*) (.*)");
196 re.FullMatch("100 40 0100 0x40",
197 pcrecpp::Octal(&a), pcrecpp::Hex(&b),
198 pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
199 </pre>
200 will leave 64 in a, b, c, and d.
201 </P>
202 <br><a name="SEC8" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
203 <P>
204 You can replace the first match of "pattern" in "str" with "rewrite".
205 Within "rewrite", backslash-escaped digits (\1 to \9) can be
206 used to insert text matching corresponding parenthesized group
207 from the pattern. \0 in "rewrite" refers to the entire matching
208 text. For example:
209 <pre>
210 string s = "yabba dabba doo";
211 pcrecpp::RE("b+").Replace("d", &s);
212 </pre>
213 will leave "s" containing "yada dabba doo". The result is true if the pattern
214 matches and a replacement occurs, false otherwise.
215 </P>
216 <P>
217 <b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
218 occurrences of the pattern in the string with the rewrite. Replacements are
219 not subject to re-matching. For example:
220 <pre>
221 string s = "yabba dabba doo";
222 pcrecpp::RE("b+").GlobalReplace("d", &s);
223 </pre>
224 will leave "s" containing "yada dada doo". It returns the number of
225 replacements made.
226 </P>
227 <P>
228 <b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
229 "rewrite" is copied into "out" (an additional argument) with substitutions.
230 The non-matching portions of "text" are ignored. Returns true iff a match
231 occurred and the extraction happened successfully; if no match occurs, the
232 string is left unaffected.
233 </P>
234 <br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
235 <P>
236 The C++ wrapper was contributed by Google Inc.
237 <br>
238 Copyright &copy; 2005 Google Inc.
239 <p>
240 Return to the <a href="index.html">PCRE index page</a>.
241 </p>

  ViewVC Help
Powered by ViewVC 1.1.5