1 |
<html>
|
2 |
<head>
|
3 |
<title>pcrecpp specification</title>
|
4 |
</head>
|
5 |
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
6 |
<h1>pcrecpp man page</h1>
|
7 |
<p>
|
8 |
Return to the <a href="index.html">PCRE index page</a>.
|
9 |
</p>
|
10 |
<p>
|
11 |
This page is part of the PCRE HTML documentation. It was generated automatically
|
12 |
from the original man page. If there is any nonsense in it, please consult the
|
13 |
man page, in case the conversion went wrong.
|
14 |
<br>
|
15 |
<ul>
|
16 |
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
|
17 |
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
18 |
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
|
19 |
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
|
20 |
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
|
21 |
<li><a name="TOC6" href="#SEC6">SCANNING TEXT INCREMENTALLY</a>
|
22 |
<li><a name="TOC7" href="#SEC7">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
|
23 |
<li><a name="TOC8" href="#SEC8">REPLACING PARTS OF STRINGS</a>
|
24 |
<li><a name="TOC9" href="#SEC9">AUTHOR</a>
|
25 |
</ul>
|
26 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
|
27 |
<P>
|
28 |
<b>#include <pcrecpp.h></b>
|
29 |
</P>
|
30 |
<P>
|
31 |
</P>
|
32 |
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
33 |
<P>
|
34 |
The C++ wrapper for PCRE was provided by Google Inc. This brief man page was
|
35 |
constructed from the notes in the <i>pcrecpp.h</i> file, which should be
|
36 |
consulted for further details.
|
37 |
</P>
|
38 |
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
|
39 |
<P>
|
40 |
The "FullMatch" operation checks that supplied text matches a supplied pattern
|
41 |
exactly. If pointer arguments are supplied, it copies matched sub-strings that
|
42 |
match sub-patterns into them.
|
43 |
<pre>
|
44 |
Example: successful match
|
45 |
pcrecpp::RE re("h.*o");
|
46 |
re.FullMatch("hello");
|
47 |
|
48 |
Example: unsuccessful match (requires full match):
|
49 |
pcrecpp::RE re("e");
|
50 |
!re.FullMatch("hello");
|
51 |
|
52 |
Example: creating a temporary RE object:
|
53 |
pcrecpp::RE("h.*o").FullMatch("hello");
|
54 |
</pre>
|
55 |
You can pass in a "const char*" or a "string" for "text". The examples below
|
56 |
tend to use a const char*. You can, as in the different examples above, store
|
57 |
the RE object explicitly in a variable or use a temporary RE object. The
|
58 |
examples below use one mode or the other arbitrarily. Either could correctly be
|
59 |
used for any of these examples.
|
60 |
</P>
|
61 |
<P>
|
62 |
You must supply extra pointer arguments to extract matched subpieces.
|
63 |
<pre>
|
64 |
Example: extracts "ruby" into "s" and 1234 into "i"
|
65 |
int i;
|
66 |
string s;
|
67 |
pcrecpp::RE re("(\\w+):(\\d+)");
|
68 |
re.FullMatch("ruby:1234", &s, &i);
|
69 |
|
70 |
Example: does not try to extract any extra sub-patterns
|
71 |
re.FullMatch("ruby:1234", &s);
|
72 |
|
73 |
Example: does not try to extract into NULL
|
74 |
re.FullMatch("ruby:1234", NULL, &i);
|
75 |
|
76 |
Example: integer overflow causes failure
|
77 |
!re.FullMatch("ruby:1234567891234", NULL, &i);
|
78 |
|
79 |
Example: fails because there aren't enough sub-patterns:
|
80 |
!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
|
81 |
|
82 |
Example: fails because string cannot be stored in integer
|
83 |
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
|
84 |
</pre>
|
85 |
The provided pointer arguments can be pointers to any scalar numeric
|
86 |
type, or one of:
|
87 |
<pre>
|
88 |
string (matched piece is copied to string)
|
89 |
StringPiece (StringPiece is mutated to point to matched piece)
|
90 |
T (where "bool T::ParseFrom(const char*, int)" exists)
|
91 |
NULL (the corresponding matched sub-pattern is not copied)
|
92 |
</pre>
|
93 |
The function returns true iff all of the following conditions are satisfied:
|
94 |
<pre>
|
95 |
a. "text" matches "pattern" exactly;
|
96 |
|
97 |
b. The number of matched sub-patterns is >= number of supplied
|
98 |
pointers;
|
99 |
|
100 |
c. The "i"th argument has a suitable type for holding the
|
101 |
string captured as the "i"th sub-pattern. If you pass in
|
102 |
NULL for the "i"th argument, or pass fewer arguments than
|
103 |
number of sub-patterns, "i"th captured sub-pattern is
|
104 |
ignored.
|
105 |
</pre>
|
106 |
The matching interface supports at most 16 arguments per call.
|
107 |
If you need more, consider using the more general interface
|
108 |
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
|
109 |
<b>DoMatch</b>.
|
110 |
</P>
|
111 |
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>
|
112 |
<P>
|
113 |
You can use the "PartialMatch" operation when you want the pattern
|
114 |
to match any substring of the text.
|
115 |
<pre>
|
116 |
Example: simple search for a string:
|
117 |
pcrecpp::RE("ell").PartialMatch("hello");
|
118 |
|
119 |
Example: find first number in a string:
|
120 |
int number;
|
121 |
pcrecpp::RE re("(\\d+)");
|
122 |
re.PartialMatch("x*100 + 20", &number);
|
123 |
assert(number == 100);
|
124 |
</PRE>
|
125 |
</P>
|
126 |
<br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
|
127 |
<P>
|
128 |
By default, pattern and text are plain text, one byte per character. The UTF8
|
129 |
flag, passed to the constructor, causes both pattern and string to be treated
|
130 |
as UTF-8 text, still a byte stream but potentially multiple bytes per
|
131 |
character. In practice, the text is likelier to be UTF-8 than the pattern, but
|
132 |
the match returned may depend on the UTF8 flag, so always use it when matching
|
133 |
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
|
134 |
match up to three bytes of a multi-byte character.
|
135 |
<pre>
|
136 |
Example:
|
137 |
pcrecpp::RE_Options options;
|
138 |
options.set_utf8();
|
139 |
pcrecpp::RE re(utf8_pattern, options);
|
140 |
re.FullMatch(utf8_string);
|
141 |
|
142 |
Example: using the convenience function UTF8():
|
143 |
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
|
144 |
re.FullMatch(utf8_string);
|
145 |
</pre>
|
146 |
NOTE: The UTF8 flag is ignored if pcre was not configured with the
|
147 |
<pre>
|
148 |
--enable-utf8 flag.
|
149 |
</PRE>
|
150 |
</P>
|
151 |
<br><a name="SEC6" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
|
152 |
<P>
|
153 |
The "Consume" operation may be useful if you want to repeatedly
|
154 |
match regular expressions at the front of a string and skip over
|
155 |
them as they match. This requires use of the "StringPiece" type,
|
156 |
which represents a sub-range of a real string. Like RE, StringPiece
|
157 |
is defined in the pcrecpp namespace.
|
158 |
<pre>
|
159 |
Example: read lines of the form "var = value" from a string.
|
160 |
string contents = ...; // Fill string somehow
|
161 |
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
|
162 |
</PRE>
|
163 |
</P>
|
164 |
<P>
|
165 |
<pre>
|
166 |
string var;
|
167 |
int value;
|
168 |
pcrecpp::RE re("(\\w+) = (\\d+)\n");
|
169 |
while (re.Consume(&input, &var, &value)) {
|
170 |
...;
|
171 |
}
|
172 |
</pre>
|
173 |
Each successful call to "Consume" will set "var/value", and also
|
174 |
advance "input" so it points past the matched text.
|
175 |
</P>
|
176 |
<P>
|
177 |
The "FindAndConsume" operation is similar to "Consume" but does not
|
178 |
anchor your match at the beginning of the string. For example, you
|
179 |
could extract all words from a string by repeatedly calling
|
180 |
<pre>
|
181 |
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
|
182 |
</PRE>
|
183 |
</P>
|
184 |
<br><a name="SEC7" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
|
185 |
<P>
|
186 |
By default, if you pass a pointer to a numeric value, the
|
187 |
corresponding text is interpreted as a base-10 number. You can
|
188 |
instead wrap the pointer with a call to one of the operators Hex(),
|
189 |
Octal(), or CRadix() to interpret the text in another base. The
|
190 |
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
|
191 |
prefixes, but defaults to base-10.
|
192 |
<pre>
|
193 |
Example:
|
194 |
int a, b, c, d;
|
195 |
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
|
196 |
re.FullMatch("100 40 0100 0x40",
|
197 |
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
|
198 |
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
|
199 |
</pre>
|
200 |
will leave 64 in a, b, c, and d.
|
201 |
</P>
|
202 |
<br><a name="SEC8" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
|
203 |
<P>
|
204 |
You can replace the first match of "pattern" in "str" with "rewrite".
|
205 |
Within "rewrite", backslash-escaped digits (\1 to \9) can be
|
206 |
used to insert text matching corresponding parenthesized group
|
207 |
from the pattern. \0 in "rewrite" refers to the entire matching
|
208 |
text. For example:
|
209 |
<pre>
|
210 |
string s = "yabba dabba doo";
|
211 |
pcrecpp::RE("b+").Replace("d", &s);
|
212 |
</pre>
|
213 |
will leave "s" containing "yada dabba doo". The result is true if the pattern
|
214 |
matches and a replacement occurs, false otherwise.
|
215 |
</P>
|
216 |
<P>
|
217 |
<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
|
218 |
occurrences of the pattern in the string with the rewrite. Replacements are
|
219 |
not subject to re-matching. For example:
|
220 |
<pre>
|
221 |
string s = "yabba dabba doo";
|
222 |
pcrecpp::RE("b+").GlobalReplace("d", &s);
|
223 |
</pre>
|
224 |
will leave "s" containing "yada dada doo". It returns the number of
|
225 |
replacements made.
|
226 |
</P>
|
227 |
<P>
|
228 |
<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
|
229 |
"rewrite" is copied into "out" (an additional argument) with substitutions.
|
230 |
The non-matching portions of "text" are ignored. Returns true iff a match
|
231 |
occurred and the extraction happened successfully; if no match occurs, the
|
232 |
string is left unaffected.
|
233 |
</P>
|
234 |
<br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
|
235 |
<P>
|
236 |
The C++ wrapper was contributed by Google Inc.
|
237 |
<br>
|
238 |
Copyright © 2005 Google Inc.
|
239 |
<p>
|
240 |
Return to the <a href="index.html">PCRE index page</a>.
|
241 |
</p>
|