/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 211 - (show annotations)
Thu Aug 9 09:52:43 2007 UTC (12 years ago) by ph10
File MIME type: text/html
File size: 12080 byte(s)
Update UTF-8 validity check and documentation.
1 <html>
2 <head>
3 <title>pcresyntax specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcresyntax man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17 <li><a name="TOC2" href="#SEC2">QUOTING</a>
18 <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
21 <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
22 <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
23 <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
24 <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
25 <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
26 <li><a name="TOC11" href="#SEC11">ALTERNATION</a>
27 <li><a name="TOC12" href="#SEC12">CAPTURING</a>
28 <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
29 <li><a name="TOC14" href="#SEC14">COMMENT</a>
30 <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
31 <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
32 <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
33 <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
34 <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
35 <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
36 <li><a name="TOC21" href="#SEC21">CALLOUTS</a>
37 <li><a name="TOC22" href="#SEC22">SEE ALSO</a>
38 <li><a name="TOC23" href="#SEC23">AUTHOR</a>
39 <li><a name="TOC24" href="#SEC24">REVISION</a>
40 </ul>
41 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
42 <P>
43 The full syntax and semantics of the regular expressions that are supported by
44 PCRE are described in the
45 <a href="pcrepattern.html"><b>pcrepattern</b></a>
46 documentation. This document contains just a quick-reference summary of the
47 syntax.
48 </P>
49 <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
50 <P>
51 <pre>
52 \x where x is non-alphanumeric is a literal x
53 \Q...\E treat enclosed characters as literal
54 </PRE>
55 </P>
56 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
57 <P>
58 <pre>
59 \a alarm, that is, the BEL character (hex 07)
60 \cx "control-x", where x is any character
61 \e escape (hex 1B)
62 \f formfeed (hex 0C)
63 \n newline (hex 0A)
64 \r carriage return (hex 0D)
65 \t tab (hex 09)
66 \ddd character with octal code ddd, or backreference
67 \xhh character with hex code hh
68 \x{hhh..} character with hex code hhh..
69 </PRE>
70 </P>
71 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
72 <P>
73 <pre>
74 . any character except newline;
75 in dotall mode, any character whatsoever
76 \C one byte, even in UTF-8 mode (best avoided)
77 \d a decimal digit
78 \D a character that is not a decimal digit
79 \h a horizontal whitespace character
80 \H a character that is not a horizontal whitespace character
81 \p{<i>xx</i>} a character with the <i>xx</i> property
82 \P{<i>xx</i>} a character without the <i>xx</i> property
83 \R a newline sequence
84 \s a whitespace character
85 \S a character that is not a whitespace character
86 \v a vertical whitespace character
87 \V a character that is not a vertical whitespace character
88 \w a "word" character
89 \W a "non-word" character
90 \X an extended Unicode sequence
91 </pre>
92 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
93 </P>
94 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
95 <P>
96 <pre>
97 C Other
98 Cc Control
99 Cf Format
100 Cn Unassigned
101 Co Private use
102 Cs Surrogate
103
104 L Letter
105 Ll Lower case letter
106 Lm Modifier letter
107 Lo Other letter
108 Lt Title case letter
109 Lu Upper case letter
110 L& Ll, Lu, or Lt
111
112 M Mark
113 Mc Spacing mark
114 Me Enclosing mark
115 Mn Non-spacing mark
116
117 N Number
118 Nd Decimal number
119 Nl Letter number
120 No Other number
121
122 P Punctuation
123 Pc Connector punctuation
124 Pd Dash punctuation
125 Pe Close punctuation
126 Pf Final punctuation
127 Pi Initial punctuation
128 Po Other punctuation
129 Ps Open punctuation
130
131 S Symbol
132 Sc Currency symbol
133 Sk Modifier symbol
134 Sm Mathematical symbol
135 So Other symbol
136
137 Z Separator
138 Zl Line separator
139 Zp Paragraph separator
140 Zs Space separator
141 </PRE>
142 </P>
143 <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
144 <P>
145 Arabic,
146 Armenian,
147 Balinese,
148 Bengali,
149 Bopomofo,
150 Braille,
151 Buginese,
152 Buhid,
153 Canadian_Aboriginal,
154 Cherokee,
155 Common,
156 Coptic,
157 Cuneiform,
158 Cypriot,
159 Cyrillic,
160 Deseret,
161 Devanagari,
162 Ethiopic,
163 Georgian,
164 Glagolitic,
165 Gothic,
166 Greek,
167 Gujarati,
168 Gurmukhi,
169 Han,
170 Hangul,
171 Hanunoo,
172 Hebrew,
173 Hiragana,
174 Inherited,
175 Kannada,
176 Katakana,
177 Kharoshthi,
178 Khmer,
179 Lao,
180 Latin,
181 Limbu,
182 Linear_B,
183 Malayalam,
184 Mongolian,
185 Myanmar,
186 New_Tai_Lue,
187 Nko,
188 Ogham,
189 Old_Italic,
190 Old_Persian,
191 Oriya,
192 Osmanya,
193 Phags_Pa,
194 Phoenician,
195 Runic,
196 Shavian,
197 Sinhala,
198 Syloti_Nagri,
199 Syriac,
200 Tagalog,
201 Tagbanwa,
202 Tai_Le,
203 Tamil,
204 Telugu,
205 Thaana,
206 Thai,
207 Tibetan,
208 Tifinagh,
209 Ugaritic,
210 Yi.
211 </P>
212 <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
213 <P>
214 <pre>
215 [...] positive character class
216 [^...] negative character class
217 [x-y] range (can be used for hex characters)
218 [[:xxx:]] positive POSIX named set
219 [[^:xxx:]] negative POSIX named set
220
221 alnum alphanumeric
222 alpha alphabetic
223 ascii 0-127
224 blank space or tab
225 cntrl control character
226 digit decimal digit
227 graph printing, excluding space
228 lower lower case letter
229 print printing, including space
230 punct printing, excluding alphanumeric
231 space whitespace
232 upper upper case letter
233 word same as \w
234 xdigit hexadecimal digit
235 </pre>
236 In PCRE, POSIX character set names recognize only ASCII characters. You can use
237 \Q...\E inside a character class.
238 </P>
239 <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
240 <P>
241 <pre>
242 ? 0 or 1, greedy
243 ?+ 0 or 1, possessive
244 ?? 0 or 1, lazy
245 * 0 or more, greedy
246 *+ 0 or more, possessive
247 *? 0 or more, lazy
248 + 1 or more, greedy
249 ++ 1 or more, possessive
250 +? 1 or more, lazy
251 {n} exactly n
252 {n,m} at least n, no more than m, greedy
253 {n,m}+ at least n, no more than m, possessive
254 {n,m}? at least n, no more than m, lazy
255 {n,} n or more, greedy
256 {n,}+ n or more, possessive
257 {n,}? n or more, lazy
258 </PRE>
259 </P>
260 <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
261 <P>
262 <pre>
263 \b word boundary
264 \B not a word boundary
265 ^ start of subject
266 also after internal newline in multiline mode
267 \A start of subject
268 $ end of subject
269 also before newline at end of subject
270 also before internal newline in multiline mode
271 \Z end of subject
272 also before newline at end of subject
273 \z end of subject
274 \G first matching position in subject
275 </PRE>
276 </P>
277 <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
278 <P>
279 <pre>
280 \K reset start of match
281 </PRE>
282 </P>
283 <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
284 <P>
285 <pre>
286 expr|expr|expr...
287 </PRE>
288 </P>
289 <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
290 <P>
291 <pre>
292 (...) capturing group
293 (?&#60;name&#62;...) named capturing group (Perl)
294 (?'name'...) named capturing group (Perl)
295 (?P&#60;name&#62;...) named capturing group (Python)
296 (?:...) non-capturing group
297 (?|...) non-capturing group; reset group numbers for
298 capturing groups in each alternative
299 </PRE>
300 </P>
301 <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
302 <P>
303 <pre>
304 (?&#62;...) atomic, non-capturing group
305 </PRE>
306 </P>
307 <br><a name="SEC14" href="#TOC1">COMMENT</a><br>
308 <P>
309 <pre>
310 (?#....) comment (not nestable)
311 </PRE>
312 </P>
313 <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
314 <P>
315 <pre>
316 (?i) caseless
317 (?J) allow duplicate names
318 (?m) multiline
319 (?s) single line (dotall)
320 (?U) default ungreedy (lazy)
321 (?x) extended (ignore white space)
322 (?-...) unset option(s)
323 </PRE>
324 </P>
325 <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
326 <P>
327 <pre>
328 (?=...) positive look ahead
329 (?!...) negative look ahead
330 (?&#60;=...) positive look behind
331 (?&#60;!...) negative look behind
332 </pre>
333 Each top-level branch of a look behind must be of a fixed length.
334 </P>
335 <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
336 <P>
337 <pre>
338 \n reference by number (can be ambiguous)
339 \gn reference by number
340 \g{n} reference by number
341 \g{-n} relative reference by number
342 \k&#60;name&#62; reference by name (Perl)
343 \k'name' reference by name (Perl)
344 \g{name} reference by name (Perl)
345 \k{name} reference by name (.NET)
346 (?P=name) reference by name (Python)
347 </PRE>
348 </P>
349 <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
350 <P>
351 <pre>
352 (?R) recurse whole pattern
353 (?n) call subpattern by absolute number
354 (?+n) call subpattern by relative number
355 (?-n) call subpattern by relative number
356 (?&name) call subpattern by name (Perl)
357 (?P&#62;name) call subpattern by name (Python)
358 </PRE>
359 </P>
360 <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
361 <P>
362 <pre>
363 (?(condition)yes-pattern)
364 (?(condition)yes-pattern|no-pattern)
365
366 (?(n)... absolute reference condition
367 (?(+n)... relative reference condition
368 (?(-n)... relative reference condition
369 (?(&#60;name&#62;)... named reference condition (Perl)
370 (?('name')... named reference condition (Perl)
371 (?(name)... named reference condition (PCRE)
372 (?(R)... overall recursion condition
373 (?(Rn)... specific group recursion condition
374 (?(R&name)... specific recursion condition
375 (?(DEFINE)... define subpattern for reference
376 (?(assert)... assertion condition
377 </PRE>
378 </P>
379 <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
380 <P>
381 The following act immediately they are reached:
382 <pre>
383 (*ACCEPT) force successful match
384 (*FAIL) force backtrack; synonym (*F)
385 </pre>
386 The following act only when a subsequent match failure causes a backtrack to
387 reach them. They all force a match failure, but they differ in what happens
388 afterwards. Those that advance the start-of-match point do so only if the
389 pattern is not anchored.
390 <pre>
391 (*COMMIT) overall failure, no advance of starting point
392 (*PRUNE) advance to next starting character
393 (*SKIP) advance start to current matching position
394 (*THEN) local failure, backtrack to next alternation
395 </PRE>
396 </P>
397 <br><a name="SEC21" href="#TOC1">CALLOUTS</a><br>
398 <P>
399 <pre>
400 (?C) callout
401 (?Cn) callout with data n
402 </PRE>
403 </P>
404 <br><a name="SEC22" href="#TOC1">SEE ALSO</a><br>
405 <P>
406 <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
407 <b>pcrematching</b>(3), <b>pcre</b>(3).
408 </P>
409 <br><a name="SEC23" href="#TOC1">AUTHOR</a><br>
410 <P>
411 Philip Hazel
412 <br>
413 University Computing Service
414 <br>
415 Cambridge CB2 3QH, England.
416 <br>
417 </P>
418 <br><a name="SEC24" href="#TOC1">REVISION</a><br>
419 <P>
420 Last updated: 08 August 2007
421 <br>
422 Copyright &copy; 1997-2007 University of Cambridge.
423 <br>
424 <p>
425 Return to the <a href="index.html">PCRE index page</a>.
426 </p>

  ViewVC Help
Powered by ViewVC 1.1.5