/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 507 - (show annotations)
Wed Mar 10 16:08:01 2010 UTC (9 years, 5 months ago) by ph10
File MIME type: text/html
File size: 13981 byte(s)
Tidies for 8.02-RC1 release.
1 <html>
2 <head>
3 <title>pcresyntax specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcresyntax man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17 <li><a name="TOC2" href="#SEC2">QUOTING</a>
18 <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
21 <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
22 <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
23 <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
24 <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
25 <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
26 <li><a name="TOC11" href="#SEC11">ALTERNATION</a>
27 <li><a name="TOC12" href="#SEC12">CAPTURING</a>
28 <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
29 <li><a name="TOC14" href="#SEC14">COMMENT</a>
30 <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
31 <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
32 <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
33 <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
34 <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
35 <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
36 <li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
37 <li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>
38 <li><a name="TOC23" href="#SEC23">CALLOUTS</a>
39 <li><a name="TOC24" href="#SEC24">SEE ALSO</a>
40 <li><a name="TOC25" href="#SEC25">AUTHOR</a>
41 <li><a name="TOC26" href="#SEC26">REVISION</a>
42 </ul>
43 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
44 <P>
45 The full syntax and semantics of the regular expressions that are supported by
46 PCRE are described in the
47 <a href="pcrepattern.html"><b>pcrepattern</b></a>
48 documentation. This document contains just a quick-reference summary of the
49 syntax.
50 </P>
51 <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
52 <P>
53 <pre>
54 \x where x is non-alphanumeric is a literal x
55 \Q...\E treat enclosed characters as literal
56 </PRE>
57 </P>
58 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
59 <P>
60 <pre>
61 \a alarm, that is, the BEL character (hex 07)
62 \cx "control-x", where x is any character
63 \e escape (hex 1B)
64 \f formfeed (hex 0C)
65 \n newline (hex 0A)
66 \r carriage return (hex 0D)
67 \t tab (hex 09)
68 \ddd character with octal code ddd, or backreference
69 \xhh character with hex code hh
70 \x{hhh..} character with hex code hhh..
71 </PRE>
72 </P>
73 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
74 <P>
75 <pre>
76 . any character except newline;
77 in dotall mode, any character whatsoever
78 \C one byte, even in UTF-8 mode (best avoided)
79 \d a decimal digit
80 \D a character that is not a decimal digit
81 \h a horizontal whitespace character
82 \H a character that is not a horizontal whitespace character
83 \p{<i>xx</i>} a character with the <i>xx</i> property
84 \P{<i>xx</i>} a character without the <i>xx</i> property
85 \R a newline sequence
86 \s a whitespace character
87 \S a character that is not a whitespace character
88 \v a vertical whitespace character
89 \V a character that is not a vertical whitespace character
90 \w a "word" character
91 \W a "non-word" character
92 \X an extended Unicode sequence
93 </pre>
94 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
95 </P>
96 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
97 <P>
98 <pre>
99 C Other
100 Cc Control
101 Cf Format
102 Cn Unassigned
103 Co Private use
104 Cs Surrogate
105
106 L Letter
107 Ll Lower case letter
108 Lm Modifier letter
109 Lo Other letter
110 Lt Title case letter
111 Lu Upper case letter
112 L& Ll, Lu, or Lt
113
114 M Mark
115 Mc Spacing mark
116 Me Enclosing mark
117 Mn Non-spacing mark
118
119 N Number
120 Nd Decimal number
121 Nl Letter number
122 No Other number
123
124 P Punctuation
125 Pc Connector punctuation
126 Pd Dash punctuation
127 Pe Close punctuation
128 Pf Final punctuation
129 Pi Initial punctuation
130 Po Other punctuation
131 Ps Open punctuation
132
133 S Symbol
134 Sc Currency symbol
135 Sk Modifier symbol
136 Sm Mathematical symbol
137 So Other symbol
138
139 Z Separator
140 Zl Line separator
141 Zp Paragraph separator
142 Zs Space separator
143 </PRE>
144 </P>
145 <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
146 <P>
147 Arabic,
148 Armenian,
149 Avestan,
150 Balinese,
151 Bamum,
152 Bengali,
153 Bopomofo,
154 Braille,
155 Buginese,
156 Buhid,
157 Canadian_Aboriginal,
158 Carian,
159 Cham,
160 Cherokee,
161 Common,
162 Coptic,
163 Cuneiform,
164 Cypriot,
165 Cyrillic,
166 Deseret,
167 Devanagari,
168 Egyptian_Hieroglyphs,
169 Ethiopic,
170 Georgian,
171 Glagolitic,
172 Gothic,
173 Greek,
174 Gujarati,
175 Gurmukhi,
176 Han,
177 Hangul,
178 Hanunoo,
179 Hebrew,
180 Hiragana,
181 Imperial_Aramaic,
182 Inherited,
183 Inscriptional_Pahlavi,
184 Inscriptional_Parthian,
185 Javanese,
186 Kaithi,
187 Kannada,
188 Katakana,
189 Kayah_Li,
190 Kharoshthi,
191 Khmer,
192 Lao,
193 Latin,
194 Lepcha,
195 Limbu,
196 Linear_B,
197 Lisu,
198 Lycian,
199 Lydian,
200 Malayalam,
201 Meetei_Mayek,
202 Mongolian,
203 Myanmar,
204 New_Tai_Lue,
205 Nko,
206 Ogham,
207 Old_Italic,
208 Old_Persian,
209 Old_South_Arabian,
210 Old_Turkic,
211 Ol_Chiki,
212 Oriya,
213 Osmanya,
214 Phags_Pa,
215 Phoenician,
216 Rejang,
217 Runic,
218 Samaritan,
219 Saurashtra,
220 Shavian,
221 Sinhala,
222 Sundanese,
223 Syloti_Nagri,
224 Syriac,
225 Tagalog,
226 Tagbanwa,
227 Tai_Le,
228 Tai_Tham,
229 Tai_Viet,
230 Tamil,
231 Telugu,
232 Thaana,
233 Thai,
234 Tibetan,
235 Tifinagh,
236 Ugaritic,
237 Vai,
238 Yi.
239 </P>
240 <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
241 <P>
242 <pre>
243 [...] positive character class
244 [^...] negative character class
245 [x-y] range (can be used for hex characters)
246 [[:xxx:]] positive POSIX named set
247 [[:^xxx:]] negative POSIX named set
248
249 alnum alphanumeric
250 alpha alphabetic
251 ascii 0-127
252 blank space or tab
253 cntrl control character
254 digit decimal digit
255 graph printing, excluding space
256 lower lower case letter
257 print printing, including space
258 punct printing, excluding alphanumeric
259 space whitespace
260 upper upper case letter
261 word same as \w
262 xdigit hexadecimal digit
263 </pre>
264 In PCRE, POSIX character set names recognize only ASCII characters. You can use
265 \Q...\E inside a character class.
266 </P>
267 <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
268 <P>
269 <pre>
270 ? 0 or 1, greedy
271 ?+ 0 or 1, possessive
272 ?? 0 or 1, lazy
273 * 0 or more, greedy
274 *+ 0 or more, possessive
275 *? 0 or more, lazy
276 + 1 or more, greedy
277 ++ 1 or more, possessive
278 +? 1 or more, lazy
279 {n} exactly n
280 {n,m} at least n, no more than m, greedy
281 {n,m}+ at least n, no more than m, possessive
282 {n,m}? at least n, no more than m, lazy
283 {n,} n or more, greedy
284 {n,}+ n or more, possessive
285 {n,}? n or more, lazy
286 </PRE>
287 </P>
288 <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
289 <P>
290 <pre>
291 \b word boundary (only ASCII letters recognized)
292 \B not a word boundary
293 ^ start of subject
294 also after internal newline in multiline mode
295 \A start of subject
296 $ end of subject
297 also before newline at end of subject
298 also before internal newline in multiline mode
299 \Z end of subject
300 also before newline at end of subject
301 \z end of subject
302 \G first matching position in subject
303 </PRE>
304 </P>
305 <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
306 <P>
307 <pre>
308 \K reset start of match
309 </PRE>
310 </P>
311 <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
312 <P>
313 <pre>
314 expr|expr|expr...
315 </PRE>
316 </P>
317 <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
318 <P>
319 <pre>
320 (...) capturing group
321 (?&#60;name&#62;...) named capturing group (Perl)
322 (?'name'...) named capturing group (Perl)
323 (?P&#60;name&#62;...) named capturing group (Python)
324 (?:...) non-capturing group
325 (?|...) non-capturing group; reset group numbers for
326 capturing groups in each alternative
327 </PRE>
328 </P>
329 <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
330 <P>
331 <pre>
332 (?&#62;...) atomic, non-capturing group
333 </PRE>
334 </P>
335 <br><a name="SEC14" href="#TOC1">COMMENT</a><br>
336 <P>
337 <pre>
338 (?#....) comment (not nestable)
339 </PRE>
340 </P>
341 <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
342 <P>
343 <pre>
344 (?i) caseless
345 (?J) allow duplicate names
346 (?m) multiline
347 (?s) single line (dotall)
348 (?U) default ungreedy (lazy)
349 (?x) extended (ignore white space)
350 (?-...) unset option(s)
351 </pre>
352 The following is recognized only at the start of a pattern or after one of the
353 newline-setting options with similar syntax:
354 <pre>
355 (*UTF8) set UTF-8 mode
356 </PRE>
357 </P>
358 <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
359 <P>
360 <pre>
361 (?=...) positive look ahead
362 (?!...) negative look ahead
363 (?&#60;=...) positive look behind
364 (?&#60;!...) negative look behind
365 </pre>
366 Each top-level branch of a look behind must be of a fixed length.
367 </P>
368 <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
369 <P>
370 <pre>
371 \n reference by number (can be ambiguous)
372 \gn reference by number
373 \g{n} reference by number
374 \g{-n} relative reference by number
375 \k&#60;name&#62; reference by name (Perl)
376 \k'name' reference by name (Perl)
377 \g{name} reference by name (Perl)
378 \k{name} reference by name (.NET)
379 (?P=name) reference by name (Python)
380 </PRE>
381 </P>
382 <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
383 <P>
384 <pre>
385 (?R) recurse whole pattern
386 (?n) call subpattern by absolute number
387 (?+n) call subpattern by relative number
388 (?-n) call subpattern by relative number
389 (?&name) call subpattern by name (Perl)
390 (?P&#62;name) call subpattern by name (Python)
391 \g&#60;name&#62; call subpattern by name (Oniguruma)
392 \g'name' call subpattern by name (Oniguruma)
393 \g&#60;n&#62; call subpattern by absolute number (Oniguruma)
394 \g'n' call subpattern by absolute number (Oniguruma)
395 \g&#60;+n&#62; call subpattern by relative number (PCRE extension)
396 \g'+n' call subpattern by relative number (PCRE extension)
397 \g&#60;-n&#62; call subpattern by relative number (PCRE extension)
398 \g'-n' call subpattern by relative number (PCRE extension)
399 </PRE>
400 </P>
401 <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
402 <P>
403 <pre>
404 (?(condition)yes-pattern)
405 (?(condition)yes-pattern|no-pattern)
406
407 (?(n)... absolute reference condition
408 (?(+n)... relative reference condition
409 (?(-n)... relative reference condition
410 (?(&#60;name&#62;)... named reference condition (Perl)
411 (?('name')... named reference condition (Perl)
412 (?(name)... named reference condition (PCRE)
413 (?(R)... overall recursion condition
414 (?(Rn)... specific group recursion condition
415 (?(R&name)... specific recursion condition
416 (?(DEFINE)... define subpattern for reference
417 (?(assert)... assertion condition
418 </PRE>
419 </P>
420 <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
421 <P>
422 The following act immediately they are reached:
423 <pre>
424 (*ACCEPT) force successful match
425 (*FAIL) force backtrack; synonym (*F)
426 </pre>
427 The following act only when a subsequent match failure causes a backtrack to
428 reach them. They all force a match failure, but they differ in what happens
429 afterwards. Those that advance the start-of-match point do so only if the
430 pattern is not anchored.
431 <pre>
432 (*COMMIT) overall failure, no advance of starting point
433 (*PRUNE) advance to next starting character
434 (*SKIP) advance start to current matching position
435 (*THEN) local failure, backtrack to next alternation
436 </PRE>
437 </P>
438 <br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
439 <P>
440 These are recognized only at the very start of the pattern or after a
441 (*BSR_...) or (*UTF8) option.
442 <pre>
443 (*CR) carriage return only
444 (*LF) linefeed only
445 (*CRLF) carriage return followed by linefeed
446 (*ANYCRLF) all three of the above
447 (*ANY) any Unicode newline sequence
448 </PRE>
449 </P>
450 <br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>
451 <P>
452 These are recognized only at the very start of the pattern or after a
453 (*...) option that sets the newline convention or UTF-8 mode.
454 <pre>
455 (*BSR_ANYCRLF) CR, LF, or CRLF
456 (*BSR_UNICODE) any Unicode newline sequence
457 </PRE>
458 </P>
459 <br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
460 <P>
461 <pre>
462 (?C) callout
463 (?Cn) callout with data n
464 </PRE>
465 </P>
466 <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
467 <P>
468 <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
469 <b>pcrematching</b>(3), <b>pcre</b>(3).
470 </P>
471 <br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
472 <P>
473 Philip Hazel
474 <br>
475 University Computing Service
476 <br>
477 Cambridge CB2 3QH, England.
478 <br>
479 </P>
480 <br><a name="SEC26" href="#TOC1">REVISION</a><br>
481 <P>
482 Last updated: 01 March 2010
483 <br>
484 Copyright &copy; 1997-2010 University of Cambridge.
485 <br>
486 <p>
487 Return to the <a href="index.html">PCRE index page</a>.
488 </p>

  ViewVC Help
Powered by ViewVC 1.1.5