/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 227 - (show annotations)
Tue Aug 21 15:00:15 2007 UTC (12 years ago) by ph10
File MIME type: text/html
File size: 12327 byte(s)
Add (*CR) etc.
1 <html>
2 <head>
3 <title>pcresyntax specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcresyntax man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17 <li><a name="TOC2" href="#SEC2">QUOTING</a>
18 <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
21 <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
22 <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
23 <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
24 <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
25 <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
26 <li><a name="TOC11" href="#SEC11">ALTERNATION</a>
27 <li><a name="TOC12" href="#SEC12">CAPTURING</a>
28 <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
29 <li><a name="TOC14" href="#SEC14">COMMENT</a>
30 <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
31 <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
32 <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
33 <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
34 <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
35 <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
36 <li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
37 <li><a name="TOC22" href="#SEC22">CALLOUTS</a>
38 <li><a name="TOC23" href="#SEC23">SEE ALSO</a>
39 <li><a name="TOC24" href="#SEC24">AUTHOR</a>
40 <li><a name="TOC25" href="#SEC25">REVISION</a>
41 </ul>
42 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
43 <P>
44 The full syntax and semantics of the regular expressions that are supported by
45 PCRE are described in the
46 <a href="pcrepattern.html"><b>pcrepattern</b></a>
47 documentation. This document contains just a quick-reference summary of the
48 syntax.
49 </P>
50 <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
51 <P>
52 <pre>
53 \x where x is non-alphanumeric is a literal x
54 \Q...\E treat enclosed characters as literal
55 </PRE>
56 </P>
57 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
58 <P>
59 <pre>
60 \a alarm, that is, the BEL character (hex 07)
61 \cx "control-x", where x is any character
62 \e escape (hex 1B)
63 \f formfeed (hex 0C)
64 \n newline (hex 0A)
65 \r carriage return (hex 0D)
66 \t tab (hex 09)
67 \ddd character with octal code ddd, or backreference
68 \xhh character with hex code hh
69 \x{hhh..} character with hex code hhh..
70 </PRE>
71 </P>
72 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
73 <P>
74 <pre>
75 . any character except newline;
76 in dotall mode, any character whatsoever
77 \C one byte, even in UTF-8 mode (best avoided)
78 \d a decimal digit
79 \D a character that is not a decimal digit
80 \h a horizontal whitespace character
81 \H a character that is not a horizontal whitespace character
82 \p{<i>xx</i>} a character with the <i>xx</i> property
83 \P{<i>xx</i>} a character without the <i>xx</i> property
84 \R a newline sequence
85 \s a whitespace character
86 \S a character that is not a whitespace character
87 \v a vertical whitespace character
88 \V a character that is not a vertical whitespace character
89 \w a "word" character
90 \W a "non-word" character
91 \X an extended Unicode sequence
92 </pre>
93 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
94 </P>
95 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
96 <P>
97 <pre>
98 C Other
99 Cc Control
100 Cf Format
101 Cn Unassigned
102 Co Private use
103 Cs Surrogate
104
105 L Letter
106 Ll Lower case letter
107 Lm Modifier letter
108 Lo Other letter
109 Lt Title case letter
110 Lu Upper case letter
111 L& Ll, Lu, or Lt
112
113 M Mark
114 Mc Spacing mark
115 Me Enclosing mark
116 Mn Non-spacing mark
117
118 N Number
119 Nd Decimal number
120 Nl Letter number
121 No Other number
122
123 P Punctuation
124 Pc Connector punctuation
125 Pd Dash punctuation
126 Pe Close punctuation
127 Pf Final punctuation
128 Pi Initial punctuation
129 Po Other punctuation
130 Ps Open punctuation
131
132 S Symbol
133 Sc Currency symbol
134 Sk Modifier symbol
135 Sm Mathematical symbol
136 So Other symbol
137
138 Z Separator
139 Zl Line separator
140 Zp Paragraph separator
141 Zs Space separator
142 </PRE>
143 </P>
144 <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
145 <P>
146 Arabic,
147 Armenian,
148 Balinese,
149 Bengali,
150 Bopomofo,
151 Braille,
152 Buginese,
153 Buhid,
154 Canadian_Aboriginal,
155 Cherokee,
156 Common,
157 Coptic,
158 Cuneiform,
159 Cypriot,
160 Cyrillic,
161 Deseret,
162 Devanagari,
163 Ethiopic,
164 Georgian,
165 Glagolitic,
166 Gothic,
167 Greek,
168 Gujarati,
169 Gurmukhi,
170 Han,
171 Hangul,
172 Hanunoo,
173 Hebrew,
174 Hiragana,
175 Inherited,
176 Kannada,
177 Katakana,
178 Kharoshthi,
179 Khmer,
180 Lao,
181 Latin,
182 Limbu,
183 Linear_B,
184 Malayalam,
185 Mongolian,
186 Myanmar,
187 New_Tai_Lue,
188 Nko,
189 Ogham,
190 Old_Italic,
191 Old_Persian,
192 Oriya,
193 Osmanya,
194 Phags_Pa,
195 Phoenician,
196 Runic,
197 Shavian,
198 Sinhala,
199 Syloti_Nagri,
200 Syriac,
201 Tagalog,
202 Tagbanwa,
203 Tai_Le,
204 Tamil,
205 Telugu,
206 Thaana,
207 Thai,
208 Tibetan,
209 Tifinagh,
210 Ugaritic,
211 Yi.
212 </P>
213 <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
214 <P>
215 <pre>
216 [...] positive character class
217 [^...] negative character class
218 [x-y] range (can be used for hex characters)
219 [[:xxx:]] positive POSIX named set
220 [[^:xxx:]] negative POSIX named set
221
222 alnum alphanumeric
223 alpha alphabetic
224 ascii 0-127
225 blank space or tab
226 cntrl control character
227 digit decimal digit
228 graph printing, excluding space
229 lower lower case letter
230 print printing, including space
231 punct printing, excluding alphanumeric
232 space whitespace
233 upper upper case letter
234 word same as \w
235 xdigit hexadecimal digit
236 </pre>
237 In PCRE, POSIX character set names recognize only ASCII characters. You can use
238 \Q...\E inside a character class.
239 </P>
240 <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
241 <P>
242 <pre>
243 ? 0 or 1, greedy
244 ?+ 0 or 1, possessive
245 ?? 0 or 1, lazy
246 * 0 or more, greedy
247 *+ 0 or more, possessive
248 *? 0 or more, lazy
249 + 1 or more, greedy
250 ++ 1 or more, possessive
251 +? 1 or more, lazy
252 {n} exactly n
253 {n,m} at least n, no more than m, greedy
254 {n,m}+ at least n, no more than m, possessive
255 {n,m}? at least n, no more than m, lazy
256 {n,} n or more, greedy
257 {n,}+ n or more, possessive
258 {n,}? n or more, lazy
259 </PRE>
260 </P>
261 <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
262 <P>
263 <pre>
264 \b word boundary
265 \B not a word boundary
266 ^ start of subject
267 also after internal newline in multiline mode
268 \A start of subject
269 $ end of subject
270 also before newline at end of subject
271 also before internal newline in multiline mode
272 \Z end of subject
273 also before newline at end of subject
274 \z end of subject
275 \G first matching position in subject
276 </PRE>
277 </P>
278 <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
279 <P>
280 <pre>
281 \K reset start of match
282 </PRE>
283 </P>
284 <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
285 <P>
286 <pre>
287 expr|expr|expr...
288 </PRE>
289 </P>
290 <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
291 <P>
292 <pre>
293 (...) capturing group
294 (?&#60;name&#62;...) named capturing group (Perl)
295 (?'name'...) named capturing group (Perl)
296 (?P&#60;name&#62;...) named capturing group (Python)
297 (?:...) non-capturing group
298 (?|...) non-capturing group; reset group numbers for
299 capturing groups in each alternative
300 </PRE>
301 </P>
302 <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
303 <P>
304 <pre>
305 (?&#62;...) atomic, non-capturing group
306 </PRE>
307 </P>
308 <br><a name="SEC14" href="#TOC1">COMMENT</a><br>
309 <P>
310 <pre>
311 (?#....) comment (not nestable)
312 </PRE>
313 </P>
314 <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
315 <P>
316 <pre>
317 (?i) caseless
318 (?J) allow duplicate names
319 (?m) multiline
320 (?s) single line (dotall)
321 (?U) default ungreedy (lazy)
322 (?x) extended (ignore white space)
323 (?-...) unset option(s)
324 </PRE>
325 </P>
326 <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
327 <P>
328 <pre>
329 (?=...) positive look ahead
330 (?!...) negative look ahead
331 (?&#60;=...) positive look behind
332 (?&#60;!...) negative look behind
333 </pre>
334 Each top-level branch of a look behind must be of a fixed length.
335 </P>
336 <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
337 <P>
338 <pre>
339 \n reference by number (can be ambiguous)
340 \gn reference by number
341 \g{n} reference by number
342 \g{-n} relative reference by number
343 \k&#60;name&#62; reference by name (Perl)
344 \k'name' reference by name (Perl)
345 \g{name} reference by name (Perl)
346 \k{name} reference by name (.NET)
347 (?P=name) reference by name (Python)
348 </PRE>
349 </P>
350 <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
351 <P>
352 <pre>
353 (?R) recurse whole pattern
354 (?n) call subpattern by absolute number
355 (?+n) call subpattern by relative number
356 (?-n) call subpattern by relative number
357 (?&name) call subpattern by name (Perl)
358 (?P&#62;name) call subpattern by name (Python)
359 </PRE>
360 </P>
361 <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
362 <P>
363 <pre>
364 (?(condition)yes-pattern)
365 (?(condition)yes-pattern|no-pattern)
366
367 (?(n)... absolute reference condition
368 (?(+n)... relative reference condition
369 (?(-n)... relative reference condition
370 (?(&#60;name&#62;)... named reference condition (Perl)
371 (?('name')... named reference condition (Perl)
372 (?(name)... named reference condition (PCRE)
373 (?(R)... overall recursion condition
374 (?(Rn)... specific group recursion condition
375 (?(R&name)... specific recursion condition
376 (?(DEFINE)... define subpattern for reference
377 (?(assert)... assertion condition
378 </PRE>
379 </P>
380 <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
381 <P>
382 The following act immediately they are reached:
383 <pre>
384 (*ACCEPT) force successful match
385 (*FAIL) force backtrack; synonym (*F)
386 </pre>
387 The following act only when a subsequent match failure causes a backtrack to
388 reach them. They all force a match failure, but they differ in what happens
389 afterwards. Those that advance the start-of-match point do so only if the
390 pattern is not anchored.
391 <pre>
392 (*COMMIT) overall failure, no advance of starting point
393 (*PRUNE) advance to next starting character
394 (*SKIP) advance start to current matching position
395 (*THEN) local failure, backtrack to next alternation
396 </PRE>
397 </P>
398 <br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
399 <P>
400 These are recognized only at the very start of a pattern.
401 <pre>
402 (*CR)
403 (*LF)
404 (*CRLF)
405 (*ANYCRLF)
406 (*ANY)
407 </PRE>
408 </P>
409 <br><a name="SEC22" href="#TOC1">CALLOUTS</a><br>
410 <P>
411 <pre>
412 (?C) callout
413 (?Cn) callout with data n
414 </PRE>
415 </P>
416 <br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
417 <P>
418 <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
419 <b>pcrematching</b>(3), <b>pcre</b>(3).
420 </P>
421 <br><a name="SEC24" href="#TOC1">AUTHOR</a><br>
422 <P>
423 Philip Hazel
424 <br>
425 University Computing Service
426 <br>
427 Cambridge CB2 3QH, England.
428 <br>
429 </P>
430 <br><a name="SEC25" href="#TOC1">REVISION</a><br>
431 <P>
432 Last updated: 21 August 2007
433 <br>
434 Copyright &copy; 1997-2007 University of Cambridge.
435 <br>
436 <p>
437 Return to the <a href="index.html">PCRE index page</a>.
438 </p>

  ViewVC Help
Powered by ViewVC 1.1.5