/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 208 - (show annotations)
Mon Aug 6 15:23:29 2007 UTC (13 years, 2 months ago) by ph10
File MIME type: text/html
File size: 11300 byte(s)
Error occurred while calculating annotation data.
Added a pcresyntax man page; tidied some others.
1 <html>
2 <head>
3 <title>pcresyntax specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcresyntax man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
17 <li><a name="TOC2" href="#SEC2">QUOTING</a>
18 <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
21 <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
22 <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
23 <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
24 <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
25 <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
26 <li><a name="TOC11" href="#SEC11">ALTERNATION</a>
27 <li><a name="TOC12" href="#SEC12">CAPTURING</a>
28 <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
29 <li><a name="TOC14" href="#SEC14">COMMENT</a>
30 <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
31 <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
32 <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
33 <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
34 <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
35 <li><a name="TOC20" href="#SEC20">CALLOUTS</a>
36 <li><a name="TOC21" href="#SEC21">SEE ALSO</a>
37 <li><a name="TOC22" href="#SEC22">AUTHOR</a>
38 <li><a name="TOC23" href="#SEC23">REVISION</a>
39 </ul>
40 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
41 <P>
42 The full syntax and semantics of the regular expressions that are supported by
43 PCRE are described in the
44 <a href="pcrepattern.html"><b>pcrepattern</b></a>
45 documentation. This document contains just a quick-reference summary of the
46 syntax.
47 </P>
48 <br><a name="SEC2" href="#TOC1">QUOTING</a><br>
49 <P>
50 <pre>
51 \x where x is non-alphanumeric is a literal x
52 \Q...\E treat enclosed characters as literal
53 </PRE>
54 </P>
55 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
56 <P>
57 <pre>
58 \a alarm, that is, the BEL character (hex 07)
59 \cx "control-x", where x is any character
60 \e escape (hex 1B)
61 \f formfeed (hex 0C)
62 \n newline (hex 0A)
63 \r carriage return (hex 0D)
64 \t tab (hex 09)
65 \ddd character with octal code ddd, or backreference
66 \xhh character with hex code hh
67 \x{hhh..} character with hex code hhh..
68 </PRE>
69 </P>
70 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
71 <P>
72 <pre>
73 . any character except newline;
74 in dotall mode, any character whatsoever
75 \C one byte, even in UTF-8 mode (best avoided)
76 \d a decimal digit
77 \D a character that is not a decimal digit
78 \h a horizontal whitespace character
79 \H a character that is not a horizontal whitespace character
80 \p{<i>xx</i>} a character with the <i>xx</i> property
81 \P{<i>xx</i>} a character without the <i>xx</i> property
82 \R a newline sequence
83 \s a whitespace character
84 \S a character that is not a whitespace character
85 \v a vertical whitespace character
86 \V a character that is not a vertical whitespace character
87 \w a "word" character
88 \W a "non-word" character
89 \X an extended Unicode sequence
90 </pre>
91 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
92 </P>
93 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
94 <P>
95 <pre>
96 C Other
97 Cc Control
98 Cf Format
99 Cn Unassigned
100 Co Private use
101 Cs Surrogate
102
103 L Letter
104 Ll Lower case letter
105 Lm Modifier letter
106 Lo Other letter
107 Lt Title case letter
108 Lu Upper case letter
109 L& Ll, Lu, or Lt
110
111 M Mark
112 Mc Spacing mark
113 Me Enclosing mark
114 Mn Non-spacing mark
115
116 N Number
117 Nd Decimal number
118 Nl Letter number
119 No Other number
120
121 P Punctuation
122 Pc Connector punctuation
123 Pd Dash punctuation
124 Pe Close punctuation
125 Pf Final punctuation
126 Pi Initial punctuation
127 Po Other punctuation
128 Ps Open punctuation
129
130 S Symbol
131 Sc Currency symbol
132 Sk Modifier symbol
133 Sm Mathematical symbol
134 So Other symbol
135
136 Z Separator
137 Zl Line separator
138 Zp Paragraph separator
139 Zs Space separator
140 </PRE>
141 </P>
142 <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
143 <P>
144 Arabic,
145 Armenian,
146 Balinese,
147 Bengali,
148 Bopomofo,
149 Braille,
150 Buginese,
151 Buhid,
152 Canadian_Aboriginal,
153 Cherokee,
154 Common,
155 Coptic,
156 Cuneiform,
157 Cypriot,
158 Cyrillic,
159 Deseret,
160 Devanagari,
161 Ethiopic,
162 Georgian,
163 Glagolitic,
164 Gothic,
165 Greek,
166 Gujarati,
167 Gurmukhi,
168 Han,
169 Hangul,
170 Hanunoo,
171 Hebrew,
172 Hiragana,
173 Inherited,
174 Kannada,
175 Katakana,
176 Kharoshthi,
177 Khmer,
178 Lao,
179 Latin,
180 Limbu,
181 Linear_B,
182 Malayalam,
183 Mongolian,
184 Myanmar,
185 New_Tai_Lue,
186 Nko,
187 Ogham,
188 Old_Italic,
189 Old_Persian,
190 Oriya,
191 Osmanya,
192 Phags_Pa,
193 Phoenician,
194 Runic,
195 Shavian,
196 Sinhala,
197 Syloti_Nagri,
198 Syriac,
199 Tagalog,
200 Tagbanwa,
201 Tai_Le,
202 Tamil,
203 Telugu,
204 Thaana,
205 Thai,
206 Tibetan,
207 Tifinagh,
208 Ugaritic,
209 Yi.
210 </P>
211 <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
212 <P>
213 <pre>
214 [...] positive character class
215 [^...] negative character class
216 [x-y] range (can be used for hex characters)
217 [[:xxx:]] positive POSIX named set
218 [[^:xxx:]] negative POSIX named set
219
220 alnum alphanumeric
221 alpha alphabetic
222 ascii 0-127
223 blank space or tab
224 cntrl control character
225 digit decimal digit
226 graph printing, excluding space
227 lower lower case letter
228 print printing, including space
229 punct printing, excluding alphanumeric
230 space whitespace
231 upper upper case letter
232 word same as \w
233 xdigit hexadecimal digit
234 </pre>
235 In PCRE, POSIX character set names recognize only ASCII characters. You can use
236 \Q...\E inside a character class.
237 </P>
238 <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
239 <P>
240 <pre>
241 ? 0 or 1, greedy
242 ?+ 0 or 1, possessive
243 ?? 0 or 1, lazy
244 * 0 or more, greedy
245 *+ 0 or more, possessive
246 *? 0 or more, lazy
247 + 1 or more, greedy
248 ++ 1 or more, possessive
249 +? 1 or more, lazy
250 {n} exactly n
251 {n,m} at least n, no more than m, greedy
252 {n,m}+ at least n, no more than m, possessive
253 {n,m}? at least n, no more than m, lazy
254 {n,} n or more, greedy
255 {n,}+ n or more, possessive
256 {n,}? n or more, lazy
257 </PRE>
258 </P>
259 <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
260 <P>
261 <pre>
262 \b word boundary
263 \B not a word boundary
264 ^ start of subject
265 also after internal newline in multiline mode
266 \A start of subject
267 $ end of subject
268 also before newline at end of subject
269 also before internal newline in multiline mode
270 \Z end of subject
271 also before newline at end of subject
272 \z end of subject
273 \G first matching position in subject
274 </PRE>
275 </P>
276 <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
277 <P>
278 <pre>
279 \K reset start of match
280 </PRE>
281 </P>
282 <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
283 <P>
284 <pre>
285 expr|expr|expr...
286 </PRE>
287 </P>
288 <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
289 <P>
290 <pre>
291 (...) capturing group
292 (?&#60;name&#62;...) named capturing group (Perl)
293 (?'name'...) named capturing group (Perl)
294 (?P&#60;name&#62;...) named capturing group (Python)
295 (?:...) non-capturing group
296 (?|...) non-capturing group; reset group numbers for
297 capturing groups in each alternative
298 </PRE>
299 </P>
300 <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
301 <P>
302 <pre>
303 (?&#62;...) atomic, non-capturing group
304 </PRE>
305 </P>
306 <br><a name="SEC14" href="#TOC1">COMMENT</a><br>
307 <P>
308 <pre>
309 (?#....) comment (not nestable)
310 </PRE>
311 </P>
312 <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
313 <P>
314 <pre>
315 (?i) caseless
316 (?J) allow duplicate names
317 (?m) multiline
318 (?s) single line (dotall)
319 (?U) default ungreedy (lazy)
320 (?x) extended (ignore white space)
321 (?-...) unset option(s)
322 </PRE>
323 </P>
324 <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
325 <P>
326 <pre>
327 (?=...) positive look ahead
328 (?!...) negative look ahead
329 (?&#60;=...) positive look behind
330 (?&#60;!...) negative look behind
331 </pre>
332 Each top-level branch of a look behind must be of a fixed length.
333 </P>
334 <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
335 <P>
336 <pre>
337 \n reference by number (can be ambiguous)
338 \gn reference by number
339 \g{n} reference by number
340 \g{-n} relative reference by number
341 \k&#60;name&#62; reference by name (Perl)
342 \k'name' reference by name (Perl)
343 \g{name} reference by name (Perl)
344 \k{name} reference by name (.NET)
345 (?P=name) reference by name (Python)
346 </PRE>
347 </P>
348 <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
349 <P>
350 <pre>
351 (?R) recurse whole pattern
352 (?n) call subpattern by absolute number
353 (?+n) call subpattern by relative number
354 (?-n) call subpattern by relative number
355 (?&name) call subpattern by name (Perl)
356 (?P&#62;name) call subpattern by name (Python)
357 </PRE>
358 </P>
359 <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
360 <P>
361 <pre>
362 (?(condition)yes-pattern)
363 (?(condition)yes-pattern|no-pattern)
364
365 (?(n)... absolute reference condition
366 (?(+n)... relative reference condition
367 (?(-n)... relative reference condition
368 (?(&#60;name&#62;)... named reference condition (Perl)
369 (?('name')... named reference condition (Perl)
370 (?(name)... named reference condition (PCRE)
371 (?(R)... overall recursion condition
372 (?(Rn)... specific group recursion condition
373 (?(R&name)... specific recursion condition
374 (?(DEFINE)... define subpattern for reference
375 (?(assert)... assertion condition
376 </PRE>
377 </P>
378 <br><a name="SEC20" href="#TOC1">CALLOUTS</a><br>
379 <P>
380 <pre>
381 (?C) callout
382 (?Cn) callout with data n
383 </PRE>
384 </P>
385 <br><a name="SEC21" href="#TOC1">SEE ALSO</a><br>
386 <P>
387 <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
388 <b>pcrematching</b>(3), <b>pcre</b>(3).
389 </P>
390 <br><a name="SEC22" href="#TOC1">AUTHOR</a><br>
391 <P>
392 Philip Hazel
393 <br>
394 University Computing Service
395 <br>
396 Cambridge CB2 3QH, England.
397 <br>
398 </P>
399 <br><a name="SEC23" href="#TOC1">REVISION</a><br>
400 <P>
401 Last updated: 06 August 2007
402 <br>
403 Copyright &copy; 1997-2007 University of Cambridge.
404 <br>
405 <p>
406 Return to the <a href="index.html">PCRE index page</a>.
407 </p>

  ViewVC Help
Powered by ViewVC 1.1.5