/[pcre]/code/trunk/doc/pcresyntax.3
ViewVC logotype

Contents of /code/trunk/doc/pcresyntax.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 517 - (show annotations)
Wed May 5 10:44:20 2010 UTC (5 years, 2 months ago) by ph10
File size: 11277 byte(s)
Error occurred while calculating annotation data.
Add new special properties Xan, Xps, Xsp, Xwd to help with \w etc.
1 .TH PCRESYNTAX 3
2 .SH NAME
3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5 .rs
6 .sp
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
9 .\" HREF
10 \fBpcrepattern\fP
11 .\"
12 documentation. This document contains just a quick-reference summary of the
13 syntax.
14 .
15 .
16 .SH "QUOTING"
17 .rs
18 .sp
19 \ex where x is non-alphanumeric is a literal x
20 \eQ...\eE treat enclosed characters as literal
21 .
22 .
23 .SH "CHARACTERS"
24 .rs
25 .sp
26 \ea alarm, that is, the BEL character (hex 07)
27 \ecx "control-x", where x is any character
28 \ee escape (hex 1B)
29 \ef formfeed (hex 0C)
30 \en newline (hex 0A)
31 \er carriage return (hex 0D)
32 \et tab (hex 09)
33 \eddd character with octal code ddd, or backreference
34 \exhh character with hex code hh
35 \ex{hhh..} character with hex code hhh..
36 .
37 .
38 .SH "CHARACTER TYPES"
39 .rs
40 .sp
41 . any character except newline;
42 in dotall mode, any character whatsoever
43 \eC one byte, even in UTF-8 mode (best avoided)
44 \ed a decimal digit
45 \eD a character that is not a decimal digit
46 \eh a horizontal whitespace character
47 \eH a character that is not a horizontal whitespace character
48 \eN a character that is not a newline
49 \ep{\fIxx\fP} a character with the \fIxx\fP property
50 \eP{\fIxx\fP} a character without the \fIxx\fP property
51 \eR a newline sequence
52 \es a whitespace character
53 \eS a character that is not a whitespace character
54 \ev a vertical whitespace character
55 \eV a character that is not a vertical whitespace character
56 \ew a "word" character
57 \eW a "non-word" character
58 \eX an extended Unicode sequence
59 .sp
60 In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.
61 .
62 .
63 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
64 .rs
65 .sp
66 C Other
67 Cc Control
68 Cf Format
69 Cn Unassigned
70 Co Private use
71 Cs Surrogate
72 .sp
73 L Letter
74 Ll Lower case letter
75 Lm Modifier letter
76 Lo Other letter
77 Lt Title case letter
78 Lu Upper case letter
79 L& Ll, Lu, or Lt
80 .sp
81 M Mark
82 Mc Spacing mark
83 Me Enclosing mark
84 Mn Non-spacing mark
85 .sp
86 N Number
87 Nd Decimal number
88 Nl Letter number
89 No Other number
90 .sp
91 P Punctuation
92 Pc Connector punctuation
93 Pd Dash punctuation
94 Pe Close punctuation
95 Pf Final punctuation
96 Pi Initial punctuation
97 Po Other punctuation
98 Ps Open punctuation
99 .sp
100 S Symbol
101 Sc Currency symbol
102 Sk Modifier symbol
103 Sm Mathematical symbol
104 So Other symbol
105 .sp
106 Z Separator
107 Zl Line separator
108 Zp Paragraph separator
109 Zs Space separator
110 .
111 .
112 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
113 .rs
114 .sp
115 Xan Alphanumeric: union of properties L and N
116 Xps POSIX space: property Z or tab, NL, VT, FF, CR
117 Xsp Perl space: property Z or tab, NL, FF, CR
118 Xwd Perl word: property Xan or underscore
119 .
120 .
121 .SH "SCRIPT NAMES FOR \ep AND \eP"
122 .rs
123 .sp
124 Arabic,
125 Armenian,
126 Avestan,
127 Balinese,
128 Bamum,
129 Bengali,
130 Bopomofo,
131 Braille,
132 Buginese,
133 Buhid,
134 Canadian_Aboriginal,
135 Carian,
136 Cham,
137 Cherokee,
138 Common,
139 Coptic,
140 Cuneiform,
141 Cypriot,
142 Cyrillic,
143 Deseret,
144 Devanagari,
145 Egyptian_Hieroglyphs,
146 Ethiopic,
147 Georgian,
148 Glagolitic,
149 Gothic,
150 Greek,
151 Gujarati,
152 Gurmukhi,
153 Han,
154 Hangul,
155 Hanunoo,
156 Hebrew,
157 Hiragana,
158 Imperial_Aramaic,
159 Inherited,
160 Inscriptional_Pahlavi,
161 Inscriptional_Parthian,
162 Javanese,
163 Kaithi,
164 Kannada,
165 Katakana,
166 Kayah_Li,
167 Kharoshthi,
168 Khmer,
169 Lao,
170 Latin,
171 Lepcha,
172 Limbu,
173 Linear_B,
174 Lisu,
175 Lycian,
176 Lydian,
177 Malayalam,
178 Meetei_Mayek,
179 Mongolian,
180 Myanmar,
181 New_Tai_Lue,
182 Nko,
183 Ogham,
184 Old_Italic,
185 Old_Persian,
186 Old_South_Arabian,
187 Old_Turkic,
188 Ol_Chiki,
189 Oriya,
190 Osmanya,
191 Phags_Pa,
192 Phoenician,
193 Rejang,
194 Runic,
195 Samaritan,
196 Saurashtra,
197 Shavian,
198 Sinhala,
199 Sundanese,
200 Syloti_Nagri,
201 Syriac,
202 Tagalog,
203 Tagbanwa,
204 Tai_Le,
205 Tai_Tham,
206 Tai_Viet,
207 Tamil,
208 Telugu,
209 Thaana,
210 Thai,
211 Tibetan,
212 Tifinagh,
213 Ugaritic,
214 Vai,
215 Yi.
216 .
217 .
218 .SH "CHARACTER CLASSES"
219 .rs
220 .sp
221 [...] positive character class
222 [^...] negative character class
223 [x-y] range (can be used for hex characters)
224 [[:xxx:]] positive POSIX named set
225 [[:^xxx:]] negative POSIX named set
226 .sp
227 alnum alphanumeric
228 alpha alphabetic
229 ascii 0-127
230 blank space or tab
231 cntrl control character
232 digit decimal digit
233 graph printing, excluding space
234 lower lower case letter
235 print printing, including space
236 punct printing, excluding alphanumeric
237 space whitespace
238 upper upper case letter
239 word same as \ew
240 xdigit hexadecimal digit
241 .sp
242 In PCRE, POSIX character set names recognize only ASCII characters. You can use
243 \eQ...\eE inside a character class.
244 .
245 .
246 .SH "QUANTIFIERS"
247 .rs
248 .sp
249 ? 0 or 1, greedy
250 ?+ 0 or 1, possessive
251 ?? 0 or 1, lazy
252 * 0 or more, greedy
253 *+ 0 or more, possessive
254 *? 0 or more, lazy
255 + 1 or more, greedy
256 ++ 1 or more, possessive
257 +? 1 or more, lazy
258 {n} exactly n
259 {n,m} at least n, no more than m, greedy
260 {n,m}+ at least n, no more than m, possessive
261 {n,m}? at least n, no more than m, lazy
262 {n,} n or more, greedy
263 {n,}+ n or more, possessive
264 {n,}? n or more, lazy
265 .
266 .
267 .SH "ANCHORS AND SIMPLE ASSERTIONS"
268 .rs
269 .sp
270 \eb word boundary (only ASCII letters recognized)
271 \eB not a word boundary
272 ^ start of subject
273 also after internal newline in multiline mode
274 \eA start of subject
275 $ end of subject
276 also before newline at end of subject
277 also before internal newline in multiline mode
278 \eZ end of subject
279 also before newline at end of subject
280 \ez end of subject
281 \eG first matching position in subject
282 .
283 .
284 .SH "MATCH POINT RESET"
285 .rs
286 .sp
287 \eK reset start of match
288 .
289 .
290 .SH "ALTERNATION"
291 .rs
292 .sp
293 expr|expr|expr...
294 .
295 .
296 .SH "CAPTURING"
297 .rs
298 .sp
299 (...) capturing group
300 (?<name>...) named capturing group (Perl)
301 (?'name'...) named capturing group (Perl)
302 (?P<name>...) named capturing group (Python)
303 (?:...) non-capturing group
304 (?|...) non-capturing group; reset group numbers for
305 capturing groups in each alternative
306 .
307 .
308 .SH "ATOMIC GROUPS"
309 .rs
310 .sp
311 (?>...) atomic, non-capturing group
312 .
313 .
314 .
315 .
316 .SH "COMMENT"
317 .rs
318 .sp
319 (?#....) comment (not nestable)
320 .
321 .
322 .SH "OPTION SETTING"
323 .rs
324 .sp
325 (?i) caseless
326 (?J) allow duplicate names
327 (?m) multiline
328 (?s) single line (dotall)
329 (?U) default ungreedy (lazy)
330 (?x) extended (ignore white space)
331 (?-...) unset option(s)
332 .sp
333 The following is recognized only at the start of a pattern or after one of the
334 newline-setting options with similar syntax:
335 .sp
336 (*UTF8) set UTF-8 mode
337 .
338 .
339 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
340 .rs
341 .sp
342 (?=...) positive look ahead
343 (?!...) negative look ahead
344 (?<=...) positive look behind
345 (?<!...) negative look behind
346 .sp
347 Each top-level branch of a look behind must be of a fixed length.
348 .
349 .
350 .SH "BACKREFERENCES"
351 .rs
352 .sp
353 \en reference by number (can be ambiguous)
354 \egn reference by number
355 \eg{n} reference by number
356 \eg{-n} relative reference by number
357 \ek<name> reference by name (Perl)
358 \ek'name' reference by name (Perl)
359 \eg{name} reference by name (Perl)
360 \ek{name} reference by name (.NET)
361 (?P=name) reference by name (Python)
362 .
363 .
364 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
365 .rs
366 .sp
367 (?R) recurse whole pattern
368 (?n) call subpattern by absolute number
369 (?+n) call subpattern by relative number
370 (?-n) call subpattern by relative number
371 (?&name) call subpattern by name (Perl)
372 (?P>name) call subpattern by name (Python)
373 \eg<name> call subpattern by name (Oniguruma)
374 \eg'name' call subpattern by name (Oniguruma)
375 \eg<n> call subpattern by absolute number (Oniguruma)
376 \eg'n' call subpattern by absolute number (Oniguruma)
377 \eg<+n> call subpattern by relative number (PCRE extension)
378 \eg'+n' call subpattern by relative number (PCRE extension)
379 \eg<-n> call subpattern by relative number (PCRE extension)
380 \eg'-n' call subpattern by relative number (PCRE extension)
381 .
382 .
383 .SH "CONDITIONAL PATTERNS"
384 .rs
385 .sp
386 (?(condition)yes-pattern)
387 (?(condition)yes-pattern|no-pattern)
388 .sp
389 (?(n)... absolute reference condition
390 (?(+n)... relative reference condition
391 (?(-n)... relative reference condition
392 (?(<name>)... named reference condition (Perl)
393 (?('name')... named reference condition (Perl)
394 (?(name)... named reference condition (PCRE)
395 (?(R)... overall recursion condition
396 (?(Rn)... specific group recursion condition
397 (?(R&name)... specific recursion condition
398 (?(DEFINE)... define subpattern for reference
399 (?(assert)... assertion condition
400 .
401 .
402 .SH "BACKTRACKING CONTROL"
403 .rs
404 .sp
405 The following act immediately they are reached:
406 .sp
407 (*ACCEPT) force successful match
408 (*FAIL) force backtrack; synonym (*F)
409 .sp
410 The following act only when a subsequent match failure causes a backtrack to
411 reach them. They all force a match failure, but they differ in what happens
412 afterwards. Those that advance the start-of-match point do so only if the
413 pattern is not anchored.
414 .sp
415 (*COMMIT) overall failure, no advance of starting point
416 (*PRUNE) advance to next starting character
417 (*SKIP) advance start to current matching position
418 (*THEN) local failure, backtrack to next alternation
419 .
420 .
421 .SH "NEWLINE CONVENTIONS"
422 .rs
423 .sp
424 These are recognized only at the very start of the pattern or after a
425 (*BSR_...) or (*UTF8) option.
426 .sp
427 (*CR) carriage return only
428 (*LF) linefeed only
429 (*CRLF) carriage return followed by linefeed
430 (*ANYCRLF) all three of the above
431 (*ANY) any Unicode newline sequence
432 .
433 .
434 .SH "WHAT \eR MATCHES"
435 .rs
436 .sp
437 These are recognized only at the very start of the pattern or after a
438 (*...) option that sets the newline convention or UTF-8 mode.
439 .sp
440 (*BSR_ANYCRLF) CR, LF, or CRLF
441 (*BSR_UNICODE) any Unicode newline sequence
442 .
443 .
444 .SH "CALLOUTS"
445 .rs
446 .sp
447 (?C) callout
448 (?Cn) callout with data n
449 .
450 .
451 .SH "SEE ALSO"
452 .rs
453 .sp
454 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
455 \fBpcrematching\fP(3), \fBpcre\fP(3).
456 .
457 .
458 .SH AUTHOR
459 .rs
460 .sp
461 .nf
462 Philip Hazel
463 University Computing Service
464 Cambridge CB2 3QH, England.
465 .fi
466 .
467 .
468 .SH REVISION
469 .rs
470 .sp
471 .nf
472 Last updated: 05 May 2010
473 Copyright (c) 1997-2010 University of Cambridge.
474 .fi

  ViewVC Help
Powered by ViewVC 1.1.5