/[pcre]/code/trunk/doc/pcresyntax.3
ViewVC logotype

Contents of /code/trunk/doc/pcresyntax.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1314 - (show annotations)
Fri Apr 26 10:44:13 2013 UTC (6 years, 5 months ago) by ph10
File size: 12502 byte(s)
Error occurred while calculating annotation data.
Documentation updates.
1 .TH PCRESYNTAX 3 "26 April 2013" "PCRE 8.33"
2 .SH NAME
3 PCRE - Perl-compatible regular expressions
4 .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
5 .rs
6 .sp
7 The full syntax and semantics of the regular expressions that are supported by
8 PCRE are described in the
9 .\" HREF
10 \fBpcrepattern\fP
11 .\"
12 documentation. This document contains a quick-reference summary of the syntax.
13 .
14 .
15 .SH "QUOTING"
16 .rs
17 .sp
18 \ex where x is non-alphanumeric is a literal x
19 \eQ...\eE treat enclosed characters as literal
20 .
21 .
22 .SH "CHARACTERS"
23 .rs
24 .sp
25 \ea alarm, that is, the BEL character (hex 07)
26 \ecx "control-x", where x is any ASCII character
27 \ee escape (hex 1B)
28 \ef form feed (hex 0C)
29 \en newline (hex 0A)
30 \er carriage return (hex 0D)
31 \et tab (hex 09)
32 \eddd character with octal code ddd, or backreference
33 \exhh character with hex code hh
34 \ex{hhh..} character with hex code hhh..
35 .
36 .
37 .SH "CHARACTER TYPES"
38 .rs
39 .sp
40 . any character except newline;
41 in dotall mode, any character whatsoever
42 \eC one data unit, even in UTF mode (best avoided)
43 \ed a decimal digit
44 \eD a character that is not a decimal digit
45 \eh a horizontal white space character
46 \eH a character that is not a horizontal white space character
47 \eN a character that is not a newline
48 \ep{\fIxx\fP} a character with the \fIxx\fP property
49 \eP{\fIxx\fP} a character without the \fIxx\fP property
50 \eR a newline sequence
51 \es a white space character
52 \eS a character that is not a white space character
53 \ev a vertical white space character
54 \eV a character that is not a vertical white space character
55 \ew a "word" character
56 \eW a "non-word" character
57 \eX a Unicode extended grapheme cluster
58 .sp
59 In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
60 characters, even in a UTF mode. However, this can be changed by setting the
61 PCRE_UCP option.
62 .
63 .
64 .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65 .rs
66 .sp
67 C Other
68 Cc Control
69 Cf Format
70 Cn Unassigned
71 Co Private use
72 Cs Surrogate
73 .sp
74 L Letter
75 Ll Lower case letter
76 Lm Modifier letter
77 Lo Other letter
78 Lt Title case letter
79 Lu Upper case letter
80 L& Ll, Lu, or Lt
81 .sp
82 M Mark
83 Mc Spacing mark
84 Me Enclosing mark
85 Mn Non-spacing mark
86 .sp
87 N Number
88 Nd Decimal number
89 Nl Letter number
90 No Other number
91 .sp
92 P Punctuation
93 Pc Connector punctuation
94 Pd Dash punctuation
95 Pe Close punctuation
96 Pf Final punctuation
97 Pi Initial punctuation
98 Po Other punctuation
99 Ps Open punctuation
100 .sp
101 S Symbol
102 Sc Currency symbol
103 Sk Modifier symbol
104 Sm Mathematical symbol
105 So Other symbol
106 .sp
107 Z Separator
108 Zl Line separator
109 Zp Paragraph separator
110 Zs Space separator
111 .
112 .
113 .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114 .rs
115 .sp
116 Xan Alphanumeric: union of properties L and N
117 Xps POSIX space: property Z or tab, NL, VT, FF, CR
118 Xsp Perl space: property Z or tab, NL, FF, CR
119 Xuc Univerally-named character: one that can be
120 represented by a Universal Character Name
121 Xwd Perl word: property Xan or underscore
122 .
123 .
124 .SH "SCRIPT NAMES FOR \ep AND \eP"
125 .rs
126 .sp
127 Arabic,
128 Armenian,
129 Avestan,
130 Balinese,
131 Bamum,
132 Batak,
133 Bengali,
134 Bopomofo,
135 Brahmi,
136 Braille,
137 Buginese,
138 Buhid,
139 Canadian_Aboriginal,
140 Carian,
141 Chakma,
142 Cham,
143 Cherokee,
144 Common,
145 Coptic,
146 Cuneiform,
147 Cypriot,
148 Cyrillic,
149 Deseret,
150 Devanagari,
151 Egyptian_Hieroglyphs,
152 Ethiopic,
153 Georgian,
154 Glagolitic,
155 Gothic,
156 Greek,
157 Gujarati,
158 Gurmukhi,
159 Han,
160 Hangul,
161 Hanunoo,
162 Hebrew,
163 Hiragana,
164 Imperial_Aramaic,
165 Inherited,
166 Inscriptional_Pahlavi,
167 Inscriptional_Parthian,
168 Javanese,
169 Kaithi,
170 Kannada,
171 Katakana,
172 Kayah_Li,
173 Kharoshthi,
174 Khmer,
175 Lao,
176 Latin,
177 Lepcha,
178 Limbu,
179 Linear_B,
180 Lisu,
181 Lycian,
182 Lydian,
183 Malayalam,
184 Mandaic,
185 Meetei_Mayek,
186 Meroitic_Cursive,
187 Meroitic_Hieroglyphs,
188 Miao,
189 Mongolian,
190 Myanmar,
191 New_Tai_Lue,
192 Nko,
193 Ogham,
194 Old_Italic,
195 Old_Persian,
196 Old_South_Arabian,
197 Old_Turkic,
198 Ol_Chiki,
199 Oriya,
200 Osmanya,
201 Phags_Pa,
202 Phoenician,
203 Rejang,
204 Runic,
205 Samaritan,
206 Saurashtra,
207 Sharada,
208 Shavian,
209 Sinhala,
210 Sora_Sompeng,
211 Sundanese,
212 Syloti_Nagri,
213 Syriac,
214 Tagalog,
215 Tagbanwa,
216 Tai_Le,
217 Tai_Tham,
218 Tai_Viet,
219 Takri,
220 Tamil,
221 Telugu,
222 Thaana,
223 Thai,
224 Tibetan,
225 Tifinagh,
226 Ugaritic,
227 Vai,
228 Yi.
229 .
230 .
231 .SH "CHARACTER CLASSES"
232 .rs
233 .sp
234 [...] positive character class
235 [^...] negative character class
236 [x-y] range (can be used for hex characters)
237 [[:xxx:]] positive POSIX named set
238 [[:^xxx:]] negative POSIX named set
239 .sp
240 alnum alphanumeric
241 alpha alphabetic
242 ascii 0-127
243 blank space or tab
244 cntrl control character
245 digit decimal digit
246 graph printing, excluding space
247 lower lower case letter
248 print printing, including space
249 punct printing, excluding alphanumeric
250 space white space
251 upper upper case letter
252 word same as \ew
253 xdigit hexadecimal digit
254 .sp
255 In PCRE, POSIX character set names recognize only ASCII characters by default,
256 but some of them use Unicode properties if PCRE_UCP is set. You can use
257 \eQ...\eE inside a character class.
258 .
259 .
260 .SH "QUANTIFIERS"
261 .rs
262 .sp
263 ? 0 or 1, greedy
264 ?+ 0 or 1, possessive
265 ?? 0 or 1, lazy
266 * 0 or more, greedy
267 *+ 0 or more, possessive
268 *? 0 or more, lazy
269 + 1 or more, greedy
270 ++ 1 or more, possessive
271 +? 1 or more, lazy
272 {n} exactly n
273 {n,m} at least n, no more than m, greedy
274 {n,m}+ at least n, no more than m, possessive
275 {n,m}? at least n, no more than m, lazy
276 {n,} n or more, greedy
277 {n,}+ n or more, possessive
278 {n,}? n or more, lazy
279 .
280 .
281 .SH "ANCHORS AND SIMPLE ASSERTIONS"
282 .rs
283 .sp
284 \eb word boundary
285 \eB not a word boundary
286 ^ start of subject
287 also after internal newline in multiline mode
288 \eA start of subject
289 $ end of subject
290 also before newline at end of subject
291 also before internal newline in multiline mode
292 \eZ end of subject
293 also before newline at end of subject
294 \ez end of subject
295 \eG first matching position in subject
296 .
297 .
298 .SH "MATCH POINT RESET"
299 .rs
300 .sp
301 \eK reset start of match
302 .
303 .
304 .SH "ALTERNATION"
305 .rs
306 .sp
307 expr|expr|expr...
308 .
309 .
310 .SH "CAPTURING"
311 .rs
312 .sp
313 (...) capturing group
314 (?<name>...) named capturing group (Perl)
315 (?'name'...) named capturing group (Perl)
316 (?P<name>...) named capturing group (Python)
317 (?:...) non-capturing group
318 (?|...) non-capturing group; reset group numbers for
319 capturing groups in each alternative
320 .
321 .
322 .SH "ATOMIC GROUPS"
323 .rs
324 .sp
325 (?>...) atomic, non-capturing group
326 .
327 .
328 .
329 .
330 .SH "COMMENT"
331 .rs
332 .sp
333 (?#....) comment (not nestable)
334 .
335 .
336 .SH "OPTION SETTING"
337 .rs
338 .sp
339 (?i) caseless
340 (?J) allow duplicate names
341 (?m) multiline
342 (?s) single line (dotall)
343 (?U) default ungreedy (lazy)
344 (?x) extended (ignore white space)
345 (?-...) unset option(s)
346 .sp
347 The following are recognized only at the start of a pattern or after one of the
348 newline-setting options with similar syntax:
349 .sp
350 (*LIMIT_MATCH=d) set the match limit to d (decimal number)
351 (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
352 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
353 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
354 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
355 (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
356 (*UTF) set appropriate UTF mode for the library in use
357 (*UCP) set PCRE_UCP (use Unicode properties for \ed etc)
358 .
359 .
360 .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
361 .rs
362 .sp
363 (?=...) positive look ahead
364 (?!...) negative look ahead
365 (?<=...) positive look behind
366 (?<!...) negative look behind
367 .sp
368 Each top-level branch of a look behind must be of a fixed length.
369 .
370 .
371 .SH "BACKREFERENCES"
372 .rs
373 .sp
374 \en reference by number (can be ambiguous)
375 \egn reference by number
376 \eg{n} reference by number
377 \eg{-n} relative reference by number
378 \ek<name> reference by name (Perl)
379 \ek'name' reference by name (Perl)
380 \eg{name} reference by name (Perl)
381 \ek{name} reference by name (.NET)
382 (?P=name) reference by name (Python)
383 .
384 .
385 .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
386 .rs
387 .sp
388 (?R) recurse whole pattern
389 (?n) call subpattern by absolute number
390 (?+n) call subpattern by relative number
391 (?-n) call subpattern by relative number
392 (?&name) call subpattern by name (Perl)
393 (?P>name) call subpattern by name (Python)
394 \eg<name> call subpattern by name (Oniguruma)
395 \eg'name' call subpattern by name (Oniguruma)
396 \eg<n> call subpattern by absolute number (Oniguruma)
397 \eg'n' call subpattern by absolute number (Oniguruma)
398 \eg<+n> call subpattern by relative number (PCRE extension)
399 \eg'+n' call subpattern by relative number (PCRE extension)
400 \eg<-n> call subpattern by relative number (PCRE extension)
401 \eg'-n' call subpattern by relative number (PCRE extension)
402 .
403 .
404 .SH "CONDITIONAL PATTERNS"
405 .rs
406 .sp
407 (?(condition)yes-pattern)
408 (?(condition)yes-pattern|no-pattern)
409 .sp
410 (?(n)... absolute reference condition
411 (?(+n)... relative reference condition
412 (?(-n)... relative reference condition
413 (?(<name>)... named reference condition (Perl)
414 (?('name')... named reference condition (Perl)
415 (?(name)... named reference condition (PCRE)
416 (?(R)... overall recursion condition
417 (?(Rn)... specific group recursion condition
418 (?(R&name)... specific recursion condition
419 (?(DEFINE)... define subpattern for reference
420 (?(assert)... assertion condition
421 .
422 .
423 .SH "BACKTRACKING CONTROL"
424 .rs
425 .sp
426 The following act immediately they are reached:
427 .sp
428 (*ACCEPT) force successful match
429 (*FAIL) force backtrack; synonym (*F)
430 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
431 .sp
432 The following act only when a subsequent match failure causes a backtrack to
433 reach them. They all force a match failure, but they differ in what happens
434 afterwards. Those that advance the start-of-match point do so only if the
435 pattern is not anchored.
436 .sp
437 (*COMMIT) overall failure, no advance of starting point
438 (*PRUNE) advance to next starting character
439 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
440 (*SKIP) advance to current matching position
441 (*SKIP:NAME) advance to position corresponding to an earlier
442 (*MARK:NAME); if not found, the (*SKIP) is ignored
443 (*THEN) local failure, backtrack to next alternation
444 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
445 .
446 .
447 .SH "NEWLINE CONVENTIONS"
448 .rs
449 .sp
450 These are recognized only at the very start of the pattern or after a
451 (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
452 .sp
453 (*CR) carriage return only
454 (*LF) linefeed only
455 (*CRLF) carriage return followed by linefeed
456 (*ANYCRLF) all three of the above
457 (*ANY) any Unicode newline sequence
458 .
459 .
460 .SH "WHAT \eR MATCHES"
461 .rs
462 .sp
463 These are recognized only at the very start of the pattern or after a
464 (*...) option that sets the newline convention or a UTF or UCP mode.
465 .sp
466 (*BSR_ANYCRLF) CR, LF, or CRLF
467 (*BSR_UNICODE) any Unicode newline sequence
468 .
469 .
470 .SH "CALLOUTS"
471 .rs
472 .sp
473 (?C) callout
474 (?Cn) callout with data n
475 .
476 .
477 .SH "SEE ALSO"
478 .rs
479 .sp
480 \fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3),
481 \fBpcrematching\fP(3), \fBpcre\fP(3).
482 .
483 .
484 .SH AUTHOR
485 .rs
486 .sp
487 .nf
488 Philip Hazel
489 University Computing Service
490 Cambridge CB2 3QH, England.
491 .fi
492 .
493 .
494 .SH REVISION
495 .rs
496 .sp
497 .nf
498 Last updated: 26 April 2013
499 Copyright (c) 1997-2013 University of Cambridge.
500 .fi

  ViewVC Help
Powered by ViewVC 1.1.5