/[pcre]/code/trunk/doc/pcresyntax.3
ViewVC logotype

Diff of /code/trunk/doc/pcresyntax.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 231 by ph10, Tue Sep 11 11:15:33 2007 UTC revision 1011 by ph10, Sat Aug 25 11:36:15 2012 UTC
# Line 1  Line 1 
1  .TH PCRESYNTAX 3  .TH PCRESYNTAX 3 "10 January 2012" "PCRE 8.30"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"  .SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY"
# Line 9  PCRE are described in the Line 9  PCRE are described in the
9  .\" HREF  .\" HREF
10  \fBpcrepattern\fP  \fBpcrepattern\fP
11  .\"  .\"
12  documentation. This document contains just a quick-reference summary of the  documentation. This document contains a quick-reference summary of the syntax.
 syntax.  
13  .  .
14  .  .
15  .SH "QUOTING"  .SH "QUOTING"
# Line 24  syntax. Line 23  syntax.
23  .rs  .rs
24  .sp  .sp
25    \ea         alarm, that is, the BEL character (hex 07)    \ea         alarm, that is, the BEL character (hex 07)
26    \ecx        "control-x", where x is any character    \ecx        "control-x", where x is any ASCII character
27    \ee         escape (hex 1B)    \ee         escape (hex 1B)
28    \ef         formfeed (hex 0C)    \ef         form feed (hex 0C)
29    \en         newline (hex 0A)    \en         newline (hex 0A)
30    \er         carriage return (hex 0D)    \er         carriage return (hex 0D)
31    \et         tab (hex 09)    \et         tab (hex 09)
# Line 40  syntax. Line 39  syntax.
39  .sp  .sp
40    .          any character except newline;    .          any character except newline;
41                 in dotall mode, any character whatsoever                 in dotall mode, any character whatsoever
42    \eC         one byte, even in UTF-8 mode (best avoided)    \eC         one data unit, even in UTF mode (best avoided)
43    \ed         a decimal digit    \ed         a decimal digit
44    \eD         a character that is not a decimal digit    \eD         a character that is not a decimal digit
45    \eh         a horizontal whitespace character    \eh         a horizontal white space character
46    \eH         a character that is not a horizontal whitespace character    \eH         a character that is not a horizontal white space character
47      \eN         a character that is not a newline
48    \ep{\fIxx\fP}     a character with the \fIxx\fP property    \ep{\fIxx\fP}     a character with the \fIxx\fP property
49    \eP{\fIxx\fP}     a character without the \fIxx\fP property    \eP{\fIxx\fP}     a character without the \fIxx\fP property
50    \eR         a newline sequence    \eR         a newline sequence
51    \es         a whitespace character    \es         a white space character
52    \eS         a character that is not a whitespace character    \eS         a character that is not a white space character
53    \ev         a vertical whitespace character    \ev         a vertical white space character
54    \eV         a character that is not a vertical whitespace character    \eV         a character that is not a vertical white space character
55    \ew         a "word" character    \ew         a "word" character
56    \eW         a "non-word" character    \eW         a "non-word" character
57    \eX         an extended Unicode sequence    \eX         a Unicode extended grapheme cluster
58  .sp  .sp
59  In PCRE, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII characters.  In PCRE, by default, \ed, \eD, \es, \eS, \ew, and \eW recognize only ASCII
60    characters, even in a UTF mode. However, this can be changed by setting the
61    PCRE_UCP option.
62  .  .
63  .  .
64  .SH "GENERAL CATEGORY PROPERTY CODES FOR \ep and \eP"  .SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP"
65  .rs  .rs
66  .sp  .sp
67    C          Other    C          Other
# Line 108  In PCRE, \ed, \eD, \es, \eS, \ew, and \e Line 110  In PCRE, \ed, \eD, \es, \eS, \ew, and \e
110    Zs         Space separator    Zs         Space separator
111  .  .
112  .  .
113    .SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP"
114    .rs
115    .sp
116      Xan        Alphanumeric: union of properties L and N
117      Xps        POSIX space: property Z or tab, NL, VT, FF, CR
118      Xsp        Perl space: property Z or tab, NL, FF, CR
119      Xwd        Perl word: property Xan or underscore
120    .
121    .
122  .SH "SCRIPT NAMES FOR \ep AND \eP"  .SH "SCRIPT NAMES FOR \ep AND \eP"
123  .rs  .rs
124  .sp  .sp
125  Arabic,  Arabic,
126  Armenian,  Armenian,
127    Avestan,
128  Balinese,  Balinese,
129    Bamum,
130    Batak,
131  Bengali,  Bengali,
132  Bopomofo,  Bopomofo,
133    Brahmi,
134  Braille,  Braille,
135  Buginese,  Buginese,
136  Buhid,  Buhid,
137  Canadian_Aboriginal,  Canadian_Aboriginal,
138    Carian,
139    Chakma,
140    Cham,
141  Cherokee,  Cherokee,
142  Common,  Common,
143  Coptic,  Coptic,
# Line 128  Cypriot, Line 146  Cypriot,
146  Cyrillic,  Cyrillic,
147  Deseret,  Deseret,
148  Devanagari,  Devanagari,
149    Egyptian_Hieroglyphs,
150  Ethiopic,  Ethiopic,
151  Georgian,  Georgian,
152  Glagolitic,  Glagolitic,
# Line 140  Hangul, Line 159  Hangul,
159  Hanunoo,  Hanunoo,
160  Hebrew,  Hebrew,
161  Hiragana,  Hiragana,
162    Imperial_Aramaic,
163  Inherited,  Inherited,
164    Inscriptional_Pahlavi,
165    Inscriptional_Parthian,
166    Javanese,
167    Kaithi,
168  Kannada,  Kannada,
169  Katakana,  Katakana,
170    Kayah_Li,
171  Kharoshthi,  Kharoshthi,
172  Khmer,  Khmer,
173  Lao,  Lao,
174  Latin,  Latin,
175    Lepcha,
176  Limbu,  Limbu,
177  Linear_B,  Linear_B,
178    Lisu,
179    Lycian,
180    Lydian,
181  Malayalam,  Malayalam,
182    Mandaic,
183    Meetei_Mayek,
184    Meroitic_Cursive,
185    Meroitic_Hieroglyphs,
186    Miao,
187  Mongolian,  Mongolian,
188  Myanmar,  Myanmar,
189  New_Tai_Lue,  New_Tai_Lue,
# Line 157  Nko, Line 191  Nko,
191  Ogham,  Ogham,
192  Old_Italic,  Old_Italic,
193  Old_Persian,  Old_Persian,
194    Old_South_Arabian,
195    Old_Turkic,
196    Ol_Chiki,
197  Oriya,  Oriya,
198  Osmanya,  Osmanya,
199  Phags_Pa,  Phags_Pa,
200  Phoenician,  Phoenician,
201    Rejang,
202  Runic,  Runic,
203    Samaritan,
204    Saurashtra,
205    Sharada,
206  Shavian,  Shavian,
207  Sinhala,  Sinhala,
208    Sora_Sompeng,
209    Sundanese,
210  Syloti_Nagri,  Syloti_Nagri,
211  Syriac,  Syriac,
212  Tagalog,  Tagalog,
213  Tagbanwa,  Tagbanwa,
214  Tai_Le,  Tai_Le,
215    Tai_Tham,
216    Tai_Viet,
217    Takri,
218  Tamil,  Tamil,
219  Telugu,  Telugu,
220  Thaana,  Thaana,
# Line 176  Thai, Line 222  Thai,
222  Tibetan,  Tibetan,
223  Tifinagh,  Tifinagh,
224  Ugaritic,  Ugaritic,
225    Vai,
226  Yi.  Yi.
227  .  .
228  .  .
# Line 186  Yi. Line 233  Yi.
233    [^...]      negative character class    [^...]      negative character class
234    [x-y]       range (can be used for hex characters)    [x-y]       range (can be used for hex characters)
235    [[:xxx:]]   positive POSIX named set    [[:xxx:]]   positive POSIX named set
236    [[^:xxx:]]  negative POSIX named set    [[:^xxx:]]  negative POSIX named set
237  .sp  .sp
238    alnum       alphanumeric    alnum       alphanumeric
239    alpha       alphabetic    alpha       alphabetic
# Line 198  Yi. Line 245  Yi.
245    lower       lower case letter    lower       lower case letter
246    print       printing, including space    print       printing, including space
247    punct       printing, excluding alphanumeric    punct       printing, excluding alphanumeric
248    space       whitespace    space       white space
249    upper       upper case letter    upper       upper case letter
250    word        same as \ew    word        same as \ew
251    xdigit      hexadecimal digit    xdigit      hexadecimal digit
252  .sp  .sp
253  In PCRE, POSIX character set names recognize only ASCII characters. You can use  In PCRE, POSIX character set names recognize only ASCII characters by default,
254    but some of them use Unicode properties if PCRE_UCP is set. You can use
255  \eQ...\eE inside a character class.  \eQ...\eE inside a character class.
256  .  .
257  .  .
# Line 260  In PCRE, POSIX character set names recog Line 308  In PCRE, POSIX character set names recog
308  .SH "CAPTURING"  .SH "CAPTURING"
309  .rs  .rs
310  .sp  .sp
311    (...)          capturing group    (...)           capturing group
312    (?<name>...)   named capturing group (Perl)    (?<name>...)    named capturing group (Perl)
313    (?'name'...)   named capturing group (Perl)    (?'name'...)    named capturing group (Perl)
314    (?P<name>...)  named capturing group (Python)    (?P<name>...)   named capturing group (Python)
315    (?:...)        non-capturing group    (?:...)         non-capturing group
316    (?|...)        non-capturing group; reset group numbers for    (?|...)         non-capturing group; reset group numbers for
317                    capturing groups in each alternative                     capturing groups in each alternative
318  .  .
319  .  .
320  .SH "ATOMIC GROUPS"  .SH "ATOMIC GROUPS"
321  .rs  .rs
322  .sp  .sp
323    (?>...)        atomic, non-capturing group    (?>...)         atomic, non-capturing group
324  .  .
325  .  .
326  .  .
# Line 280  In PCRE, POSIX character set names recog Line 328  In PCRE, POSIX character set names recog
328  .SH "COMMENT"  .SH "COMMENT"
329  .rs  .rs
330  .sp  .sp
331    (?#....)       comment (not nestable)    (?#....)        comment (not nestable)
332  .  .
333  .  .
334  .SH "OPTION SETTING"  .SH "OPTION SETTING"
335  .rs  .rs
336  .sp  .sp
337    (?i)           caseless    (?i)            caseless
338    (?J)           allow duplicate names    (?J)            allow duplicate names
339    (?m)           multiline    (?m)            multiline
340    (?s)           single line (dotall)    (?s)            single line (dotall)
341    (?U)           default ungreedy (lazy)    (?U)            default ungreedy (lazy)
342    (?x)           extended (ignore white space)    (?x)            extended (ignore white space)
343    (?-...)        unset option(s)    (?-...)         unset option(s)
344    .sp
345    The following are recognized only at the start of a pattern or after one of the
346    newline-setting options with similar syntax:
347    .sp
348      (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
349      (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
350      (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
351      (*UCP)          set PCRE_UCP (use Unicode properties for \ed etc)
352  .  .
353  .  .
354  .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"  .SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS"
355  .rs  .rs
356  .sp  .sp
357    (?=...)        positive look ahead    (?=...)         positive look ahead
358    (?!...)        negative look ahead    (?!...)         negative look ahead
359    (?<=...)       positive look behind    (?<=...)        positive look behind
360    (?<!...)       negative look behind    (?<!...)        negative look behind
361  .sp  .sp
362  Each top-level branch of a look behind must be of a fixed length.  Each top-level branch of a look behind must be of a fixed length.
363    .
364    .
365  .SH "BACKREFERENCES"  .SH "BACKREFERENCES"
366  .rs  .rs
367  .sp  .sp
368    \en             reference by number (can be ambiguous)    \en              reference by number (can be ambiguous)
369    \egn            reference by number    \egn             reference by number
370    \eg{n}          reference by number    \eg{n}           reference by number
371    \eg{-n}         relative reference by number    \eg{-n}          relative reference by number
372    \ek<name>       reference by name (Perl)    \ek<name>        reference by name (Perl)
373    \ek'name'       reference by name (Perl)    \ek'name'        reference by name (Perl)
374    \eg{name}       reference by name (Perl)    \eg{name}        reference by name (Perl)
375    \ek{name}       reference by name (.NET)    \ek{name}        reference by name (.NET)
376    (?P=name)      reference by name (Python)    (?P=name)       reference by name (Python)
377  .  .
378  .  .
379  .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"  .SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)"
380  .rs  .rs
381  .sp  .sp
382    (?R)           recurse whole pattern    (?R)            recurse whole pattern
383    (?n)           call subpattern by absolute number    (?n)            call subpattern by absolute number
384    (?+n)          call subpattern by relative number    (?+n)           call subpattern by relative number
385    (?-n)          call subpattern by relative number    (?-n)           call subpattern by relative number
386    (?&name)       call subpattern by name (Perl)    (?&name)        call subpattern by name (Perl)
387    (?P>name)      call subpattern by name (Python)    (?P>name)       call subpattern by name (Python)
388      \eg<name>        call subpattern by name (Oniguruma)
389      \eg'name'        call subpattern by name (Oniguruma)
390      \eg<n>           call subpattern by absolute number (Oniguruma)
391      \eg'n'           call subpattern by absolute number (Oniguruma)
392      \eg<+n>          call subpattern by relative number (PCRE extension)
393      \eg'+n'          call subpattern by relative number (PCRE extension)
394      \eg<-n>          call subpattern by relative number (PCRE extension)
395      \eg'-n'          call subpattern by relative number (PCRE extension)
396  .  .
397  .  .
398  .SH "CONDITIONAL PATTERNS"  .SH "CONDITIONAL PATTERNS"
# Line 335  Each top-level branch of a look behind m Line 401  Each top-level branch of a look behind m
401    (?(condition)yes-pattern)    (?(condition)yes-pattern)
402    (?(condition)yes-pattern|no-pattern)    (?(condition)yes-pattern|no-pattern)
403  .sp  .sp
404    (?(n)...       absolute reference condition    (?(n)...        absolute reference condition
405    (?(+n)...      relative reference condition    (?(+n)...       relative reference condition
406    (?(-n)...      relative reference condition    (?(-n)...       relative reference condition
407    (?(<name>)...  named reference condition (Perl)    (?(<name>)...   named reference condition (Perl)
408    (?('name')...  named reference condition (Perl)    (?('name')...   named reference condition (Perl)
409    (?(name)...    named reference condition (PCRE)    (?(name)...     named reference condition (PCRE)
410    (?(R)...       overall recursion condition    (?(R)...        overall recursion condition
411    (?(Rn)...      specific group recursion condition    (?(Rn)...       specific group recursion condition
412    (?(R&name)...  specific recursion condition    (?(R&name)...   specific recursion condition
413    (?(DEFINE)...  define subpattern for reference    (?(DEFINE)...   define subpattern for reference
414    (?(assert)...  assertion condition    (?(assert)...   assertion condition
415  .  .
416  .  .
417  .SH "BACKTRACKING CONTROL"  .SH "BACKTRACKING CONTROL"
# Line 353  Each top-level branch of a look behind m Line 419  Each top-level branch of a look behind m
419  .sp  .sp
420  The following act immediately they are reached:  The following act immediately they are reached:
421  .sp  .sp
422    (*ACCEPT)      force successful match    (*ACCEPT)       force successful match
423    (*FAIL)        force backtrack; synonym (*F)    (*FAIL)         force backtrack; synonym (*F)
424      (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
425  .sp  .sp
426  The following act only when a subsequent match failure causes a backtrack to  The following act only when a subsequent match failure causes a backtrack to
427  reach them. They all force a match failure, but they differ in what happens  reach them. They all force a match failure, but they differ in what happens
428  afterwards. Those that advance the start-of-match point do so only if the  afterwards. Those that advance the start-of-match point do so only if the
429  pattern is not anchored.  pattern is not anchored.
430  .sp  .sp
431    (*COMMIT)      overall failure, no advance of starting point    (*COMMIT)       overall failure, no advance of starting point
432    (*PRUNE)       advance to next starting character    (*PRUNE)        advance to next starting character
433    (*SKIP)        advance start to current matching position    (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
434    (*THEN)        local failure, backtrack to next alternation    (*SKIP)         advance to current matching position
435      (*SKIP:NAME)    advance to position corresponding to an earlier
436                      (*MARK:NAME); if not found, the (*SKIP) is ignored
437      (*THEN)         local failure, backtrack to next alternation
438      (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
439  .  .
440  .  .
441  .SH "NEWLINE CONVENTIONS"  .SH "NEWLINE CONVENTIONS"
442  .rs  .rs
443  .sp  .sp
444  These are recognized only at the very start of a pattern.  These are recognized only at the very start of the pattern or after a
445    (*BSR_...), (*UTF8), (*UTF16) or (*UCP) option.
446  .sp  .sp
447    (*CR)    (*CR)           carriage return only
448    (*LF)    (*LF)           linefeed only
449    (*CRLF)    (*CRLF)         carriage return followed by linefeed
450    (*ANYCRLF)    (*ANYCRLF)      all three of the above
451    (*ANY)    (*ANY)          any Unicode newline sequence
452  .  .
453  .  .
454  .SH "WHAT \eR MATCHES"  .SH "WHAT \eR MATCHES"
455  .rs  .rs
456  .sp  .sp
457  These are recognized only at the very start of a pattern.  These are recognized only at the very start of the pattern or after a
458    (*...) option that sets the newline convention or a UTF or UCP mode.
459  .sp  .sp
460    (*BSR_ANYCRLF)    (*BSR_ANYCRLF)  CR, LF, or CRLF
461    (*BSR_UNICODE)    (*BSR_UNICODE)  any Unicode newline sequence
462  .  .
463  .  .
464  .SH "CALLOUTS"  .SH "CALLOUTS"
# Line 416  Cambridge CB2 3QH, England. Line 489  Cambridge CB2 3QH, England.
489  .rs  .rs
490  .sp  .sp
491  .nf  .nf
492  Last updated: 11 September 2007  Last updated: 25 August 2012
493  Copyright (c) 1997-2007 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
494  .fi  .fi

Legend:
Removed from v.231  
changed lines
  Added in v.1011

  ViewVC Help
Powered by ViewVC 1.1.5