/[pcre]/code/trunk/doc/html/pcresyntax.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcresyntax.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 517 by ph10, Wed Mar 10 16:08:01 2010 UTC revision 518 by ph10, Tue May 18 15:47:01 2010 UTC
# Line 17  man page, in case the conversion went wr Line 17  man page, in case the conversion went wr
17  <li><a name="TOC2" href="#SEC2">QUOTING</a>  <li><a name="TOC2" href="#SEC2">QUOTING</a>
18  <li><a name="TOC3" href="#SEC3">CHARACTERS</a>  <li><a name="TOC3" href="#SEC3">CHARACTERS</a>
19  <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>  <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
20  <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>  <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
21  <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>  <li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
22  <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>  <li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a>
23  <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>  <li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a>
24  <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>  <li><a name="TOC9" href="#SEC9">QUANTIFIERS</a>
25  <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>  <li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a>
26  <li><a name="TOC11" href="#SEC11">ALTERNATION</a>  <li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a>
27  <li><a name="TOC12" href="#SEC12">CAPTURING</a>  <li><a name="TOC12" href="#SEC12">ALTERNATION</a>
28  <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>  <li><a name="TOC13" href="#SEC13">CAPTURING</a>
29  <li><a name="TOC14" href="#SEC14">COMMENT</a>  <li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a>
30  <li><a name="TOC15" href="#SEC15">OPTION SETTING</a>  <li><a name="TOC15" href="#SEC15">COMMENT</a>
31  <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>  <li><a name="TOC16" href="#SEC16">OPTION SETTING</a>
32  <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>  <li><a name="TOC17" href="#SEC17">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
33  <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>  <li><a name="TOC18" href="#SEC18">BACKREFERENCES</a>
34  <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>  <li><a name="TOC19" href="#SEC19">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
35  <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>  <li><a name="TOC20" href="#SEC20">CONDITIONAL PATTERNS</a>
36  <li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>  <li><a name="TOC21" href="#SEC21">BACKTRACKING CONTROL</a>
37  <li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>  <li><a name="TOC22" href="#SEC22">NEWLINE CONVENTIONS</a>
38  <li><a name="TOC23" href="#SEC23">CALLOUTS</a>  <li><a name="TOC23" href="#SEC23">WHAT \R MATCHES</a>
39  <li><a name="TOC24" href="#SEC24">SEE ALSO</a>  <li><a name="TOC24" href="#SEC24">CALLOUTS</a>
40  <li><a name="TOC25" href="#SEC25">AUTHOR</a>  <li><a name="TOC25" href="#SEC25">SEE ALSO</a>
41  <li><a name="TOC26" href="#SEC26">REVISION</a>  <li><a name="TOC26" href="#SEC26">AUTHOR</a>
42    <li><a name="TOC27" href="#SEC27">REVISION</a>
43  </ul>  </ul>
44  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
45  <P>  <P>
# Line 80  syntax. Line 81  syntax.
81    \D         a character that is not a decimal digit    \D         a character that is not a decimal digit
82    \h         a horizontal whitespace character    \h         a horizontal whitespace character
83    \H         a character that is not a horizontal whitespace character    \H         a character that is not a horizontal whitespace character
84      \N         a character that is not a newline
85    \p{<i>xx</i>}     a character with the <i>xx</i> property    \p{<i>xx</i>}     a character with the <i>xx</i> property
86    \P{<i>xx</i>}     a character without the <i>xx</i> property    \P{<i>xx</i>}     a character without the <i>xx</i> property
87    \R         a newline sequence    \R         a newline sequence
# Line 93  syntax. Line 95  syntax.
95  </pre>  </pre>
96  In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.  In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
97  </P>  </P>
98  <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>  <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
99  <P>  <P>
100  <pre>  <pre>
101    C          Other    C          Other
# Line 142  In PCRE, \d, \D, \s, \S, \w, and \W reco Line 144  In PCRE, \d, \D, \s, \S, \w, and \W reco
144    Zs         Space separator    Zs         Space separator
145  </PRE>  </PRE>
146  </P>  </P>
147  <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>  <br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
148    <P>
149    <pre>
150      Xan        Alphanumeric: union of properties L and N
151      Xps        POSIX space: property Z or tab, NL, VT, FF, CR
152      Xsp        Perl space: property Z or tab, NL, FF, CR
153      Xwd        Perl word: property Xan or underscore
154    </PRE>
155    </P>
156    <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
157  <P>  <P>
158  Arabic,  Arabic,
159  Armenian,  Armenian,
# Line 237  Ugaritic, Line 248  Ugaritic,
248  Vai,  Vai,
249  Yi.  Yi.
250  </P>  </P>
251  <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>  <br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
252  <P>  <P>
253  <pre>  <pre>
254    [...]       positive character class    [...]       positive character class
# Line 264  Yi. Line 275  Yi.
275  In PCRE, POSIX character set names recognize only ASCII characters. You can use  In PCRE, POSIX character set names recognize only ASCII characters. You can use
276  \Q...\E inside a character class.  \Q...\E inside a character class.
277  </P>  </P>
278  <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>  <br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br>
279  <P>  <P>
280  <pre>  <pre>
281    ?           0 or 1, greedy    ?           0 or 1, greedy
# Line 285  In PCRE, POSIX character set names recog Line 296  In PCRE, POSIX character set names recog
296    {n,}?       n or more, lazy    {n,}?       n or more, lazy
297  </PRE>  </PRE>
298  </P>  </P>
299  <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>  <br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
300  <P>  <P>
301  <pre>  <pre>
302    \b          word boundary (only ASCII letters recognized)    \b          word boundary (only ASCII letters recognized)
# Line 302  In PCRE, POSIX character set names recog Line 313  In PCRE, POSIX character set names recog
313    \G          first matching position in subject    \G          first matching position in subject
314  </PRE>  </PRE>
315  </P>  </P>
316  <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>  <br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br>
317  <P>  <P>
318  <pre>  <pre>
319    \K          reset start of match    \K          reset start of match
320  </PRE>  </PRE>
321  </P>  </P>
322  <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>  <br><a name="SEC12" href="#TOC1">ALTERNATION</a><br>
323  <P>  <P>
324  <pre>  <pre>
325    expr|expr|expr...    expr|expr|expr...
326  </PRE>  </PRE>
327  </P>  </P>
328  <br><a name="SEC12" href="#TOC1">CAPTURING</a><br>  <br><a name="SEC13" href="#TOC1">CAPTURING</a><br>
329  <P>  <P>
330  <pre>  <pre>
331    (...)           capturing group    (...)           capturing group
# Line 326  In PCRE, POSIX character set names recog Line 337  In PCRE, POSIX character set names recog
337                     capturing groups in each alternative                     capturing groups in each alternative
338  </PRE>  </PRE>
339  </P>  </P>
340  <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>  <br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br>
341  <P>  <P>
342  <pre>  <pre>
343    (?&#62;...)         atomic, non-capturing group    (?&#62;...)         atomic, non-capturing group
344  </PRE>  </PRE>
345  </P>  </P>
346  <br><a name="SEC14" href="#TOC1">COMMENT</a><br>  <br><a name="SEC15" href="#TOC1">COMMENT</a><br>
347  <P>  <P>
348  <pre>  <pre>
349    (?#....)        comment (not nestable)    (?#....)        comment (not nestable)
350  </PRE>  </PRE>
351  </P>  </P>
352  <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>  <br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br>
353  <P>  <P>
354  <pre>  <pre>
355    (?i)            caseless    (?i)            caseless
# Line 355  newline-setting options with similar syn Line 366  newline-setting options with similar syn
366    (*UTF8)         set UTF-8 mode    (*UTF8)         set UTF-8 mode
367  </PRE>  </PRE>
368  </P>  </P>
369  <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>  <br><a name="SEC17" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
370  <P>  <P>
371  <pre>  <pre>
372    (?=...)         positive look ahead    (?=...)         positive look ahead
# Line 365  newline-setting options with similar syn Line 376  newline-setting options with similar syn
376  </pre>  </pre>
377  Each top-level branch of a look behind must be of a fixed length.  Each top-level branch of a look behind must be of a fixed length.
378  </P>  </P>
379  <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>  <br><a name="SEC18" href="#TOC1">BACKREFERENCES</a><br>
380  <P>  <P>
381  <pre>  <pre>
382    \n              reference by number (can be ambiguous)    \n              reference by number (can be ambiguous)
# Line 379  Each top-level branch of a look behind m Line 390  Each top-level branch of a look behind m
390    (?P=name)       reference by name (Python)    (?P=name)       reference by name (Python)
391  </PRE>  </PRE>
392  </P>  </P>
393  <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>  <br><a name="SEC19" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
394  <P>  <P>
395  <pre>  <pre>
396    (?R)            recurse whole pattern    (?R)            recurse whole pattern
# Line 398  Each top-level branch of a look behind m Line 409  Each top-level branch of a look behind m
409    \g'-n'          call subpattern by relative number (PCRE extension)    \g'-n'          call subpattern by relative number (PCRE extension)
410  </PRE>  </PRE>
411  </P>  </P>
412  <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>  <br><a name="SEC20" href="#TOC1">CONDITIONAL PATTERNS</a><br>
413  <P>  <P>
414  <pre>  <pre>
415    (?(condition)yes-pattern)    (?(condition)yes-pattern)
# Line 417  Each top-level branch of a look behind m Line 428  Each top-level branch of a look behind m
428    (?(assert)...   assertion condition    (?(assert)...   assertion condition
429  </PRE>  </PRE>
430  </P>  </P>
431  <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>  <br><a name="SEC21" href="#TOC1">BACKTRACKING CONTROL</a><br>
432  <P>  <P>
433  The following act immediately they are reached:  The following act immediately they are reached:
434  <pre>  <pre>
# Line 435  pattern is not anchored. Line 446  pattern is not anchored.
446    (*THEN)         local failure, backtrack to next alternation    (*THEN)         local failure, backtrack to next alternation
447  </PRE>  </PRE>
448  </P>  </P>
449  <br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>  <br><a name="SEC22" href="#TOC1">NEWLINE CONVENTIONS</a><br>
450  <P>  <P>
451  These are recognized only at the very start of the pattern or after a  These are recognized only at the very start of the pattern or after a
452  (*BSR_...) or (*UTF8) option.  (*BSR_...) or (*UTF8) option.
# Line 447  These are recognized only at the very st Line 458  These are recognized only at the very st
458    (*ANY)          any Unicode newline sequence    (*ANY)          any Unicode newline sequence
459  </PRE>  </PRE>
460  </P>  </P>
461  <br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>  <br><a name="SEC23" href="#TOC1">WHAT \R MATCHES</a><br>
462  <P>  <P>
463  These are recognized only at the very start of the pattern or after a  These are recognized only at the very start of the pattern or after a
464  (*...) option that sets the newline convention or UTF-8 mode.  (*...) option that sets the newline convention or UTF-8 mode.
# Line 456  These are recognized only at the very st Line 467  These are recognized only at the very st
467    (*BSR_UNICODE)  any Unicode newline sequence    (*BSR_UNICODE)  any Unicode newline sequence
468  </PRE>  </PRE>
469  </P>  </P>
470  <br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>  <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
471  <P>  <P>
472  <pre>  <pre>
473    (?C)      callout    (?C)      callout
474    (?Cn)     callout with data n    (?Cn)     callout with data n
475  </PRE>  </PRE>
476  </P>  </P>
477  <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC25" href="#TOC1">SEE ALSO</a><br>
478  <P>  <P>
479  <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),  <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
480  <b>pcrematching</b>(3), <b>pcre</b>(3).  <b>pcrematching</b>(3), <b>pcre</b>(3).
481  </P>  </P>
482  <br><a name="SEC25" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC26" href="#TOC1">AUTHOR</a><br>
483  <P>  <P>
484  Philip Hazel  Philip Hazel
485  <br>  <br>
# Line 477  University Computing Service Line 488  University Computing Service
488  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
489  <br>  <br>
490  </P>  </P>
491  <br><a name="SEC26" href="#TOC1">REVISION</a><br>  <br><a name="SEC27" href="#TOC1">REVISION</a><br>
492  <P>  <P>
493  Last updated: 01 March 2010  Last updated: 05 May 2010
494  <br>  <br>
495  Copyright &copy; 1997-2010 University of Cambridge.  Copyright &copy; 1997-2010 University of Cambridge.
496  <br>  <br>

Legend:
Removed from v.517  
changed lines
  Added in v.518

  ViewVC Help
Powered by ViewVC 1.1.5