/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 251 by ph10, Mon Sep 17 10:33:48 2007 UTC revision 394 by ph10, Wed Mar 18 16:38:23 2009 UTC
# Line 9  are described in detail below. There is Line 9  are described in detail below. There is
9  .\" HREF  .\" HREF
10  \fBpcresyntax\fP  \fBpcresyntax\fP
11  .\"  .\"
12  page. Perl's regular expressions are described in its own documentation, and  page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE
13    also supports some alternative regular expression syntax (which does not
14    conflict with the Perl syntax) in order to provide some compatibility with
15    regular expressions in Python, .NET, and Oniguruma.
16    .P
17    Perl's regular expressions are described in its own documentation, and
18  regular expressions in general are covered in a number of books, some of which  regular expressions in general are covered in a number of books, some of which
19  have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",  have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
20  published by O'Reilly, covers regular expressions in great detail. This  published by O'Reilly, covers regular expressions in great detail. This
# Line 310  parenthesized subpatterns. Line 315  parenthesized subpatterns.
315  .\"  .\"
316  .  .
317  .  .
318    .SS "Absolute and relative subroutine calls"
319    .rs
320    .sp
321    For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or
322    a number enclosed either in angle brackets or single quotes, is an alternative
323    syntax for referencing a subpattern as a "subroutine". Details are discussed
324    .\" HTML <a href="#onigurumasubroutines">
325    .\" </a>
326    later.
327    .\"
328    Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
329    synonymous. The former is a back reference; the latter is a subroutine call.
330    .
331    .
332  .SS "Generic character types"  .SS "Generic character types"
333  .rs  .rs
334  .sp  .sp
# Line 345  In UTF-8 mode, characters with values gr Line 364  In UTF-8 mode, characters with values gr
364  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode
365  character property support is available. These sequences retain their original  character property support is available. These sequences retain their original
366  meanings from before UTF-8 support was available, mainly for efficiency  meanings from before UTF-8 support was available, mainly for efficiency
367  reasons.  reasons. Note that this also affects \eb, because it is defined in terms of \ew
368    and \eW.
369  .P  .P
370  The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the  The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the
371  other sequences, these do match certain high-valued codepoints in UTF-8 mode.  other sequences, these do match certain high-valued codepoints in UTF-8 mode.
# Line 1043  has set or what has been defaulted. Deta Line 1063  has set or what has been defaulted. Deta
1063  .\" </a>  .\" </a>
1064  "Newline sequences"  "Newline sequences"
1065  .\"  .\"
1066  above.  above.
1067  .  .
1068  .  .
1069  .\" HTML <a name="subpattern"></a>  .\" HTML <a name="subpattern"></a>
# Line 1192  details of the interfaces for handling n Line 1212  details of the interfaces for handling n
1212  \fBpcreapi\fP  \fBpcreapi\fP
1213  .\"  .\"
1214  documentation.  documentation.
1215    .P
1216    \fBWarning:\fP You cannot use different names to distinguish between two
1217    subpatterns with the same number (see the previous section) because PCRE uses
1218    only the numbers when matching.
1219  .  .
1220  .  .
1221  .SH REPETITION  .SH REPETITION
# Line 1240  support is available, \eX{3} matches thr Line 1264  support is available, \eX{3} matches thr
1264  which may be several bytes long (and they may be of different lengths).  which may be several bytes long (and they may be of different lengths).
1265  .P  .P
1266  The quantifier {0} is permitted, causing the expression to behave as if the  The quantifier {0} is permitted, causing the expression to behave as if the
1267  previous item and the quantifier were not present.  previous item and the quantifier were not present. This may be useful for
1268    subpatterns that are referenced as
1269    .\" HTML <a href="#subpatternsassubroutines">
1270    .\" </a>
1271    subroutines
1272    .\"
1273    from elsewhere in the pattern. Items other than subpatterns that have a {0}
1274    quantifier are omitted from the compiled pattern.
1275  .P  .P
1276  For convenience, the three most common quantifiers have single-character  For convenience, the three most common quantifiers have single-character
1277  abbreviations:  abbreviations:
# Line 2023  It matches "abcabc". It does not match " Line 2054  It matches "abcabc". It does not match "
2054  processing option does not affect the called subpattern.  processing option does not affect the called subpattern.
2055  .  .
2056  .  .
2057    .\" HTML <a name="onigurumasubroutines"></a>
2058    .SH "ONIGURUMA SUBROUTINE SYNTAX"
2059    .rs
2060    .sp
2061    For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or
2062    a number enclosed either in angle brackets or single quotes, is an alternative
2063    syntax for referencing a subpattern as a subroutine, possibly recursively. Here
2064    are two of the examples used above, rewritten using this syntax:
2065    .sp
2066      (?<pn> \e( ( (?>[^()]+) | \eg<pn> )* \e) )
2067      (sens|respons)e and \eg'1'ibility
2068    .sp
2069    PCRE supports an extension to Oniguruma: if a number is preceded by a
2070    plus or a minus sign it is taken as a relative reference. For example:
2071    .sp
2072      (abc)(?i:\eg<-1>)
2073    .sp
2074    Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
2075    synonymous. The former is a back reference; the latter is a subroutine call.
2076    .
2077    .
2078  .SH CALLOUTS  .SH CALLOUTS
2079  .rs  .rs
2080  .sp  .sp
# Line 2068  or removal in a future version of Perl". Line 2120  or removal in a future version of Perl".
2120  production code should be noted to avoid problems during upgrades." The same  production code should be noted to avoid problems during upgrades." The same
2121  remarks apply to the PCRE features described in this section.  remarks apply to the PCRE features described in this section.
2122  .P  .P
2123  Since these verbs are specifically related to backtracking, they can be used  Since these verbs are specifically related to backtracking, most of them can be
2124  only when the pattern is to be matched using \fBpcre_exec()\fP, which uses a  used only when the pattern is to be matched using \fBpcre_exec()\fP, which uses
2125  backtracking algorithm. They cause an error if encountered by  a backtracking algorithm. With the exception of (*FAIL), which behaves like a
2126    failing negative assertion, they cause an error if encountered by
2127  \fBpcre_dfa_exec()\fP.  \fBpcre_dfa_exec()\fP.
2128  .P  .P
2129  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
# Line 2192  Cambridge CB2 3QH, England. Line 2245  Cambridge CB2 3QH, England.
2245  .rs  .rs
2246  .sp  .sp
2247  .nf  .nf
2248  Last updated: 17 September 2007  Last updated: 18 March 2009
2249  Copyright (c) 1997-2007 University of Cambridge.  Copyright (c) 1997-2009 University of Cambridge.
2250  .fi  .fi

Legend:
Removed from v.251  
changed lines
  Added in v.394

  ViewVC Help
Powered by ViewVC 1.1.5