/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 235 by ph10, Tue Sep 11 11:39:14 2007 UTC revision 385 by ph10, Sun Mar 8 16:56:58 2009 UTC
# Line 9  are described in detail below. There is Line 9  are described in detail below. There is
9  .\" HREF  .\" HREF
10  \fBpcresyntax\fP  \fBpcresyntax\fP
11  .\"  .\"
12  page. Perl's regular expressions are described in its own documentation, and  page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE
13    also supports some alternative regular expression syntax (which does not
14    conflict with the Perl syntax) in order to provide some compatibility with
15    regular expressions in Python, .NET, and Oniguruma.
16    .P
17    Perl's regular expressions are described in its own documentation, and
18  regular expressions in general are covered in a number of books, some of which  regular expressions in general are covered in a number of books, some of which
19  have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",  have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
20  published by O'Reilly, covers regular expressions in great detail. This  published by O'Reilly, covers regular expressions in great detail. This
# Line 89  this can be changed; see the description Line 94  this can be changed; see the description
94  .\" </a>  .\" </a>
95  "Newline sequences"  "Newline sequences"
96  .\"  .\"
97  below.  below. A change of \eR setting can be combined with a change of newline
98    convention.
99  .  .
100  .  .
101  .SH "CHARACTERS AND METACHARACTERS"  .SH "CHARACTERS AND METACHARACTERS"
# Line 309  parenthesized subpatterns. Line 315  parenthesized subpatterns.
315  .\"  .\"
316  .  .
317  .  .
318    .SS "Absolute and relative subroutine calls"
319    .rs
320    .sp
321    For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or
322    a number enclosed either in angle brackets or single quotes, is an alternative
323    syntax for referencing a subpattern as a "subroutine". Details are discussed
324    .\" HTML <a href="#onigurumasubroutines">
325    .\" </a>
326    later.
327    .\"
328    Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
329    synonymous. The former is a back reference; the latter is a subroutine call.
330    .
331    .
332  .SS "Generic character types"  .SS "Generic character types"
333  .rs  .rs
334  .sp  .sp
# Line 426  recognized. Line 446  recognized.
446  .P  .P
447  It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the  It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the
448  complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF  complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF
449  either at compile time or when the pattern is matched. This can be made the  either at compile time or when the pattern is matched. (BSR is an abbrevation
450  default when PCRE is built; if this is the case, the other behaviour can be  for "backslash R".) This can be made the default when PCRE is built; if this is
451  requested via the PCRE_BSR_UNICODE option. It is also possible to specify these  the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option.
452  settings by starting a pattern string with one of the following sequences:  It is also possible to specify these settings by starting a pattern string with
453    one of the following sequences:
454  .sp  .sp
455    (*BSR_ANYCRLF)   CR, LF, or CRLF only    (*BSR_ANYCRLF)   CR, LF, or CRLF only
456    (*BSR_UNICODE)   any Unicode newline sequence    (*BSR_UNICODE)   any Unicode newline sequence
# Line 438  These override the default and the optio Line 459  These override the default and the optio
459  they can be overridden by options given to \fBpcre_exec()\fP. Note that these  they can be overridden by options given to \fBpcre_exec()\fP. Note that these
460  special settings, which are not Perl-compatible, are recognized only at the  special settings, which are not Perl-compatible, are recognized only at the
461  very start of a pattern, and that they must be in upper case. If more than one  very start of a pattern, and that they must be in upper case. If more than one
462  of them is present, the last one is used.  of them is present, the last one is used. They can be combined with a change of
463  .P  newline convention, for example, a pattern can start with:
464    .sp
465      (*ANY)(*BSR_ANYCRLF)
466    .sp
467  Inside a character class, \eR matches the letter "R".  Inside a character class, \eR matches the letter "R".
468  .  .
469  .  .
# Line 1029  matches "ab", "aB", "c", and "C", even t Line 1053  matches "ab", "aB", "c", and "C", even t
1053  branch is abandoned before the option setting. This is because the effects of  branch is abandoned before the option setting. This is because the effects of
1054  option settings happen at compile time. There would be some very weird  option settings happen at compile time. There would be some very weird
1055  behaviour otherwise.  behaviour otherwise.
1056    .P
1057    \fBNote:\fP There are other PCRE-specific options that can be set by the
1058    application when the compile or match functions are called. In some cases the
1059    pattern can contain special leading sequences to override what the application
1060    has set or what has been defaulted. Details are given in the section entitled
1061    .\" HTML <a href="#newlineseq">
1062    .\" </a>
1063    "Newline sequences"
1064    .\"
1065    above.
1066  .  .
1067  .  .
1068  .\" HTML <a name="subpattern"></a>  .\" HTML <a name="subpattern"></a>
# Line 1177  details of the interfaces for handling n Line 1211  details of the interfaces for handling n
1211  \fBpcreapi\fP  \fBpcreapi\fP
1212  .\"  .\"
1213  documentation.  documentation.
1214    .P
1215    \fBWarning:\fP You cannot use different names to distinguish between two
1216    subpatterns with the same number (see the previous section) because PCRE uses
1217    only the numbers when matching.
1218  .  .
1219  .  .
1220  .SH REPETITION  .SH REPETITION
# Line 1225  support is available, \eX{3} matches thr Line 1263  support is available, \eX{3} matches thr
1263  which may be several bytes long (and they may be of different lengths).  which may be several bytes long (and they may be of different lengths).
1264  .P  .P
1265  The quantifier {0} is permitted, causing the expression to behave as if the  The quantifier {0} is permitted, causing the expression to behave as if the
1266  previous item and the quantifier were not present.  previous item and the quantifier were not present. This may be useful for
1267    subpatterns that are referenced as
1268    .\" HTML <a href="#subpatternsassubroutines">
1269    .\" </a>
1270    subroutines
1271    .\"
1272    from elsewhere in the pattern. Items other than subpatterns that have a {0}
1273    quantifier are omitted from the compiled pattern.
1274  .P  .P
1275  For convenience, the three most common quantifiers have single-character  For convenience, the three most common quantifiers have single-character
1276  abbreviations:  abbreviations:
# Line 2008  It matches "abcabc". It does not match " Line 2053  It matches "abcabc". It does not match "
2053  processing option does not affect the called subpattern.  processing option does not affect the called subpattern.
2054  .  .
2055  .  .
2056    .\" HTML <a name="onigurumasubroutines"></a>
2057    .SH "ONIGURUMA SUBROUTINE SYNTAX"
2058    .rs
2059    .sp
2060    For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or
2061    a number enclosed either in angle brackets or single quotes, is an alternative
2062    syntax for referencing a subpattern as a subroutine, possibly recursively. Here
2063    are two of the examples used above, rewritten using this syntax:
2064    .sp
2065      (?<pn> \e( ( (?>[^()]+) | \eg<pn> )* \e) )
2066      (sens|respons)e and \eg'1'ibility
2067    .sp
2068    PCRE supports an extension to Oniguruma: if a number is preceded by a
2069    plus or a minus sign it is taken as a relative reference. For example:
2070    .sp
2071      (abc)(?i:\eg<-1>)
2072    .sp
2073    Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP
2074    synonymous. The former is a back reference; the latter is a subroutine call.
2075    .
2076    .
2077  .SH CALLOUTS  .SH CALLOUTS
2078  .rs  .rs
2079  .sp  .sp
# Line 2053  or removal in a future version of Perl". Line 2119  or removal in a future version of Perl".
2119  production code should be noted to avoid problems during upgrades." The same  production code should be noted to avoid problems during upgrades." The same
2120  remarks apply to the PCRE features described in this section.  remarks apply to the PCRE features described in this section.
2121  .P  .P
2122  Since these verbs are specifically related to backtracking, they can be used  Since these verbs are specifically related to backtracking, most of them can be
2123  only when the pattern is to be matched using \fBpcre_exec()\fP, which uses a  used only when the pattern is to be matched using \fBpcre_exec()\fP, which uses
2124  backtracking algorithm. They cause an error if encountered by  a backtracking algorithm. With the exception of (*FAIL), which behaves like a
2125    failing negative assertion, they cause an error if encountered by
2126  \fBpcre_dfa_exec()\fP.  \fBpcre_dfa_exec()\fP.
2127  .P  .P
2128  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
# Line 2177  Cambridge CB2 3QH, England. Line 2244  Cambridge CB2 3QH, England.
2244  .rs  .rs
2245  .sp  .sp
2246  .nf  .nf
2247  Last updated: 11 September 2007  Last updated: 08 March 2009
2248  Copyright (c) 1997-2007 University of Cambridge.  Copyright (c) 1997-2009 University of Cambridge.
2249  .fi  .fi

Legend:
Removed from v.235  
changed lines
  Added in v.385

  ViewVC Help
Powered by ViewVC 1.1.5