/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 385 by ph10, Sun Mar 8 16:56:58 2009 UTC revision 447 by ph10, Tue Sep 15 18:17:54 2009 UTC
# Line 23  description of PCRE's regular expression Line 23  description of PCRE's regular expression
23  The original operation of PCRE was on strings of one-byte characters. However,  The original operation of PCRE was on strings of one-byte characters. However,
24  there is now also support for UTF-8 character strings. To use this, you must  there is now also support for UTF-8 character strings. To use this, you must
25  build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with  build PCRE to include UTF-8 support, and then call \fBpcre_compile()\fP with
26  the PCRE_UTF8 option. How this affects pattern matching is mentioned in several  the PCRE_UTF8 option. There is also a special sequence that can be given at the
27  places below. There is also a summary of UTF-8 features in the  start of a pattern:
28    .sp
29      (*UTF8)
30    .sp
31    Starting a pattern with this sequence is equivalent to setting the PCRE_UTF8
32    option. This feature is not Perl-compatible. How setting UTF-8 mode affects
33    pattern matching is mentioned in several places below. There is also a summary
34    of UTF-8 features in the
35  .\" HTML <a href="pcre.html#utf8support">  .\" HTML <a href="pcre.html#utf8support">
36  .\" </a>  .\" </a>
37  section on UTF-8 support  section on UTF-8 support
# Line 364  In UTF-8 mode, characters with values gr Line 371  In UTF-8 mode, characters with values gr
371  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode  \ew, and always match \eD, \eS, and \eW. This is true even when Unicode
372  character property support is available. These sequences retain their original  character property support is available. These sequences retain their original
373  meanings from before UTF-8 support was available, mainly for efficiency  meanings from before UTF-8 support was available, mainly for efficiency
374  reasons.  reasons. Note that this also affects \eb, because it is defined in terms of \ew
375    and \eW.
376  .P  .P
377  The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the  The sequences \eh, \eH, \ev, and \eV are Perl 5.10 features. In contrast to the
378  other sequences, these do match certain high-valued codepoints in UTF-8 mode.  other sequences, these do match certain high-valued codepoints in UTF-8 mode.
# Line 1031  The PCRE-specific options PCRE_DUPNAMES, Line 1039  The PCRE-specific options PCRE_DUPNAMES,
1039  changed in the same way as the Perl-compatible options by using the characters  changed in the same way as the Perl-compatible options by using the characters
1040  J, U and X respectively.  J, U and X respectively.
1041  .P  .P
1042  When an option change occurs at top level (that is, not inside subpattern  When one of these option changes occurs at top level (that is, not inside
1043  parentheses), the change applies to the remainder of the pattern that follows.  subpattern parentheses), the change applies to the remainder of the pattern
1044  If the change is placed right at the start of a pattern, PCRE extracts it into  that follows. If the change is placed right at the start of a pattern, PCRE
1045  the global options (and it will therefore show up in data extracted by the  extracts it into the global options (and it will therefore show up in data
1046  \fBpcre_fullinfo()\fP function).  extracted by the \fBpcre_fullinfo()\fP function).
1047  .P  .P
1048  An option change within a subpattern (see below for a description of  An option change within a subpattern (see below for a description of
1049  subpatterns) affects only that part of the current pattern that follows it, so  subpatterns) affects only that part of the current pattern that follows it, so
# Line 1056  behaviour otherwise. Line 1064  behaviour otherwise.
1064  .P  .P
1065  \fBNote:\fP There are other PCRE-specific options that can be set by the  \fBNote:\fP There are other PCRE-specific options that can be set by the
1066  application when the compile or match functions are called. In some cases the  application when the compile or match functions are called. In some cases the
1067  pattern can contain special leading sequences to override what the application  pattern can contain special leading sequences such as (*CRLF) to override what
1068  has set or what has been defaulted. Details are given in the section entitled  the application has set or what has been defaulted. Details are given in the
1069    section entitled
1070  .\" HTML <a href="#newlineseq">  .\" HTML <a href="#newlineseq">
1071  .\" </a>  .\" </a>
1072  "Newline sequences"  "Newline sequences"
1073  .\"  .\"
1074  above.  above. There is also the (*UTF8) leading sequence that can be used to set UTF-8
1075    mode; this is equivalent to setting the PCRE_UTF8 option.
1076  .  .
1077  .  .
1078  .\" HTML <a name="subpattern"></a>  .\" HTML <a name="subpattern"></a>
# Line 2125  a backtracking algorithm. With the excep Line 2135  a backtracking algorithm. With the excep
2135  failing negative assertion, they cause an error if encountered by  failing negative assertion, they cause an error if encountered by
2136  \fBpcre_dfa_exec()\fP.  \fBpcre_dfa_exec()\fP.
2137  .P  .P
2138    If any of these verbs are used in an assertion subpattern, their effect is
2139    confined to that subpattern; it does not extend to the surrounding pattern.
2140    Note that assertion subpatterns are processed as anchored at the point where
2141    they are tested.
2142    .P
2143  The new verbs make use of what was previously invalid syntax: an opening  The new verbs make use of what was previously invalid syntax: an opening
2144  parenthesis followed by an asterisk. In Perl, they are generally of the form  parenthesis followed by an asterisk. In Perl, they are generally of the form
2145  (*VERB:ARG) but PCRE does not support the use of arguments, so its general  (*VERB:ARG) but PCRE does not support the use of arguments, so its general
# Line 2140  The following verbs act as soon as they Line 2155  The following verbs act as soon as they
2155  .sp  .sp
2156  This verb causes the match to end successfully, skipping the remainder of the  This verb causes the match to end successfully, skipping the remainder of the
2157  pattern. When inside a recursion, only the innermost pattern is ended  pattern. When inside a recursion, only the innermost pattern is ended
2158  immediately. PCRE differs from Perl in what happens if the (*ACCEPT) is inside  immediately. If the (*ACCEPT) is inside capturing parentheses, the data so far
2159  capturing parentheses. In Perl, the data so far is captured: in PCRE no data is  is captured. (This feature was added to PCRE at release 8.00.) For example:
 captured. For example:  
2160  .sp  .sp
2161    A(A|B(*ACCEPT)|C)D    A((?:A|B(*ACCEPT)|C)D)
2162  .sp  .sp
2163  This matches "AB", "AAD", or "ACD", but when it matches "AB", no data is  This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by
2164  captured.  the outer parentheses.
2165  .sp  .sp
2166    (*FAIL) or (*F)    (*FAIL) or (*F)
2167  .sp  .sp
# Line 2244  Cambridge CB2 3QH, England. Line 2258  Cambridge CB2 3QH, England.
2258  .rs  .rs
2259  .sp  .sp
2260  .nf  .nf
2261  Last updated: 08 March 2009  Last updated: 15 September 2009
2262  Copyright (c) 1997-2009 University of Cambridge.  Copyright (c) 1997-2009 University of Cambridge.
2263  .fi  .fi

Legend:
Removed from v.385  
changed lines
  Added in v.447

  ViewVC Help
Powered by ViewVC 1.1.5