/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 572 by ph10, Wed Nov 17 17:55:57 2010 UTC revision 576 by ph10, Sun Nov 21 18:45:10 2010 UTC
# Line 52  such as \ed and \ew to use Unicode prope Line 52  such as \ed and \ew to use Unicode prope
52  instead of recognizing only characters with codes less than 128 via a lookup  instead of recognizing only characters with codes less than 128 via a lookup
53  table.  table.
54  .P  .P
55    If a pattern starts with (*NO_START_OPT), it has the same effect as setting the
56    PCRE_NO_START_OPTIMIZE option either at compile or matching time. There are
57    also some more of these special sequences that are concerned with the handling
58    of newlines; they are described below.
59    .P
60  The remainder of this document discusses the patterns that are supported by  The remainder of this document discusses the patterns that are supported by
61  PCRE when its main matching function, \fBpcre_exec()\fP, is used.  PCRE when its main matching function, \fBpcre_exec()\fP, is used.
62  From release 6.0, PCRE offers a second matching function,  From release 6.0, PCRE offers a second matching function,
# Line 182  The following sections describe the use Line 187  The following sections describe the use
187  .rs  .rs
188  .sp  .sp
189  The backslash character has several uses. Firstly, if it is followed by a  The backslash character has several uses. Firstly, if it is followed by a
190  non-alphanumeric character, it takes away any special meaning that character  character that is not a number or a letter, it takes away any special meaning
191  may have. This use of backslash as an escape character applies both inside and  that character may have. This use of backslash as an escape character applies
192  outside character classes.  both inside and outside character classes.
193  .P  .P
194  For example, if you want to match a * character, you write \e* in the pattern.  For example, if you want to match a * character, you write \e* in the pattern.
195  This escaping action applies whether or not the following character would  This escaping action applies whether or not the following character would
# Line 192  otherwise be interpreted as a metacharac Line 197  otherwise be interpreted as a metacharac
197  non-alphanumeric with backslash to specify that it stands for itself. In  non-alphanumeric with backslash to specify that it stands for itself. In
198  particular, if you want to match a backslash, you write \e\e.  particular, if you want to match a backslash, you write \e\e.
199  .P  .P
200    In UTF-8 mode, only ASCII numbers and letters have any special meaning after a
201    backslash. All other characters (in particular, those whose codepoints are
202    greater than 127) are treated as literals.
203    .P
204  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the  If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the
205  pattern (other than in a character class) and characters between a # outside  pattern (other than in a character class) and characters between a # outside
206  a character class and the next newline are ignored. An escaping backslash can  a character class and the next newline are ignored. An escaping backslash can
# Line 225  but when a pattern is being prepared by Line 234  but when a pattern is being prepared by
234  one of the following escape sequences than the binary character it represents:  one of the following escape sequences than the binary character it represents:
235  .sp  .sp
236    \ea        alarm, that is, the BEL character (hex 07)    \ea        alarm, that is, the BEL character (hex 07)
237    \ecx       "control-x", where x is any character    \ecx       "control-x", where x is any ASCII character
238    \ee        escape (hex 1B)    \ee        escape (hex 1B)
239    \ef        formfeed (hex 0C)    \ef        formfeed (hex 0C)
240    \en        linefeed (hex 0A)    \en        linefeed (hex 0A)
# Line 237  one of the following escape sequences th Line 246  one of the following escape sequences th
246  .sp  .sp
247  The precise effect of \ecx is as follows: if x is a lower case letter, it  The precise effect of \ecx is as follows: if x is a lower case letter, it
248  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.
249  Thus \ecz becomes hex 1A, but \ec{ becomes hex 3B, while \ec; becomes hex  Thus \ecz becomes hex 1A (z is 7A), but \ec{ becomes hex 3B ({ is 7B), while
250  7B.  \ec; becomes hex 7B (; is 3B). If the byte following \ec has a value greater
251    than 127, a compile-time error occurs. This locks out non-ASCII characters in
252    both byte mode and UTF-8 mode. (When PCRE is compiled in EBCDIC mode, all byte
253    values are valid. A lower case letter is converted to upper case, and then the
254    0xc0 bits are flipped.)
255  .P  .P
256  After \ex, from zero to two hexadecimal digits are read (letters can be in  After \ex, from zero to two hexadecimal digits are read (letters can be in
257  upper or lower case). Any number of hexadecimal digits may appear between \ex{  upper or lower case). Any number of hexadecimal digits may appear between \ex{
# Line 1044  characters in both cases. In UTF-8 mode, Line 1057  characters in both cases. In UTF-8 mode,
1057  characters with values greater than 128 only when it is compiled with Unicode  characters with values greater than 128 only when it is compiled with Unicode
1058  property support.  property support.
1059  .P  .P
1060  The character types \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev, \eV, \ew, and  The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
1061  \eW may also appear in a character class, and add the characters that they  \eV, \ew, and \eW may appear in a character class, and add the characters that
1062  match to the class. For example, [\edABCDEF] matches any hexadecimal digit. A  they match to the class. For example, [\edABCDEF] matches any hexadecimal
1063  circumflex can conveniently be used with the upper case character types to  digit. In UTF-8 mode, the PCRE_UCP option affects the meanings of \ed, \es, \ew
1064    and their upper case partners, just as it does when they appear outside a
1065    character class, as described in the section entitled
1066    .\" HTML <a href="#genericchartypes">
1067    .\" </a>
1068    "Generic character types"
1069    .\"
1070    above. The escape sequence \eb has a different meaning inside a character
1071    class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
1072    are not special inside a character class. Like any other unrecognized escape
1073    sequences, they are treated as the literal characters "B", "N", "R", and "X" by
1074    default, but cause an error if the PCRE_EXTRA option is set.
1075    .P
1076    A circumflex can conveniently be used with the upper case character types to
1077  specify a more restricted set of characters than the matching lower case type.  specify a more restricted set of characters than the matching lower case type.
1078  For example, the class [^\eW_] matches any letter or digit, but not underscore.  For example, the class [^\eW_] matches any letter or digit, but not underscore,
1079    whereas [\ew] includes underscore. A positive character class should be read as
1080    "something OR something OR ..." and a negative class as "NOT something AND NOT
1081    something AND NOT ...".
1082  .P  .P
1083  The only metacharacters that are recognized in character classes are backslash,  The only metacharacters that are recognized in character classes are backslash,
1084  hyphen (only where it can be interpreted as specifying a range), circumflex  hyphen (only where it can be interpreted as specifying a range), circumflex
# Line 2718  Cambridge CB2 3QH, England. Line 2747  Cambridge CB2 3QH, England.
2747  .rs  .rs
2748  .sp  .sp
2749  .nf  .nf
2750  Last updated: 17 November 2010  Last updated: 21 November 2010
2751  Copyright (c) 1997-2010 University of Cambridge.  Copyright (c) 1997-2010 University of Cambridge.
2752  .fi  .fi

Legend:
Removed from v.572  
changed lines
  Added in v.576

  ViewVC Help
Powered by ViewVC 1.1.5