26 |
.\" |
.\" |
27 |
page. |
page. |
28 |
.P |
.P |
29 |
|
The remainder of this document discusses the patterns that are supported by |
30 |
|
PCRE when its main matching function, \fBpcre_exec()\fP, is used. |
31 |
|
From release 6.0, PCRE offers a second matching function, |
32 |
|
\fBpcre_dfa_exec()\fP, which matches using a different algorithm that is not |
33 |
|
Perl-compatible. The advantages and disadvantages of the alternative function, |
34 |
|
and how it differs from the normal function, are discussed in the |
35 |
|
.\" HREF |
36 |
|
\fBpcrematching\fP |
37 |
|
.\" |
38 |
|
page. |
39 |
|
.P |
40 |
A regular expression is a pattern that is matched against a subject string from |
A regular expression is a pattern that is matched against a subject string from |
41 |
left to right. Most characters stand for themselves in a pattern, and match the |
left to right. Most characters stand for themselves in a pattern, and match the |
42 |
corresponding characters in the subject. As a trivial example, the pattern |
corresponding characters in the subject. As a trivial example, the pattern |
43 |
.sp |
.sp |
44 |
The quick brown fox |
The quick brown fox |
45 |
.sp |
.sp |
46 |
matches a portion of a subject string that is identical to itself. The power of |
matches a portion of a subject string that is identical to itself. When |
47 |
regular expressions comes from the ability to include alternatives and |
caseless matching is specified (the PCRE_CASELESS option), letters are matched |
48 |
repetitions in the pattern. These are encoded in the pattern by the use of |
independently of case. In UTF-8 mode, PCRE always understands the concept of |
49 |
|
case for characters whose values are less than 128, so caseless matching is |
50 |
|
always possible. For characters with higher values, the concept of case is |
51 |
|
supported if PCRE is compiled with Unicode property support, but not otherwise. |
52 |
|
If you want to use caseless matching for characters 128 and above, you must |
53 |
|
ensure that PCRE is compiled with Unicode property support as well as with |
54 |
|
UTF-8 support. |
55 |
|
.P |
56 |
|
The power of regular expressions comes from the ability to include alternatives |
57 |
|
and repetitions in the pattern. These are encoded in the pattern by the use of |
58 |
\fImetacharacters\fP, which do not stand for themselves but instead are |
\fImetacharacters\fP, which do not stand for themselves but instead are |
59 |
interpreted in some special way. |
interpreted in some special way. |
60 |
.P |
.P |
547 |
When caseless matching is set, any letters in a class represent both their |
When caseless matching is set, any letters in a class represent both their |
548 |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
549 |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
550 |
caseful version would. When running in UTF-8 mode, PCRE supports the concept of |
caseful version would. In UTF-8 mode, PCRE always understands the concept of |
551 |
case for characters with values greater than 128 only when it is compiled with |
case for characters whose values are less than 128, so caseless matching is |
552 |
Unicode property support. |
always possible. For characters with higher values, the concept of case is |
553 |
|
supported if PCRE is compiled with Unicode property support, but not otherwise. |
554 |
|
If you want to use caseless matching for characters 128 and above, you must |
555 |
|
ensure that PCRE is compiled with Unicode property support as well as with |
556 |
|
UTF-8 support. |
557 |
.P |
.P |
558 |
The newline character is never treated in any special way in character classes, |
The newline character is never treated in any special way in character classes, |
559 |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
1475 |
documentation. |
documentation. |
1476 |
.P |
.P |
1477 |
.in 0 |
.in 0 |
1478 |
Last updated: 09 September 2004 |
Last updated: 28 February 2005 |
1479 |
.br |
.br |
1480 |
Copyright (c) 1997-2004 University of Cambridge. |
Copyright (c) 1997-2005 University of Cambridge. |