--- code/trunk/doc/html/pcrepattern.html 2007/02/24 21:40:37 75 +++ code/trunk/doc/html/pcrepattern.html 2007/02/24 21:40:45 77 @@ -55,15 +55,35 @@ page.

+The remainder of this document discusses the patterns that are supported by +PCRE when its main matching function, pcre_exec(), is used. +From release 6.0, PCRE offers a second matching function, +pcre_dfa_exec(), which matches using a different algorithm that is not +Perl-compatible. The advantages and disadvantages of the alternative function, +and how it differs from the normal function, are discussed in the +pcrematching +page. +

+

A regular expression is a pattern that is matched against a subject string from left to right. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. As a trivial example, the pattern

   The quick brown fox
 
-matches a portion of a subject string that is identical to itself. The power of -regular expressions comes from the ability to include alternatives and -repetitions in the pattern. These are encoded in the pattern by the use of +matches a portion of a subject string that is identical to itself. When +caseless matching is specified (the PCRE_CASELESS option), letters are matched +independently of case. In UTF-8 mode, PCRE always understands the concept of +case for characters whose values are less than 128, so caseless matching is +always possible. For characters with higher values, the concept of case is +supported if PCRE is compiled with Unicode property support, but not otherwise. +If you want to use caseless matching for characters 128 and above, you must +ensure that PCRE is compiled with Unicode property support as well as with +UTF-8 support. +

+

+The power of regular expressions comes from the ability to include alternatives +and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters, which do not stand for themselves but instead are interpreted in some special way.

@@ -536,9 +556,13 @@ When caseless matching is set, any letters in a class represent both their upper case and lower case versions, so for example, a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a -caseful version would. When running in UTF-8 mode, PCRE supports the concept of -case for characters with values greater than 128 only when it is compiled with -Unicode property support. +caseful version would. In UTF-8 mode, PCRE always understands the concept of +case for characters whose values are less than 128, so caseless matching is +always possible. For characters with higher values, the concept of case is +supported if PCRE is compiled with Unicode property support, but not otherwise. +If you want to use caseless matching for characters 128 and above, you must +ensure that PCRE is compiled with Unicode property support as well as with +UTF-8 support.

The newline character is never treated in any special way in character classes, @@ -1462,9 +1486,9 @@ documentation.

-Last updated: 09 September 2004 +Last updated: 28 February 2005
-Copyright © 1997-2004 University of Cambridge. +Copyright © 1997-2005 University of Cambridge.

Return to the PCRE index page.