--- code/trunk/doc/html/pcrecompat.html 2007/02/24 21:40:30 73 +++ code/trunk/doc/html/pcrecompat.html 2007/02/24 21:41:42 93 @@ -3,21 +3,27 @@ pcrecompat specification -This HTML document has been generated automatically from the original man page. -If there is any nonsense in it, please consult the man page, in case the -conversion went wrong.
- -
DIFFERENCES FROM PERL
+

pcrecompat man page

+

+Return to the PCRE index page. +

+

+This page is part of the PCRE HTML documentation. It was generated automatically +from the original man page. If there is any nonsense in it, please consult the +man page, in case the conversion went wrong. +
+
+DIFFERENCES BETWEEN PCRE AND PERL +

This document describes the differences in the ways that PCRE and Perl handle -regular expressions. The differences described here are with respect to Perl -5.8. +regular expressions. The differences described here are mainly with respect to +Perl 5.8, though PCRE version 7.0 contains some features that are expected to +be in the forthcoming Perl 5.10.

-1. PCRE does not have full UTF-8 support. Details of what it does have are -given in the +1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what +it does have are given in the section on UTF-8 support in the main pcre @@ -39,98 +45,106 @@

4. Though binary zero characters are supported in the subject string, they are not allowed in a pattern string because it is passed as a normal C string, -terminated by zero. The escape sequence "\0" can be used in the pattern to +terminated by zero. The escape sequence \0 can be used in the pattern to represent a binary zero.

5. The following Perl escape sequences are not supported: \l, \u, \L, -\U, \P, \p, \N, and \X. In fact these are implemented by Perl's general -string-handling and are not part of its pattern matching engine. If any of -these are encountered by PCRE, an error is generated. +\U, and \N. In fact these are implemented by Perl's general string-handling +and are not part of its pattern matching engine. If any of these are +encountered by PCRE, an error is generated. +

+

+6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is +built with Unicode character property support. The properties that can be +tested with \p and \P are limited to the general category properties such as +Lu and Nd, script names such as Greek or Han, and the derived properties Any +and L&.

-6. PCRE does support the \Q...\E escape for quoting substrings. Characters in +7. PCRE does support the \Q...\E escape for quoting substrings. Characters in between are treated as literals. This is slightly different from Perl in that $ and @ are also handled as literals inside the quotes. In Perl, they cause variable interpolation (but of course PCRE does not have variables). Note the following examples: -

-

     Pattern            PCRE matches      Perl matches
-
-

-

-

-    \Qabc$xyz\E        abc$xyz           abc followed by the
-                                           contents of $xyz
+
+    \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
     \Qabc\$xyz\E       abc\$xyz          abc\$xyz
     \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
-
+ +The \Q...\E sequence is recognized both inside and outside character classes.

-The \Q...\E sequence is recognized both inside and outside character classes. +8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) +constructions. However, there is support for recursive patterns. This is not +available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE "callout" +feature allows an external function to be called during pattern matching. See +the +pcrecallout +documentation for details.

-7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code}) -constructions. However, there is some experimental support for recursive -patterns using the non-Perl items (?R), (?number) and (?P>name). Also, the PCRE -"callout" feature allows an external function to be called during pattern -matching. +9. Subpatterns that are called recursively or as "subroutines" are always +treated as atomic groups in PCRE. This is like Python, but unlike Perl.

-8. There are some differences that are concerned with the settings of captured +10. There are some differences that are concerned with the settings of captured strings when part of a pattern is repeated. For example, matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".

-9. PCRE provides some extensions to the Perl regular expression facilities: -

-

+11. PCRE provides some extensions to the Perl regular expression facilities. +Perl 5.10 will include new features that are not in earlier versions, some of +which (such as named parentheses) have been in PCRE for some time. This list is +with respect to Perl 5.10: +
+
(a) Although lookbehind assertions must match fixed length strings, each alternative branch of a lookbehind assertion can match a different length of string. Perl requires them all to have the same length. -

-

+
+
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ meta-character matches only at the very end of the string. -

-

-© If PCRE_EXTRA is set, a backslash followed by a letter with no special -meaning is faulted. -

-

+
+
+(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special +meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can +be made to issue a warning.) +
+
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is inverted, that is, by default they are not greedy, but if followed by a question mark they are. -

-

-(e) PCRE_ANCHORED can be used to force a pattern to be tried only at the first -matching position in the subject string. -

-

+
+
+(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried +only at the first matching position in the subject string. +
+
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no Perl equivalents. +
+
+(g) The callout facility is PCRE-specific. +
+
+(h) The partial matching facility is PCRE-specific. +
+
+(i) Patterns compiled by PCRE can be saved and re-used at a later time, even on +different hosts that have the other endianness. +
+
+(j) The alternative matching function (pcre_dfa_exec()) matches in a +different way and is not Perl-compatible.

-(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern -matching (Perl can do this using the (?p{code}) construct, which PCRE cannot -support.) -

-

-(h) PCRE supports named capturing substrings, using the Python syntax. -

-

-(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java -package. -

-

-(j) The (R) condition, for testing recursion, is a PCRE extension. -

-

-(k) The callout facility is PCRE-specific. -

-

-Last updated: 09 December 2003 +Last updated: 28 November 2006
-Copyright © 1997-2003 University of Cambridge. +Copyright © 1997-2006 University of Cambridge. +

+Return to the PCRE index page. +