--- code/trunk/doc/html/pcre.html 2007/02/24 21:40:37 75 +++ code/trunk/doc/html/pcre.html 2007/02/24 21:40:45 77 @@ -23,16 +23,27 @@

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few -differences. The current implementation of PCRE (release 5.x) corresponds +differences. The current implementation of PCRE (release 6.x) corresponds approximately with Perl 5.8, including support for UTF-8 encoded strings and Unicode general category properties. However, this support has to be explicitly enabled; it is not the default.

+In addition to the Perl-compatible matching function, PCRE also contains an +alternative matching function that matches the same compiled patterns in a +different way. In certain circumstances, the alternative function has some +advantages. For a discussion of the two matching algorithms, see the +pcrematching +page. +

+

PCRE is written in C and released as a C library. A number of people have -written wrappers and interfaces of various kinds. A C++ class is included in -these contributions, which can be found in the Contrib directory at the -primary FTP site, which is: +written wrappers and interfaces of various kinds. In particular, Google Inc. +have provided a comprehensive C++ wrapper. This is now included as part of the +PCRE distribution. The +pcrecpp +page has details of this interface. Other people's contributions can be found +in the Contrib directory at the primary FTP site, which is: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre

@@ -53,6 +64,12 @@ page. Documentation about building PCRE for various operating systems can be found in the README file in the source distribution.

+

+The library contains a number of undocumented internal functions and data +tables that are used by more than one of the exported external functions, but +which are not intended for use by external callers. Their names all begin with +"_pcre_", which hopefully will not provoke any name clashes. +


USER DOCUMENTATION

The user documentation for PCRE comprises a number of different sections. In @@ -62,21 +79,23 @@ follows:

   pcre              this document
-  pcreapi           details of PCRE's native API
+  pcreapi           details of PCRE's native C API
   pcrebuild         options for building PCRE
   pcrecallout       details of the callout feature
   pcrecompat        discussion of Perl compatibility
+  pcrecpp           details of the C++ wrapper
   pcregrep          description of the pcregrep command
+  pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
   pcrepattern       syntax and semantics of supported regular expressions
   pcreperform       discussion of performance issues
-  pcreposix         the POSIX-compatible API
+  pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the sample program
   pcretest          description of the pcretest testing command
 
In addition, in the "man" and HTML formats, there is a short page for each -library function, listing its arguments and results. +C library function, listing its arguments and results.


LIMITATIONS

@@ -104,9 +123,10 @@

The maximum length of a subject string is the largest positive number that an -integer variable can hold. However, PCRE uses recursion to handle subpatterns -and indefinite repetition. This means that the available stack space may limit -the size of a subject string that can be processed by certain patterns. +integer variable can hold. However, when using the traditional matching +function, PCRE uses recursion to handle subpatterns and indefinite repetition. +This means that the available stack space may limit the size of a subject +string that can be processed by certain patterns.


UTF-8 AND UNICODE PROPERTY SUPPORT

@@ -174,7 +194,8 @@

6. The escape sequence \C can be used to match a single byte in UTF-8 mode, -but its use can lead to some strange effects. +but its use can lead to some strange effects. This facility is not available in +the alternative matching function, pcre_dfa_exec().

7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly @@ -199,16 +220,19 @@


AUTHOR

-Philip Hazel <ph10@cam.ac.uk> +Philip Hazel
University Computing Service,
Cambridge CB2 3QG, England. +

+

+Putting an actual email address here seems to have been a spam magnet, so I've +taken it away. If you want to email me, use my initial and surname, separated +by a dot, at the domain ucs.cam.ac.uk. +Last updated: 07 March 2005
-Phone: +44 1223 334714 -Last updated: 09 September 2004 -
-Copyright © 1997-2004 University of Cambridge. +Copyright © 1997-2005 University of Cambridge.

Return to the PCRE index page.