/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 99 by ph10, Tue Mar 6 12:27:42 2007 UTC
# Line 7  The latest release of PCRE is always ava Line 7  The latest release of PCRE is always ava
7    
8  Please read the NEWS file if you are upgrading from a previous release.  Please read the NEWS file if you are upgrading from a previous release.
9    
10  PCRE has its own native API, but a set of "wrapper" functions that are based on  
11  the POSIX API are also supplied in the library libpcreposix. Note that this  The PCRE APIs
12  just provides a POSIX calling interface to PCRE: the regular expressions  -------------
13  themselves still follow Perl syntax and semantics. The header file  
14  for the POSIX-style functions is called pcreposix.h. The official POSIX name is  PCRE is written in C, and it has its own API. The distribution now includes a
15  regex.h, but I didn't want to risk possible problems with existing files of  set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
16  that name by distributing it that way. To use it with an existing program that  for details).
17  uses the POSIX API, it will have to be renamed or pointed at by a link.  
18    Also included are a set of C wrapper functions that are based on the POSIX
19    API. These end up in the library called libpcreposix. Note that this just
20    provides a POSIX calling interface to PCRE: the regular expressions themselves
21    still follow Perl syntax and semantics. The header file for the POSIX-style
22    functions is called pcreposix.h. The official POSIX name is regex.h, but I
23    didn't want to risk possible problems with existing files of that name by
24    distributing it that way. To use it with an existing program that uses the
25    POSIX API, it will have to be renamed or pointed at by a link.
26    
27  If you are using the POSIX interface to PCRE and there is already a POSIX regex  If you are using the POSIX interface to PCRE and there is already a POSIX regex
28  library installed on your system, you must take care when linking programs to  library installed on your system, you must take care when linking programs to
# Line 26  Documentation for PCRE Line 34  Documentation for PCRE
34  ----------------------  ----------------------
35    
36  If you install PCRE in the normal way, you will end up with an installed set of  If you install PCRE in the normal way, you will end up with an installed set of
37  man pages whose names all start with "pcre". The one that is called "pcre"  man pages whose names all start with "pcre". The one that is just called "pcre"
38  lists all the others. In addition to these man pages, the PCRE documentation is  lists all the others. In addition to these man pages, the PCRE documentation is
39  supplied in two other forms; however, as there is no standard place to install  supplied in two other forms; however, as there is no standard place to install
40  them, they are left in the doc directory of the unpacked source distribution.  them, they are left in the doc directory of the unpacked source distribution.
# Line 60  others are pointers to URLs containing r Line 68  others are pointers to URLs containing r
68  Building PCRE on a Unix-like system  Building PCRE on a Unix-like system
69  -----------------------------------  -----------------------------------
70    
71    If you are using HP's ANSI C++ compiler (aCC), please see the special note
72    in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
73    
74  To build PCRE on a Unix-like system, first run the "configure" command from the  To build PCRE on a Unix-like system, first run the "configure" command from the
75  PCRE distribution directory, with your current directory set to the directory  PCRE distribution directory, with your current directory set to the directory
76  where you want the files to be created. This command is a standard GNU  where you want the files to be created. This command is a standard GNU
# Line 83  into /source/pcre/pcre-xxx, but you want Line 94  into /source/pcre/pcre-xxx, but you want
94  cd /build/pcre/pcre-xxx  cd /build/pcre/pcre-xxx
95  /source/pcre/pcre-xxx/configure  /source/pcre/pcre-xxx/configure
96    
97    PCRE is written in C and is normally compiled as a C library. However, it is
98    possible to build it as a C++ library, though the provided building apparatus
99    does not have any features to support this.
100    
101  There are some optional features that can be included or omitted from the PCRE  There are some optional features that can be included or omitted from the PCRE
102  library. You can read more about them in the pcrebuild man page.  library. You can read more about them in the pcrebuild man page.
103    
104    . If you want to suppress the building of the C++ wrapper library, you can add
105      --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
106      will try to find a C++ compiler and C++ header files, and if it succeeds, it
107      will try to build the C++ wrapper.
108    
109  . If you want to make use of the support for UTF-8 character strings in PCRE,  . If you want to make use of the support for UTF-8 character strings in PCRE,
110    you must add --enable-utf8 to the "configure" command. Without it, the code    you must add --enable-utf8 to the "configure" command. Without it, the code
111    for handling UTF-8 is not included in the library. (Even when included, it    for handling UTF-8 is not included in the library. (Even when included, it
# Line 94  library. You can read more about them in Line 114  library. You can read more about them in
114  . If, in addition to support for UTF-8 character strings, you want to include  . If, in addition to support for UTF-8 character strings, you want to include
115    support for the \P, \p, and \X sequences that recognize Unicode character    support for the \P, \p, and \X sequences that recognize Unicode character
116    properties, you must add --enable-unicode-properties to the "configure"    properties, you must add --enable-unicode-properties to the "configure"
117    command. This adds about 90K to the size of the library (in the form of a    command. This adds about 30K to the size of the library (in the form of a
118    property table); only the basic two-letter properties such as Lu are    property table); only the basic two-letter properties such as Lu are
119    supported.    supported.
120    
121  . You can build PCRE to recognized CR or NL as the newline character, instead  . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
122    of whatever your compiler uses for "\n", by adding --newline-is-cr or    of the Unicode newline sequences as indicating the end of a line. Whatever
123    --newline-is-nl to the "configure" command, respectively. Only do this if you    you specify at build time is the default; the caller of PCRE can change the
124    really understand what you are doing. On traditional Unix-like systems, the    selection at run time. The default newline indicator is a single LF character
125    newline character is NL.    (the Unix standard). You can specify the default newline indicator by adding
126      --newline-is-cr or --newline-is-lf or --newline-is-crlf or --newline-is-any
127      to the "configure" command, respectively.
128    
129      If you specify --newline-is-cr or --newline-is-crlf, some of the standard
130      tests will fail, because the lines in the test files end with LF. Even if
131      the files are edited to change the line endings, there are likely to be some
132      failures. With --newline-is-any, many tests should succeed, but there may be
133      some failures.
134    
135  . When called via the POSIX interface, PCRE uses malloc() to get additional  . When called via the POSIX interface, PCRE uses malloc() to get additional
136    storage for processing capturing parentheses if there are more than 10 of    storage for processing capturing parentheses if there are more than 10 of
# Line 112  library. You can read more about them in Line 140  library. You can read more about them in
140    
141    on the "configure" command.    on the "configure" command.
142    
143  . PCRE has a counter which can be set to limit the amount of resources it uses.  . PCRE has a counter that can be set to limit the amount of resources it uses.
144    If the limit is exceeded during a match, the match fails. The default is ten    If the limit is exceeded during a match, the match fails. The default is ten
145    million. You can change the default by setting, for example,    million. You can change the default by setting, for example,
146    
# Line 122  library. You can read more about them in Line 150  library. You can read more about them in
150    pcre_exec() can supply their own value. There is discussion on the pcreapi    pcre_exec() can supply their own value. There is discussion on the pcreapi
151    man page.    man page.
152    
153    . There is a separate counter that limits the depth of recursive function calls
154      during a matching process. This also has a default of ten million, which is
155      essentially "unlimited". You can change the default by setting, for example,
156    
157      --with-match-limit-recursion=500000
158    
159      Recursive function calls use up the runtime stack; running out of stack can
160      cause programs to crash in strange ways. There is a discussion about stack
161      sizes in the pcrestack man page.
162    
163  . The default maximum compiled pattern size is around 64K. You can increase  . The default maximum compiled pattern size is around 64K. You can increase
164    this by adding --with-link-size=3 to the "configure" command. You can    this by adding --with-link-size=3 to the "configure" command. You can
165    increase it even more by setting --with-link-size=4, but this is unlikely    increase it even more by setting --with-link-size=4, but this is unlikely
# Line 130  library. You can read more about them in Line 168  library. You can read more about them in
168    is a representation of the compiled pattern, and this changes with the link    is a representation of the compiled pattern, and this changes with the link
169    size.    size.
170    
171  . You can build PCRE so that its match() function does not call itself  . You can build PCRE so that its internal match() function that is called from
172    recursively. Instead, it uses blocks of data from the heap via special    pcre_exec() does not call itself recursively. Instead, it uses blocks of data
173    functions pcre_stack_malloc() and pcre_stack_free() to save data that would    from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
174    otherwise be saved on the stack. To build PCRE like this, use    to save data that would otherwise be saved on the stack. To build PCRE like
175      this, use
176    
177    --disable-stack-for-recursion    --disable-stack-for-recursion
178    
179    on the "configure" command. PCRE runs more slowly in this mode, but it may be    on the "configure" command. PCRE runs more slowly in this mode, but it may be
180    necessary in environments with limited stack sizes.    necessary in environments with limited stack sizes. This applies only to the
181      pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
182      use deeply nested recursion.
183    
184    The "configure" script builds eight files for the basic C library:
185    
186    . Makefile is the makefile that builds the library
187    . config.h contains build-time configuration options for the library
188    . pcre-config is a script that shows the settings of "configure" options
189    . libpcre.pc is data for the pkg-config command
190    . libtool is a script that builds shared and/or static libraries
191    . RunTest is a script for running tests on the library
192    . RunGrepTest is a script for running tests on the pcregrep command
193    
194  The "configure" script builds seven files:  In addition, if a C++ compiler is found, the following are also built:
195    
196  . pcre.h is build by copying pcre.in and making substitutions  . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
197  . Makefile is built by copying Makefile.in and making substitutions.  . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
198  . config.h is built by copying config.in and making substitutions.  
199  . pcre-config is built by copying pcre-config.in and making substitutions.  The "configure" script also creates config.status, which is an executable
200  . libpcre.pc is data for the pkg-config command, built from libpcre.pc.in  script that can be run to recreate the configuration, and config.log, which
201  . libtool is a script that builds shared and/or static libraries  contains compiler output from tests that "configure" runs.
 . RunTest is a script for running tests  
202    
203  Once "configure" has run, you can run "make". It builds two libraries called  Once "configure" has run, you can run "make". It builds two libraries, called
204  libpcre and libpcreposix, a test program called pcretest, and the pcregrep  libpcre and libpcreposix, a test program called pcretest, and the pcregrep
205  command. You can use "make install" to copy these, the public header files  command. If a C++ compiler was found on your system, it also builds the C++
206  pcre.h and pcreposix.h, and the man pages to appropriate live directories on  wrapper library, which is called libpcrecpp, and some test programs called
207  your system, in the normal way.  pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
208    
209    The command "make test" runs all the appropriate tests. Details of the PCRE
210    tests are given in a separate section of this document, below.
211    
212    You can use "make install" to copy the libraries, the public header files
213    pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
214    the C++ wrapper was built), and the man pages to appropriate live directories
215    on your system, in the normal way.
216    
217    If you want to remove PCRE from your system, you can run "make uninstall".
218    This removes all the files that "make install" installed. However, it does not
219    remove any directories, because these are often shared with other programs.
220    
221    
222  Retrieving configuration information on Unix-like systems  Retrieving configuration information on Unix-like systems
# Line 187  pkgconfig. Line 249  pkgconfig.
249  Shared libraries on Unix-like systems  Shared libraries on Unix-like systems
250  -------------------------------------  -------------------------------------
251    
252  The default distribution builds PCRE as two shared libraries and two static  The default distribution builds PCRE as shared libraries and static libraries,
253  libraries, as long as the operating system supports shared libraries. Shared  as long as the operating system supports shared libraries. Shared library
254  library support relies on the "libtool" script which is built as part of the  support relies on the "libtool" script which is built as part of the
255  "configure" process.  "configure" process.
256    
257  The libtool script is used to compile and link both shared and static  The libtool script is used to compile and link both shared and static
# Line 218  order to cross-compile PCRE for some oth Line 280  order to cross-compile PCRE for some oth
280  process, the dftables.c source file is compiled *and run* on the local host, in  process, the dftables.c source file is compiled *and run* on the local host, in
281  order to generate the default character tables (the chartables.c file). It  order to generate the default character tables (the chartables.c file). It
282  therefore needs to be compiled with the local compiler, not the cross compiler.  therefore needs to be compiled with the local compiler, not the cross compiler.
283  You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD)  You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
284    there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
285  when calling the "configure" command. If they are not specified, they default  when calling the "configure" command. If they are not specified, they default
286  to the values of CC and CFLAGS.  to the values of CC and CFLAGS.
287    
288    
289    Using HP's ANSI C++ compiler (aCC)
290    ----------------------------------
291    
292    Unless C++ support is disabled by specifying the "--disable-cpp" option of the
293    "configure" script, you *must* include the "-AA" option in the CXXFLAGS
294    environment variable in order for the C++ components to compile correctly.
295    
296    Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
297    needed libraries fail to get included when specifying the "-AA" compiler
298    option. If you experience unresolved symbols when linking the C++ programs,
299    use the workaround of specifying the following environment variable prior to
300    running the "configure" script:
301    
302      CXXLDFLAGS="-lstd_v2 -lCsup_v2"
303    
304    
305  Building on non-Unix systems  Building on non-Unix systems
306  ----------------------------  ----------------------------
307    
# Line 232  PCRE in the same way as for Unix systems Line 311  PCRE in the same way as for Unix systems
311    
312  PCRE has been compiled on Windows systems and on Macintoshes, but I don't know  PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
313  the details because I don't use those systems. It should be straightforward to  the details because I don't use those systems. It should be straightforward to
314  build PCRE on any system that has a Standard C compiler, because it uses only  build PCRE on any system that has a Standard C compiler and library, because it
315  Standard C functions.  uses only Standard C functions.
316    
317    
318  Testing PCRE  Testing PCRE
319  ------------  ------------
320    
321  To test PCRE on a Unix system, run the RunTest script that is created by the  To test PCRE on a Unix system, run the RunTest script that is created by the
322  configuring process. (This can also be run by "make runtest", "make check", or  configuring process. There is also a script called RunGrepTest that tests the
323  "make test".) For other systems, see the instructions in NON-UNIX-USE.  options of the pcregrep command. If the C++ wrapper library is build, three
324    test programs called pcrecpp_unittest, pcre_scanner_unittest, and
325  The script runs the pcretest test program (which is documented in its own man  pcre_stringpiece_unittest are provided.
326  page) on each of the testinput files (in the testdata directory) in turn,  
327  and compares the output with the contents of the corresponding testoutput file.  Both the scripts and all the program tests are run if you obey "make runtest",
328  A file called testtry is used to hold the main output from pcretest  "make check", or "make test". For other systems, see the instructions in
329    NON-UNIX-USE.
330    
331    The RunTest script runs the pcretest test program (which is documented in its
332    own man page) on each of the testinput files (in the testdata directory) in
333    turn, and compares the output with the contents of the corresponding testoutput
334    files. A file called testtry is used to hold the main output from pcretest
335  (testsavedregex is also used as a working file). To run pcretest on just one of  (testsavedregex is also used as a working file). To run pcretest on just one of
336  the test files, give its number as an argument to RunTest, for example:  the test files, give its number as an argument to RunTest, for example:
337    
338    RunTest 2    RunTest 2
339    
340  The first file can also be fed directly into the perltest script to check that  The first test file can also be fed directly into the perltest script to check
341  Perl gives the same results. The only difference you should see is in the first  that Perl gives the same results. The only difference you should see is in the
342  few lines, where the Perl version is given instead of the PCRE version.  first few lines, where the Perl version is given instead of the PCRE version.
343    
344  The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),  The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
345  pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error  pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
# Line 294  commented in the script, can be be used. Line 379  commented in the script, can be be used.
379  The fifth test checks error handling with UTF-8 encoding, and internal UTF-8  The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
380  features of PCRE that are not relevant to Perl.  features of PCRE that are not relevant to Perl.
381    
382  The sixth and final test checks the support for Unicode character properties.  The sixth and test checks the support for Unicode character properties. It it
383  It it not run automatically unless PCRE is built with Unicode property support.  not run automatically unless PCRE is built with Unicode property support. To to
384  To to this you must set --enable-unicode-properties when running "configure".  this you must set --enable-unicode-properties when running "configure".
385    
386    The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
387    matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
388    property support, respectively. The eighth and ninth tests are not run
389    automatically unless PCRE is build with the relevant support.
390    
391    
392  Character tables  Character tables
# Line 348  The distribution should contain the foll Line 438  The distribution should contain the foll
438    
439    dftables.c            auxiliary program for building chartables.c    dftables.c            auxiliary program for building chartables.c
440    
   get.c                 )  
   maketables.c          )  
   study.c               ) source of the functions  
   pcre.c                )   in the library  
441    pcreposix.c           )    pcreposix.c           )
442    printint.c            )    pcre_compile.c        )
443      pcre_config.c         )
444      pcre_dfa_exec.c       )
445      pcre_exec.c           )
446      pcre_fullinfo.c       )
447      pcre_get.c            ) sources for the functions in the library,
448      pcre_globals.c        )   and some internal functions that they use
449      pcre_info.c           )
450      pcre_maketables.c     )
451      pcre_newline.c        )
452      pcre_ord2utf8.c       )
453      pcre_refcount.c       )
454      pcre_study.c          )
455      pcre_tables.c         )
456      pcre_try_flipped.c    )
457      pcre_ucp_searchfuncs.c)
458      pcre_valid_utf8.c     )
459      pcre_version.c        )
460      pcre_xclass.c         )
461    
462    ucp.c                 )    pcre_printint.src     ) debugging function that is #included in pcretest, and
463    ucp.h                 ) source for the code that is used for                          )   can also be #included in pcre_compile()
   ucpinternal.h         )   Unicode property handling  
   ucptable.c            )  
   ucptypetable.c        )  
464    
465    pcre.in               "source" for the header for the external API; pcre.h    pcre.h                the public PCRE header file
                           is built from this by "configure"  
466    pcreposix.h           header for the external POSIX wrapper API    pcreposix.h           header for the external POSIX wrapper API
467    internal.h            header for internal use    pcre_internal.h       header for internal use
468      ucp.h                 ) headers concerned with
469      ucpinternal.h         )   Unicode property handling
470      ucptable.h            ) (this one is the data table)
471    config.in             template for config.h, which is built by configure    config.in             template for config.h, which is built by configure
472    
473      pcrecpp.h             the header file for the C++ wrapper
474      pcrecpparg.h.in       "source" for another C++ header file
475      pcrecpp.cc            )
476      pcre_scanner.cc       ) source for the C++ wrapper library
477    
478      pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
479                              C++ stringpiece functions
480      pcre_stringpiece.cc   source for the C++ stringpiece functions
481    
482  (B) Auxiliary files:  (B) Auxiliary files:
483    
484    AUTHORS               information about the author of PCRE    AUTHORS               information about the author of PCRE
# Line 379  The distribution should contain the foll Line 491  The distribution should contain the foll
491    NON-UNIX-USE          notes on building PCRE on non-Unix systems    NON-UNIX-USE          notes on building PCRE on non-Unix systems
492    README                this file    README                this file
493    RunTest.in            template for a Unix shell script for running tests    RunTest.in            template for a Unix shell script for running tests
494      RunGrepTest.in        template for a Unix shell script for pcregrep tests
495    config.guess          ) files used by libtool,    config.guess          ) files used by libtool,
496    config.sub            )   used only when building a shared library    config.sub            )   used only when building a shared library
497      config.h.in           "source" for the config.h header file
498    configure             a configuring shell script (built by autoconf)    configure             a configuring shell script (built by autoconf)
499    configure.in          the autoconf input used to build configure    configure.ac          the autoconf input used to build configure
500    doc/Tech.Notes        notes on the encoding    doc/Tech.Notes        notes on the encoding
501    doc/*.3               man page sources for the PCRE functions    doc/*.3               man page sources for the PCRE functions
502    doc/*.1               man page sources for pcregrep and pcretest    doc/*.1               man page sources for pcregrep and pcretest
# Line 396  The distribution should contain the foll Line 510  The distribution should contain the foll
510    mkinstalldirs         script for making install directories    mkinstalldirs         script for making install directories
511    pcretest.c            comprehensive test program    pcretest.c            comprehensive test program
512    pcredemo.c            simple demonstration of coding calls to PCRE    pcredemo.c            simple demonstration of coding calls to PCRE
513    perltest              Perl test program    perltest.pl           Perl test program
514    pcregrep.c            source of a grep utility that uses PCRE    pcregrep.c            source of a grep utility that uses PCRE
515    pcre-config.in        source of script which retains PCRE information    pcre-config.in        source of script which retains PCRE information
516    testdata/testinput1   test data, compatible with Perl    pcrecpp_unittest.c           )
517    testdata/testinput2   test data for error messages and non-Perl things    pcre_scanner_unittest.c      ) test programs for the C++ wrapper
518    testdata/testinput3   test data for locale-specific tests    pcre_stringpiece_unittest.c  )
519    testdata/testinput4   test data for UTF-8 tests compatible with Perl    testdata/testinput*   test data for main library tests
520    testdata/testinput5   test data for other UTF-8 tests    testdata/testoutput*  expected test results
521    testdata/testinput6   test data for Unicode property support tests    testdata/grep*        input and output for pcregrep tests
   testdata/testoutput1  test results corresponding to testinput1  
   testdata/testoutput2  test results corresponding to testinput2  
   testdata/testoutput3  test results corresponding to testinput3  
   testdata/testoutput4  test results corresponding to testinput4  
   testdata/testoutput5  test results corresponding to testinput5  
   testdata/testoutput6  test results corresponding to testinput6  
522    
523  (C) Auxiliary files for Win32 DLL  (C) Auxiliary files for Win32 DLL
524    
   dll.mk  
525    libpcre.def    libpcre.def
526    libpcreposix.def    libpcreposix.def
   pcre.def  
527    
528  (D) Auxiliary file for VPASCAL  (D) Auxiliary file for VPASCAL
529    
530    makevp.bat    makevp.bat
531    
532  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel
533  September 2004  Email local part: ph10
534    Email domain: cam.ac.uk
535    March 2007

Legend:
Removed from v.75  
changed lines
  Added in v.99

  ViewVC Help
Powered by ViewVC 1.1.5