/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 93 by nigel, Sat Feb 24 21:41:42 2007 UTC
# Line 7  The latest release of PCRE is always ava Line 7  The latest release of PCRE is always ava
7    
8  Please read the NEWS file if you are upgrading from a previous release.  Please read the NEWS file if you are upgrading from a previous release.
9    
10  PCRE has its own native API, but a set of "wrapper" functions that are based on  
11  the POSIX API are also supplied in the library libpcreposix. Note that this  The PCRE APIs
12  just provides a POSIX calling interface to PCRE: the regular expressions  -------------
13  themselves still follow Perl syntax and semantics. The header file  
14  for the POSIX-style functions is called pcreposix.h. The official POSIX name is  PCRE is written in C, and it has its own API. The distribution now includes a
15  regex.h, but I didn't want to risk possible problems with existing files of  set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
16  that name by distributing it that way. To use it with an existing program that  for details).
17  uses the POSIX API, it will have to be renamed or pointed at by a link.  
18    Also included are a set of C wrapper functions that are based on the POSIX
19    API. These end up in the library called libpcreposix. Note that this just
20    provides a POSIX calling interface to PCRE: the regular expressions themselves
21    still follow Perl syntax and semantics. The header file for the POSIX-style
22    functions is called pcreposix.h. The official POSIX name is regex.h, but I
23    didn't want to risk possible problems with existing files of that name by
24    distributing it that way. To use it with an existing program that uses the
25    POSIX API, it will have to be renamed or pointed at by a link.
26    
27  If you are using the POSIX interface to PCRE and there is already a POSIX regex  If you are using the POSIX interface to PCRE and there is already a POSIX regex
28  library installed on your system, you must take care when linking programs to  library installed on your system, you must take care when linking programs to
# Line 26  Documentation for PCRE Line 34  Documentation for PCRE
34  ----------------------  ----------------------
35    
36  If you install PCRE in the normal way, you will end up with an installed set of  If you install PCRE in the normal way, you will end up with an installed set of
37  man pages whose names all start with "pcre". The one that is called "pcre"  man pages whose names all start with "pcre". The one that is just called "pcre"
38  lists all the others. In addition to these man pages, the PCRE documentation is  lists all the others. In addition to these man pages, the PCRE documentation is
39  supplied in two other forms; however, as there is no standard place to install  supplied in two other forms; however, as there is no standard place to install
40  them, they are left in the doc directory of the unpacked source distribution.  them, they are left in the doc directory of the unpacked source distribution.
# Line 60  others are pointers to URLs containing r Line 68  others are pointers to URLs containing r
68  Building PCRE on a Unix-like system  Building PCRE on a Unix-like system
69  -----------------------------------  -----------------------------------
70    
71    If you are using HP's ANSI C++ compiler (aCC), please see the special note
72    in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
73    
74  To build PCRE on a Unix-like system, first run the "configure" command from the  To build PCRE on a Unix-like system, first run the "configure" command from the
75  PCRE distribution directory, with your current directory set to the directory  PCRE distribution directory, with your current directory set to the directory
76  where you want the files to be created. This command is a standard GNU  where you want the files to be created. This command is a standard GNU
# Line 83  into /source/pcre/pcre-xxx, but you want Line 94  into /source/pcre/pcre-xxx, but you want
94  cd /build/pcre/pcre-xxx  cd /build/pcre/pcre-xxx
95  /source/pcre/pcre-xxx/configure  /source/pcre/pcre-xxx/configure
96    
97    PCRE is written in C and is normally compiled as a C library. However, it is
98    possible to build it as a C++ library, though the provided building apparatus
99    does not have any features to support this.
100    
101  There are some optional features that can be included or omitted from the PCRE  There are some optional features that can be included or omitted from the PCRE
102  library. You can read more about them in the pcrebuild man page.  library. You can read more about them in the pcrebuild man page.
103    
104    . If you want to suppress the building of the C++ wrapper library, you can add
105      --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
106      will try to find a C++ compiler and C++ header files, and if it succeeds, it
107      will try to build the C++ wrapper.
108    
109  . If you want to make use of the support for UTF-8 character strings in PCRE,  . If you want to make use of the support for UTF-8 character strings in PCRE,
110    you must add --enable-utf8 to the "configure" command. Without it, the code    you must add --enable-utf8 to the "configure" command. Without it, the code
111    for handling UTF-8 is not included in the library. (Even when included, it    for handling UTF-8 is not included in the library. (Even when included, it
# Line 94  library. You can read more about them in Line 114  library. You can read more about them in
114  . If, in addition to support for UTF-8 character strings, you want to include  . If, in addition to support for UTF-8 character strings, you want to include
115    support for the \P, \p, and \X sequences that recognize Unicode character    support for the \P, \p, and \X sequences that recognize Unicode character
116    properties, you must add --enable-unicode-properties to the "configure"    properties, you must add --enable-unicode-properties to the "configure"
117    command. This adds about 90K to the size of the library (in the form of a    command. This adds about 30K to the size of the library (in the form of a
118    property table); only the basic two-letter properties such as Lu are    property table); only the basic two-letter properties such as Lu are
119    supported.    supported.
120    
121  . You can build PCRE to recognized CR or NL as the newline character, instead  . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
122    of whatever your compiler uses for "\n", by adding --newline-is-cr or    of the Unicode newline sequences as indicating the end of a line. Whatever
123    --newline-is-nl to the "configure" command, respectively. Only do this if you    you specify at build time is the default; the caller of PCRE can change the
124    really understand what you are doing. On traditional Unix-like systems, the    selection at run time. The default newline indicator is a single LF character
125    newline character is NL.    (the Unix standard). You can specify the default newline indicator by adding
126      --newline-is-cr or --newline-is-lf or --newline-is-crlf or --newline-is-any
127      to the "configure" command, respectively.
128    
129  . When called via the POSIX interface, PCRE uses malloc() to get additional  . When called via the POSIX interface, PCRE uses malloc() to get additional
130    storage for processing capturing parentheses if there are more than 10 of    storage for processing capturing parentheses if there are more than 10 of
# Line 112  library. You can read more about them in Line 134  library. You can read more about them in
134    
135    on the "configure" command.    on the "configure" command.
136    
137  . PCRE has a counter which can be set to limit the amount of resources it uses.  . PCRE has a counter that can be set to limit the amount of resources it uses.
138    If the limit is exceeded during a match, the match fails. The default is ten    If the limit is exceeded during a match, the match fails. The default is ten
139    million. You can change the default by setting, for example,    million. You can change the default by setting, for example,
140    
# Line 122  library. You can read more about them in Line 144  library. You can read more about them in
144    pcre_exec() can supply their own value. There is discussion on the pcreapi    pcre_exec() can supply their own value. There is discussion on the pcreapi
145    man page.    man page.
146    
147    . There is a separate counter that limits the depth of recursive function calls
148      during a matching process. This also has a default of ten million, which is
149      essentially "unlimited". You can change the default by setting, for example,
150    
151      --with-match-limit-recursion=500000
152    
153      Recursive function calls use up the runtime stack; running out of stack can
154      cause programs to crash in strange ways. There is a discussion about stack
155      sizes in the pcrestack man page.
156    
157  . The default maximum compiled pattern size is around 64K. You can increase  . The default maximum compiled pattern size is around 64K. You can increase
158    this by adding --with-link-size=3 to the "configure" command. You can    this by adding --with-link-size=3 to the "configure" command. You can
159    increase it even more by setting --with-link-size=4, but this is unlikely    increase it even more by setting --with-link-size=4, but this is unlikely
# Line 130  library. You can read more about them in Line 162  library. You can read more about them in
162    is a representation of the compiled pattern, and this changes with the link    is a representation of the compiled pattern, and this changes with the link
163    size.    size.
164    
165  . You can build PCRE so that its match() function does not call itself  . You can build PCRE so that its internal match() function that is called from
166    recursively. Instead, it uses blocks of data from the heap via special    pcre_exec() does not call itself recursively. Instead, it uses blocks of data
167    functions pcre_stack_malloc() and pcre_stack_free() to save data that would    from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
168    otherwise be saved on the stack. To build PCRE like this, use    to save data that would otherwise be saved on the stack. To build PCRE like
169      this, use
170    
171    --disable-stack-for-recursion    --disable-stack-for-recursion
172    
173    on the "configure" command. PCRE runs more slowly in this mode, but it may be    on the "configure" command. PCRE runs more slowly in this mode, but it may be
174    necessary in environments with limited stack sizes.    necessary in environments with limited stack sizes. This applies only to the
175      pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
176      use deeply nested recursion.
177    
178    The "configure" script builds eight files for the basic C library:
179    
180    . Makefile is the makefile that builds the library
181    . config.h contains build-time configuration options for the library
182    . pcre-config is a script that shows the settings of "configure" options
183    . libpcre.pc is data for the pkg-config command
184    . libtool is a script that builds shared and/or static libraries
185    . RunTest is a script for running tests on the library
186    . RunGrepTest is a script for running tests on the pcregrep command
187    
188  The "configure" script builds seven files:  In addition, if a C++ compiler is found, the following are also built:
189    
190  . pcre.h is build by copying pcre.in and making substitutions  . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
191  . Makefile is built by copying Makefile.in and making substitutions.  . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
 . config.h is built by copying config.in and making substitutions.  
 . pcre-config is built by copying pcre-config.in and making substitutions.  
 . libpcre.pc is data for the pkg-config command, built from libpcre.pc.in  
 . libtool is a script that builds shared and/or static libraries  
 . RunTest is a script for running tests  
192    
193  Once "configure" has run, you can run "make". It builds two libraries called  The "configure" script also creates config.status, which is an executable
194    script that can be run to recreate the configuration, and config.log, which
195    contains compiler output from tests that "configure" runs.
196    
197    Once "configure" has run, you can run "make". It builds two libraries, called
198  libpcre and libpcreposix, a test program called pcretest, and the pcregrep  libpcre and libpcreposix, a test program called pcretest, and the pcregrep
199  command. You can use "make install" to copy these, the public header files  command. If a C++ compiler was found on your system, it also builds the C++
200  pcre.h and pcreposix.h, and the man pages to appropriate live directories on  wrapper library, which is called libpcrecpp, and some test programs called
201  your system, in the normal way.  pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
202    
203    The command "make test" runs all the appropriate tests. Details of the PCRE
204    tests are given in a separate section of this document, below.
205    
206    You can use "make install" to copy the libraries, the public header files
207    pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
208    the C++ wrapper was built), and the man pages to appropriate live directories
209    on your system, in the normal way.
210    
211    If you want to remove PCRE from your system, you can run "make uninstall".
212    This removes all the files that "make install" installed. However, it does not
213    remove any directories, because these are often shared with other programs.
214    
215    
216  Retrieving configuration information on Unix-like systems  Retrieving configuration information on Unix-like systems
# Line 187  pkgconfig. Line 243  pkgconfig.
243  Shared libraries on Unix-like systems  Shared libraries on Unix-like systems
244  -------------------------------------  -------------------------------------
245    
246  The default distribution builds PCRE as two shared libraries and two static  The default distribution builds PCRE as shared libraries and static libraries,
247  libraries, as long as the operating system supports shared libraries. Shared  as long as the operating system supports shared libraries. Shared library
248  library support relies on the "libtool" script which is built as part of the  support relies on the "libtool" script which is built as part of the
249  "configure" process.  "configure" process.
250    
251  The libtool script is used to compile and link both shared and static  The libtool script is used to compile and link both shared and static
# Line 218  order to cross-compile PCRE for some oth Line 274  order to cross-compile PCRE for some oth
274  process, the dftables.c source file is compiled *and run* on the local host, in  process, the dftables.c source file is compiled *and run* on the local host, in
275  order to generate the default character tables (the chartables.c file). It  order to generate the default character tables (the chartables.c file). It
276  therefore needs to be compiled with the local compiler, not the cross compiler.  therefore needs to be compiled with the local compiler, not the cross compiler.
277  You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD)  You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
278    there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
279  when calling the "configure" command. If they are not specified, they default  when calling the "configure" command. If they are not specified, they default
280  to the values of CC and CFLAGS.  to the values of CC and CFLAGS.
281    
282    
283    Using HP's ANSI C++ compiler (aCC)
284    ----------------------------------
285    
286    Unless C++ support is disabled by specifying the "--disable-cpp" option of the
287    "configure" script, you *must* include the "-AA" option in the CXXFLAGS
288    environment variable in order for the C++ components to compile correctly.
289    
290    Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
291    needed libraries fail to get included when specifying the "-AA" compiler
292    option. If you experience unresolved symbols when linking the C++ programs,
293    use the workaround of specifying the following environment variable prior to
294    running the "configure" script:
295    
296      CXXLDFLAGS="-lstd_v2 -lCsup_v2"
297    
298    
299  Building on non-Unix systems  Building on non-Unix systems
300  ----------------------------  ----------------------------
301    
# Line 232  PCRE in the same way as for Unix systems Line 305  PCRE in the same way as for Unix systems
305    
306  PCRE has been compiled on Windows systems and on Macintoshes, but I don't know  PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
307  the details because I don't use those systems. It should be straightforward to  the details because I don't use those systems. It should be straightforward to
308  build PCRE on any system that has a Standard C compiler, because it uses only  build PCRE on any system that has a Standard C compiler and library, because it
309  Standard C functions.  uses only Standard C functions.
310    
311    
312  Testing PCRE  Testing PCRE
313  ------------  ------------
314    
315  To test PCRE on a Unix system, run the RunTest script that is created by the  To test PCRE on a Unix system, run the RunTest script that is created by the
316  configuring process. (This can also be run by "make runtest", "make check", or  configuring process. There is also a script called RunGrepTest that tests the
317  "make test".) For other systems, see the instructions in NON-UNIX-USE.  options of the pcregrep command. If the C++ wrapper library is build, three
318    test programs called pcrecpp_unittest, pcre_scanner_unittest, and
319  The script runs the pcretest test program (which is documented in its own man  pcre_stringpiece_unittest are provided.
320  page) on each of the testinput files (in the testdata directory) in turn,  
321  and compares the output with the contents of the corresponding testoutput file.  Both the scripts and all the program tests are run if you obey "make runtest",
322  A file called testtry is used to hold the main output from pcretest  "make check", or "make test". For other systems, see the instructions in
323    NON-UNIX-USE.
324    
325    The RunTest script runs the pcretest test program (which is documented in its
326    own man page) on each of the testinput files (in the testdata directory) in
327    turn, and compares the output with the contents of the corresponding testoutput
328    files. A file called testtry is used to hold the main output from pcretest
329  (testsavedregex is also used as a working file). To run pcretest on just one of  (testsavedregex is also used as a working file). To run pcretest on just one of
330  the test files, give its number as an argument to RunTest, for example:  the test files, give its number as an argument to RunTest, for example:
331    
332    RunTest 2    RunTest 2
333    
334  The first file can also be fed directly into the perltest script to check that  The first test file can also be fed directly into the perltest script to check
335  Perl gives the same results. The only difference you should see is in the first  that Perl gives the same results. The only difference you should see is in the
336  few lines, where the Perl version is given instead of the PCRE version.  first few lines, where the Perl version is given instead of the PCRE version.
337    
338  The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),  The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
339  pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error  pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
# Line 294  commented in the script, can be be used. Line 373  commented in the script, can be be used.
373  The fifth test checks error handling with UTF-8 encoding, and internal UTF-8  The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
374  features of PCRE that are not relevant to Perl.  features of PCRE that are not relevant to Perl.
375    
376  The sixth and final test checks the support for Unicode character properties.  The sixth and test checks the support for Unicode character properties. It it
377  It it not run automatically unless PCRE is built with Unicode property support.  not run automatically unless PCRE is built with Unicode property support. To to
378  To to this you must set --enable-unicode-properties when running "configure".  this you must set --enable-unicode-properties when running "configure".
379    
380    The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
381    matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
382    property support, respectively. The eighth and ninth tests are not run
383    automatically unless PCRE is build with the relevant support.
384    
385    
386  Character tables  Character tables
# Line 348  The distribution should contain the foll Line 432  The distribution should contain the foll
432    
433    dftables.c            auxiliary program for building chartables.c    dftables.c            auxiliary program for building chartables.c
434    
   get.c                 )  
   maketables.c          )  
   study.c               ) source of the functions  
   pcre.c                )   in the library  
435    pcreposix.c           )    pcreposix.c           )
436    printint.c            )    pcre_compile.c        )
437      pcre_config.c         )
438    ucp.c                 )    pcre_dfa_exec.c       )
439    ucp.h                 ) source for the code that is used for    pcre_exec.c           )
440    ucpinternal.h         )   Unicode property handling    pcre_fullinfo.c       )
441      pcre_get.c            ) sources for the functions in the library,
442      pcre_globals.c        )   and some internal functions that they use
443      pcre_info.c           )
444      pcre_maketables.c     )
445      pcre_newline.c        )
446      pcre_ord2utf8.c       )
447      pcre_refcount.c       )
448      pcre_study.c          )
449      pcre_tables.c         )
450      pcre_try_flipped.c    )
451      pcre_ucp_searchfuncs.c)
452      pcre_valid_utf8.c     )
453      pcre_version.c        )
454      pcre_xclass.c         )
455    ucptable.c            )    ucptable.c            )
   ucptypetable.c        )  
456    
457    pcre.in               "source" for the header for the external API; pcre.h    pcre_printint.src     ) debugging function that is #included in pcretest, and
458                            is built from this by "configure"                          )   can also be #included in pcre_compile()
459    
460      pcre.h                the public PCRE header file
461    pcreposix.h           header for the external POSIX wrapper API    pcreposix.h           header for the external POSIX wrapper API
462    internal.h            header for internal use    pcre_internal.h       header for internal use
463      ucp.h                 ) headers concerned with
464      ucpinternal.h         )   Unicode property handling
465    config.in             template for config.h, which is built by configure    config.in             template for config.h, which is built by configure
466    
467      pcrecpp.h             the header file for the C++ wrapper
468      pcrecpparg.h.in       "source" for another C++ header file
469      pcrecpp.cc            )
470      pcre_scanner.cc       ) source for the C++ wrapper library
471    
472      pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
473                              C++ stringpiece functions
474      pcre_stringpiece.cc   source for the C++ stringpiece functions
475    
476  (B) Auxiliary files:  (B) Auxiliary files:
477    
478    AUTHORS               information about the author of PCRE    AUTHORS               information about the author of PCRE
# Line 379  The distribution should contain the foll Line 485  The distribution should contain the foll
485    NON-UNIX-USE          notes on building PCRE on non-Unix systems    NON-UNIX-USE          notes on building PCRE on non-Unix systems
486    README                this file    README                this file
487    RunTest.in            template for a Unix shell script for running tests    RunTest.in            template for a Unix shell script for running tests
488      RunGrepTest.in        template for a Unix shell script for pcregrep tests
489    config.guess          ) files used by libtool,    config.guess          ) files used by libtool,
490    config.sub            )   used only when building a shared library    config.sub            )   used only when building a shared library
491      config.h.in           "source" for the config.h header file
492    configure             a configuring shell script (built by autoconf)    configure             a configuring shell script (built by autoconf)
493    configure.in          the autoconf input used to build configure    configure.ac          the autoconf input used to build configure
494    doc/Tech.Notes        notes on the encoding    doc/Tech.Notes        notes on the encoding
495    doc/*.3               man page sources for the PCRE functions    doc/*.3               man page sources for the PCRE functions
496    doc/*.1               man page sources for pcregrep and pcretest    doc/*.1               man page sources for pcregrep and pcretest
# Line 399  The distribution should contain the foll Line 507  The distribution should contain the foll
507    perltest              Perl test program    perltest              Perl test program
508    pcregrep.c            source of a grep utility that uses PCRE    pcregrep.c            source of a grep utility that uses PCRE
509    pcre-config.in        source of script which retains PCRE information    pcre-config.in        source of script which retains PCRE information
510    testdata/testinput1   test data, compatible with Perl    pcrecpp_unittest.c           )
511    testdata/testinput2   test data for error messages and non-Perl things    pcre_scanner_unittest.c      ) test programs for the C++ wrapper
512    testdata/testinput3   test data for locale-specific tests    pcre_stringpiece_unittest.c  )
513    testdata/testinput4   test data for UTF-8 tests compatible with Perl    testdata/testinput*   test data for main library tests
514    testdata/testinput5   test data for other UTF-8 tests    testdata/testoutput*  expected test results
515    testdata/testinput6   test data for Unicode property support tests    testdata/grep*        input and output for pcregrep tests
   testdata/testoutput1  test results corresponding to testinput1  
   testdata/testoutput2  test results corresponding to testinput2  
   testdata/testoutput3  test results corresponding to testinput3  
   testdata/testoutput4  test results corresponding to testinput4  
   testdata/testoutput5  test results corresponding to testinput5  
   testdata/testoutput6  test results corresponding to testinput6  
516    
517  (C) Auxiliary files for Win32 DLL  (C) Auxiliary files for Win32 DLL
518    
   dll.mk  
519    libpcre.def    libpcre.def
520    libpcreposix.def    libpcreposix.def
   pcre.def  
521    
522  (D) Auxiliary file for VPASCAL  (D) Auxiliary file for VPASCAL
523    
524    makevp.bat    makevp.bat
525    
526  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel
527  September 2004  Email local part: ph10
528    Email domain: cam.ac.uk
529    November 2006

Legend:
Removed from v.75  
changed lines
  Added in v.93

  ViewVC Help
Powered by ViewVC 1.1.5