/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 73 by nigel, Sat Feb 24 21:40:30 2007 UTC revision 83 by nigel, Sat Feb 24 21:41:06 2007 UTC
# Line 7  The latest release of PCRE is always ava Line 7  The latest release of PCRE is always ava
7    
8  Please read the NEWS file if you are upgrading from a previous release.  Please read the NEWS file if you are upgrading from a previous release.
9    
10  PCRE has its own native API, but a set of "wrapper" functions that are based on  
11  the POSIX API are also supplied in the library libpcreposix. Note that this  The PCRE APIs
12  just provides a POSIX calling interface to PCRE: the regular expressions  -------------
13  themselves still follow Perl syntax and semantics. The header file  
14  for the POSIX-style functions is called pcreposix.h. The official POSIX name is  PCRE is written in C, and it has its own API. The distribution now includes a
15  regex.h, but I didn't want to risk possible problems with existing files of  set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
16  that name by distributing it that way. To use it with an existing program that  for details).
17  uses the POSIX API, it will have to be renamed or pointed at by a link.  
18    Also included are a set of C wrapper functions that are based on the POSIX
19    API. These end up in the library called libpcreposix. Note that this just
20    provides a POSIX calling interface to PCRE: the regular expressions themselves
21    still follow Perl syntax and semantics. The header file for the POSIX-style
22    functions is called pcreposix.h. The official POSIX name is regex.h, but I
23    didn't want to risk possible problems with existing files of that name by
24    distributing it that way. To use it with an existing program that uses the
25    POSIX API, it will have to be renamed or pointed at by a link.
26    
27  If you are using the POSIX interface to PCRE and there is already a POSIX regex  If you are using the POSIX interface to PCRE and there is already a POSIX regex
28  library installed on your system, you must take care when linking programs to  library installed on your system, you must take care when linking programs to
# Line 22  ensure that they link with PCRE's libpcr Line 30  ensure that they link with PCRE's libpcr
30  up the "real" POSIX functions of the same name.  up the "real" POSIX functions of the same name.
31    
32    
33    Documentation for PCRE
34    ----------------------
35    
36    If you install PCRE in the normal way, you will end up with an installed set of
37    man pages whose names all start with "pcre". The one that is called "pcre"
38    lists all the others. In addition to these man pages, the PCRE documentation is
39    supplied in two other forms; however, as there is no standard place to install
40    them, they are left in the doc directory of the unpacked source distribution.
41    These forms are:
42    
43      1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
44         first of these is a concatenation of the text forms of all the section 3
45         man pages except those that summarize individual functions. The other two
46         are the text forms of the section 1 man pages for the pcregrep and
47         pcretest commands. Text forms are provided for ease of scanning with text
48         editors or similar tools.
49    
50      2. A subdirectory called doc/html contains all the documentation in HTML
51         form, hyperlinked in various ways, and rooted in a file called
52         doc/index.html.
53    
54    
55  Contributions by users of PCRE  Contributions by users of PCRE
56  ------------------------------  ------------------------------
57    
# Line 46  INSTALL. Line 76  INSTALL.
76    
77  Most commonly, people build PCRE within its own distribution directory, and in  Most commonly, people build PCRE within its own distribution directory, and in
78  this case, on many systems, just running "./configure" is sufficient, but the  this case, on many systems, just running "./configure" is sufficient, but the
79  usual methods of changing standard defaults are available. For example,  usual methods of changing standard defaults are available. For example:
80    
81  CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local  CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
82    
# Line 64  cd /build/pcre/pcre-xxx Line 94  cd /build/pcre/pcre-xxx
94  There are some optional features that can be included or omitted from the PCRE  There are some optional features that can be included or omitted from the PCRE
95  library. You can read more about them in the pcrebuild man page.  library. You can read more about them in the pcrebuild man page.
96    
97    . If you want to suppress the building of the C++ wrapper library, you can add
98      --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
99      will try to find a C++ compiler and C++ header files, and if it succeeds, it
100      will try to build the C++ wrapper.
101    
102  . If you want to make use of the support for UTF-8 character strings in PCRE,  . If you want to make use of the support for UTF-8 character strings in PCRE,
103    you must add --enable-utf8 to the "configure" command. Without it, the code    you must add --enable-utf8 to the "configure" command. Without it, the code
104    for handling UTF-8 is not included in the library. (Even when included, it    for handling UTF-8 is not included in the library. (Even when included, it
105    still has to be enabled by an option at run time.)    still has to be enabled by an option at run time.)
106    
107    . If, in addition to support for UTF-8 character strings, you want to include
108      support for the \P, \p, and \X sequences that recognize Unicode character
109      properties, you must add --enable-unicode-properties to the "configure"
110      command. This adds about 90K to the size of the library (in the form of a
111      property table); only the basic two-letter properties such as Lu are
112      supported.
113    
114  . You can build PCRE to recognized CR or NL as the newline character, instead  . You can build PCRE to recognized CR or NL as the newline character, instead
115    of whatever your compiler uses for "\n", by adding --newline-is-cr or    of whatever your compiler uses for "\n", by adding --newline-is-cr or
116    --newline-is-nl to the "configure" command, respectively. Only do this if you    --newline-is-nl to the "configure" command, respectively. Only do this if you
# Line 83  library. You can read more about them in Line 125  library. You can read more about them in
125    
126    on the "configure" command.    on the "configure" command.
127    
128  . PCRE has a counter which can be set to limit the amount of resources it uses.  . PCRE has a counter that can be set to limit the amount of resources it uses.
129    If the limit is exceeded during a match, the match fails. The default is ten    If the limit is exceeded during a match, the match fails. The default is ten
130    million. You can change the default by setting, for example,    million. You can change the default by setting, for example,
131    
# Line 101  library. You can read more about them in Line 143  library. You can read more about them in
143    is a representation of the compiled pattern, and this changes with the link    is a representation of the compiled pattern, and this changes with the link
144    size.    size.
145    
146  . You can build PCRE so that its match() function does not call itself  . You can build PCRE so that its internal match() function that is called from
147    recursively. Instead, it uses blocks of data from the heap via special    pcre_exec() does not call itself recursively. Instead, it uses blocks of data
148    functions pcre_stack_malloc() and pcre_stack_free() to save data that would    from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
149    otherwise be saved on the stack. To build PCRE like this, use    to save data that would otherwise be saved on the stack. To build PCRE like
150      this, use
151    
152    --disable-stack-for-recursion    --disable-stack-for-recursion
153    
154    on the "configure" command. PCRE runs more slowly in this mode, but it may be    on the "configure" command. PCRE runs more slowly in this mode, but it may be
155    necessary in environments with limited stack sizes.    necessary in environments with limited stack sizes. This applies only to the
156      pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
157      use deeply nested recursion.
158    
159    The "configure" script builds eight files for the basic C library:
160    
161    . pcre.h is the header file for C programs that call PCRE
162    . Makefile is the makefile that builds the library
163    . config.h contains build-time configuration options for the library
164    . pcre-config is a script that shows the settings of "configure" options
165    . libpcre.pc is data for the pkg-config command
166    . libtool is a script that builds shared and/or static libraries
167    . RunTest is a script for running tests on the library
168    . RunGrepTest is a script for running tests on the pcregrep command
169    
170  The "configure" script builds five files:  In addition, if a C++ compiler is found, the following are also built:
171    
172  . libtool is a script that builds shared and/or static libraries  . pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
173  . Makefile is built by copying Makefile.in and making substitutions.  . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
174  . config.h is built by copying config.in and making substitutions.  
175  . pcre-config is built by copying pcre-config.in and making substitutions.  The "configure" script also creates config.status, which is an executable
176  . RunTest is a script for running tests  script that can be run to recreate the configuration, and config.log, which
177    contains compiler output from tests that "configure" runs.
178    
179  Once "configure" has run, you can run "make". It builds two libraries called  Once "configure" has run, you can run "make". It builds two libraries, called
180  libpcre and libpcreposix, a test program called pcretest, and the pcregrep  libpcre and libpcreposix, a test program called pcretest, and the pcregrep
181  command. You can use "make install" to copy these, the public header files  command. If a C++ compiler was found on your system, it also builds the C++
182  pcre.h and pcreposix.h, and the man pages to appropriate live directories on  wrapper library, which is called libpcrecpp, and some test programs called
183  your system, in the normal way.  pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
184    
185    The command "make test" runs all the appropriate tests. Details of the PCRE
186    tests are given in a separate section of this document, below.
187    
188    You can use "make install" to copy the libraries, the public header files
189    pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
190    the C++ wrapper was built), and the man pages to appropriate live directories
191    on your system, in the normal way.
192    
193    If you want to remove PCRE from your system, you can run "make uninstall".
194    This removes all the files that "make install" installed. However, it does not
195    remove any directories, because these are often shared with other programs.
196    
197    
198    Retrieving configuration information on Unix-like systems
199    ---------------------------------------------------------
200    
201  Running "make install" also installs the command pcre-config, which can be used  Running "make install" also installs the command pcre-config, which can be used
202  to recall information about the PCRE configuration and installation. For  to recall information about the PCRE configuration and installation. For
203  example,  example:
204    
205    pcre-config --version    pcre-config --version
206    
207  prints the version number, and  prints the version number, and
208    
209   pcre-config --libs    pcre-config --libs
210    
211  outputs information about where the library is installed. This command can be  outputs information about where the library is installed. This command can be
212  included in makefiles for programs that use PCRE, saving the programmer from  included in makefiles for programs that use PCRE, saving the programmer from
213  having to remember too many details.  having to remember too many details.
214    
215    The pkg-config command is another system for saving and retrieving information
216    about installed libraries. Instead of separate commands for each library, a
217    single command is used. For example:
218    
219      pkg-config --cflags pcre
220    
221    The data is held in *.pc files that are installed in a directory called
222    pkgconfig.
223    
224    
225  Shared libraries on Unix-like systems  Shared libraries on Unix-like systems
226  -------------------------------------  -------------------------------------
227    
228  The default distribution builds PCRE as two shared libraries and two static  The default distribution builds PCRE as shared libraries and static libraries,
229  libraries, as long as the operating system supports shared libraries. Shared  as long as the operating system supports shared libraries. Shared library
230  library support relies on the "libtool" script which is built as part of the  support relies on the "libtool" script which is built as part of the
231  "configure" process.  "configure" process.
232    
233  The libtool script is used to compile and link both shared and static  The libtool script is used to compile and link both shared and static
# Line 158  installed themselves. However, the versi Line 240  installed themselves. However, the versi
240  use the uninstalled libraries.  use the uninstalled libraries.
241    
242  To build PCRE using static libraries only you must use --disable-shared when  To build PCRE using static libraries only you must use --disable-shared when
243  configuring it. For example  configuring it. For example:
244    
245  ./configure --prefix=/usr/gnu --disable-shared  ./configure --prefix=/usr/gnu --disable-shared
246    
# Line 174  order to cross-compile PCRE for some oth Line 256  order to cross-compile PCRE for some oth
256  process, the dftables.c source file is compiled *and run* on the local host, in  process, the dftables.c source file is compiled *and run* on the local host, in
257  order to generate the default character tables (the chartables.c file). It  order to generate the default character tables (the chartables.c file). It
258  therefore needs to be compiled with the local compiler, not the cross compiler.  therefore needs to be compiled with the local compiler, not the cross compiler.
259  You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD)  You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
260    there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
261  when calling the "configure" command. If they are not specified, they default  when calling the "configure" command. If they are not specified, they default
262  to the values of CC and CFLAGS.  to the values of CC and CFLAGS.
263    
# Line 196  Testing PCRE Line 279  Testing PCRE
279  ------------  ------------
280    
281  To test PCRE on a Unix system, run the RunTest script that is created by the  To test PCRE on a Unix system, run the RunTest script that is created by the
282  configuring process. (This can also be run by "make runtest", "make check", or  configuring process. There is also a script called RunGrepTest that tests the
283  "make test".) For other systems, see the instructions in NON-UNIX-USE.  options of the pcregrep command. If the C++ wrapper library is build, three
284    test programs called pcrecpp_unittest, pcre_scanner_unittest, and
285  The script runs the pcretest test program (which is documented in its own man  pcre_stringpiece_unittest are provided.
286  page) on each of the testinput files (in the testdata directory) in turn,  
287  and compares the output with the contents of the corresponding testoutput file.  Both the scripts and all the program tests are run if you obey "make runtest",
288  A file called testtry is used to hold the output from pcretest. To run pcretest  "make check", or "make test". For other systems, see the instructions in
289  on just one of the test files, give its number as an argument to RunTest, for  NON-UNIX-USE.
290  example:  
291    The RunTest script runs the pcretest test program (which is documented in its
292    own man page) on each of the testinput files (in the testdata directory) in
293    turn, and compares the output with the contents of the corresponding testoutput
294    file. A file called testtry is used to hold the main output from pcretest
295    (testsavedregex is also used as a working file). To run pcretest on just one of
296    the test files, give its number as an argument to RunTest, for example:
297    
298    RunTest 2    RunTest 2
299    
# Line 247  running "configure". This file can be al Line 336  running "configure". This file can be al
336  provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,  provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
337  commented in the script, can be be used.)  commented in the script, can be be used.)
338    
339  The fifth and final file tests error handling with UTF-8 encoding, and internal  The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
340  UTF-8 features of PCRE that are not relevant to Perl.  features of PCRE that are not relevant to Perl.
341    
342    The sixth and test checks the support for Unicode character properties. It it
343    not run automatically unless PCRE is built with Unicode property support. To to
344    this you must set --enable-unicode-properties when running "configure".
345    
346    The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
347    matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
348    property support, respectively. The eighth and ninth tests are not run
349    automatically unless PCRE is build with the relevant support.
350    
351    
352  Character tables  Character tables
353  ----------------  ----------------
354    
355  PCRE uses four tables for manipulating and identifying characters. The final  PCRE uses four tables for manipulating and identifying characters whose values
356  argument of the pcre_compile() function is a pointer to a block of memory  are less than 256. The final argument of the pcre_compile() function is a
357  containing the concatenated tables. A call to pcre_maketables() can be used to  pointer to a block of memory containing the concatenated tables. A call to
358  generate a set of tables in the current locale. If the final argument for  pcre_maketables() can be used to generate a set of tables in the current
359  pcre_compile() is passed as NULL, a set of default tables that is built into  locale. If the final argument for pcre_compile() is passed as NULL, a set of
360  the binary is used.  default tables that is built into the binary is used.
361    
362  The source file called chartables.c contains the default set of tables. This is  The source file called chartables.c contains the default set of tables. This is
363  not supplied in the distribution, but is built by the program dftables  not supplied in the distribution, but is built by the program dftables
# Line 299  The distribution should contain the foll Line 397  The distribution should contain the foll
397      headers:      headers:
398    
399    dftables.c            auxiliary program for building chartables.c    dftables.c            auxiliary program for building chartables.c
400    get.c                 )  
   maketables.c          )  
   study.c               ) source of  
   pcre.c                )   the functions  
401    pcreposix.c           )    pcreposix.c           )
402    printint.c            )    pcre_compile.c        )
403      pcre_config.c         )
404      pcre_dfa_exec.c       )
405      pcre_exec.c           )
406      pcre_fullinfo.c       )
407      pcre_get.c            ) sources for the functions in the library,
408      pcre_globals.c        )   and some internal functions that they use
409      pcre_info.c           )
410      pcre_maketables.c     )
411      pcre_ord2utf8.c       )
412      pcre_printint.c       )
413      pcre_study.c          )
414      pcre_tables.c         )
415      pcre_try_flipped.c    )
416      pcre_ucp_findchar.c   )
417      pcre_valid_utf8.c     )
418      pcre_version.c        )
419      pcre_xclass.c         )
420    
421      ucp_findchar.c        )
422      ucp.h                 ) source for the code that is used for
423      ucpinternal.h         )   Unicode property handling
424      ucptable.c            )
425      ucptypetable.c        )
426    
427    pcre.in               "source" for the header for the external API; pcre.h    pcre.in               "source" for the header for the external API; pcre.h
428                            is built from this by "configure"                            is built from this by "configure"
429    pcreposix.h           header for the external POSIX wrapper API    pcreposix.h           header for the external POSIX wrapper API
430    internal.h            header for internal use    pcre_internal.h       header for internal use
431    config.in             template for config.h, which is built by configure    config.in             template for config.h, which is built by configure
432    
433      pcrecpp.h.in          "source" for the header file for the C++ wrapper
434      pcrecpp.cc            )
435      pcre_scanner.cc       ) source for the C++ wrapper library
436    
437      pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
438                              C++ stringpiece functions
439      pcre_stringpiece.cc   source for the C++ stringpiece functions
440    
441  (B) Auxiliary files:  (B) Auxiliary files:
442    
443    AUTHORS               information about the author of PCRE    AUTHORS               information about the author of PCRE
# Line 323  The distribution should contain the foll Line 450  The distribution should contain the foll
450    NON-UNIX-USE          notes on building PCRE on non-Unix systems    NON-UNIX-USE          notes on building PCRE on non-Unix systems
451    README                this file    README                this file
452    RunTest.in            template for a Unix shell script for running tests    RunTest.in            template for a Unix shell script for running tests
453      RunGrepTest.in        template for a Unix shell script for pcregrep tests
454    config.guess          ) files used by libtool,    config.guess          ) files used by libtool,
455    config.sub            )   used only when building a shared library    config.sub            )   used only when building a shared library
456    configure             a configuring shell script (built by autoconf)    configure             a configuring shell script (built by autoconf)
# Line 335  The distribution should contain the foll Line 463  The distribution should contain the foll
463    doc/pcretest.txt      plain text documentation of test program    doc/pcretest.txt      plain text documentation of test program
464    doc/perltest.txt      plain text documentation of Perl test program    doc/perltest.txt      plain text documentation of Perl test program
465    install-sh            a shell script for installing files    install-sh            a shell script for installing files
466      libpcre.pc.in         "source" for libpcre.pc for pkg-config
467    ltmain.sh             file used to build a libtool script    ltmain.sh             file used to build a libtool script
468      mkinstalldirs         script for making install directories
469    pcretest.c            comprehensive test program    pcretest.c            comprehensive test program
470    pcredemo.c            simple demonstration of coding calls to PCRE    pcredemo.c            simple demonstration of coding calls to PCRE
471    perltest              Perl test program    perltest              Perl test program
472    pcregrep.c            source of a grep utility that uses PCRE    pcregrep.c            source of a grep utility that uses PCRE
473    pcre-config.in        source of script which retains PCRE information    pcre-config.in        source of script which retains PCRE information
474    testdata/testinput1   test data, compatible with Perl    pcrecpp_unittest.c           )
475    testdata/testinput2   test data for error messages and non-Perl things    pcre_scanner_unittest.c      ) test programs for the C++ wrapper
476    testdata/testinput3   test data for locale-specific tests    pcre_stringpiece_unittest.c  )
477    testdata/testinput4   test data for UTF-8 tests compatible with Perl    testdata/testinput*   test data for main library tests
478    testdata/testinput5   test data for other UTF-8 tests    testdata/testoutput*  expected test results
479    testdata/testoutput1  test results corresponding to testinput1    testdata/grep*        input and output for pcregrep tests
   testdata/testoutput2  test results corresponding to testinput2  
   testdata/testoutput3  test results corresponding to testinput3  
   testdata/testoutput4  test results corresponding to testinput4  
   testdata/testoutput5  test results corresponding to testinput5  
480    
481  (C) Auxiliary files for Win32 DLL  (C) Auxiliary files for Win32 DLL
482    
483    dll.mk    libpcre.def
484      libpcreposix.def
485    pcre.def    pcre.def
486    
487  (D) Auxiliary file for VPASCAL  (D) Auxiliary file for VPASCAL
488    
489    makevp.bat    makevp.bat
490    
491  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel
492  December 2003  Email local part: ph10
493    Email domain: cam.ac.uk
494    August 2005

Legend:
Removed from v.73  
changed lines
  Added in v.83

  ViewVC Help
Powered by ViewVC 1.1.5