/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 840 by ph10, Fri Dec 30 19:32:50 2011 UTC revision 936 by ph10, Sat Feb 25 17:02:23 2012 UTC
# Line 34  The contents of this README file are: Line 34  The contents of this README file are:
34  The PCRE APIs  The PCRE APIs
35  -------------  -------------
36    
37  PCRE is written in C, and it has its own API. There are two sets of functions,  PCRE is written in C, and it has its own API. There are two sets of functions,
38  one for the 8-bit library, which processes strings of bytes, and one for the  one for the 8-bit library, which processes strings of bytes, and one for the
39  16-bit library, which processes strings of 16-bit values. The distribution also  16-bit library, which processes strings of 16-bit values. The distribution also
40  includes a set of C++ wrapper functions (see the pcrecpp man page for details),  includes a set of C++ wrapper functions (see the pcrecpp man page for details),
41  courtesy of Google Inc., which can be used to call the 8-bit PCRE library from  courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
42  C++.  C++.
43    
44  In addition, there is a set of C wrapper functions (again, just for the 8-bit  In addition, there is a set of C wrapper functions (again, just for the 8-bit
45  library) that are based on the POSIX regular expression API (see the pcreposix  library) that are based on the POSIX regular expression API (see the pcreposix
46  man page). These end up in the library called libpcreposix. Note that this just  man page). These end up in the library called libpcreposix. Note that this just
47  provides a POSIX calling interface to PCRE; the regular expressions themselves  provides a POSIX calling interface to PCRE; the regular expressions themselves
# Line 171  library. They are also documented in the Line 171  library. They are also documented in the
171    --disable-static    --disable-static
172    
173    (See also "Shared libraries on Unix-like systems" below.)    (See also "Shared libraries on Unix-like systems" below.)
174    
175  . By default, only the 8-bit library is built. If you add --enable-pcre16 to  . By default, only the 8-bit library is built. If you add --enable-pcre16 to
176    the "configure" command, the 16-bit library is also built. If you want only    the "configure" command, the 16-bit library is also built. If you want only
177    the 16-bit library, use "./configure --enable-pcre16 --disable-pcre8".    the 16-bit library, use "./configure --enable-pcre16 --disable-pcre8".
178    
179  . If you are building the 8-bit library and want to suppress the building of  . If you are building the 8-bit library and want to suppress the building of
180    the C++ wrapper library, you can add --disable-cpp to the "configure"    the C++ wrapper library, you can add --disable-cpp to the "configure"
181    command. Otherwise, when "configure" is run without --disable-pcre8, it will    command. Otherwise, when "configure" is run without --disable-pcre8, it will
182    try to find a C++ compiler and C++ header files, and if it succeeds, it will    try to find a C++ compiler and C++ header files, and if it succeeds, it will
# Line 195  library. They are also documented in the Line 195  library. They are also documented in the
195    the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,    the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
196    you must add --enable-utf to the "configure" command. Without it, the code    you must add --enable-utf to the "configure" command. Without it, the code
197    for handling UTF-8 and UTF-16 is not included in the relevant library. Even    for handling UTF-8 and UTF-16 is not included in the relevant library. Even
198    when --enable-utf included, the use of UTF encoding still has to be enabled    when --enable-utf is included, the use of a UTF encoding still has to be
199    by an option at run time. When PCRE is compiled with this option, its input    enabled by an option at run time. When PCRE is compiled with this option, its
200    can only either be ASCII or UTF-8/16, even when running on EBCDIC platforms.    input can only either be ASCII or UTF-8/16, even when running on EBCDIC
201    It is not possible to use both --enable-utf and --enable-ebcdic at the same    platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
202    time.    the same time.
203    
204  . The option --enable-utf8 is retained for backwards compatibility with earlier  . There are no separate options for enabling UTF-8 and UTF-16 independently
205    releases that did not support 16-bit character strings. It is synonymous with    because that would allow ridiculous settings such as requesting UTF-16
206    --enable-utf. It is not possible to configure one library with UTF support    support while building only the 8-bit library. However, the option
207    and the other without in the same configuration.    --enable-utf8 is retained for backwards compatibility with earlier releases
208      that did not support 16-bit character strings. It is synonymous with
209      --enable-utf. It is not possible to configure one library with UTF support
210      and the other without in the same configuration.
211    
212  . If, in addition to support for UTF-8/16 character strings, you want to  . If, in addition to support for UTF-8/16 character strings, you want to
213    include support for the \P, \p, and \X sequences that recognize Unicode    include support for the \P, \p, and \X sequences that recognize Unicode
214    character properties, you must add --enable-unicode-properties to the    character properties, you must add --enable-unicode-properties to the
215    "configure" command. This adds about 30K to the size of the library (in the    "configure" command. This adds about 30K to the size of the library (in the
# Line 264  library. They are also documented in the Line 267  library. They are also documented in the
267    sizes in the pcrestack man page.    sizes in the pcrestack man page.
268    
269  . The default maximum compiled pattern size is around 64K. You can increase  . The default maximum compiled pattern size is around 64K. You can increase
270    this by adding --with-link-size=3 to the "configure" command. In the 8-bit    this by adding --with-link-size=3 to the "configure" command. In the 8-bit
271    library, PCRE then uses three bytes instead of two for offsets to different    library, PCRE then uses three bytes instead of two for offsets to different
272    parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is    parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
273    the same as --with-link-size=4, which (in both libraries) uses four-byte    the same as --with-link-size=4, which (in both libraries) uses four-byte
274    offsets. Increasing the internal link size reduces performance.    offsets. Increasing the internal link size reduces performance.
275    
276  . You can build PCRE so that its internal match() function that is called from  . You can build PCRE so that its internal match() function that is called from
# Line 305  library. They are also documented in the Line 308  library. They are also documented in the
308    when PCRE is built this way, it always operates in EBCDIC. It cannot support    when PCRE is built this way, it always operates in EBCDIC. It cannot support
309    both EBCDIC and UTF-8/16.    both EBCDIC and UTF-8/16.
310    
311  . The pcregrep program currently supports only 8-bit data files, and so  . The pcregrep program currently supports only 8-bit data files, and so
312    requires the 8-bit PCRE library. It is possible to compile pcregrep to use    requires the 8-bit PCRE library. It is possible to compile pcregrep to use
313    libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by    libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
314    specifying one or both of    specifying one or both of
# Line 323  library. They are also documented in the Line 326  library. They are also documented in the
326    The default value is 20K.    The default value is 20K.
327    
328  . It is possible to compile pcretest so that it links with the libreadline  . It is possible to compile pcretest so that it links with the libreadline
329    library, by specifying    or libedit libraries, by specifying, respectively,
330    
331    --enable-pcretest-libreadline    --enable-pcretest-libreadline or --enable-pcretest-libedit
332    
333    If this is done, when pcretest's input is from a terminal, it reads it using    If this is done, when pcretest's input is from a terminal, it reads it using
334    the readline() function. This provides line-editing and history facilities.    the readline() function. This provides line-editing and history facilities.
335    Note that libreadline is GPL-licenced, so if you distribute a binary of    Note that libreadline is GPL-licenced, so if you distribute a binary of
336    pcretest linked in this way, there may be licensing issues.    pcretest linked in this way, there may be licensing issues. These can be
337      avoided by linking with libedit instead.
338    
339    Setting this option causes the -lreadline option to be added to the pcretest    Enabling libreadline causes the -lreadline option to be added to the pcretest
340    build. In many operating environments with a sytem-installed readline    build. In many operating environments with a sytem-installed readline
341    library this is sufficient. However, in some environments (e.g. if an    library this is sufficient. However, in some environments (e.g. if an
342    unmodified distribution version of readline is in use), it may be necessary    unmodified distribution version of readline is in use), it may be necessary
# Line 397  system. The following are installed (fil Line 401  system. The following are installed (fil
401      pcre-config      pcre-config
402    
403    Libraries (lib):    Libraries (lib):
404      libpcre16     (if 16-bit support is enabled)      libpcre16     (if 16-bit support is enabled)
405      libpcre       (if 8-bit support is enabled)      libpcre       (if 8-bit support is enabled)
406      libpcreposix  (if 8-bit support is enabled)      libpcreposix  (if 8-bit support is enabled)
407      libpcrecpp    (if 8-bit and C++ support is enabled)      libpcrecpp    (if 8-bit and C++ support is enabled)
408    
409    Configuration information (lib/pkgconfig):    Configuration information (lib/pkgconfig):
410      libpcre16.pc      libpcre16.pc
411      libpcre.pc      libpcre.pc
412      libpcreposix.pc      libpcreposix.pc
413      libpcrecpp.pc (if C++ support is enabled)      libpcrecpp.pc (if C++ support is enabled)
# Line 592  tests that are marked "never study" (see Line 596  tests that are marked "never study" (see
596  done). If JIT support is available, the non-DFA tests are run a third time,  done). If JIT support is available, the non-DFA tests are run a third time,
597  this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.  this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
598    
599  When both 8-bit and 16-bit support is enabled, the entire set of tests is run  When both 8-bit and 16-bit support is enabled, the entire set of tests is run
600  twice, once for each library. If you want to run just one set of tests, call  twice, once for each library. If you want to run just one set of tests, call
601  RunTest with either the -8 or -16 option.  RunTest with either the -8 or -16 option.
602    
603  RunTest uses a file called testtry to hold the main output from pcretest  RunTest uses a file called testtry to hold the main output from pcretest.
604  (testsavedregex is also used as a working file). To run pcretest on just one or  Other files whose names begin with "test" are used as working files in some
605  more specific test files, give their numbers as arguments to RunTest, for  tests. To run pcretest on just one or more specific test files, give their
606  example:  numbers as arguments to RunTest, for example:
607    
608    RunTest 2 7 11    RunTest 2 7 11
609    
610  The first test file can be fed directly into the perltest.pl script to check  The first test file can be fed directly into the perltest.pl script to check
611  that Perl gives the same results. The only difference you should see is in the  that Perl gives the same results. The only difference you should see is in the
612  first few lines, where the Perl version is given instead of the PCRE version.  first few lines, where the Perl version is given instead of the PCRE version.
# Line 658  The twelfth test is run only when JIT su Line 662  The twelfth test is run only when JIT su
662  test is run only when JIT support is not available. They test some JIT-specific  test is run only when JIT support is not available. They test some JIT-specific
663  features such as information output from pcretest about JIT compilation.  features such as information output from pcretest about JIT compilation.
664    
665  The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and  The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
666  the seventeenth, eighteenth, and nineteenth tests are run only in 16-bit mode.  the seventeenth, eighteenth, and nineteenth tests are run only in 16-bit mode.
667  These are tests that generate different output in the two modes. They are for  These are tests that generate different output in the two modes. They are for
668  general cases, UTF-8/16 support, and Unicode property support, respectively.  general cases, UTF-8/16 support, and Unicode property support, respectively.
669    
670  The twentieth test is run only in 16-bit mode. It tests some specific 16-bit  The twentieth test is run only in 16-bit mode. It tests some specific 16-bit
671  features of the DFA matching engine.  features of the DFA matching engine.
672    
673    The twenty-first and twenty-second tests are run only in 16-bit mode, when the
674    link size is set to 2. They test reloading pre-compiled patterns.
675    
676    
677  Character tables  Character tables
678  ----------------  ----------------
# Line 724  will cause PCRE to malfunction. Line 731  will cause PCRE to malfunction.
731  File manifest  File manifest
732  -------------  -------------
733    
734  The distribution should contain the files listed below. Where a file name is  The distribution should contain the files listed below. Where a file name is
735  given as pcre[16]_xxx it means that there are two files, one with the name  given as pcre[16]_xxx it means that there are two files, one with the name
736  pcre_xxx and the other with the name pcre16_xxx.  pcre_xxx and the other with the name pcre16_xxx.
737    
738  (A) Source files of the PCRE library functions and their headers:  (A) Source files of the PCRE library functions and their headers:
# Line 761  pcre_xxx and the other with the name pcr Line 768  pcre_xxx and the other with the name pcr
768    pcre16_ord2utf16.c      )    pcre16_ord2utf16.c      )
769    pcre16_utf16_utils.c    )    pcre16_utf16_utils.c    )
770    pcre16_valid_utf16.c    )    pcre16_valid_utf16.c    )
771    
772    pcre[16]_printint.c     ) debugging function that is used by pcretest,    pcre[16]_printint.c     ) debugging function that is used by pcretest,
773                            )   and can also be #included in pcre_compile()                            )   and can also be #included in pcre_compile()
774    
775    pcre.h.in               template for pcre.h when built by "configure"    pcre.h.in               template for pcre.h when built by "configure"
776    pcreposix.h             header for the external POSIX wrapper API    pcreposix.h             header for the external POSIX wrapper API
777    pcre_internal.h         header for internal use    pcre_internal.h         header for internal use
# Line 843  pcre_xxx and the other with the name pcr Line 850  pcre_xxx and the other with the name pcr
850    testdata/testinput*     test data for main library tests    testdata/testinput*     test data for main library tests
851    testdata/testoutput*    expected test results    testdata/testoutput*    expected test results
852    testdata/grep*          input and output for pcregrep tests    testdata/grep*          input and output for pcregrep tests
853    testdata/*              other supporting test files    testdata/*              other supporting test files
854    
855  (D) Auxiliary files for cmake support  (D) Auxiliary files for cmake support
856    
# Line 874  pcre_xxx and the other with the name pcr Line 881  pcre_xxx and the other with the name pcr
881  Philip Hazel  Philip Hazel
882  Email local part: ph10  Email local part: ph10
883  Email domain: cam.ac.uk  Email domain: cam.ac.uk
884  Last updated: 30 December 2011  Last updated: 25 February 2012

Legend:
Removed from v.840  
changed lines
  Added in v.936

  ViewVC Help
Powered by ViewVC 1.1.5