/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1031 by ph10, Sat Sep 8 15:59:01 2012 UTC revision 1055 by chpe, Tue Oct 16 15:53:30 2012 UTC
# Line 35  The contents of this README file are: Line 35  The contents of this README file are:
35  The PCRE APIs  The PCRE APIs
36  -------------  -------------
37    
38  PCRE is written in C, and it has its own API. There are two sets of functions,  PCRE is written in C, and it has its own API. There are three sets of functions,
39  one for the 8-bit library, which processes strings of bytes, and one for the  one for the 8-bit library, which processes strings of bytes, one for the
40  16-bit library, which processes strings of 16-bit values. The distribution also  16-bit library, which processes strings of 16-bit values, and one for the 32-bit
41    library, which processes strings of 32-bit values. The distribution also
42  includes a set of C++ wrapper functions (see the pcrecpp man page for details),  includes a set of C++ wrapper functions (see the pcrecpp man page for details),
43  courtesy of Google Inc., which can be used to call the 8-bit PCRE library from  courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
44  C++.  C++.
# Line 183  library. They are also documented in the Line 184  library. They are also documented in the
184    (See also "Shared libraries on Unix-like systems" below.)    (See also "Shared libraries on Unix-like systems" below.)
185    
186  . By default, only the 8-bit library is built. If you add --enable-pcre16 to  . By default, only the 8-bit library is built. If you add --enable-pcre16 to
187    the "configure" command, the 16-bit library is also built. If you want only    the "configure" command, the 16-bit library is also built. If you add
188    the 16-bit library, use "./configure --enable-pcre16 --disable-pcre8".    --enable-pcre32 to the "configure" command, the 32-bit library is also built.
189      If you want only the 16-bit or 32-bit library, --disable-pcre8 to disable
190      building the 8-bit library.
191    
192  . If you are building the 8-bit library and want to suppress the building of  . If you are building the 8-bit library and want to suppress the building of
193    the C++ wrapper library, you can add --disable-cpp to the "configure"    the C++ wrapper library, you can add --disable-cpp to the "configure"
# Line 203  library. They are also documented in the Line 206  library. They are also documented in the
206    
207  . If you want to make use of the support for UTF-8 Unicode character strings in  . If you want to make use of the support for UTF-8 Unicode character strings in
208    the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,    the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
209    you must add --enable-utf to the "configure" command. Without it, the code    or UTF-32 Unicode character strings in the 32-bit library, you must add
210    for handling UTF-8 and UTF-16 is not included in the relevant library. Even    --enable-utf to the "configure" command. Without it, the code for handling
211      UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
212    when --enable-utf is included, the use of a UTF encoding still has to be    when --enable-utf is included, the use of a UTF encoding still has to be
213    enabled by an option at run time. When PCRE is compiled with this option, its    enabled by an option at run time. When PCRE is compiled with this option, its
214    input can only either be ASCII or UTF-8/16, even when running on EBCDIC    input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
215    platforms. It is not possible to use both --enable-utf and --enable-ebcdic at    platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
216    the same time.    the same time.
217    
218  . There are no separate options for enabling UTF-8 and UTF-16 independently  . There are no separate options for enabling UTF-8, UTF-16 and UTF-32
219    because that would allow ridiculous settings such as requesting UTF-16    independently because that would allow ridiculous settings such as requesting
220    support while building only the 8-bit library. However, the option    UTF-16 support while building only the 8-bit library. However, the option
221    --enable-utf8 is retained for backwards compatibility with earlier releases    --enable-utf8 is retained for backwards compatibility with earlier releases
222    that did not support 16-bit character strings. It is synonymous with    that did not support 16-bit or 32-bit character strings. It is synonymous with
223    --enable-utf. It is not possible to configure one library with UTF support    --enable-utf. It is not possible to configure one library with UTF support
224    and the other without in the same configuration.    and the other without in the same configuration.
225    
226  . If, in addition to support for UTF-8/16 character strings, you want to  . If, in addition to support for UTF-8/16/32 character strings, you want to
227    include support for the \P, \p, and \X sequences that recognize Unicode    include support for the \P, \p, and \X sequences that recognize Unicode
228    character properties, you must add --enable-unicode-properties to the    character properties, you must add --enable-unicode-properties to the
229    "configure" command. This adds about 30K to the size of the library (in the    "configure" command. This adds about 30K to the size of the library (in the
# Line 281  library. They are also documented in the Line 285  library. They are also documented in the
285    library, PCRE then uses three bytes instead of two for offsets to different    library, PCRE then uses three bytes instead of two for offsets to different
286    parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is    parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
287    the same as --with-link-size=4, which (in both libraries) uses four-byte    the same as --with-link-size=4, which (in both libraries) uses four-byte
288    offsets. Increasing the internal link size reduces performance.    offsets. Increasing the internal link size reduces performance. In the 32-bit
289      library, the only supported link size is 4.
290    
291  . You can build PCRE so that its internal match() function that is called from  . You can build PCRE so that its internal match() function that is called from
292    pcre_exec() does not call itself recursively. Instead, it uses memory blocks    pcre_exec() does not call itself recursively. Instead, it uses memory blocks
# Line 316  library. They are also documented in the Line 321  library. They are also documented in the
321    
322    This automatically implies --enable-rebuild-chartables (see above). However,    This automatically implies --enable-rebuild-chartables (see above). However,
323    when PCRE is built this way, it always operates in EBCDIC. It cannot support    when PCRE is built this way, it always operates in EBCDIC. It cannot support
324    both EBCDIC and UTF-8/16. There is a second option, --enable-ebcdic-nl25,    both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
325    which specifies that the code value for the EBCDIC NL character is 0x25    which specifies that the code value for the EBCDIC NL character is 0x25
326    instead of the default 0x15.    instead of the default 0x15.
327    
# Line 368  The "configure" script builds the follow Line 373  The "configure" script builds the follow
373                           that were set for "configure"                           that were set for "configure"
374  . libpcre.pc         ) data for the pkg-config command  . libpcre.pc         ) data for the pkg-config command
375  . libpcre16.pc       )  . libpcre16.pc       )
376    . libpcre32.pc       )
377  . libpcreposix.pc    )  . libpcreposix.pc    )
378  . libtool              script that builds shared and/or static libraries  . libtool              script that builds shared and/or static libraries
379    
# Line 387  The "configure" script also creates conf Line 393  The "configure" script also creates conf
393  script that can be run to recreate the configuration, and config.log, which  script that can be run to recreate the configuration, and config.log, which
394  contains compiler output from tests that "configure" runs.  contains compiler output from tests that "configure" runs.
395    
396  Once "configure" has run, you can run "make". This builds either or both of the  Once "configure" has run, you can run "make". This builds the the libraries
397  libraries libpcre and libpcre16, and a test program called pcretest. If you  libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
398  enabled JIT support with --enable-jit, a test program called pcre_jit_test is  enabled JIT support with --enable-jit, a test program called pcre_jit_test is
399  built as well.  built as well.
400    
# Line 412  system. The following are installed (fil Line 418  system. The following are installed (fil
418    
419    Libraries (lib):    Libraries (lib):
420      libpcre16     (if 16-bit support is enabled)      libpcre16     (if 16-bit support is enabled)
421        libpcre32     (if 32-bit support is enabled)
422      libpcre       (if 8-bit support is enabled)      libpcre       (if 8-bit support is enabled)
423      libpcreposix  (if 8-bit support is enabled)      libpcreposix  (if 8-bit support is enabled)
424      libpcrecpp    (if 8-bit and C++ support is enabled)      libpcrecpp    (if 8-bit and C++ support is enabled)
425    
426    Configuration information (lib/pkgconfig):    Configuration information (lib/pkgconfig):
427      libpcre16.pc      libpcre16.pc
428        libpcre32.pc
429      libpcre.pc      libpcre.pc
430      libpcreposix.pc      libpcreposix.pc
431      libpcrecpp.pc (if C++ support is enabled)      libpcrecpp.pc (if C++ support is enabled)
# Line 598  The RunTest script runs the pcretest tes Line 606  The RunTest script runs the pcretest tes
606  own man page) on each of the relevant testinput files in the testdata  own man page) on each of the relevant testinput files in the testdata
607  directory, and compares the output with the contents of the corresponding  directory, and compares the output with the contents of the corresponding
608  testoutput files. Some tests are relevant only when certain build-time options  testoutput files. Some tests are relevant only when certain build-time options
609  were selected. For example, the tests for UTF-8/16 support are run only if  were selected. For example, the tests for UTF-8/16/32 support are run only if
610  --enable-utf was used. RunTest outputs a comment when it skips a test.  --enable-utf was used. RunTest outputs a comment when it skips a test.
611    
612  Many of the tests that are not skipped are run up to three times. The second  Many of the tests that are not skipped are run up to three times. The second
# Line 607  tests that are marked "never study" (see Line 615  tests that are marked "never study" (see
615  done). If JIT support is available, the non-DFA tests are run a third time,  done). If JIT support is available, the non-DFA tests are run a third time,
616  this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.  this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
617    
618  When both 8-bit and 16-bit support is enabled, the entire set of tests is run  The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
619  twice, once for each library. If you want to run just one set of tests, call  libraries that are enabled. If you want to run just one set of tests, call
620  RunTest with either the -8 or -16 option.  RunTest with either the -8, -16 or -32 option.
621    
622  RunTest uses a file called testtry to hold the main output from pcretest.  RunTest uses a file called testtry to hold the main output from pcretest.
623  Other files whose names begin with "test" are used as working files in some  Other files whose names begin with "test" are used as working files in some
# Line 660  RunTest.bat. The version of RunTest.bat Line 668  RunTest.bat. The version of RunTest.bat
668  Windows versions of test 2. More info on using RunTest.bat is included in the  Windows versions of test 2. More info on using RunTest.bat is included in the
669  document entitled NON-UNIX-USE.]  document entitled NON-UNIX-USE.]
670    
671  The fourth and fifth tests check the UTF-8/16 support and error handling and  The fourth and fifth tests check the UTF-8/16/32 support and error handling and
672  internal UTF features of PCRE that are not relevant to Perl, respectively. The  internal UTF features of PCRE that are not relevant to Perl, respectively. The
673  sixth and seventh tests do the same for Unicode character properties support.  sixth and seventh tests do the same for Unicode character properties support.
674    
675  The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative  The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
676  matching function, in non-UTF-8/16 mode, UTF-8/16 mode, and UTF-8/16 mode with  matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
677  Unicode property support, respectively.  mode with Unicode property support, respectively.
678    
679  The eleventh test checks some internal offsets and code size features; it is  The eleventh test checks some internal offsets and code size features; it is
680  run only when the default "link size" of 2 is set (in other cases the sizes  run only when the default "link size" of 2 is set (in other cases the sizes
# Line 677  test is run only when JIT support is not Line 685  test is run only when JIT support is not
685  features such as information output from pcretest about JIT compilation.  features such as information output from pcretest about JIT compilation.
686    
687  The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and  The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
688  the seventeenth, eighteenth, and nineteenth tests are run only in 16-bit mode.  the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit mode.
689  These are tests that generate different output in the two modes. They are for  These are tests that generate different output in the two modes. They are for
690  general cases, UTF-8/16 support, and Unicode property support, respectively.  general cases, UTF-8/16/32 support, and Unicode property support, respectively.
691    
692  The twentieth test is run only in 16-bit mode. It tests some specific 16-bit  The twentieth test is run only in 16/32-bit mode. It tests some specific
693  features of the DFA matching engine.  16/32-bit features of the DFA matching engine.
694    
695  The twenty-first and twenty-second tests are run only in 16-bit mode, when the  The twenty-first and twenty-second tests are run only in 16/32-bit mode, when the
696  link size is set to 2. They test reloading pre-compiled patterns.  link size is set to 2 for the 16-bit library. They test reloading pre-compiled patterns.
697    
698    The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are for
699    general cases, and UTF-16 support, respectively.
700    
701    The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are for
702    general cases, and UTF-32 support, respectively.
703    
704  Character tables  Character tables
705  ----------------  ----------------
# Line 746  File manifest Line 759  File manifest
759  -------------  -------------
760    
761  The distribution should contain the files listed below. Where a file name is  The distribution should contain the files listed below. Where a file name is
762  given as pcre[16]_xxx it means that there are two files, one with the name  given as pcre[16|32]_xxx it means that there are three files, one with the name
763  pcre_xxx and the other with the name pcre16_xxx.  pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
764    
765  (A) Source files of the PCRE library functions and their headers:  (A) Source files of the PCRE library functions and their headers:
766    
# Line 758  pcre_xxx and the other with the name pcr Line 771  pcre_xxx and the other with the name pcr
771                              coding; used, unless --enable-rebuild-chartables is                              coding; used, unless --enable-rebuild-chartables is
772                              specified, by copying to pcre[16]_chartables.c                              specified, by copying to pcre[16]_chartables.c
773    
774    pcreposix.c             )    pcreposix.c                )
775    pcre[16]_byte_order.c   )    pcre[16|32]_byte_order.c   )
776    pcre[16]_compile.c      )    pcre[16|32]_compile.c      )
777    pcre[16]_config.c       )    pcre[16|32]_config.c       )
778    pcre[16]_dfa_exec.c     )    pcre[16|32]_dfa_exec.c     )
779    pcre[16]_exec.c         )    pcre[16|32]_exec.c         )
780    pcre[16]_fullinfo.c     )    pcre[16|32]_fullinfo.c     )
781    pcre[16]_get.c          ) sources for the functions in the library,    pcre[16|32]_get.c          ) sources for the functions in the library,
782    pcre[16]_globals.c      )   and some internal functions that they use    pcre[16|32]_globals.c      )   and some internal functions that they use
783    pcre[16]_jit_compile.c  )    pcre[16|32]_jit_compile.c  )
784    pcre[16]_maketables.c   )    pcre[16|32]_maketables.c   )
785    pcre[16]_newline.c      )    pcre[16|32]_newline.c      )
786    pcre[16]_refcount.c     )    pcre[16|32]_refcount.c     )
787    pcre[16]_string_utils.c )    pcre[16|32]_string_utils.c )
788    pcre[16]_study.c        )    pcre[16|32]_study.c        )
789    pcre[16]_tables.c       )    pcre[16|32]_tables.c       )
790    pcre[16]_ucd.c          )    pcre[16|32]_ucd.c          )
791    pcre[16]_version.c      )    pcre[16|32]_version.c      )
792    pcre[16]_xclass.c       )    pcre[16|32]_xclass.c       )
793    pcre_ord2utf8.c         )    pcre_ord2utf8.c            )
794    pcre_valid_utf8.c       )    pcre_valid_utf8.c          )
795    pcre16_ord2utf16.c      )    pcre16_ord2utf16.c         )
796    pcre16_utf16_utils.c    )    pcre16_utf16_utils.c       )
797    pcre16_valid_utf16.c    )    pcre16_valid_utf16.c       )
798      pcre32_utf32_utils.c       )
799      pcre32_valid_utf32.c       )
800    
801    pcre[16]_printint.c     ) debugging function that is used by pcretest,    pcre[16|32]_printint.c     ) debugging function that is used by pcretest,
802                            )   and can also be #included in pcre_compile()                               )   and can also be #included in pcre_compile()
803    
804    pcre.h.in               template for pcre.h when built by "configure"    pcre.h.in               template for pcre.h when built by "configure"
805    pcreposix.h             header for the external POSIX wrapper API    pcreposix.h             header for the external POSIX wrapper API
# Line 849  pcre_xxx and the other with the name pcr Line 864  pcre_xxx and the other with the name pcr
864    doc/perltest.txt        plain text documentation of Perl test program    doc/perltest.txt        plain text documentation of Perl test program
865    install-sh              a shell script for installing files    install-sh              a shell script for installing files
866    libpcre16.pc.in         template for libpcre16.pc for pkg-config    libpcre16.pc.in         template for libpcre16.pc for pkg-config
867      libpcre32.pc.in         template for libpcre32.pc for pkg-config
868    libpcre.pc.in           template for libpcre.pc for pkg-config    libpcre.pc.in           template for libpcre.pc for pkg-config
869    libpcreposix.pc.in      template for libpcreposix.pc for pkg-config    libpcreposix.pc.in      template for libpcreposix.pc for pkg-config
870    libpcrecpp.pc.in        template for libpcrecpp.pc for pkg-config    libpcrecpp.pc.in        template for libpcrecpp.pc for pkg-config

Legend:
Removed from v.1031  
changed lines
  Added in v.1055

  ViewVC Help
Powered by ViewVC 1.1.5