/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 77 by nigel, Sat Feb 24 21:40:45 2007 UTC revision 101 by ph10, Tue Mar 6 15:19:44 2007 UTC
# Line 34  Documentation for PCRE Line 34  Documentation for PCRE
34  ----------------------  ----------------------
35    
36  If you install PCRE in the normal way, you will end up with an installed set of  If you install PCRE in the normal way, you will end up with an installed set of
37  man pages whose names all start with "pcre". The one that is called "pcre"  man pages whose names all start with "pcre". The one that is just called "pcre"
38  lists all the others. In addition to these man pages, the PCRE documentation is  lists all the others. In addition to these man pages, the PCRE documentation is
39  supplied in two other forms; however, as there is no standard place to install  supplied in two other forms; however, as there is no standard place to install
40  them, they are left in the doc directory of the unpacked source distribution.  them, they are left in the doc directory of the unpacked source distribution.
# Line 65  Windows systems (I myself do not use Win Line 65  Windows systems (I myself do not use Win
65  others are pointers to URLs containing relevant files.  others are pointers to URLs containing relevant files.
66    
67    
68    Building on non-Unix systems
69    ----------------------------
70    
71    For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
72    the system supports the use of "configure" and "make" you may be able to build
73    PCRE in the same way as for Unix systems.
74    
75    PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
76    the details because I don't use those systems. It should be straightforward to
77    build PCRE on any system that has a Standard C compiler and library, because it
78    uses only Standard C functions.
79    
80    
81  Building PCRE on a Unix-like system  Building PCRE on a Unix-like system
82  -----------------------------------  -----------------------------------
83    
84    If you are using HP's ANSI C++ compiler (aCC), please see the special note
85    in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
86    
87  To build PCRE on a Unix-like system, first run the "configure" command from the  To build PCRE on a Unix-like system, first run the "configure" command from the
88  PCRE distribution directory, with your current directory set to the directory  PCRE distribution directory, with your current directory set to the directory
89  where you want the files to be created. This command is a standard GNU  where you want the files to be created. This command is a standard GNU
# Line 91  into /source/pcre/pcre-xxx, but you want Line 107  into /source/pcre/pcre-xxx, but you want
107  cd /build/pcre/pcre-xxx  cd /build/pcre/pcre-xxx
108  /source/pcre/pcre-xxx/configure  /source/pcre/pcre-xxx/configure
109    
110    PCRE is written in C and is normally compiled as a C library. However, it is
111    possible to build it as a C++ library, though the provided building apparatus
112    does not have any features to support this.
113    
114  There are some optional features that can be included or omitted from the PCRE  There are some optional features that can be included or omitted from the PCRE
115  library. You can read more about them in the pcrebuild man page.  library. You can read more about them in the pcrebuild man page.
116    
117    . If you want to suppress the building of the C++ wrapper library, you can add
118      --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
119      will try to find a C++ compiler and C++ header files, and if it succeeds, it
120      will try to build the C++ wrapper.
121    
122  . If you want to make use of the support for UTF-8 character strings in PCRE,  . If you want to make use of the support for UTF-8 character strings in PCRE,
123    you must add --enable-utf8 to the "configure" command. Without it, the code    you must add --enable-utf8 to the "configure" command. Without it, the code
124    for handling UTF-8 is not included in the library. (Even when included, it    for handling UTF-8 is not included in the library. (Even when included, it
# Line 102  library. You can read more about them in Line 127  library. You can read more about them in
127  . If, in addition to support for UTF-8 character strings, you want to include  . If, in addition to support for UTF-8 character strings, you want to include
128    support for the \P, \p, and \X sequences that recognize Unicode character    support for the \P, \p, and \X sequences that recognize Unicode character
129    properties, you must add --enable-unicode-properties to the "configure"    properties, you must add --enable-unicode-properties to the "configure"
130    command. This adds about 90K to the size of the library (in the form of a    command. This adds about 30K to the size of the library (in the form of a
131    property table); only the basic two-letter properties such as Lu are    property table); only the basic two-letter properties such as Lu are
132    supported.    supported.
133    
134  . You can build PCRE to recognized CR or NL as the newline character, instead  . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
135    of whatever your compiler uses for "\n", by adding --newline-is-cr or    of the Unicode newline sequences as indicating the end of a line. Whatever
136    --newline-is-nl to the "configure" command, respectively. Only do this if you    you specify at build time is the default; the caller of PCRE can change the
137    really understand what you are doing. On traditional Unix-like systems, the    selection at run time. The default newline indicator is a single LF character
138    newline character is NL.    (the Unix standard). You can specify the default newline indicator by adding
139      --newline-is-cr or --newline-is-lf or --newline-is-crlf or --newline-is-any
140      to the "configure" command, respectively.
141    
142      If you specify --newline-is-cr or --newline-is-crlf, some of the standard
143      tests will fail, because the lines in the test files end with LF. Even if
144      the files are edited to change the line endings, there are likely to be some
145      failures. With --newline-is-any, many tests should succeed, but there may be
146      some failures.
147    
148  . When called via the POSIX interface, PCRE uses malloc() to get additional  . When called via the POSIX interface, PCRE uses malloc() to get additional
149    storage for processing capturing parentheses if there are more than 10 of    storage for processing capturing parentheses if there are more than 10 of
# Line 130  library. You can read more about them in Line 163  library. You can read more about them in
163    pcre_exec() can supply their own value. There is discussion on the pcreapi    pcre_exec() can supply their own value. There is discussion on the pcreapi
164    man page.    man page.
165    
166    . There is a separate counter that limits the depth of recursive function calls
167      during a matching process. This also has a default of ten million, which is
168      essentially "unlimited". You can change the default by setting, for example,
169    
170      --with-match-limit-recursion=500000
171    
172      Recursive function calls use up the runtime stack; running out of stack can
173      cause programs to crash in strange ways. There is a discussion about stack
174      sizes in the pcrestack man page.
175    
176  . The default maximum compiled pattern size is around 64K. You can increase  . The default maximum compiled pattern size is around 64K. You can increase
177    this by adding --with-link-size=3 to the "configure" command. You can    this by adding --with-link-size=3 to the "configure" command. You can
178    increase it even more by setting --with-link-size=4, but this is unlikely    increase it even more by setting --with-link-size=4, but this is unlikely
# Line 153  library. You can read more about them in Line 196  library. You can read more about them in
196    
197  The "configure" script builds eight files for the basic C library:  The "configure" script builds eight files for the basic C library:
198    
 . pcre.h is the header file for C programs that call PCRE  
199  . Makefile is the makefile that builds the library  . Makefile is the makefile that builds the library
200  . config.h contains build-time configuration options for the library  . config.h contains build-time configuration options for the library
201  . pcre-config is a script that shows the settings of "configure" options  . pcre-config is a script that shows the settings of "configure" options
# Line 257  when calling the "configure" command. If Line 299  when calling the "configure" command. If
299  to the values of CC and CFLAGS.  to the values of CC and CFLAGS.
300    
301    
302  Building on non-Unix systems  Using HP's ANSI C++ compiler (aCC)
303  ----------------------------  ----------------------------------
304    
305  For a non-Unix system, read the comments in the file NON-UNIX-USE, though if  Unless C++ support is disabled by specifying the "--disable-cpp" option of the
306  the system supports the use of "configure" and "make" you may be able to build  "configure" script, you *must* include the "-AA" option in the CXXFLAGS
307  PCRE in the same way as for Unix systems.  environment variable in order for the C++ components to compile correctly.
308    
309    Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
310    needed libraries fail to get included when specifying the "-AA" compiler
311    option. If you experience unresolved symbols when linking the C++ programs,
312    use the workaround of specifying the following environment variable prior to
313    running the "configure" script:
314    
315  PCRE has been compiled on Windows systems and on Macintoshes, but I don't know    CXXLDFLAGS="-lstd_v2 -lCsup_v2"
 the details because I don't use those systems. It should be straightforward to  
 build PCRE on any system that has a Standard C compiler, because it uses only  
 Standard C functions.  
316    
317    
318  Testing PCRE  Testing PCRE
# Line 286  NON-UNIX-USE. Line 331  NON-UNIX-USE.
331  The RunTest script runs the pcretest test program (which is documented in its  The RunTest script runs the pcretest test program (which is documented in its
332  own man page) on each of the testinput files (in the testdata directory) in  own man page) on each of the testinput files (in the testdata directory) in
333  turn, and compares the output with the contents of the corresponding testoutput  turn, and compares the output with the contents of the corresponding testoutput
334  file. A file called testtry is used to hold the main output from pcretest  files. A file called testtry is used to hold the main output from pcretest
335  (testsavedregex is also used as a working file). To run pcretest on just one of  (testsavedregex is also used as a working file). To run pcretest on just one of
336  the test files, give its number as an argument to RunTest, for example:  the test files, give its number as an argument to RunTest, for example:
337    
338    RunTest 2    RunTest 2
339    
340  The first file can also be fed directly into the perltest script to check that  The first test file can also be fed directly into the perltest script to check
341  Perl gives the same results. The only difference you should see is in the first  that Perl gives the same results. The only difference you should see is in the
342  few lines, where the Perl version is given instead of the PCRE version.  first few lines, where the Perl version is given instead of the PCRE version.
343    
344  The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),  The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
345  pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error  pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
# Line 403  The distribution should contain the foll Line 448  The distribution should contain the foll
448    pcre_globals.c        )   and some internal functions that they use    pcre_globals.c        )   and some internal functions that they use
449    pcre_info.c           )    pcre_info.c           )
450    pcre_maketables.c     )    pcre_maketables.c     )
451      pcre_newline.c        )
452    pcre_ord2utf8.c       )    pcre_ord2utf8.c       )
453    pcre_printint.c       )    pcre_refcount.c       )
454    pcre_study.c          )    pcre_study.c          )
455    pcre_tables.c         )    pcre_tables.c         )
456    pcre_try_flipped.c    )    pcre_try_flipped.c    )
457    pcre_ucp_findchar.c   )    pcre_ucp_searchfuncs.c)
458    pcre_valid_utf8.c     )    pcre_valid_utf8.c     )
459    pcre_version.c        )    pcre_version.c        )
460    pcre_xclass.c         )    pcre_xclass.c         )
461    
462    ucp_findchar.c        )    pcre_printint.src     ) debugging function that is #included in pcretest, and
463    ucp.h                 ) source for the code that is used for                          )   can also be #included in pcre_compile()
   ucpinternal.h         )   Unicode property handling  
   ucptable.c            )  
   ucptypetable.c        )  
464    
465    pcre.in               "source" for the header for the external API; pcre.h    pcre.h                the public PCRE header file
                           is built from this by "configure"  
466    pcreposix.h           header for the external POSIX wrapper API    pcreposix.h           header for the external POSIX wrapper API
467    pcre_internal.h       header for internal use    pcre_internal.h       header for internal use
468      ucp.h                 ) headers concerned with
469      ucpinternal.h         )   Unicode property handling
470      ucptable.h            ) (this one is the data table)
471    config.in             template for config.h, which is built by configure    config.in             template for config.h, which is built by configure
472    
473    pcrecpp.h.in          "source" for the header file for the C++ wrapper    pcrecpp.h             the header file for the C++ wrapper
474      pcrecpparg.h.in       "source" for another C++ header file
475    pcrecpp.cc            )    pcrecpp.cc            )
476    pcre_scanner.cc       ) source for the C++ wrapper library    pcre_scanner.cc       ) source for the C++ wrapper library
477    
# Line 448  The distribution should contain the foll Line 494  The distribution should contain the foll
494    RunGrepTest.in        template for a Unix shell script for pcregrep tests    RunGrepTest.in        template for a Unix shell script for pcregrep tests
495    config.guess          ) files used by libtool,    config.guess          ) files used by libtool,
496    config.sub            )   used only when building a shared library    config.sub            )   used only when building a shared library
497      config.h.in           "source" for the config.h header file
498    configure             a configuring shell script (built by autoconf)    configure             a configuring shell script (built by autoconf)
499    configure.in          the autoconf input used to build configure    configure.ac          the autoconf input used to build configure
500    doc/Tech.Notes        notes on the encoding    doc/Tech.Notes        notes on the encoding
501    doc/*.3               man page sources for the PCRE functions    doc/*.3               man page sources for the PCRE functions
502    doc/*.1               man page sources for pcregrep and pcretest    doc/*.1               man page sources for pcregrep and pcretest
# Line 463  The distribution should contain the foll Line 510  The distribution should contain the foll
510    mkinstalldirs         script for making install directories    mkinstalldirs         script for making install directories
511    pcretest.c            comprehensive test program    pcretest.c            comprehensive test program
512    pcredemo.c            simple demonstration of coding calls to PCRE    pcredemo.c            simple demonstration of coding calls to PCRE
513    perltest              Perl test program    perltest.pl           Perl test program
514    pcregrep.c            source of a grep utility that uses PCRE    pcregrep.c            source of a grep utility that uses PCRE
515    pcre-config.in        source of script which retains PCRE information    pcre-config.in        source of script which retains PCRE information
516    pcrecpp_unittest.c           )    pcrecpp_unittest.c           )
# Line 477  The distribution should contain the foll Line 524  The distribution should contain the foll
524    
525    libpcre.def    libpcre.def
526    libpcreposix.def    libpcreposix.def
   pcre.def  
527    
528  (D) Auxiliary file for VPASCAL  (D) Auxiliary file for VPASCAL
529    
# Line 486  The distribution should contain the foll Line 532  The distribution should contain the foll
532  Philip Hazel  Philip Hazel
533  Email local part: ph10  Email local part: ph10
534  Email domain: cam.ac.uk  Email domain: cam.ac.uk
535  June 2005  March 2007

Legend:
Removed from v.77  
changed lines
  Added in v.101

  ViewVC Help
Powered by ViewVC 1.1.5