/[pcre]/code/trunk/README
ViewVC logotype

Contents of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1763 - (show annotations)
Wed Feb 12 17:37:05 2020 UTC (7 months, 1 week ago) by ph10
File size: 45484 byte(s)
Final tidies and documentation updates for 8.44.
1 README file for PCRE (Perl-compatible regular expression library)
2 -----------------------------------------------------------------
3
4 NOTE: This set of files relates to PCRE releases that use the original API,
5 with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
6 first release of a new API, known as PCRE2, with release numbers starting at
7 10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
8 libraries (now called PCRE1) are still being maintained for bug fixes, but
9 there will be no new development. New projects are advised to use the new PCRE2
10 libraries.
11
12
13 The latest release of PCRE1 is always available in three alternative formats
14 from:
15
16 https://ftp.pcre.org/pub/pcre/pcre-x.xx.tar.gz
17 https://ftp.pcre.org/pub/pcre/pcre-x.xx.tar.bz2
18 https://ftp.pcre.org/pub/pcre/pcre-x.xx.tar.zip
19
20
21 There is a mailing list for discussion about the development of PCRE at
22 pcre-dev@exim.org. You can access the archives and subscribe or manage your
23 subscription here:
24
25 https://lists.exim.org/mailman/listinfo/pcre-dev
26
27 Please read the NEWS file if you are upgrading from a previous release.
28 The contents of this README file are:
29
30 The PCRE APIs
31 Documentation for PCRE
32 Contributions by users of PCRE
33 Building PCRE on non-Unix-like systems
34 Building PCRE without using autotools
35 Building PCRE using autotools
36 Retrieving configuration information
37 Shared libraries
38 Cross-compiling using autotools
39 Using HP's ANSI C++ compiler (aCC)
40 Compiling in Tru64 using native compilers
41 Using Sun's compilers for Solaris
42 Using PCRE from MySQL
43 Making new tarballs
44 Testing PCRE
45 Character tables
46 File manifest
47
48
49 The PCRE APIs
50 -------------
51
52 PCRE is written in C, and it has its own API. There are three sets of
53 functions, one for the 8-bit library, which processes strings of bytes, one for
54 the 16-bit library, which processes strings of 16-bit values, and one for the
55 32-bit library, which processes strings of 32-bit values. The distribution also
56 includes a set of C++ wrapper functions (see the pcrecpp man page for details),
57 courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
58 C++. Other C++ wrappers have been created from time to time. See, for example:
59 https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
60 style to the C API.
61
62 The distribution also contains a set of C wrapper functions (again, just for
63 the 8-bit library) that are based on the POSIX regular expression API (see the
64 pcreposix man page). These end up in the library called libpcreposix. Note that
65 this just provides a POSIX calling interface to PCRE; the regular expressions
66 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
67 and does not give full access to all of PCRE's facilities.
68
69 The header file for the POSIX-style functions is called pcreposix.h. The
70 official POSIX name is regex.h, but I did not want to risk possible problems
71 with existing files of that name by distributing it that way. To use PCRE with
72 an existing program that uses the POSIX API, pcreposix.h will have to be
73 renamed or pointed at by a link.
74
75 If you are using the POSIX interface to PCRE and there is already a POSIX regex
76 library installed on your system, as well as worrying about the regex.h header
77 file (as mentioned above), you must also take care when linking programs to
78 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
79 up the POSIX functions of the same name from the other library.
80
81 One way of avoiding this confusion is to compile PCRE with the addition of
82 -Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
83 compiler flags (CFLAGS if you are using "configure" -- see below). This has the
84 effect of renaming the functions so that the names no longer clash. Of course,
85 you have to do the same thing for your applications, or write them using the
86 new names.
87
88
89 Documentation for PCRE
90 ----------------------
91
92 If you install PCRE in the normal way on a Unix-like system, you will end up
93 with a set of man pages whose names all start with "pcre". The one that is just
94 called "pcre" lists all the others. In addition to these man pages, the PCRE
95 documentation is supplied in two other forms:
96
97 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
98 doc/pcretest.txt in the source distribution. The first of these is a
99 concatenation of the text forms of all the section 3 man pages except
100 the listing of pcredemo.c and those that summarize individual functions.
101 The other two are the text forms of the section 1 man pages for the
102 pcregrep and pcretest commands. These text forms are provided for ease of
103 scanning with text editors or similar tools. They are installed in
104 <prefix>/share/doc/pcre, where <prefix> is the installation prefix
105 (defaulting to /usr/local).
106
107 2. A set of files containing all the documentation in HTML form, hyperlinked
108 in various ways, and rooted in a file called index.html, is distributed in
109 doc/html and installed in <prefix>/share/doc/pcre/html.
110
111 Users of PCRE have contributed files containing the documentation for various
112 releases in CHM format. These can be found in the Contrib directory of the FTP
113 site (see next section).
114
115
116 Contributions by users of PCRE
117 ------------------------------
118
119 You can find contributions from PCRE users in the directory
120
121 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
122
123 There is a README file giving brief descriptions of what they are. Some are
124 complete in themselves; others are pointers to URLs containing relevant files.
125 Some of this material is likely to be well out-of-date. Several of the earlier
126 contributions provided support for compiling PCRE on various flavours of
127 Windows (I myself do not use Windows). Nowadays there is more Windows support
128 in the standard distribution, so these contibutions have been archived.
129
130 A PCRE user maintains downloadable Windows binaries of the pcregrep and
131 pcretest programs here:
132
133 http://www.rexegg.com/pcregrep-pcretest.html
134
135
136 Building PCRE on non-Unix-like systems
137 --------------------------------------
138
139 For a non-Unix-like system, please read the comments in the file
140 NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
141 "make" you may be able to build PCRE using autotools in the same way as for
142 many Unix-like systems.
143
144 PCRE can also be configured using the GUI facility provided by CMake's
145 cmake-gui command. This creates Makefiles, solution files, etc. The file
146 NON-AUTOTOOLS-BUILD has information about CMake.
147
148 PCRE has been compiled on many different operating systems. It should be
149 straightforward to build PCRE on any system that has a Standard C compiler and
150 library, because it uses only Standard C functions.
151
152
153 Building PCRE without using autotools
154 -------------------------------------
155
156 The use of autotools (in particular, libtool) is problematic in some
157 environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
158 file for ways of building PCRE without using autotools.
159
160
161 Building PCRE using autotools
162 -----------------------------
163
164 If you are using HP's ANSI C++ compiler (aCC), please see the special note
165 in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
166
167 The following instructions assume the use of the widely used "configure; make;
168 make install" (autotools) process.
169
170 To build PCRE on system that supports autotools, first run the "configure"
171 command from the PCRE distribution directory, with your current directory set
172 to the directory where you want the files to be created. This command is a
173 standard GNU "autoconf" configuration script, for which generic instructions
174 are supplied in the file INSTALL.
175
176 Most commonly, people build PCRE within its own distribution directory, and in
177 this case, on many systems, just running "./configure" is sufficient. However,
178 the usual methods of changing standard defaults are available. For example:
179
180 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
181
182 This command specifies that the C compiler should be run with the flags '-O2
183 -Wall' instead of the default, and that "make install" should install PCRE
184 under /opt/local instead of the default /usr/local.
185
186 If you want to build in a different directory, just run "configure" with that
187 directory as current. For example, suppose you have unpacked the PCRE source
188 into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
189
190 cd /build/pcre/pcre-xxx
191 /source/pcre/pcre-xxx/configure
192
193 PCRE is written in C and is normally compiled as a C library. However, it is
194 possible to build it as a C++ library, though the provided building apparatus
195 does not have any features to support this.
196
197 There are some optional features that can be included or omitted from the PCRE
198 library. They are also documented in the pcrebuild man page.
199
200 . By default, both shared and static libraries are built. You can change this
201 by adding one of these options to the "configure" command:
202
203 --disable-shared
204 --disable-static
205
206 (See also "Shared libraries on Unix-like systems" below.)
207
208 . By default, only the 8-bit library is built. If you add --enable-pcre16 to
209 the "configure" command, the 16-bit library is also built. If you add
210 --enable-pcre32 to the "configure" command, the 32-bit library is also built.
211 If you want only the 16-bit or 32-bit library, use --disable-pcre8 to disable
212 building the 8-bit library.
213
214 . If you are building the 8-bit library and want to suppress the building of
215 the C++ wrapper library, you can add --disable-cpp to the "configure"
216 command. Otherwise, when "configure" is run without --disable-pcre8, it will
217 try to find a C++ compiler and C++ header files, and if it succeeds, it will
218 try to build the C++ wrapper.
219
220 . If you want to include support for just-in-time compiling, which can give
221 large performance improvements on certain platforms, add --enable-jit to the
222 "configure" command. This support is available only for certain hardware
223 architectures. If you try to enable it on an unsupported architecture, there
224 will be a compile time error.
225
226 . When JIT support is enabled, pcregrep automatically makes use of it, unless
227 you add --disable-pcregrep-jit to the "configure" command.
228
229 . If you want to make use of the support for UTF-8 Unicode character strings in
230 the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
231 or UTF-32 Unicode character strings in the 32-bit library, you must add
232 --enable-utf to the "configure" command. Without it, the code for handling
233 UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
234 when --enable-utf is included, the use of a UTF encoding still has to be
235 enabled by an option at run time. When PCRE is compiled with this option, its
236 input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
237 platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
238 the same time.
239
240 . There are no separate options for enabling UTF-8, UTF-16 and UTF-32
241 independently because that would allow ridiculous settings such as requesting
242 UTF-16 support while building only the 8-bit library. However, the option
243 --enable-utf8 is retained for backwards compatibility with earlier releases
244 that did not support 16-bit or 32-bit character strings. It is synonymous with
245 --enable-utf. It is not possible to configure one library with UTF support
246 and the other without in the same configuration.
247
248 . If, in addition to support for UTF-8/16/32 character strings, you want to
249 include support for the \P, \p, and \X sequences that recognize Unicode
250 character properties, you must add --enable-unicode-properties to the
251 "configure" command. This adds about 30K to the size of the library (in the
252 form of a property table); only the basic two-letter properties such as Lu
253 are supported.
254
255 . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
256 of the preceding, or any of the Unicode newline sequences as indicating the
257 end of a line. Whatever you specify at build time is the default; the caller
258 of PCRE can change the selection at run time. The default newline indicator
259 is a single LF character (the Unix standard). You can specify the default
260 newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
261 or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
262 --enable-newline-is-any to the "configure" command, respectively.
263
264 If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
265 the standard tests will fail, because the lines in the test files end with
266 LF. Even if the files are edited to change the line endings, there are likely
267 to be some failures. With --enable-newline-is-anycrlf or
268 --enable-newline-is-any, many tests should succeed, but there may be some
269 failures.
270
271 . By default, the sequence \R in a pattern matches any Unicode line ending
272 sequence. This is independent of the option specifying what PCRE considers to
273 be the end of a line (see above). However, the caller of PCRE can restrict \R
274 to match only CR, LF, or CRLF. You can make this the default by adding
275 --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
276
277 . When called via the POSIX interface, PCRE uses malloc() to get additional
278 storage for processing capturing parentheses if there are more than 10 of
279 them in a pattern. You can increase this threshold by setting, for example,
280
281 --with-posix-malloc-threshold=20
282
283 on the "configure" command.
284
285 . PCRE has a counter that limits the depth of nesting of parentheses in a
286 pattern. This limits the amount of system stack that a pattern uses when it
287 is compiled. The default is 250, but you can change it by setting, for
288 example,
289
290 --with-parens-nest-limit=500
291
292 . PCRE has a counter that can be set to limit the amount of resources it uses
293 when matching a pattern. If the limit is exceeded during a match, the match
294 fails. The default is ten million. You can change the default by setting, for
295 example,
296
297 --with-match-limit=500000
298
299 on the "configure" command. This is just the default; individual calls to
300 pcre_exec() can supply their own value. There is more discussion on the
301 pcreapi man page.
302
303 . There is a separate counter that limits the depth of recursive function calls
304 during a matching process. This also has a default of ten million, which is
305 essentially "unlimited". You can change the default by setting, for example,
306
307 --with-match-limit-recursion=500000
308
309 Recursive function calls use up the runtime stack; running out of stack can
310 cause programs to crash in strange ways. There is a discussion about stack
311 sizes in the pcrestack man page.
312
313 . The default maximum compiled pattern size is around 64K. You can increase
314 this by adding --with-link-size=3 to the "configure" command. In the 8-bit
315 library, PCRE then uses three bytes instead of two for offsets to different
316 parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
317 the same as --with-link-size=4, which (in both libraries) uses four-byte
318 offsets. Increasing the internal link size reduces performance. In the 32-bit
319 library, the only supported link size is 4.
320
321 . You can build PCRE so that its internal match() function that is called from
322 pcre_exec() does not call itself recursively. Instead, it uses memory blocks
323 obtained from the heap via the special functions pcre_stack_malloc() and
324 pcre_stack_free() to save data that would otherwise be saved on the stack. To
325 build PCRE like this, use
326
327 --disable-stack-for-recursion
328
329 on the "configure" command. PCRE runs more slowly in this mode, but it may be
330 necessary in environments with limited stack sizes. This applies only to the
331 normal execution of the pcre_exec() function; if JIT support is being
332 successfully used, it is not relevant. Equally, it does not apply to
333 pcre_dfa_exec(), which does not use deeply nested recursion. There is a
334 discussion about stack sizes in the pcrestack man page.
335
336 . For speed, PCRE uses four tables for manipulating and identifying characters
337 whose code point values are less than 256. By default, it uses a set of
338 tables for ASCII encoding that is part of the distribution. If you specify
339
340 --enable-rebuild-chartables
341
342 a program called dftables is compiled and run in the default C locale when
343 you obey "make". It builds a source file called pcre_chartables.c. If you do
344 not specify this option, pcre_chartables.c is created as a copy of
345 pcre_chartables.c.dist. See "Character tables" below for further information.
346
347 . It is possible to compile PCRE for use on systems that use EBCDIC as their
348 character code (as opposed to ASCII/Unicode) by specifying
349
350 --enable-ebcdic
351
352 This automatically implies --enable-rebuild-chartables (see above). However,
353 when PCRE is built this way, it always operates in EBCDIC. It cannot support
354 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
355 which specifies that the code value for the EBCDIC NL character is 0x25
356 instead of the default 0x15.
357
358 . In environments where valgrind is installed, if you specify
359
360 --enable-valgrind
361
362 PCRE will use valgrind annotations to mark certain memory regions as
363 unaddressable. This allows it to detect invalid memory accesses, and is
364 mostly useful for debugging PCRE itself.
365
366 . In environments where the gcc compiler is used and lcov version 1.6 or above
367 is installed, if you specify
368
369 --enable-coverage
370
371 the build process implements a code coverage report for the test suite. The
372 report is generated by running "make coverage". If ccache is installed on
373 your system, it must be disabled when building PCRE for coverage reporting.
374 You can do this by setting the environment variable CCACHE_DISABLE=1 before
375 running "make" to build PCRE. There is more information about coverage
376 reporting in the "pcrebuild" documentation.
377
378 . The pcregrep program currently supports only 8-bit data files, and so
379 requires the 8-bit PCRE library. It is possible to compile pcregrep to use
380 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
381 specifying one or both of
382
383 --enable-pcregrep-libz
384 --enable-pcregrep-libbz2
385
386 Of course, the relevant libraries must be installed on your system.
387
388 . The default size (in bytes) of the internal buffer used by pcregrep can be
389 set by, for example:
390
391 --with-pcregrep-bufsize=51200
392
393 The value must be a plain integer. The default is 20480.
394
395 . It is possible to compile pcretest so that it links with the libreadline
396 or libedit libraries, by specifying, respectively,
397
398 --enable-pcretest-libreadline or --enable-pcretest-libedit
399
400 If this is done, when pcretest's input is from a terminal, it reads it using
401 the readline() function. This provides line-editing and history facilities.
402 Note that libreadline is GPL-licenced, so if you distribute a binary of
403 pcretest linked in this way, there may be licensing issues. These can be
404 avoided by linking with libedit (which has a BSD licence) instead.
405
406 Enabling libreadline causes the -lreadline option to be added to the pcretest
407 build. In many operating environments with a sytem-installed readline
408 library this is sufficient. However, in some environments (e.g. if an
409 unmodified distribution version of readline is in use), it may be necessary
410 to specify something like LIBS="-lncurses" as well. This is because, to quote
411 the readline INSTALL, "Readline uses the termcap functions, but does not link
412 with the termcap or curses library itself, allowing applications which link
413 with readline the to choose an appropriate library." If you get error
414 messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
415 this is the problem, and linking with the ncurses library should fix it.
416
417 The "configure" script builds the following files for the basic C library:
418
419 . Makefile the makefile that builds the library
420 . config.h build-time configuration options for the library
421 . pcre.h the public PCRE header file
422 . pcre-config script that shows the building settings such as CFLAGS
423 that were set for "configure"
424 . libpcre.pc ) data for the pkg-config command
425 . libpcre16.pc )
426 . libpcre32.pc )
427 . libpcreposix.pc )
428 . libtool script that builds shared and/or static libraries
429
430 Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
431 names config.h.generic and pcre.h.generic. These are provided for those who
432 have to built PCRE without using "configure" or CMake. If you use "configure"
433 or CMake, the .generic versions are not used.
434
435 When building the 8-bit library, if a C++ compiler is found, the following
436 files are also built:
437
438 . libpcrecpp.pc data for the pkg-config command
439 . pcrecpparg.h header file for calling PCRE via the C++ wrapper
440 . pcre_stringpiece.h header for the C++ "stringpiece" functions
441
442 The "configure" script also creates config.status, which is an executable
443 script that can be run to recreate the configuration, and config.log, which
444 contains compiler output from tests that "configure" runs.
445
446 Once "configure" has run, you can run "make". This builds the the libraries
447 libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
448 enabled JIT support with --enable-jit, a test program called pcre_jit_test is
449 built as well.
450
451 If the 8-bit library is built, libpcreposix and the pcregrep command are also
452 built, and if a C++ compiler was found on your system, and you did not disable
453 it with --disable-cpp, "make" builds the C++ wrapper library, which is called
454 libpcrecpp, as well as some test programs called pcrecpp_unittest,
455 pcre_scanner_unittest, and pcre_stringpiece_unittest.
456
457 The command "make check" runs all the appropriate tests. Details of the PCRE
458 tests are given below in a separate section of this document.
459
460 You can use "make install" to install PCRE into live directories on your
461 system. The following are installed (file names are all relative to the
462 <prefix> that is set when "configure" is run):
463
464 Commands (bin):
465 pcretest
466 pcregrep (if 8-bit support is enabled)
467 pcre-config
468
469 Libraries (lib):
470 libpcre16 (if 16-bit support is enabled)
471 libpcre32 (if 32-bit support is enabled)
472 libpcre (if 8-bit support is enabled)
473 libpcreposix (if 8-bit support is enabled)
474 libpcrecpp (if 8-bit and C++ support is enabled)
475
476 Configuration information (lib/pkgconfig):
477 libpcre16.pc
478 libpcre32.pc
479 libpcre.pc
480 libpcreposix.pc
481 libpcrecpp.pc (if C++ support is enabled)
482
483 Header files (include):
484 pcre.h
485 pcreposix.h
486 pcre_scanner.h )
487 pcre_stringpiece.h ) if C++ support is enabled
488 pcrecpp.h )
489 pcrecpparg.h )
490
491 Man pages (share/man/man{1,3}):
492 pcregrep.1
493 pcretest.1
494 pcre-config.1
495 pcre.3
496 pcre*.3 (lots more pages, all starting "pcre")
497
498 HTML documentation (share/doc/pcre/html):
499 index.html
500 *.html (lots more pages, hyperlinked from index.html)
501
502 Text file documentation (share/doc/pcre):
503 AUTHORS
504 COPYING
505 ChangeLog
506 LICENCE
507 NEWS
508 README
509 pcre.txt (a concatenation of the man(3) pages)
510 pcretest.txt the pcretest man page
511 pcregrep.txt the pcregrep man page
512 pcre-config.txt the pcre-config man page
513
514 If you want to remove PCRE from your system, you can run "make uninstall".
515 This removes all the files that "make install" installed. However, it does not
516 remove any directories, because these are often shared with other programs.
517
518
519 Retrieving configuration information
520 ------------------------------------
521
522 Running "make install" installs the command pcre-config, which can be used to
523 recall information about the PCRE configuration and installation. For example:
524
525 pcre-config --version
526
527 prints the version number, and
528
529 pcre-config --libs
530
531 outputs information about where the library is installed. This command can be
532 included in makefiles for programs that use PCRE, saving the programmer from
533 having to remember too many details.
534
535 The pkg-config command is another system for saving and retrieving information
536 about installed libraries. Instead of separate commands for each library, a
537 single command is used. For example:
538
539 pkg-config --cflags pcre
540
541 The data is held in *.pc files that are installed in a directory called
542 <prefix>/lib/pkgconfig.
543
544
545 Shared libraries
546 ----------------
547
548 The default distribution builds PCRE as shared libraries and static libraries,
549 as long as the operating system supports shared libraries. Shared library
550 support relies on the "libtool" script which is built as part of the
551 "configure" process.
552
553 The libtool script is used to compile and link both shared and static
554 libraries. They are placed in a subdirectory called .libs when they are newly
555 built. The programs pcretest and pcregrep are built to use these uninstalled
556 libraries (by means of wrapper scripts in the case of shared libraries). When
557 you use "make install" to install shared libraries, pcregrep and pcretest are
558 automatically re-built to use the newly installed shared libraries before being
559 installed themselves. However, the versions left in the build directory still
560 use the uninstalled libraries.
561
562 To build PCRE using static libraries only you must use --disable-shared when
563 configuring it. For example:
564
565 ./configure --prefix=/usr/gnu --disable-shared
566
567 Then run "make" in the usual way. Similarly, you can use --disable-static to
568 build only shared libraries.
569
570
571 Cross-compiling using autotools
572 -------------------------------
573
574 You can specify CC and CFLAGS in the normal way to the "configure" command, in
575 order to cross-compile PCRE for some other host. However, you should NOT
576 specify --enable-rebuild-chartables, because if you do, the dftables.c source
577 file is compiled and run on the local host, in order to generate the inbuilt
578 character tables (the pcre_chartables.c file). This will probably not work,
579 because dftables.c needs to be compiled with the local compiler, not the cross
580 compiler.
581
582 When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
583 by making a copy of pcre_chartables.c.dist, which is a default set of tables
584 that assumes ASCII code. Cross-compiling with the default tables should not be
585 a problem.
586
587 If you need to modify the character tables when cross-compiling, you should
588 move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
589 run it on the local host to make a new version of pcre_chartables.c.dist.
590 Then when you cross-compile PCRE this new version of the tables will be used.
591
592
593 Using HP's ANSI C++ compiler (aCC)
594 ----------------------------------
595
596 Unless C++ support is disabled by specifying the "--disable-cpp" option of the
597 "configure" script, you must include the "-AA" option in the CXXFLAGS
598 environment variable in order for the C++ components to compile correctly.
599
600 Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
601 needed libraries fail to get included when specifying the "-AA" compiler
602 option. If you experience unresolved symbols when linking the C++ programs,
603 use the workaround of specifying the following environment variable prior to
604 running the "configure" script:
605
606 CXXLDFLAGS="-lstd_v2 -lCsup_v2"
607
608
609 Compiling in Tru64 using native compilers
610 -----------------------------------------
611
612 The following error may occur when compiling with native compilers in the Tru64
613 operating system:
614
615 CXX libpcrecpp_la-pcrecpp.lo
616 cxx: Error: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/iosfwd, line 58: #error
617 directive: "cannot include iosfwd -- define __USE_STD_IOSTREAM to
618 override default - see section 7.1.2 of the C++ Using Guide"
619 #error "cannot include iosfwd -- define __USE_STD_IOSTREAM to override default
620 - see section 7.1.2 of the C++ Using Guide"
621
622 This may be followed by other errors, complaining that 'namespace "std" has no
623 member'. The solution to this is to add the line
624
625 #define __USE_STD_IOSTREAM 1
626
627 to the config.h file.
628
629
630 Using Sun's compilers for Solaris
631 ---------------------------------
632
633 A user reports that the following configurations work on Solaris 9 sparcv9 and
634 Solaris 9 x86 (32-bit):
635
636 Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
637 Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
638
639
640 Using PCRE from MySQL
641 ---------------------
642
643 On systems where both PCRE and MySQL are installed, it is possible to make use
644 of PCRE from within MySQL, as an alternative to the built-in pattern matching.
645 There is a web page that tells you how to do this:
646
647 http://www.mysqludf.org/lib_mysqludf_preg/index.php
648
649
650 Making new tarballs
651 -------------------
652
653 The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
654 zip formats. The command "make distcheck" does the same, but then does a trial
655 build of the new distribution to ensure that it works.
656
657 If you have modified any of the man page sources in the doc directory, you
658 should first run the PrepareRelease script before making a distribution. This
659 script creates the .txt and HTML forms of the documentation from the man pages.
660
661
662 Testing PCRE
663 ------------
664
665 To test the basic PCRE library on a Unix-like system, run the RunTest script.
666 There is another script called RunGrepTest that tests the options of the
667 pcregrep command. If the C++ wrapper library is built, three test programs
668 called pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest
669 are also built. When JIT support is enabled, another test program called
670 pcre_jit_test is built.
671
672 Both the scripts and all the program tests are run if you obey "make check" or
673 "make test". For other environments, see the instructions in
674 NON-AUTOTOOLS-BUILD.
675
676 The RunTest script runs the pcretest test program (which is documented in its
677 own man page) on each of the relevant testinput files in the testdata
678 directory, and compares the output with the contents of the corresponding
679 testoutput files. RunTest uses a file called testtry to hold the main output
680 from pcretest. Other files whose names begin with "test" are used as working
681 files in some tests.
682
683 Some tests are relevant only when certain build-time options were selected. For
684 example, the tests for UTF-8/16/32 support are run only if --enable-utf was
685 used. RunTest outputs a comment when it skips a test.
686
687 Many of the tests that are not skipped are run up to three times. The second
688 run forces pcre_study() to be called for all patterns except for a few in some
689 tests that are marked "never study" (see the pcretest program for how this is
690 done). If JIT support is available, the non-DFA tests are run a third time,
691 this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
692 This testing can be suppressed by putting "nojit" on the RunTest command line.
693
694 The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
695 libraries that are enabled. If you want to run just one set of tests, call
696 RunTest with either the -8, -16 or -32 option.
697
698 If valgrind is installed, you can run the tests under it by putting "valgrind"
699 on the RunTest command line. To run pcretest on just one or more specific test
700 files, give their numbers as arguments to RunTest, for example:
701
702 RunTest 2 7 11
703
704 You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
705 end), or a number preceded by ~ to exclude a test. For example:
706
707 Runtest 3-15 ~10
708
709 This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
710 except test 13. Whatever order the arguments are in, the tests are always run
711 in numerical order.
712
713 You can also call RunTest with the single argument "list" to cause it to output
714 a list of tests.
715
716 The first test file can be fed directly into the perltest.pl script to check
717 that Perl gives the same results. The only difference you should see is in the
718 first few lines, where the Perl version is given instead of the PCRE version.
719
720 The second set of tests check pcre_fullinfo(), pcre_study(),
721 pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
722 detection, and run-time flags that are specific to PCRE, as well as the POSIX
723 wrapper API. It also uses the debugging flags to check some of the internals of
724 pcre_compile().
725
726 If you build PCRE with a locale setting that is not the standard C locale, the
727 character tables may be different (see next paragraph). In some cases, this may
728 cause failures in the second set of tests. For example, in a locale where the
729 isprint() function yields TRUE for characters in the range 128-255, the use of
730 [:isascii:] inside a character class defines a different set of characters, and
731 this shows up in this test as a difference in the compiled code, which is being
732 listed for checking. Where the comparison test output contains [\x00-\x7f] the
733 test will contain [\x00-\xff], and similarly in some other cases. This is not a
734 bug in PCRE.
735
736 The third set of tests checks pcre_maketables(), the facility for building a
737 set of character tables for a specific locale and using them instead of the
738 default tables. The tests make use of the "fr_FR" (French) locale. Before
739 running the test, the script checks for the presence of this locale by running
740 the "locale" command. If that command fails, or if it doesn't include "fr_FR"
741 in the list of available locales, the third test cannot be run, and a comment
742 is output to say why. If running this test produces instances of the error
743
744 ** Failed to set locale "fr_FR"
745
746 in the comparison output, it means that locale is not available on your system,
747 despite being listed by "locale". This does not mean that PCRE is broken.
748
749 [If you are trying to run this test on Windows, you may be able to get it to
750 work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
751 RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
752 Windows versions of test 2. More info on using RunTest.bat is included in the
753 document entitled NON-UNIX-USE.]
754
755 The fourth and fifth tests check the UTF-8/16/32 support and error handling and
756 internal UTF features of PCRE that are not relevant to Perl, respectively. The
757 sixth and seventh tests do the same for Unicode character properties support.
758
759 The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
760 matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
761 mode with Unicode property support, respectively.
762
763 The eleventh test checks some internal offsets and code size features; it is
764 run only when the default "link size" of 2 is set (in other cases the sizes
765 change) and when Unicode property support is enabled.
766
767 The twelfth test is run only when JIT support is available, and the thirteenth
768 test is run only when JIT support is not available. They test some JIT-specific
769 features such as information output from pcretest about JIT compilation.
770
771 The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
772 the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit
773 mode. These are tests that generate different output in the two modes. They are
774 for general cases, UTF-8/16/32 support, and Unicode property support,
775 respectively.
776
777 The twentieth test is run only in 16/32-bit mode. It tests some specific
778 16/32-bit features of the DFA matching engine.
779
780 The twenty-first and twenty-second tests are run only in 16/32-bit mode, when
781 the link size is set to 2 for the 16-bit library. They test reloading
782 pre-compiled patterns.
783
784 The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are
785 for general cases, and UTF-16 support, respectively.
786
787 The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are
788 for general cases, and UTF-32 support, respectively.
789
790
791 Character tables
792 ----------------
793
794 For speed, PCRE uses four tables for manipulating and identifying characters
795 whose code point values are less than 256. The final argument of the
796 pcre_compile() function is a pointer to a block of memory containing the
797 concatenated tables. A call to pcre_maketables() can be used to generate a set
798 of tables in the current locale. If the final argument for pcre_compile() is
799 passed as NULL, a set of default tables that is built into the binary is used.
800
801 The source file called pcre_chartables.c contains the default set of tables. By
802 default, this is created as a copy of pcre_chartables.c.dist, which contains
803 tables for ASCII coding. However, if --enable-rebuild-chartables is specified
804 for ./configure, a different version of pcre_chartables.c is built by the
805 program dftables (compiled from dftables.c), which uses the ANSI C character
806 handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
807 build the table sources. This means that the default C locale which is set for
808 your system will control the contents of these default tables. You can change
809 the default tables by editing pcre_chartables.c and then re-building PCRE. If
810 you do this, you should take care to ensure that the file does not get
811 automatically re-generated. The best way to do this is to move
812 pcre_chartables.c.dist out of the way and replace it with your customized
813 tables.
814
815 When the dftables program is run as a result of --enable-rebuild-chartables,
816 it uses the default C locale that is set on your system. It does not pay
817 attention to the LC_xxx environment variables. In other words, it uses the
818 system's default locale rather than whatever the compiling user happens to have
819 set. If you really do want to build a source set of character tables in a
820 locale that is specified by the LC_xxx variables, you can run the dftables
821 program by hand with the -L option. For example:
822
823 ./dftables -L pcre_chartables.c.special
824
825 The first two 256-byte tables provide lower casing and case flipping functions,
826 respectively. The next table consists of three 32-byte bit maps which identify
827 digits, "word" characters, and white space, respectively. These are used when
828 building 32-byte bit maps that represent character classes for code points less
829 than 256.
830
831 The final 256-byte table has bits indicating various character types, as
832 follows:
833
834 1 white space character
835 2 letter
836 4 decimal digit
837 8 hexadecimal digit
838 16 alphanumeric or '_'
839 128 regular expression metacharacter or binary zero
840
841 You should not alter the set of characters that contain the 128 bit, as that
842 will cause PCRE to malfunction.
843
844
845 File manifest
846 -------------
847
848 The distribution should contain the files listed below. Where a file name is
849 given as pcre[16|32]_xxx it means that there are three files, one with the name
850 pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
851
852 (A) Source files of the PCRE library functions and their headers:
853
854 dftables.c auxiliary program for building pcre_chartables.c
855 when --enable-rebuild-chartables is specified
856
857 pcre_chartables.c.dist a default set of character tables that assume ASCII
858 coding; used, unless --enable-rebuild-chartables is
859 specified, by copying to pcre[16]_chartables.c
860
861 pcreposix.c )
862 pcre[16|32]_byte_order.c )
863 pcre[16|32]_compile.c )
864 pcre[16|32]_config.c )
865 pcre[16|32]_dfa_exec.c )
866 pcre[16|32]_exec.c )
867 pcre[16|32]_fullinfo.c )
868 pcre[16|32]_get.c ) sources for the functions in the library,
869 pcre[16|32]_globals.c ) and some internal functions that they use
870 pcre[16|32]_jit_compile.c )
871 pcre[16|32]_maketables.c )
872 pcre[16|32]_newline.c )
873 pcre[16|32]_refcount.c )
874 pcre[16|32]_string_utils.c )
875 pcre[16|32]_study.c )
876 pcre[16|32]_tables.c )
877 pcre[16|32]_ucd.c )
878 pcre[16|32]_version.c )
879 pcre[16|32]_xclass.c )
880 pcre_ord2utf8.c )
881 pcre_valid_utf8.c )
882 pcre16_ord2utf16.c )
883 pcre16_utf16_utils.c )
884 pcre16_valid_utf16.c )
885 pcre32_utf32_utils.c )
886 pcre32_valid_utf32.c )
887
888 pcre[16|32]_printint.c ) debugging function that is used by pcretest,
889 ) and can also be #included in pcre_compile()
890
891 pcre.h.in template for pcre.h when built by "configure"
892 pcreposix.h header for the external POSIX wrapper API
893 pcre_internal.h header for internal use
894 sljit/* 16 files that make up the JIT compiler
895 ucp.h header for Unicode property handling
896
897 config.h.in template for config.h, which is built by "configure"
898
899 pcrecpp.h public header file for the C++ wrapper
900 pcrecpparg.h.in template for another C++ header file
901 pcre_scanner.h public header file for C++ scanner functions
902 pcrecpp.cc )
903 pcre_scanner.cc ) source for the C++ wrapper library
904
905 pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the
906 C++ stringpiece functions
907 pcre_stringpiece.cc source for the C++ stringpiece functions
908
909 (B) Source files for programs that use PCRE:
910
911 pcredemo.c simple demonstration of coding calls to PCRE
912 pcregrep.c source of a grep utility that uses PCRE
913 pcretest.c comprehensive test program
914
915 (C) Auxiliary files:
916
917 132html script to turn "man" pages into HTML
918 AUTHORS information about the author of PCRE
919 ChangeLog log of changes to the code
920 CleanTxt script to clean nroff output for txt man pages
921 Detrail script to remove trailing spaces
922 HACKING some notes about the internals of PCRE
923 INSTALL generic installation instructions
924 LICENCE conditions for the use of PCRE
925 COPYING the same, using GNU's standard name
926 Makefile.in ) template for Unix Makefile, which is built by
927 ) "configure"
928 Makefile.am ) the automake input that was used to create
929 ) Makefile.in
930 NEWS important changes in this release
931 NON-UNIX-USE the previous name for NON-AUTOTOOLS-BUILD
932 NON-AUTOTOOLS-BUILD notes on building PCRE without using autotools
933 PrepareRelease script to make preparations for "make dist"
934 README this file
935 RunTest a Unix shell script for running tests
936 RunGrepTest a Unix shell script for pcregrep tests
937 aclocal.m4 m4 macros (generated by "aclocal")
938 config.guess ) files used by libtool,
939 config.sub ) used only when building a shared library
940 configure a configuring shell script (built by autoconf)
941 configure.ac ) the autoconf input that was used to build
942 ) "configure" and config.h
943 depcomp ) script to find program dependencies, generated by
944 ) automake
945 doc/*.3 man page sources for PCRE
946 doc/*.1 man page sources for pcregrep and pcretest
947 doc/index.html.src the base HTML page
948 doc/html/* HTML documentation
949 doc/pcre.txt plain text version of the man pages
950 doc/pcretest.txt plain text documentation of test program
951 doc/perltest.txt plain text documentation of Perl test program
952 install-sh a shell script for installing files
953 libpcre16.pc.in template for libpcre16.pc for pkg-config
954 libpcre32.pc.in template for libpcre32.pc for pkg-config
955 libpcre.pc.in template for libpcre.pc for pkg-config
956 libpcreposix.pc.in template for libpcreposix.pc for pkg-config
957 libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
958 ltmain.sh file used to build a libtool script
959 missing ) common stub for a few missing GNU programs while
960 ) installing, generated by automake
961 mkinstalldirs script for making install directories
962 perltest.pl Perl test program
963 pcre-config.in source of script which retains PCRE information
964 pcre_jit_test.c test program for the JIT compiler
965 pcrecpp_unittest.cc )
966 pcre_scanner_unittest.cc ) test programs for the C++ wrapper
967 pcre_stringpiece_unittest.cc )
968 testdata/testinput* test data for main library tests
969 testdata/testoutput* expected test results
970 testdata/grep* input and output for pcregrep tests
971 testdata/* other supporting test files
972
973 (D) Auxiliary files for cmake support
974
975 cmake/COPYING-CMAKE-SCRIPTS
976 cmake/FindPackageHandleStandardArgs.cmake
977 cmake/FindEditline.cmake
978 cmake/FindReadline.cmake
979 CMakeLists.txt
980 config-cmake.h.in
981
982 (E) Auxiliary files for VPASCAL
983
984 makevp.bat
985 makevp_c.txt
986 makevp_l.txt
987 pcregexp.pas
988
989 (F) Auxiliary files for building PCRE "by hand"
990
991 pcre.h.generic ) a version of the public PCRE header file
992 ) for use in non-"configure" environments
993 config.h.generic ) a version of config.h for use in non-"configure"
994 ) environments
995
996 (F) Miscellaneous
997
998 RunTest.bat a script for running tests under Windows
999
1000 Philip Hazel
1001 Email local part: ph10
1002 Email domain: cam.ac.uk
1003 Last updated: 12 February 2020

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

  ViewVC Help
Powered by ViewVC 1.1.5