30 |
The PCRE APIs |
The PCRE APIs |
31 |
------------- |
------------- |
32 |
|
|
33 |
PCRE is written in C, and it has its own API. The distribution now includes a |
PCRE is written in C, and it has its own API. The distribution also includes a |
34 |
set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page |
set of C++ wrapper functions (see the pcrecpp man page for details), courtesy |
35 |
for details). |
of Google Inc. |
36 |
|
|
37 |
Also included in the distribution are a set of C wrapper functions that are |
In addition, there is a set of C wrapper functions that are based on the POSIX |
38 |
based on the POSIX API. These end up in the library called libpcreposix. Note |
regular expression API (see the pcreposix man page). These end up in the |
39 |
that this just provides a POSIX calling interface to PCRE; the regular |
library called libpcreposix. Note that this just provides a POSIX calling |
40 |
expressions themselves still follow Perl syntax and semantics. The POSIX API is |
interface to PCRE; the regular expressions themselves still follow Perl syntax |
41 |
restricted, and does not give full access to all of PCRE's facilities. |
and semantics. The POSIX API is restricted, and does not give full access to |
42 |
|
all of PCRE's facilities. |
43 |
|
|
44 |
The header file for the POSIX-style functions is called pcreposix.h. The |
The header file for the POSIX-style functions is called pcreposix.h. The |
45 |
official POSIX name is regex.h, but I did not want to risk possible problems |
official POSIX name is regex.h, but I did not want to risk possible problems |
92 |
|
|
93 |
There is a README file giving brief descriptions of what they are. Some are |
There is a README file giving brief descriptions of what they are. Some are |
94 |
complete in themselves; others are pointers to URLs containing relevant files. |
complete in themselves; others are pointers to URLs containing relevant files. |
95 |
Some of this material is likely to be well out-of-date. In particular, several |
Some of this material is likely to be well out-of-date. Several of the earlier |
96 |
of the contributions provide support for compiling PCRE on various flavours of |
contributions provided support for compiling PCRE on various flavours of |
97 |
Windows (I myself do not use Windows), but nowadays there is more Windows |
Windows (I myself do not use Windows). Nowadays there is more Windows support |
98 |
support in the standard distribution. |
in the standard distribution, so these contibutions have been archived. |
99 |
|
|
100 |
|
|
101 |
Building PCRE on non-Unix systems |
Building PCRE on non-Unix systems |
148 |
|
|
149 |
. If you want to suppress the building of the C++ wrapper library, you can add |
. If you want to suppress the building of the C++ wrapper library, you can add |
150 |
--disable-cpp to the "configure" command. Otherwise, when "configure" is run, |
--disable-cpp to the "configure" command. Otherwise, when "configure" is run, |
151 |
will try to find a C++ compiler and C++ header files, and if it succeeds, it |
it will try to find a C++ compiler and C++ header files, and if it succeeds, |
152 |
will try to build the C++ wrapper. |
it will try to build the C++ wrapper. |
153 |
|
|
154 |
. If you want to make use of the support for UTF-8 character strings in PCRE, |
. If you want to make use of the support for UTF-8 character strings in PCRE, |
155 |
you must add --enable-utf8 to the "configure" command. Without it, the code |
you must add --enable-utf8 to the "configure" command. Without it, the code |
179 |
|
|
180 |
. When called via the POSIX interface, PCRE uses malloc() to get additional |
. When called via the POSIX interface, PCRE uses malloc() to get additional |
181 |
storage for processing capturing parentheses if there are more than 10 of |
storage for processing capturing parentheses if there are more than 10 of |
182 |
them. You can increase this threshold by setting, for example, |
them in a pattern. You can increase this threshold by setting, for example, |
183 |
|
|
184 |
--with-posix-malloc-threshold=20 |
--with-posix-malloc-threshold=20 |
185 |
|
|
208 |
. The default maximum compiled pattern size is around 64K. You can increase |
. The default maximum compiled pattern size is around 64K. You can increase |
209 |
this by adding --with-link-size=3 to the "configure" command. You can |
this by adding --with-link-size=3 to the "configure" command. You can |
210 |
increase it even more by setting --with-link-size=4, but this is unlikely |
increase it even more by setting --with-link-size=4, but this is unlikely |
211 |
ever to be necessary. |
ever to be necessary. Increasing the internal link size will reduce |
212 |
|
performance. |
213 |
|
|
214 |
. You can build PCRE so that its internal match() function that is called from |
. You can build PCRE so that its internal match() function that is called from |
215 |
pcre_exec() does not call itself recursively. Instead, it uses memory blocks |
pcre_exec() does not call itself recursively. Instead, it uses memory blocks |
225 |
use deeply nested recursion. There is a discussion about stack sizes in the |
use deeply nested recursion. There is a discussion about stack sizes in the |
226 |
pcrestack man page. |
pcrestack man page. |
227 |
|
|
228 |
|
. For speed, PCRE uses four tables for manipulating and identifying characters |
229 |
|
whose code point values are less than 256. By default, it uses a set of |
230 |
|
tables for ASCII encoding that is part of the distribution. If you specify |
231 |
|
|
232 |
|
--enable-rebuild-chartables |
233 |
|
|
234 |
|
a program called dftables is compiled and run in the default C locale when |
235 |
|
you obey "make". It builds a source file called pcre_chartables.c. If you do |
236 |
|
not specify this option, pcre_chartables.c is created as a copy of |
237 |
|
pcre_chartables.c.dist. See "Character tables" below for further information. |
238 |
|
|
239 |
|
. It is possible to compile PCRE for use on systems that use EBCDIC as their |
240 |
|
default character code (as opposed to ASCII) by specifying |
241 |
|
|
242 |
|
--enable-ebcdic |
243 |
|
|
244 |
|
This automatically implies --enable-rebuild-chartables (see above). |
245 |
|
|
246 |
The "configure" script builds the following files for the basic C library: |
The "configure" script builds the following files for the basic C library: |
247 |
|
|
248 |
. Makefile is the makefile that builds the library |
. Makefile is the makefile that builds the library |
391 |
------------------------------------ |
------------------------------------ |
392 |
|
|
393 |
You can specify CC and CFLAGS in the normal way to the "configure" command, in |
You can specify CC and CFLAGS in the normal way to the "configure" command, in |
394 |
order to cross-compile PCRE for some other host. However, during the building |
order to cross-compile PCRE for some other host. However, you should NOT |
395 |
process, the dftables.c source file is compiled *and run* on the local host, in |
specify --enable-rebuild-chartables, because if you do, the dftables.c source |
396 |
order to generate the default character tables (the chartables.c file). It |
file is compiled and run on the local host, in order to generate the inbuilt |
397 |
therefore needs to be compiled with the local compiler, not the cross compiler. |
character tables (the pcre_chartables.c file). This will probably not work, |
398 |
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD; |
because dftables.c needs to be compiled with the local compiler, not the cross |
399 |
there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper) |
compiler. |
400 |
when calling the "configure" command. If they are not specified, they default |
|
401 |
to the values of CC and CFLAGS. |
When --enable-rebuild-chartables is not specified, pcre_chartables.c is created |
402 |
|
by making a copy of pcre_chartables.c.dist, which is a default set of tables |
403 |
|
that assumes ASCII code. Cross-compiling with the default tables should not be |
404 |
|
a problem. |
405 |
|
|
406 |
|
If you need to modify the character tables when cross-compiling, you should |
407 |
|
move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and |
408 |
|
run it on the local host to make a new version of pcre_chartables.c.dist. |
409 |
|
Then when you cross-compile PCRE this new version of the tables will be used. |
410 |
|
|
411 |
|
|
412 |
Using HP's ANSI C++ compiler (aCC) |
Using HP's ANSI C++ compiler (aCC) |
518 |
of tables in the current locale. If the final argument for pcre_compile() is |
of tables in the current locale. If the final argument for pcre_compile() is |
519 |
passed as NULL, a set of default tables that is built into the binary is used. |
passed as NULL, a set of default tables that is built into the binary is used. |
520 |
|
|
521 |
The source file called chartables.c contains the default set of tables. This is |
The source file called pcre_chartables.c contains the default set of tables. By |
522 |
not supplied in the distribution, but is built by the program dftables |
default, this is created as a copy of pcre_chartables.c.dist, which contains |
523 |
(compiled from dftables.c), which uses the ANSI C character handling functions |
tables for ASCII coding. However, if --enable-rebuild-chartables is specified |
524 |
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table |
for ./configure, a different version of pcre_chartables.c is built by the |
525 |
sources. This means that the default C locale which is set for your system will |
program dftables (compiled from dftables.c), which uses the ANSI C character |
526 |
control the contents of these default tables. You can change the default tables |
handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to |
527 |
by editing chartables.c and then re-building PCRE. If you do this, you should |
build the table sources. This means that the default C locale which is set for |
528 |
take care to ensure that the file does not get automaticaly re-generated. |
your system will control the contents of these default tables. You can change |
529 |
|
the default tables by editing pcre_chartables.c and then re-building PCRE. If |
530 |
|
you do this, you should take care to ensure that the file does not get |
531 |
|
automatically re-generated. The best way to do this is to move |
532 |
|
pcre_chartables.c.dist out of the way and replace it with your customized |
533 |
|
tables. |
534 |
|
|
535 |
|
When the dftables program is run as a result of --enable-rebuild-chartables, |
536 |
|
it uses the default C locale that is set on your system. It does not pay |
537 |
|
attention to the LC_xxx environment variables. In other words, it uses the |
538 |
|
system's default locale rather than whatever the compiling user happens to have |
539 |
|
set. If you really do want to build a source set of character tables in a |
540 |
|
locale that is specified by the LC_xxx variables, you can run the dftables |
541 |
|
program by hand with the -L option. For example: |
542 |
|
|
543 |
|
./dftables -L pcre_chartables.c.special |
544 |
|
|
545 |
The first two 256-byte tables provide lower casing and case flipping functions, |
The first two 256-byte tables provide lower casing and case flipping functions, |
546 |
respectively. The next table consists of three 32-byte bit maps which identify |
respectively. The next table consists of three 32-byte bit maps which identify |
569 |
|
|
570 |
(A) Source files of the PCRE library functions and their headers: |
(A) Source files of the PCRE library functions and their headers: |
571 |
|
|
572 |
dftables.c auxiliary program for building chartables.c |
dftables.c auxiliary program for building pcre_chartables.c |
573 |
|
when --enable-rebuild-chartables is specified |
574 |
|
|
575 |
pcreposix.c ) |
pcre_chartables.c.dist a default set of character tables that assume ASCII |
576 |
pcre_compile.c ) |
coding; used, unless --enable-rebuild-chartables is |
577 |
pcre_config.c ) |
specified, by copying to pcre_chartables.c |
578 |
pcre_dfa_exec.c ) |
|
579 |
pcre_exec.c ) |
pcreposix.c ) |
580 |
pcre_fullinfo.c ) |
pcre_compile.c ) |
581 |
pcre_get.c ) sources for the functions in the library, |
pcre_config.c ) |
582 |
pcre_globals.c ) and some internal functions that they use |
pcre_dfa_exec.c ) |
583 |
pcre_info.c ) |
pcre_exec.c ) |
584 |
pcre_maketables.c ) |
pcre_fullinfo.c ) |
585 |
pcre_newline.c ) |
pcre_get.c ) sources for the functions in the library, |
586 |
pcre_ord2utf8.c ) |
pcre_globals.c ) and some internal functions that they use |
587 |
pcre_refcount.c ) |
pcre_info.c ) |
588 |
pcre_study.c ) |
pcre_maketables.c ) |
589 |
pcre_tables.c ) |
pcre_newline.c ) |
590 |
pcre_try_flipped.c ) |
pcre_ord2utf8.c ) |
591 |
pcre_ucp_searchfuncs.c ) |
pcre_refcount.c ) |
592 |
pcre_valid_utf8.c ) |
pcre_study.c ) |
593 |
pcre_version.c ) |
pcre_tables.c ) |
594 |
pcre_xclass.c ) |
pcre_try_flipped.c ) |
595 |
pcre_printint.src ) debugging function that is #included in pcretest, |
pcre_ucp_searchfuncs.c ) |
596 |
) and can also be #included in pcre_compile() |
pcre_valid_utf8.c ) |
597 |
pcre.h.in template for pcre.h when built by "configure" |
pcre_version.c ) |
598 |
pcreposix.h header for the external POSIX wrapper API |
pcre_xclass.c ) |
599 |
pcre_internal.h header for internal use |
pcre_printint.src ) debugging function that is #included in pcretest, |
600 |
ucp.h ) headers concerned with |
) and can also be #included in pcre_compile() |
601 |
ucpinternal.h ) Unicode property handling |
pcre.h.in template for pcre.h when built by "configure" |
602 |
ucptable.h ) (this one is the data table) |
pcreposix.h header for the external POSIX wrapper API |
603 |
|
pcre_internal.h header for internal use |
604 |
config.h.in template for config.h, which is built by "configure" |
ucp.h ) headers concerned with |
605 |
|
ucpinternal.h ) Unicode property handling |
606 |
pcrecpp.h public header file for the C++ wrapper |
ucptable.h ) (this one is the data table) |
607 |
pcrecpparg.h.in template for another C++ header file |
|
608 |
pcre_scanner.h public header file for C++ scanner functions |
config.h.in template for config.h, which is built by "configure" |
609 |
pcrecpp.cc ) |
|
610 |
pcre_scanner.cc ) source for the C++ wrapper library |
pcrecpp.h public header file for the C++ wrapper |
611 |
|
pcrecpparg.h.in template for another C++ header file |
612 |
pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the |
pcre_scanner.h public header file for C++ scanner functions |
613 |
C++ stringpiece functions |
pcrecpp.cc ) |
614 |
pcre_stringpiece.cc source for the C++ stringpiece functions |
pcre_scanner.cc ) source for the C++ wrapper library |
615 |
|
|
616 |
|
pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the |
617 |
|
C++ stringpiece functions |
618 |
|
pcre_stringpiece.cc source for the C++ stringpiece functions |
619 |
|
|
620 |
(B) Source files for programs that use PCRE: |
(B) Source files for programs that use PCRE: |
621 |
|
|
622 |
pcredemo.c simple demonstration of coding calls to PCRE |
pcredemo.c simple demonstration of coding calls to PCRE |
623 |
pcregrep.c source of a grep utility that uses PCRE |
pcregrep.c source of a grep utility that uses PCRE |
624 |
pcretest.c comprehensive test program |
pcretest.c comprehensive test program |
625 |
|
|
626 |
(C) Auxiliary files: |
(C) Auxiliary files: |
627 |
|
|
628 |
132html script to turn "man" pages into HTML |
132html script to turn "man" pages into HTML |
629 |
AUTHORS information about the author of PCRE |
AUTHORS information about the author of PCRE |
630 |
ChangeLog log of changes to the code |
ChangeLog log of changes to the code |
631 |
CleanTxt script to clean nroff output for txt man pages |
CleanTxt script to clean nroff output for txt man pages |
632 |
Detrail script to remove trailing spaces |
Detrail script to remove trailing spaces |
633 |
Index.html the base HTML page |
HACKING some notes about the internals of PCRE |
634 |
INSTALL generic installation instructions |
INSTALL generic installation instructions |
635 |
LICENCE conditions for the use of PCRE |
LICENCE conditions for the use of PCRE |
636 |
COPYING the same, using GNU's standard name |
COPYING the same, using GNU's standard name |
637 |
Makefile.in ) template for Unix Makefile, which is built by |
Makefile.in ) template for Unix Makefile, which is built by |
638 |
) "configure" |
) "configure" |
639 |
Makefile.am ) the automake input that was used to create |
Makefile.am ) the automake input that was used to create |
640 |
) Makefile.in |
) Makefile.in |
641 |
NEWS important changes in this release |
NEWS important changes in this release |
642 |
NON-UNIX-USE notes on building PCRE on non-Unix systems |
NON-UNIX-USE notes on building PCRE on non-Unix systems |
643 |
PrepareRelease script to make preparations for "make dist" |
PrepareRelease script to make preparations for "make dist" |
644 |
README this file |
README this file |
645 |
RunTest.in template for a Unix shell script for running tests |
RunTest.in template for a Unix shell script for running tests |
646 |
RunGrepTest.in template for a Unix shell script for pcregrep tests |
RunGrepTest.in template for a Unix shell script for pcregrep tests |
647 |
aclocal.m4 m4 macros (generated by "aclocal") |
aclocal.m4 m4 macros (generated by "aclocal") |
648 |
config.guess ) files used by libtool, |
config.guess ) files used by libtool, |
649 |
config.sub ) used only when building a shared library |
config.sub ) used only when building a shared library |
650 |
configure a configuring shell script (built by autoconf) |
configure a configuring shell script (built by autoconf) |
651 |
configure.ac ) the autoconf input that was used to build |
configure.ac ) the autoconf input that was used to build |
652 |
) "configure" and config.h |
) "configure" and config.h |
653 |
depcomp ) script to find program dependencies, generated by |
depcomp ) script to find program dependencies, generated by |
654 |
) automake |
) automake |
655 |
doc/*.3 man page sources for the PCRE functions |
doc/*.3 man page sources for the PCRE functions |
656 |
doc/*.1 man page sources for pcregrep and pcretest |
doc/*.1 man page sources for pcregrep and pcretest |
657 |
doc/html/* HTML documentation |
doc/index.html.src the base HTML page |
658 |
doc/pcre.txt plain text version of the man pages |
doc/html/* HTML documentation |
659 |
doc/pcretest.txt plain text documentation of test program |
doc/pcre.txt plain text version of the man pages |
660 |
doc/perltest.txt plain text documentation of Perl test program |
doc/pcretest.txt plain text documentation of test program |
661 |
install-sh a shell script for installing files |
doc/perltest.txt plain text documentation of Perl test program |
662 |
libpcre.pc.in template for libpcre.pc for pkg-config |
install-sh a shell script for installing files |
663 |
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config |
libpcre.pc.in template for libpcre.pc for pkg-config |
664 |
ltmain.sh file used to build a libtool script |
libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config |
665 |
missing ) common stub for a few missing GNU programs while |
ltmain.sh file used to build a libtool script |
666 |
) installing, generated by automake |
missing ) common stub for a few missing GNU programs while |
667 |
mkinstalldirs script for making install directories |
) installing, generated by automake |
668 |
perltest.pl Perl test program |
mkinstalldirs script for making install directories |
669 |
pcre-config.in source of script which retains PCRE information |
perltest.pl Perl test program |
670 |
|
pcre-config.in source of script which retains PCRE information |
671 |
pcrecpp_unittest.cc ) |
pcrecpp_unittest.cc ) |
672 |
pcre_scanner_unittest.cc ) test programs for the C++ wrapper |
pcre_scanner_unittest.cc ) test programs for the C++ wrapper |
673 |
pcre_stringpiece_unittest.cc ) |
pcre_stringpiece_unittest.cc ) |
674 |
testdata/testinput* test data for main library tests |
testdata/testinput* test data for main library tests |
675 |
testdata/testoutput* expected test results |
testdata/testoutput* expected test results |
676 |
testdata/grep* input and output for pcregrep tests |
testdata/grep* input and output for pcregrep tests |
677 |
|
|
678 |
(D) Auxiliary files for cmake support |
(D) Auxiliary files for cmake support |
679 |
|
|
683 |
(E) Auxiliary files for VPASCAL |
(E) Auxiliary files for VPASCAL |
684 |
|
|
685 |
makevp.bat |
makevp.bat |
686 |
!compile.txt |
makevp-c.txt |
687 |
!linklib.txt |
makevp-l.txt |
688 |
pcregexp.pas |
pcregexp.pas |
689 |
|
|
690 |
(F) Auxiliary files for building PCRE "by hand" |
(F) Auxiliary files for building PCRE "by hand" |
691 |
|
|
692 |
pcre.h.generic ) a version of the public PCRE header file |
pcre.h.generic ) a version of the public PCRE header file |
693 |
) for use in non-"configure" environments |
) for use in non-"configure" environments |
694 |
config.h.generic ) a version of config.h for use in non-"configure" |
config.h.generic ) a version of config.h for use in non-"configure" |
695 |
) environments |
) environments |
696 |
|
|
697 |
(F) Miscellaneous |
(F) Miscellaneous |
698 |
|
|
701 |
Philip Hazel |
Philip Hazel |
702 |
Email local part: ph10 |
Email local part: ph10 |
703 |
Email domain: cam.ac.uk |
Email domain: cam.ac.uk |
704 |
Last updated: March 2007 |
Last updated: 26 March 2007 |