/[pcre]/code/trunk/doc/pcrebuild.3
ViewVC logotype

Contents of /code/trunk/doc/pcrebuild.3

Parent Directory Parent Directory | Revision Log Revision Log


Revision 678 - (show annotations)
Sun Aug 28 15:23:03 2011 UTC (8 years, 5 months ago) by ph10
File size: 13947 byte(s)
Documentation for JIT support.
1 .TH PCREBUILD 3
2 .SH NAME
3 PCRE - Perl-compatible regular expressions
4 .
5 .
6 .SH "PCRE BUILD-TIME OPTIONS"
7 .rs
8 .sp
9 This document describes the optional features of PCRE that can be selected when
10 the library is compiled. It assumes use of the \fBconfigure\fP script, where
11 the optional features are selected or deselected by providing options to
12 \fBconfigure\fP before running the \fBmake\fP command. However, the same
13 options can be selected in both Unix-like and non-Unix-like environments using
14 the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead of
15 \fBconfigure\fP to build PCRE.
16 .P
17 There is a lot more information about building PCRE in non-Unix-like
18 environments in the file called \fINON_UNIX_USE\fP, which is part of the PCRE
19 distribution. You should consult this file as well as the \fIREADME\fP file if
20 you are building in a non-Unix-like environment.
21 .P
22 The complete list of options for \fBconfigure\fP (which includes the standard
23 ones such as the selection of the installation directory) can be obtained by
24 running
25 .sp
26 ./configure --help
27 .sp
28 The following sections include descriptions of options whose names begin with
29 --enable or --disable. These settings specify changes to the defaults for the
30 \fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
31 --enable and --disable always come in pairs, so the complementary option always
32 exists as well, but as it specifies the default, it is not described.
33 .
34 .
35 .SH "BUILDING SHARED AND STATIC LIBRARIES"
36 .rs
37 .sp
38 The PCRE building process uses \fBlibtool\fP to build both shared and static
39 Unix libraries by default. You can suppress one of these by adding one of
40 .sp
41 --disable-shared
42 --disable-static
43 .sp
44 to the \fBconfigure\fP command, as required.
45 .
46 .
47 .SH "C++ SUPPORT"
48 .rs
49 .sp
50 By default, the \fBconfigure\fP script will search for a C++ compiler and C++
51 header files. If it finds them, it automatically builds the C++ wrapper library
52 for PCRE. You can disable this by adding
53 .sp
54 --disable-cpp
55 .sp
56 to the \fBconfigure\fP command.
57 .
58 .
59 .SH "UTF-8 SUPPORT"
60 .rs
61 .sp
62 To build PCRE with support for UTF-8 Unicode character strings, add
63 .sp
64 --enable-utf8
65 .sp
66 to the \fBconfigure\fP command. Of itself, this does not make PCRE treat
67 strings as UTF-8. As well as compiling PCRE with this option, you also have
68 have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP
69 or \fBpcre_compile2()\fP functions.
70 .P
71 If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE expects
72 its input to be either ASCII or UTF-8 (depending on the runtime option). It is
73 not possible to support both EBCDIC and UTF-8 codes in the same version of the
74 library. Consequently, --enable-utf8 and --enable-ebcdic are mutually
75 exclusive.
76 .
77 .
78 .SH "UNICODE CHARACTER PROPERTY SUPPORT"
79 .rs
80 .sp
81 UTF-8 support allows PCRE to process character values greater than 255 in the
82 strings that it handles. On its own, however, it does not provide any
83 facilities for accessing the properties of such characters. If you want to be
84 able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
85 character properties, you must add
86 .sp
87 --enable-unicode-properties
88 .sp
89 to the \fBconfigure\fP command. This implies UTF-8 support, even if you have
90 not explicitly requested it.
91 .P
92 Including Unicode property support adds around 30K of tables to the PCRE
93 library. Only the general category properties such as \fILu\fP and \fINd\fP are
94 supported. Details are given in the
95 .\" HREF
96 \fBpcrepattern\fP
97 .\"
98 documentation.
99 .
100 .
101 .SH "JUST-IN-TIME COMPILER SUPPORT"
102 .rs
103 .sp
104 Just-in-time compiler support is included in the build by specifying
105 .sp
106 --enable-jit
107 .sp
108 This support is available only for certain hardware architectures. If this
109 option is set for an unsupported architecture, a compile time error occurs.
110 See the
111 .\" HREF
112 \fBpcrejit\fP
113 .\"
114 documentation for a discussion of JIT usage.
115 .
116 .
117 .SH "CODE VALUE OF NEWLINE"
118 .rs
119 .sp
120 By default, PCRE interprets the linefeed (LF) character as indicating the end
121 of a line. This is the normal newline character on Unix-like systems. You can
122 compile PCRE to use carriage return (CR) instead, by adding
123 .sp
124 --enable-newline-is-cr
125 .sp
126 to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
127 which explicitly specifies linefeed as the newline character.
128 .sp
129 Alternatively, you can specify that line endings are to be indicated by the two
130 character sequence CRLF. If you want this, add
131 .sp
132 --enable-newline-is-crlf
133 .sp
134 to the \fBconfigure\fP command. There is a fourth option, specified by
135 .sp
136 --enable-newline-is-anycrlf
137 .sp
138 which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
139 indicating a line ending. Finally, a fifth option, specified by
140 .sp
141 --enable-newline-is-any
142 .sp
143 causes PCRE to recognize any Unicode newline sequence.
144 .P
145 Whatever line ending convention is selected when PCRE is built can be
146 overridden when the library functions are called. At build time it is
147 conventional to use the standard for your operating system.
148 .
149 .
150 .SH "WHAT \eR MATCHES"
151 .rs
152 .sp
153 By default, the sequence \eR in a pattern matches any Unicode newline sequence,
154 whatever has been selected as the line ending sequence. If you specify
155 .sp
156 --enable-bsr-anycrlf
157 .sp
158 the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is
159 selected when PCRE is built can be overridden when the library functions are
160 called.
161 .
162 .
163 .SH "POSIX MALLOC USAGE"
164 .rs
165 .sp
166 When PCRE is called through the POSIX interface (see the
167 .\" HREF
168 \fBpcreposix\fP
169 .\"
170 documentation), additional working storage is required for holding the pointers
171 to capturing substrings, because PCRE requires three integers per substring,
172 whereas the POSIX interface provides only two. If the number of expected
173 substrings is small, the wrapper function uses space on the stack, because this
174 is faster than using \fBmalloc()\fP for each call. The default threshold above
175 which the stack is no longer used is 10; it can be changed by adding a setting
176 such as
177 .sp
178 --with-posix-malloc-threshold=20
179 .sp
180 to the \fBconfigure\fP command.
181 .
182 .
183 .SH "HANDLING VERY LARGE PATTERNS"
184 .rs
185 .sp
186 Within a compiled pattern, offset values are used to point from one part to
187 another (for example, from an opening parenthesis to an alternation
188 metacharacter). By default, two-byte values are used for these offsets, leading
189 to a maximum size for a compiled pattern of around 64K. This is sufficient to
190 handle all but the most gigantic patterns. Nevertheless, some people do want to
191 process truyl enormous patterns, so it is possible to compile PCRE to use
192 three-byte or four-byte offsets by adding a setting such as
193 .sp
194 --with-link-size=3
195 .sp
196 to the \fBconfigure\fP command. The value given must be 2, 3, or 4. Using
197 longer offsets slows down the operation of PCRE because it has to load
198 additional bytes when handling them.
199 .
200 .
201 .SH "AVOIDING EXCESSIVE STACK USAGE"
202 .rs
203 .sp
204 When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
205 by making recursive calls to an internal function called \fBmatch()\fP. In
206 environments where the size of the stack is limited, this can severely limit
207 PCRE's operation. (The Unix environment does not usually suffer from this
208 problem, but it may sometimes be necessary to increase the maximum stack size.
209 There is a discussion in the
210 .\" HREF
211 \fBpcrestack\fP
212 .\"
213 documentation.) An alternative approach to recursion that uses memory from the
214 heap to remember data, instead of using recursive function calls, has been
215 implemented to work round the problem of limited stack size. If you want to
216 build a version of PCRE that works this way, add
217 .sp
218 --disable-stack-for-recursion
219 .sp
220 to the \fBconfigure\fP command. With this configuration, PCRE will use the
221 \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
222 management functions. By default these point to \fBmalloc()\fP and
223 \fBfree()\fP, but you can replace the pointers so that your own functions are
224 used instead.
225 .P
226 Separate functions are provided rather than using \fBpcre_malloc\fP and
227 \fBpcre_free\fP because the usage is very predictable: the block sizes
228 requested are always the same, and the blocks are always freed in reverse
229 order. A calling program might be able to implement optimized functions that
230 perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more
231 slowly when built in this way. This option affects only the \fBpcre_exec()\fP
232 function; it is not relevant for \fBpcre_dfa_exec()\fP.
233 .
234 .
235 .SH "LIMITING PCRE RESOURCE USAGE"
236 .rs
237 .sp
238 Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
239 (sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
240 function. By controlling the maximum number of times this function may be
241 called during a single matching operation, a limit can be placed on the
242 resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
243 at run time, as described in the
244 .\" HREF
245 \fBpcreapi\fP
246 .\"
247 documentation. The default is 10 million, but this can be changed by adding a
248 setting such as
249 .sp
250 --with-match-limit=500000
251 .sp
252 to the \fBconfigure\fP command. This setting has no effect on the
253 \fBpcre_dfa_exec()\fP matching function.
254 .P
255 In some environments it is desirable to limit the depth of recursive calls of
256 \fBmatch()\fP more strictly than the total number of calls, in order to
257 restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
258 is specified) that is used. A second limit controls this; it defaults to the
259 value that is set for --with-match-limit, which imposes no additional
260 constraints. However, you can set a lower limit by adding, for example,
261 .sp
262 --with-match-limit-recursion=10000
263 .sp
264 to the \fBconfigure\fP command. This value can also be overridden at run time.
265 .
266 .
267 .SH "CREATING CHARACTER TABLES AT BUILD TIME"
268 .rs
269 .sp
270 PCRE uses fixed tables for processing characters whose code values are less
271 than 256. By default, PCRE is built with a set of tables that are distributed
272 in the file \fIpcre_chartables.c.dist\fP. These tables are for ASCII codes
273 only. If you add
274 .sp
275 --enable-rebuild-chartables
276 .sp
277 to the \fBconfigure\fP command, the distributed tables are no longer used.
278 Instead, a program called \fBdftables\fP is compiled and run. This outputs the
279 source for new set of tables, created in the default locale of your C runtime
280 system. (This method of replacing the tables does not work if you are cross
281 compiling, because \fBdftables\fP is run on the local host. If you need to
282 create alternative tables when cross compiling, you will have to do so "by
283 hand".)
284 .
285 .
286 .SH "USING EBCDIC CODE"
287 .rs
288 .sp
289 PCRE assumes by default that it will run in an environment where the character
290 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
291 most computer operating systems. PCRE can, however, be compiled to run in an
292 EBCDIC environment by adding
293 .sp
294 --enable-ebcdic
295 .sp
296 to the \fBconfigure\fP command. This setting implies
297 --enable-rebuild-chartables. You should only use it if you know that you are in
298 an EBCDIC environment (for example, an IBM mainframe operating system). The
299 --enable-ebcdic option is incompatible with --enable-utf8.
300 .
301 .
302 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT"
303 .rs
304 .sp
305 By default, \fBpcregrep\fP reads all files as plain text. You can build it so
306 that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads
307 them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of
308 .sp
309 --enable-pcregrep-libz
310 --enable-pcregrep-libbz2
311 .sp
312 to the \fBconfigure\fP command. These options naturally require that the
313 relevant libraries are installed on your system. Configuration will fail if
314 they are not.
315 .
316 .
317 .SH "PCREGREP BUFFER SIZE"
318 .rs
319 .sp
320 \fBpcregrep\fP uses an internal buffer to hold a "window" on the file it is
321 scanning, in order to be able to output "before" and "after" lines when it
322 finds a match. The size of the buffer is controlled by a parameter whose
323 default value is 20K. The buffer itself is three times this size, but because
324 of the way it is used for holding "before" lines, the longest line that is
325 guaranteed to be processable is the parameter size. You can change the default
326 parameter value by adding, for example,
327 .sp
328 --with-pcregrep-bufsize=50K
329 .sp
330 to the \fBconfigure\fP command. The caller of \fPpcregrep\fP can, however,
331 override this value by specifying a run-time option.
332 .
333 .
334 .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT"
335 .rs
336 .sp
337 If you add
338 .sp
339 --enable-pcretest-libreadline
340 .sp
341 to the \fBconfigure\fP command, \fBpcretest\fP is linked with the
342 \fBlibreadline\fP library, and when its input is from a terminal, it reads it
343 using the \fBreadline()\fP function. This provides line-editing and history
344 facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a
345 binary of \fBpcretest\fP linked in this way, there may be licensing issues.
346 .P
347 Setting this option causes the \fB-lreadline\fP option to be added to the
348 \fBpcretest\fP build. In many operating environments with a sytem-installed
349 \fBlibreadline\fP this is sufficient. However, in some environments (e.g.
350 if an unmodified distribution version of readline is in use), some extra
351 configuration may be necessary. The INSTALL file for \fBlibreadline\fP says
352 this:
353 .sp
354 "Readline uses the termcap functions, but does not link with the
355 termcap or curses library itself, allowing applications which link
356 with readline the to choose an appropriate library."
357 .sp
358 If your environment has not been set up so that an appropriate library is
359 automatically included, you may need to add something like
360 .sp
361 LIBS="-ncurses"
362 .sp
363 immediately before the \fBconfigure\fP command.
364 .
365 .
366 .SH "SEE ALSO"
367 .rs
368 .sp
369 \fBpcreapi\fP(3), \fBpcre_config\fP(3).
370 .
371 .
372 .SH AUTHOR
373 .rs
374 .sp
375 .nf
376 Philip Hazel
377 University Computing Service
378 Cambridge CB2 3QH, England.
379 .fi
380 .
381 .
382 .SH REVISION
383 .rs
384 .sp
385 .nf
386 Last updated: 27 August 2011
387 Copyright (c) 1997-2011 University of Cambridge.
388 .fi

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

  ViewVC Help
Powered by ViewVC 1.1.5