ViewVC logotype

Contents of /code/trunk/doc/html/pcrebuild.html

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1221 - (show annotations)
Sun Nov 11 20:27:03 2012 UTC (8 years, 5 months ago) by ph10
File MIME type: text/html
File size: 21951 byte(s)
File tidies, preparing for 8.32-RC1.
1 <html>
2 <head>
3 <title>pcrebuild specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcrebuild man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
17 <li><a name="TOC2" href="#SEC2">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
19 <li><a name="TOC4" href="#SEC4">C++ SUPPORT</a>
20 <li><a name="TOC5" href="#SEC5">UTF-8, UTF-16 AND UTF-32 SUPPORT</a>
22 <li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
23 <li><a name="TOC8" href="#SEC8">CODE VALUE OF NEWLINE</a>
24 <li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
25 <li><a name="TOC10" href="#SEC10">POSIX MALLOC USAGE</a>
26 <li><a name="TOC11" href="#SEC11">HANDLING VERY LARGE PATTERNS</a>
27 <li><a name="TOC12" href="#SEC12">AVOIDING EXCESSIVE STACK USAGE</a>
28 <li><a name="TOC13" href="#SEC13">LIMITING PCRE RESOURCE USAGE</a>
29 <li><a name="TOC14" href="#SEC14">CREATING CHARACTER TABLES AT BUILD TIME</a>
30 <li><a name="TOC15" href="#SEC15">USING EBCDIC CODE</a>
32 <li><a name="TOC17" href="#SEC17">PCREGREP BUFFER SIZE</a>
34 <li><a name="TOC19" href="#SEC19">DEBUGGING WITH VALGRIND SUPPORT</a>
35 <li><a name="TOC20" href="#SEC20">CODE COVERAGE REPORTING</a>
36 <li><a name="TOC21" href="#SEC21">SEE ALSO</a>
37 <li><a name="TOC22" href="#SEC22">AUTHOR</a>
38 <li><a name="TOC23" href="#SEC23">REVISION</a>
39 </ul>
40 <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
41 <P>
42 This document describes the optional features of PCRE that can be selected when
43 the library is compiled. It assumes use of the <b>configure</b> script, where
44 the optional features are selected or deselected by providing options to
45 <b>configure</b> before running the <b>make</b> command. However, the same
46 options can be selected in both Unix-like and non-Unix-like environments using
47 the GUI facility of <b>cmake-gui</b> if you are using <b>CMake</b> instead of
48 <b>configure</b> to build PCRE.
49 </P>
50 <P>
51 There is a lot more information about building PCRE without using
52 <b>configure</b> (including information about using <b>CMake</b> or building "by
53 hand") in the file called <i>NON-AUTOTOOLS-BUILD</i>, which is part of the PCRE
54 distribution. You should consult this file as well as the <i>README</i> file if
55 you are building in a non-Unix-like environment.
56 </P>
57 <P>
58 The complete list of options for <b>configure</b> (which includes the standard
59 ones such as the selection of the installation directory) can be obtained by
60 running
61 <pre>
62 ./configure --help
63 </pre>
64 The following sections include descriptions of options whose names begin with
65 --enable or --disable. These settings specify changes to the defaults for the
66 <b>configure</b> command. Because of the way that <b>configure</b> works,
67 --enable and --disable always come in pairs, so the complementary option always
68 exists as well, but as it specifies the default, it is not described.
69 </P>
70 <br><a name="SEC2" href="#TOC1">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
71 <P>
72 By default, a library called <b>libpcre</b> is built, containing functions that
73 take string arguments contained in vectors of bytes, either as single-byte
74 characters, or interpreted as UTF-8 strings. You can also build a separate
75 library, called <b>libpcre16</b>, in which strings are contained in vectors of
76 16-bit data units and interpreted either as single-unit characters or UTF-16
77 strings, by adding
78 <pre>
79 --enable-pcre16
80 </pre>
81 to the <b>configure</b> command. You can also build a separate
82 library, called <b>libpcre32</b>, in which strings are contained in vectors of
83 32-bit data units and interpreted either as single-unit characters or UTF-32
84 strings, by adding
85 <pre>
86 --enable-pcre32
87 </pre>
88 to the <b>configure</b> command. If you do not want the 8-bit library, add
89 <pre>
90 --disable-pcre8
91 </pre>
92 as well. At least one of the three libraries must be built. Note that the C++
93 and POSIX wrappers are for the 8-bit library only, and that <b>pcregrep</b> is
94 an 8-bit program. None of these are built if you select only the 16-bit or
95 32-bit libraries.
96 </P>
97 <br><a name="SEC3" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
98 <P>
99 The PCRE building process uses <b>libtool</b> to build both shared and static
100 Unix libraries by default. You can suppress one of these by adding one of
101 <pre>
102 --disable-shared
103 --disable-static
104 </pre>
105 to the <b>configure</b> command, as required.
106 </P>
107 <br><a name="SEC4" href="#TOC1">C++ SUPPORT</a><br>
108 <P>
109 By default, if the 8-bit library is being built, the <b>configure</b> script
110 will search for a C++ compiler and C++ header files. If it finds them, it
111 automatically builds the C++ wrapper library (which supports only 8-bit
112 strings). You can disable this by adding
113 <pre>
114 --disable-cpp
115 </pre>
116 to the <b>configure</b> command.
117 </P>
118 <br><a name="SEC5" href="#TOC1">UTF-8, UTF-16 AND UTF-32 SUPPORT</a><br>
119 <P>
120 To build PCRE with support for UTF Unicode character strings, add
121 <pre>
122 --enable-utf
123 </pre>
124 to the <b>configure</b> command. This setting applies to all three libraries,
125 adding support for UTF-8 to the 8-bit library, support for UTF-16 to the 16-bit
126 library, and support for UTF-32 to the to the 32-bit library. There are no
127 separate options for enabling UTF-8, UTF-16 and UTF-32 independently because
128 that would allow ridiculous settings such as requesting UTF-16 support while
129 building only the 8-bit library. It is not possible to build one library with
130 UTF support and another without in the same configuration. (For backwards
131 compatibility, --enable-utf8 is a synonym of --enable-utf.)
132 </P>
133 <P>
134 Of itself, this setting does not make PCRE treat strings as UTF-8, UTF-16 or
135 UTF-32. As well as compiling PCRE with this option, you also have have to set
136 the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as appropriate) when you call
137 one of the pattern compiling functions.
138 </P>
139 <P>
140 If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects
141 its input to be either ASCII or UTF-8 (depending on the run-time option). It is
142 not possible to support both EBCDIC and UTF-8 codes in the same version of the
143 library. Consequently, --enable-utf and --enable-ebcdic are mutually
144 exclusive.
145 </P>
146 <br><a name="SEC6" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
147 <P>
148 UTF support allows the libraries to process character codepoints up to 0x10ffff
149 in the strings that they handle. On its own, however, it does not provide any
150 facilities for accessing the properties of such characters. If you want to be
151 able to use the pattern escapes \P, \p, and \X, which refer to Unicode
152 character properties, you must add
153 <pre>
154 --enable-unicode-properties
155 </pre>
156 to the <b>configure</b> command. This implies UTF support, even if you have
157 not explicitly requested it.
158 </P>
159 <P>
160 Including Unicode property support adds around 30K of tables to the PCRE
161 library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are
162 supported. Details are given in the
163 <a href="pcrepattern.html"><b>pcrepattern</b></a>
164 documentation.
165 </P>
166 <br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
167 <P>
168 Just-in-time compiler support is included in the build by specifying
169 <pre>
170 --enable-jit
171 </pre>
172 This support is available only for certain hardware architectures. If this
173 option is set for an unsupported architecture, a compile time error occurs.
174 See the
175 <a href="pcrejit.html"><b>pcrejit</b></a>
176 documentation for a discussion of JIT usage. When JIT support is enabled,
177 pcregrep automatically makes use of it, unless you add
178 <pre>
179 --disable-pcregrep-jit
180 </pre>
181 to the "configure" command.
182 </P>
183 <br><a name="SEC8" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
184 <P>
185 By default, PCRE interprets the linefeed (LF) character as indicating the end
186 of a line. This is the normal newline character on Unix-like systems. You can
187 compile PCRE to use carriage return (CR) instead, by adding
188 <pre>
189 --enable-newline-is-cr
190 </pre>
191 to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
192 which explicitly specifies linefeed as the newline character.
193 <br>
194 <br>
195 Alternatively, you can specify that line endings are to be indicated by the two
196 character sequence CRLF. If you want this, add
197 <pre>
198 --enable-newline-is-crlf
199 </pre>
200 to the <b>configure</b> command. There is a fourth option, specified by
201 <pre>
202 --enable-newline-is-anycrlf
203 </pre>
204 which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as
205 indicating a line ending. Finally, a fifth option, specified by
206 <pre>
207 --enable-newline-is-any
208 </pre>
209 causes PCRE to recognize any Unicode newline sequence.
210 </P>
211 <P>
212 Whatever line ending convention is selected when PCRE is built can be
213 overridden when the library functions are called. At build time it is
214 conventional to use the standard for your operating system.
215 </P>
216 <br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
217 <P>
218 By default, the sequence \R in a pattern matches any Unicode newline sequence,
219 whatever has been selected as the line ending sequence. If you specify
220 <pre>
221 --enable-bsr-anycrlf
222 </pre>
223 the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
224 selected when PCRE is built can be overridden when the library functions are
225 called.
226 </P>
227 <br><a name="SEC10" href="#TOC1">POSIX MALLOC USAGE</a><br>
228 <P>
229 When the 8-bit library is called through the POSIX interface (see the
230 <a href="pcreposix.html"><b>pcreposix</b></a>
231 documentation), additional working storage is required for holding the pointers
232 to capturing substrings, because PCRE requires three integers per substring,
233 whereas the POSIX interface provides only two. If the number of expected
234 substrings is small, the wrapper function uses space on the stack, because this
235 is faster than using <b>malloc()</b> for each call. The default threshold above
236 which the stack is no longer used is 10; it can be changed by adding a setting
237 such as
238 <pre>
239 --with-posix-malloc-threshold=20
240 </pre>
241 to the <b>configure</b> command.
242 </P>
243 <br><a name="SEC11" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
244 <P>
245 Within a compiled pattern, offset values are used to point from one part to
246 another (for example, from an opening parenthesis to an alternation
247 metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values
248 are used for these offsets, leading to a maximum size for a compiled pattern of
249 around 64K. This is sufficient to handle all but the most gigantic patterns.
250 Nevertheless, some people do want to process truly enormous patterns, so it is
251 possible to compile PCRE to use three-byte or four-byte offsets by adding a
252 setting such as
253 <pre>
254 --with-link-size=3
255 </pre>
256 to the <b>configure</b> command. The value given must be 2, 3, or 4. For the
257 16-bit library, a value of 3 is rounded up to 4. In these libraries, using
258 longer offsets slows down the operation of PCRE because it has to load
259 additional data when handling them. For the 32-bit library the value is always
260 4 and cannot be overridden; the value of --with-link-size is ignored.
261 </P>
262 <br><a name="SEC12" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
263 <P>
264 When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
265 by making recursive calls to an internal function called <b>match()</b>. In
266 environments where the size of the stack is limited, this can severely limit
267 PCRE's operation. (The Unix environment does not usually suffer from this
268 problem, but it may sometimes be necessary to increase the maximum stack size.
269 There is a discussion in the
270 <a href="pcrestack.html"><b>pcrestack</b></a>
271 documentation.) An alternative approach to recursion that uses memory from the
272 heap to remember data, instead of using recursive function calls, has been
273 implemented to work round the problem of limited stack size. If you want to
274 build a version of PCRE that works this way, add
275 <pre>
276 --disable-stack-for-recursion
277 </pre>
278 to the <b>configure</b> command. With this configuration, PCRE will use the
279 <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
280 management functions. By default these point to <b>malloc()</b> and
281 <b>free()</b>, but you can replace the pointers so that your own functions are
282 used instead.
283 </P>
284 <P>
285 Separate functions are provided rather than using <b>pcre_malloc</b> and
286 <b>pcre_free</b> because the usage is very predictable: the block sizes
287 requested are always the same, and the blocks are always freed in reverse
288 order. A calling program might be able to implement optimized functions that
289 perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more
290 slowly when built in this way. This option affects only the <b>pcre_exec()</b>
291 function; it is not relevant for <b>pcre_dfa_exec()</b>.
292 </P>
293 <br><a name="SEC13" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
294 <P>
295 Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
296 (sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
297 function. By controlling the maximum number of times this function may be
298 called during a single matching operation, a limit can be placed on the
299 resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
300 at run time, as described in the
301 <a href="pcreapi.html"><b>pcreapi</b></a>
302 documentation. The default is 10 million, but this can be changed by adding a
303 setting such as
304 <pre>
305 --with-match-limit=500000
306 </pre>
307 to the <b>configure</b> command. This setting has no effect on the
308 <b>pcre_dfa_exec()</b> matching function.
309 </P>
310 <P>
311 In some environments it is desirable to limit the depth of recursive calls of
312 <b>match()</b> more strictly than the total number of calls, in order to
313 restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
314 is specified) that is used. A second limit controls this; it defaults to the
315 value that is set for --with-match-limit, which imposes no additional
316 constraints. However, you can set a lower limit by adding, for example,
317 <pre>
318 --with-match-limit-recursion=10000
319 </pre>
320 to the <b>configure</b> command. This value can also be overridden at run time.
321 </P>
322 <br><a name="SEC14" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
323 <P>
324 PCRE uses fixed tables for processing characters whose code values are less
325 than 256. By default, PCRE is built with a set of tables that are distributed
326 in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes
327 only. If you add
328 <pre>
329 --enable-rebuild-chartables
330 </pre>
331 to the <b>configure</b> command, the distributed tables are no longer used.
332 Instead, a program called <b>dftables</b> is compiled and run. This outputs the
333 source for new set of tables, created in the default locale of your C run-time
334 system. (This method of replacing the tables does not work if you are cross
335 compiling, because <b>dftables</b> is run on the local host. If you need to
336 create alternative tables when cross compiling, you will have to do so "by
337 hand".)
338 </P>
339 <br><a name="SEC15" href="#TOC1">USING EBCDIC CODE</a><br>
340 <P>
341 PCRE assumes by default that it will run in an environment where the character
342 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for
343 most computer operating systems. PCRE can, however, be compiled to run in an
344 EBCDIC environment by adding
345 <pre>
346 --enable-ebcdic
347 </pre>
348 to the <b>configure</b> command. This setting implies
349 --enable-rebuild-chartables. You should only use it if you know that you are in
350 an EBCDIC environment (for example, an IBM mainframe operating system). The
351 --enable-ebcdic option is incompatible with --enable-utf.
352 </P>
353 <P>
354 The EBCDIC character that corresponds to an ASCII LF is assumed to have the
355 value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In
356 such an environment you should use
357 <pre>
358 --enable-ebcdic-nl25
359 </pre>
360 as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the
361 same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is <i>not</i>
362 chosen as LF is made to correspond to the Unicode NEL character (which, in
363 Unicode, is 0x85).
364 </P>
365 <P>
366 The options that select newline behaviour, such as --enable-newline-is-cr,
367 and equivalent run-time options, refer to these character values in an EBCDIC
368 environment.
369 </P>
371 <P>
372 By default, <b>pcregrep</b> reads all files as plain text. You can build it so
373 that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
374 them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of
375 <pre>
376 --enable-pcregrep-libz
377 --enable-pcregrep-libbz2
378 </pre>
379 to the <b>configure</b> command. These options naturally require that the
380 relevant libraries are installed on your system. Configuration will fail if
381 they are not.
382 </P>
383 <br><a name="SEC17" href="#TOC1">PCREGREP BUFFER SIZE</a><br>
384 <P>
385 <b>pcregrep</b> uses an internal buffer to hold a "window" on the file it is
386 scanning, in order to be able to output "before" and "after" lines when it
387 finds a match. The size of the buffer is controlled by a parameter whose
388 default value is 20K. The buffer itself is three times this size, but because
389 of the way it is used for holding "before" lines, the longest line that is
390 guaranteed to be processable is the parameter size. You can change the default
391 parameter value by adding, for example,
392 <pre>
393 --with-pcregrep-bufsize=50K
394 </pre>
395 to the <b>configure</b> command. The caller of \fPpcregrep\fP can, however,
396 override this value by specifying a run-time option.
397 </P>
398 <br><a name="SEC18" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br>
399 <P>
400 If you add
401 <pre>
402 --enable-pcretest-libreadline
403 </pre>
404 to the <b>configure</b> command, <b>pcretest</b> is linked with the
405 <b>libreadline</b> library, and when its input is from a terminal, it reads it
406 using the <b>readline()</b> function. This provides line-editing and history
407 facilities. Note that <b>libreadline</b> is GPL-licensed, so if you distribute a
408 binary of <b>pcretest</b> linked in this way, there may be licensing issues.
409 </P>
410 <P>
411 Setting this option causes the <b>-lreadline</b> option to be added to the
412 <b>pcretest</b> build. In many operating environments with a sytem-installed
413 <b>libreadline</b> this is sufficient. However, in some environments (e.g.
414 if an unmodified distribution version of readline is in use), some extra
415 configuration may be necessary. The INSTALL file for <b>libreadline</b> says
416 this:
417 <pre>
418 "Readline uses the termcap functions, but does not link with the
419 termcap or curses library itself, allowing applications which link
420 with readline the to choose an appropriate library."
421 </pre>
422 If your environment has not been set up so that an appropriate library is
423 automatically included, you may need to add something like
424 <pre>
425 LIBS="-ncurses"
426 </pre>
427 immediately before the <b>configure</b> command.
428 </P>
429 <br><a name="SEC19" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
430 <P>
431 By adding the
432 <pre>
433 --enable-valgrind
434 </pre>
435 option to to the <b>configure</b> command, PCRE will use valgrind annotations
436 to mark certain memory regions as unaddressable. This allows it to detect
437 invalid memory accesses, and is mostly useful for debugging PCRE itself.
438 </P>
439 <br><a name="SEC20" href="#TOC1">CODE COVERAGE REPORTING</a><br>
440 <P>
441 If your C compiler is gcc, you can build a version of PCRE that can generate a
442 code coverage report for its test suite. To enable this, you must install
443 <b>lcov</b> version 1.6 or above. Then specify
444 <pre>
445 --enable-coverage
446 </pre>
447 to the <b>configure</b> command and build PCRE in the usual way.
448 </P>
449 <P>
450 Note that using <b>ccache</b> (a caching C compiler) is incompatible with code
451 coverage reporting. If you have configured <b>ccache</b> to run automatically
452 on your system, you must set the environment variable
453 <pre>
455 </pre>
456 before running <b>make</b> to build PCRE, so that <b>ccache</b> is not used.
457 </P>
458 <P>
459 When --enable-coverage is used, the following addition targets are added to the
460 <i>Makefile</i>:
461 <pre>
462 make coverage
463 </pre>
464 This creates a fresh coverage report for the PCRE test suite. It is equivalent
465 to running "make coverage-reset", "make coverage-baseline", "make check", and
466 then "make coverage-report".
467 <pre>
468 make coverage-reset
469 </pre>
470 This zeroes the coverage counters, but does nothing else.
471 <pre>
472 make coverage-baseline
473 </pre>
474 This captures baseline coverage information.
475 <pre>
476 make coverage-report
477 </pre>
478 This creates the coverage report.
479 <pre>
480 make coverage-clean-report
481 </pre>
482 This removes the generated coverage report without cleaning the coverage data
483 itself.
484 <pre>
485 make coverage-clean-data
486 </pre>
487 This removes the captured coverage data without removing the coverage files
488 created at compile time (*.gcno).
489 <pre>
490 make coverage-clean
491 </pre>
492 This cleans all coverage data including the generated coverage report. For more
493 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
494 documentation.
495 </P>
496 <br><a name="SEC21" href="#TOC1">SEE ALSO</a><br>
497 <P>
498 <b>pcreapi</b>(3), <b>pcre16</b>, <b>pcre32</b>, <b>pcre_config</b>(3).
499 </P>
500 <br><a name="SEC22" href="#TOC1">AUTHOR</a><br>
501 <P>
502 Philip Hazel
503 <br>
504 University Computing Service
505 <br>
506 Cambridge CB2 3QH, England.
507 <br>
508 </P>
509 <br><a name="SEC23" href="#TOC1">REVISION</a><br>
510 <P>
511 Last updated: 30 October 2012
512 <br>
513 Copyright &copy; 1997-2012 University of Cambridge.
514 <br>
515 <p>
516 Return to the <a href="index.html">PCRE index page</a>.
517 </p>


Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

  ViewVC Help
Powered by ViewVC 1.1.5