271 |
PCRE BUILD-TIME OPTIONS |
PCRE BUILD-TIME OPTIONS |
272 |
|
|
273 |
This document describes the optional features of PCRE that can be |
This document describes the optional features of PCRE that can be |
274 |
selected when the library is compiled. They are all selected, or dese- |
selected when the library is compiled. It assumes use of the configure |
275 |
lected, by providing options to the configure script that is run before |
script, where the optional features are selected or deselected by pro- |
276 |
the make command. The complete list of options for configure (which |
viding options to configure before running the make command. However, |
277 |
includes the standard ones such as the selection of the installation |
the same options can be selected in both Unix-like and non-Unix-like |
278 |
directory) can be obtained by running |
environments using the GUI facility of CMakeSetup if you are using |
279 |
|
CMake instead of configure to build PCRE. |
280 |
|
|
281 |
|
The complete list of options for configure (which includes the standard |
282 |
|
ones such as the selection of the installation directory) can be |
283 |
|
obtained by running |
284 |
|
|
285 |
./configure --help |
./configure --help |
286 |
|
|
287 |
The following sections include descriptions of options whose names |
The following sections include descriptions of options whose names |
288 |
begin with --enable or --disable. These settings specify changes to the |
begin with --enable or --disable. These settings specify changes to the |
289 |
defaults for the configure command. Because of the way that configure |
defaults for the configure command. Because of the way that configure |
290 |
works, --enable and --disable always come in pairs, so the complemen- |
works, --enable and --disable always come in pairs, so the complemen- |
291 |
tary option always exists as well, but as it specifies the default, it |
tary option always exists as well, but as it specifies the default, it |
292 |
is not described. |
is not described. |
293 |
|
|
294 |
|
|
309 |
|
|
310 |
--enable-utf8 |
--enable-utf8 |
311 |
|
|
312 |
to the configure command. Of itself, this does not make PCRE treat |
to the configure command. Of itself, this does not make PCRE treat |
313 |
strings as UTF-8. As well as compiling PCRE with this option, you also |
strings as UTF-8. As well as compiling PCRE with this option, you also |
314 |
have have to set the PCRE_UTF8 option when you call the pcre_compile() |
have have to set the PCRE_UTF8 option when you call the pcre_compile() |
315 |
function. |
function. |
316 |
|
|
317 |
|
|
318 |
UNICODE CHARACTER PROPERTY SUPPORT |
UNICODE CHARACTER PROPERTY SUPPORT |
319 |
|
|
320 |
UTF-8 support allows PCRE to process character values greater than 255 |
UTF-8 support allows PCRE to process character values greater than 255 |
321 |
in the strings that it handles. On its own, however, it does not pro- |
in the strings that it handles. On its own, however, it does not pro- |
322 |
vide any facilities for accessing the properties of such characters. If |
vide any facilities for accessing the properties of such characters. If |
323 |
you want to be able to use the pattern escapes \P, \p, and \X, which |
you want to be able to use the pattern escapes \P, \p, and \X, which |
324 |
refer to Unicode character properties, you must add |
refer to Unicode character properties, you must add |
325 |
|
|
326 |
--enable-unicode-properties |
--enable-unicode-properties |
327 |
|
|
328 |
to the configure command. This implies UTF-8 support, even if you have |
to the configure command. This implies UTF-8 support, even if you have |
329 |
not explicitly requested it. |
not explicitly requested it. |
330 |
|
|
331 |
Including Unicode property support adds around 30K of tables to the |
Including Unicode property support adds around 30K of tables to the |
332 |
PCRE library. Only the general category properties such as Lu and Nd |
PCRE library. Only the general category properties such as Lu and Nd |
333 |
are supported. Details are given in the pcrepattern documentation. |
are supported. Details are given in the pcrepattern documentation. |
334 |
|
|
335 |
|
|
336 |
CODE VALUE OF NEWLINE |
CODE VALUE OF NEWLINE |
337 |
|
|
338 |
By default, PCRE interprets character 10 (linefeed, LF) as indicating |
By default, PCRE interprets character 10 (linefeed, LF) as indicating |
339 |
the end of a line. This is the normal newline character on Unix-like |
the end of a line. This is the normal newline character on Unix-like |
340 |
systems. You can compile PCRE to use character 13 (carriage return, CR) |
systems. You can compile PCRE to use character 13 (carriage return, CR) |
341 |
instead, by adding |
instead, by adding |
342 |
|
|
343 |
--enable-newline-is-cr |
--enable-newline-is-cr |
344 |
|
|
345 |
to the configure command. There is also a --enable-newline-is-lf |
to the configure command. There is also a --enable-newline-is-lf |
346 |
option, which explicitly specifies linefeed as the newline character. |
option, which explicitly specifies linefeed as the newline character. |
347 |
|
|
348 |
Alternatively, you can specify that line endings are to be indicated by |
Alternatively, you can specify that line endings are to be indicated by |
354 |
|
|
355 |
--enable-newline-is-anycrlf |
--enable-newline-is-anycrlf |
356 |
|
|
357 |
which causes PCRE to recognize any of the three sequences CR, LF, or |
which causes PCRE to recognize any of the three sequences CR, LF, or |
358 |
CRLF as indicating a line ending. Finally, a fifth option, specified by |
CRLF as indicating a line ending. Finally, a fifth option, specified by |
359 |
|
|
360 |
--enable-newline-is-any |
--enable-newline-is-any |
361 |
|
|
362 |
causes PCRE to recognize any Unicode newline sequence. |
causes PCRE to recognize any Unicode newline sequence. |
363 |
|
|
364 |
Whatever line ending convention is selected when PCRE is built can be |
Whatever line ending convention is selected when PCRE is built can be |
365 |
overridden when the library functions are called. At build time it is |
overridden when the library functions are called. At build time it is |
366 |
conventional to use the standard for your operating system. |
conventional to use the standard for your operating system. |
367 |
|
|
368 |
|
|
369 |
WHAT \R MATCHES |
WHAT \R MATCHES |
370 |
|
|
371 |
By default, the sequence \R in a pattern matches any Unicode newline |
By default, the sequence \R in a pattern matches any Unicode newline |
372 |
sequence, whatever has been selected as the line ending sequence. If |
sequence, whatever has been selected as the line ending sequence. If |
373 |
you specify |
you specify |
374 |
|
|
375 |
--enable-bsr-anycrlf |
--enable-bsr-anycrlf |
376 |
|
|
377 |
the default is changed so that \R matches only CR, LF, or CRLF. What- |
the default is changed so that \R matches only CR, LF, or CRLF. What- |
378 |
ever is selected when PCRE is built can be overridden when the library |
ever is selected when PCRE is built can be overridden when the library |
379 |
functions are called. |
functions are called. |
380 |
|
|
381 |
|
|
382 |
BUILDING SHARED AND STATIC LIBRARIES |
BUILDING SHARED AND STATIC LIBRARIES |
383 |
|
|
384 |
The PCRE building process uses libtool to build both shared and static |
The PCRE building process uses libtool to build both shared and static |
385 |
Unix libraries by default. You can suppress one of these by adding one |
Unix libraries by default. You can suppress one of these by adding one |
386 |
of |
of |
387 |
|
|
388 |
--disable-shared |
--disable-shared |
394 |
POSIX MALLOC USAGE |
POSIX MALLOC USAGE |
395 |
|
|
396 |
When PCRE is called through the POSIX interface (see the pcreposix doc- |
When PCRE is called through the POSIX interface (see the pcreposix doc- |
397 |
umentation), additional working storage is required for holding the |
umentation), additional working storage is required for holding the |
398 |
pointers to capturing substrings, because PCRE requires three integers |
pointers to capturing substrings, because PCRE requires three integers |
399 |
per substring, whereas the POSIX interface provides only two. If the |
per substring, whereas the POSIX interface provides only two. If the |
400 |
number of expected substrings is small, the wrapper function uses space |
number of expected substrings is small, the wrapper function uses space |
401 |
on the stack, because this is faster than using malloc() for each call. |
on the stack, because this is faster than using malloc() for each call. |
402 |
The default threshold above which the stack is no longer used is 10; it |
The default threshold above which the stack is no longer used is 10; it |
409 |
|
|
410 |
HANDLING VERY LARGE PATTERNS |
HANDLING VERY LARGE PATTERNS |
411 |
|
|
412 |
Within a compiled pattern, offset values are used to point from one |
Within a compiled pattern, offset values are used to point from one |
413 |
part to another (for example, from an opening parenthesis to an alter- |
part to another (for example, from an opening parenthesis to an alter- |
414 |
nation metacharacter). By default, two-byte values are used for these |
nation metacharacter). By default, two-byte values are used for these |
415 |
offsets, leading to a maximum size for a compiled pattern of around |
offsets, leading to a maximum size for a compiled pattern of around |
416 |
64K. This is sufficient to handle all but the most gigantic patterns. |
64K. This is sufficient to handle all but the most gigantic patterns. |
417 |
Nevertheless, some people do want to process enormous patterns, so it |
Nevertheless, some people do want to process enormous patterns, so it |
418 |
is possible to compile PCRE to use three-byte or four-byte offsets by |
is possible to compile PCRE to use three-byte or four-byte offsets by |
419 |
adding a setting such as |
adding a setting such as |
420 |
|
|
421 |
--with-link-size=3 |
--with-link-size=3 |
422 |
|
|
423 |
to the configure command. The value given must be 2, 3, or 4. Using |
to the configure command. The value given must be 2, 3, or 4. Using |
424 |
longer offsets slows down the operation of PCRE because it has to load |
longer offsets slows down the operation of PCRE because it has to load |
425 |
additional bytes when handling them. |
additional bytes when handling them. |
426 |
|
|
427 |
|
|
428 |
AVOIDING EXCESSIVE STACK USAGE |
AVOIDING EXCESSIVE STACK USAGE |
429 |
|
|
430 |
When matching with the pcre_exec() function, PCRE implements backtrack- |
When matching with the pcre_exec() function, PCRE implements backtrack- |
431 |
ing by making recursive calls to an internal function called match(). |
ing by making recursive calls to an internal function called match(). |
432 |
In environments where the size of the stack is limited, this can se- |
In environments where the size of the stack is limited, this can se- |
433 |
verely limit PCRE's operation. (The Unix environment does not usually |
verely limit PCRE's operation. (The Unix environment does not usually |
434 |
suffer from this problem, but it may sometimes be necessary to increase |
suffer from this problem, but it may sometimes be necessary to increase |
435 |
the maximum stack size. There is a discussion in the pcrestack docu- |
the maximum stack size. There is a discussion in the pcrestack docu- |
436 |
mentation.) An alternative approach to recursion that uses memory from |
mentation.) An alternative approach to recursion that uses memory from |
437 |
the heap to remember data, instead of using recursive function calls, |
the heap to remember data, instead of using recursive function calls, |
438 |
has been implemented to work round the problem of limited stack size. |
has been implemented to work round the problem of limited stack size. |
439 |
If you want to build a version of PCRE that works this way, add |
If you want to build a version of PCRE that works this way, add |
440 |
|
|
441 |
--disable-stack-for-recursion |
--disable-stack-for-recursion |
442 |
|
|
443 |
to the configure command. With this configuration, PCRE will use the |
to the configure command. With this configuration, PCRE will use the |
444 |
pcre_stack_malloc and pcre_stack_free variables to call memory manage- |
pcre_stack_malloc and pcre_stack_free variables to call memory manage- |
445 |
ment functions. By default these point to malloc() and free(), but you |
ment functions. By default these point to malloc() and free(), but you |
446 |
can replace the pointers so that your own functions are used. |
can replace the pointers so that your own functions are used. |
447 |
|
|
448 |
Separate functions are provided rather than using pcre_malloc and |
Separate functions are provided rather than using pcre_malloc and |
449 |
pcre_free because the usage is very predictable: the block sizes |
pcre_free because the usage is very predictable: the block sizes |
450 |
requested are always the same, and the blocks are always freed in |
requested are always the same, and the blocks are always freed in |
451 |
reverse order. A calling program might be able to implement optimized |
reverse order. A calling program might be able to implement optimized |
452 |
functions that perform better than malloc() and free(). PCRE runs |
functions that perform better than malloc() and free(). PCRE runs |
453 |
noticeably more slowly when built in this way. This option affects only |
noticeably more slowly when built in this way. This option affects only |
454 |
the pcre_exec() function; it is not relevant for the the |
the pcre_exec() function; it is not relevant for the the |
455 |
pcre_dfa_exec() function. |
pcre_dfa_exec() function. |
456 |
|
|
457 |
|
|
458 |
LIMITING PCRE RESOURCE USAGE |
LIMITING PCRE RESOURCE USAGE |
459 |
|
|
460 |
Internally, PCRE has a function called match(), which it calls repeat- |
Internally, PCRE has a function called match(), which it calls repeat- |
461 |
edly (sometimes recursively) when matching a pattern with the |
edly (sometimes recursively) when matching a pattern with the |
462 |
pcre_exec() function. By controlling the maximum number of times this |
pcre_exec() function. By controlling the maximum number of times this |
463 |
function may be called during a single matching operation, a limit can |
function may be called during a single matching operation, a limit can |
464 |
be placed on the resources used by a single call to pcre_exec(). The |
be placed on the resources used by a single call to pcre_exec(). The |
465 |
limit can be changed at run time, as described in the pcreapi documen- |
limit can be changed at run time, as described in the pcreapi documen- |
466 |
tation. The default is 10 million, but this can be changed by adding a |
tation. The default is 10 million, but this can be changed by adding a |
467 |
setting such as |
setting such as |
468 |
|
|
469 |
--with-match-limit=500000 |
--with-match-limit=500000 |
470 |
|
|
471 |
to the configure command. This setting has no effect on the |
to the configure command. This setting has no effect on the |
472 |
pcre_dfa_exec() matching function. |
pcre_dfa_exec() matching function. |
473 |
|
|
474 |
In some environments it is desirable to limit the depth of recursive |
In some environments it is desirable to limit the depth of recursive |
475 |
calls of match() more strictly than the total number of calls, in order |
calls of match() more strictly than the total number of calls, in order |
476 |
to restrict the maximum amount of stack (or heap, if --disable-stack- |
to restrict the maximum amount of stack (or heap, if --disable-stack- |
477 |
for-recursion is specified) that is used. A second limit controls this; |
for-recursion is specified) that is used. A second limit controls this; |
478 |
it defaults to the value that is set for --with-match-limit, which |
it defaults to the value that is set for --with-match-limit, which |
479 |
imposes no additional constraints. However, you can set a lower limit |
imposes no additional constraints. However, you can set a lower limit |
480 |
by adding, for example, |
by adding, for example, |
481 |
|
|
482 |
--with-match-limit-recursion=10000 |
--with-match-limit-recursion=10000 |
483 |
|
|
484 |
to the configure command. This value can also be overridden at run |
to the configure command. This value can also be overridden at run |
485 |
time. |
time. |
486 |
|
|
487 |
|
|
488 |
CREATING CHARACTER TABLES AT BUILD TIME |
CREATING CHARACTER TABLES AT BUILD TIME |
489 |
|
|
490 |
PCRE uses fixed tables for processing characters whose code values are |
PCRE uses fixed tables for processing characters whose code values are |
491 |
less than 256. By default, PCRE is built with a set of tables that are |
less than 256. By default, PCRE is built with a set of tables that are |
492 |
distributed in the file pcre_chartables.c.dist. These tables are for |
distributed in the file pcre_chartables.c.dist. These tables are for |
493 |
ASCII codes only. If you add |
ASCII codes only. If you add |
494 |
|
|
495 |
--enable-rebuild-chartables |
--enable-rebuild-chartables |
496 |
|
|
497 |
to the configure command, the distributed tables are no longer used. |
to the configure command, the distributed tables are no longer used. |
498 |
Instead, a program called dftables is compiled and run. This outputs |
Instead, a program called dftables is compiled and run. This outputs |
499 |
the source for new set of tables, created in the default locale of your |
the source for new set of tables, created in the default locale of your |
500 |
C runtime system. (This method of replacing the tables does not work if |
C runtime system. (This method of replacing the tables does not work if |
501 |
you are cross compiling, because dftables is run on the local host. If |
you are cross compiling, because dftables is run on the local host. If |
502 |
you need to create alternative tables when cross compiling, you will |
you need to create alternative tables when cross compiling, you will |
503 |
have to do so "by hand".) |
have to do so "by hand".) |
504 |
|
|
505 |
|
|
506 |
USING EBCDIC CODE |
USING EBCDIC CODE |
507 |
|
|
508 |
PCRE assumes by default that it will run in an environment where the |
PCRE assumes by default that it will run in an environment where the |
509 |
character code is ASCII (or Unicode, which is a superset of ASCII). |
character code is ASCII (or Unicode, which is a superset of ASCII). |
510 |
This is the case for most computer operating systems. PCRE can, how- |
This is the case for most computer operating systems. PCRE can, how- |
511 |
ever, be compiled to run in an EBCDIC environment by adding |
ever, be compiled to run in an EBCDIC environment by adding |
512 |
|
|
513 |
--enable-ebcdic |
--enable-ebcdic |
514 |
|
|
515 |
to the configure command. This setting implies --enable-rebuild-charta- |
to the configure command. This setting implies --enable-rebuild-charta- |
516 |
bles. You should only use it if you know that you are in an EBCDIC |
bles. You should only use it if you know that you are in an EBCDIC |
517 |
environment (for example, an IBM mainframe operating system). |
environment (for example, an IBM mainframe operating system). |
518 |
|
|
519 |
|
|
531 |
|
|
532 |
REVISION |
REVISION |
533 |
|
|
534 |
Last updated: 11 September 2007 |
Last updated: 21 September 2007 |
535 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
536 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
537 |
|
|
5166 |
|
|
5167 |
NEWLINE CONVENTIONS |
NEWLINE CONVENTIONS |
5168 |
|
|
5169 |
These are recognized only at the very start of a pattern. |
These are recognized only at the very start of the pattern or after a |
5170 |
|
(*BSR_...) option. |
5171 |
|
|
5172 |
(*CR) |
(*CR) |
5173 |
(*LF) |
(*LF) |
5178 |
|
|
5179 |
WHAT \R MATCHES |
WHAT \R MATCHES |
5180 |
|
|
5181 |
These are recognized only at the very start of a pattern. |
These are recognized only at the very start of the pattern or after a |
5182 |
|
(*...) option that sets the newline convention. |
5183 |
|
|
5184 |
(*BSR_ANYCRLF) |
(*BSR_ANYCRLF) |
5185 |
(*BSR_UNICODE) |
(*BSR_UNICODE) |
5205 |
|
|
5206 |
REVISION |
REVISION |
5207 |
|
|
5208 |
Last updated: 11 September 2007 |
Last updated: 21 September 2007 |
5209 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
5210 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5211 |
|
|