2 |
This file contains a concatenation of the PCRE man pages, converted to plain |
This file contains a concatenation of the PCRE man pages, converted to plain |
3 |
text format for ease of searching with a text editor, or for use on systems |
text format for ease of searching with a text editor, or for use on systems |
4 |
that do not have a man page processor. The small individual files that give |
that do not have a man page processor. The small individual files that give |
5 |
synopses of each function in the library have not been included. There are |
synopses of each function in the library have not been included. Neither has |
6 |
separate text files for the pcregrep and pcretest commands. |
the pcredemo program. There are separate text files for the pcregrep and |
7 |
|
pcretest commands. |
8 |
----------------------------------------------------------------------------- |
----------------------------------------------------------------------------- |
9 |
|
|
10 |
|
|
25 |
tax items, and there is an option for requesting some minor changes |
tax items, and there is an option for requesting some minor changes |
26 |
that give better JavaScript compatibility. |
that give better JavaScript compatibility. |
27 |
|
|
28 |
The current implementation of PCRE (release 7.x) corresponds approxi- |
The current implementation of PCRE (release 8.xx) corresponds approxi- |
29 |
mately with Perl 5.10, including support for UTF-8 encoded strings and |
mately with Perl 5.10, including support for UTF-8 encoded strings and |
30 |
Unicode general category properties. However, UTF-8 and Unicode support |
Unicode general category properties. However, UTF-8 and Unicode support |
31 |
has to be explicitly enabled; it is not the default. The Unicode tables |
has to be explicitly enabled; it is not the default. The Unicode tables |
32 |
correspond to Unicode release 5.0.0. |
correspond to Unicode release 5.1. |
33 |
|
|
34 |
In addition to the Perl-compatible matching function, PCRE contains an |
In addition to the Perl-compatible matching function, PCRE contains an |
35 |
alternative matching function that matches the same compiled patterns |
alternative matching function that matches the same compiled patterns |
72 |
The user documentation for PCRE comprises a number of different sec- |
The user documentation for PCRE comprises a number of different sec- |
73 |
tions. In the "man" format, each of these is a separate "man page". In |
tions. In the "man" format, each of these is a separate "man page". In |
74 |
the HTML format, each is a separate page, linked from the index page. |
the HTML format, each is a separate page, linked from the index page. |
75 |
In the plain text format, all the sections are concatenated, for ease |
In the plain text format, all the sections, except the pcredemo sec- |
76 |
of searching. The sections are as follows: |
tion, are concatenated, for ease of searching. The sections are as fol- |
77 |
|
lows: |
78 |
|
|
79 |
pcre this document |
pcre this document |
80 |
pcre-config show PCRE installation configuration information |
pcre-config show PCRE installation configuration information |
83 |
pcrecallout details of the callout feature |
pcrecallout details of the callout feature |
84 |
pcrecompat discussion of Perl compatibility |
pcrecompat discussion of Perl compatibility |
85 |
pcrecpp details of the C++ wrapper |
pcrecpp details of the C++ wrapper |
86 |
|
pcredemo a demonstration C program that uses PCRE |
87 |
pcregrep description of the pcregrep command |
pcregrep description of the pcregrep command |
88 |
pcrematching discussion of the two matching algorithms |
pcrematching discussion of the two matching algorithms |
89 |
pcrepartial details of the partial matching facility |
pcrepartial details of the partial matching facility |
93 |
pcreperform discussion of performance issues |
pcreperform discussion of performance issues |
94 |
pcreposix the POSIX-compatible C API |
pcreposix the POSIX-compatible C API |
95 |
pcreprecompile details of saving and re-using precompiled patterns |
pcreprecompile details of saving and re-using precompiled patterns |
96 |
pcresample discussion of the sample program |
pcresample discussion of the pcredemo program |
97 |
pcrestack discussion of stack usage |
pcrestack discussion of stack usage |
98 |
pcretest description of the pcretest testing command |
pcretest description of the pcretest testing command |
99 |
|
|
100 |
In addition, in the "man" and HTML formats, there is a short page for |
In addition, in the "man" and HTML formats, there is a short page for |
101 |
each C library function, listing its arguments and results. |
each C library function, listing its arguments and results. |
102 |
|
|
103 |
|
|
104 |
LIMITATIONS |
LIMITATIONS |
105 |
|
|
106 |
There are some size limitations in PCRE but it is hoped that they will |
There are some size limitations in PCRE but it is hoped that they will |
107 |
never in practice be relevant. |
never in practice be relevant. |
108 |
|
|
109 |
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE |
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE |
110 |
is compiled with the default internal linkage size of 2. If you want to |
is compiled with the default internal linkage size of 2. If you want to |
111 |
process regular expressions that are truly enormous, you can compile |
process regular expressions that are truly enormous, you can compile |
112 |
PCRE with an internal linkage size of 3 or 4 (see the README file in |
PCRE with an internal linkage size of 3 or 4 (see the README file in |
113 |
the source distribution and the pcrebuild documentation for details). |
the source distribution and the pcrebuild documentation for details). |
114 |
In these cases the limit is substantially larger. However, the speed |
In these cases the limit is substantially larger. However, the speed |
115 |
of execution is slower. |
of execution is slower. |
116 |
|
|
117 |
All values in repeating quantifiers must be less than 65536. |
All values in repeating quantifiers must be less than 65536. |
122 |
The maximum length of name for a named subpattern is 32 characters, and |
The maximum length of name for a named subpattern is 32 characters, and |
123 |
the maximum number of named subpatterns is 10000. |
the maximum number of named subpatterns is 10000. |
124 |
|
|
125 |
The maximum length of a subject string is the largest positive number |
The maximum length of a subject string is the largest positive number |
126 |
that an integer variable can hold. However, when using the traditional |
that an integer variable can hold. However, when using the traditional |
127 |
matching function, PCRE uses recursion to handle subpatterns and indef- |
matching function, PCRE uses recursion to handle subpatterns and indef- |
128 |
inite repetition. This means that the available stack space may limit |
inite repetition. This means that the available stack space may limit |
129 |
the size of a subject string that can be processed by certain patterns. |
the size of a subject string that can be processed by certain patterns. |
130 |
For a discussion of stack issues, see the pcrestack documentation. |
For a discussion of stack issues, see the pcrestack documentation. |
131 |
|
|
132 |
|
|
133 |
UTF-8 AND UNICODE PROPERTY SUPPORT |
UTF-8 AND UNICODE PROPERTY SUPPORT |
134 |
|
|
135 |
From release 3.3, PCRE has had some support for character strings |
From release 3.3, PCRE has had some support for character strings |
136 |
encoded in the UTF-8 format. For release 4.0 this was greatly extended |
encoded in the UTF-8 format. For release 4.0 this was greatly extended |
137 |
to cover most common requirements, and in release 5.0 additional sup- |
to cover most common requirements, and in release 5.0 additional sup- |
138 |
port for Unicode general category properties was added. |
port for Unicode general category properties was added. |
139 |
|
|
140 |
In order process UTF-8 strings, you must build PCRE to include UTF-8 |
In order process UTF-8 strings, you must build PCRE to include UTF-8 |
141 |
support in the code, and, in addition, you must call pcre_compile() |
support in the code, and, in addition, you must call pcre_compile() |
142 |
with the PCRE_UTF8 option flag. When you do this, both the pattern and |
with the PCRE_UTF8 option flag, or the pattern must start with the |
143 |
any subject strings that are matched against it are treated as UTF-8 |
sequence (*UTF8). When either of these is the case, both the pattern |
144 |
strings instead of just strings of bytes. |
and any subject strings that are matched against it are treated as |
145 |
|
UTF-8 strings instead of just strings of bytes. |
146 |
|
|
147 |
If you compile PCRE with UTF-8 support, but do not use it at run time, |
If you compile PCRE with UTF-8 support, but do not use it at run time, |
148 |
the library will be a bit bigger, but the additional run time overhead |
the library will be a bit bigger, but the additional run time overhead |
263 |
|
|
264 |
REVISION |
REVISION |
265 |
|
|
266 |
Last updated: 18 March 2009 |
Last updated: 01 September 2009 |
267 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
268 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
269 |
|
|
270 |
|
|
271 |
PCREBUILD(3) PCREBUILD(3) |
PCREBUILD(3) PCREBUILD(3) |
272 |
|
|
273 |
|
|
282 |
script, where the optional features are selected or deselected by pro- |
script, where the optional features are selected or deselected by pro- |
283 |
viding options to configure before running the make command. However, |
viding options to configure before running the make command. However, |
284 |
the same options can be selected in both Unix-like and non-Unix-like |
the same options can be selected in both Unix-like and non-Unix-like |
285 |
environments using the GUI facility of CMakeSetup if you are using |
environments using the GUI facility of cmake-gui if you are using CMake |
286 |
CMake instead of configure to build PCRE. |
instead of configure to build PCRE. |
287 |
|
|
288 |
|
There is a lot more information about building PCRE in non-Unix-like |
289 |
|
environments in the file called NON_UNIX_USE, which is part of the PCRE |
290 |
|
distribution. You should consult this file as well as the README file |
291 |
|
if you are building in a non-Unix-like environment. |
292 |
|
|
293 |
The complete list of options for configure (which includes the standard |
The complete list of options for configure (which includes the standard |
294 |
ones such as the selection of the installation directory) can be |
ones such as the selection of the installation directory) can be |
295 |
obtained by running |
obtained by running |
296 |
|
|
297 |
./configure --help |
./configure --help |
298 |
|
|
299 |
The following sections include descriptions of options whose names |
The following sections include descriptions of options whose names |
300 |
begin with --enable or --disable. These settings specify changes to the |
begin with --enable or --disable. These settings specify changes to the |
301 |
defaults for the configure command. Because of the way that configure |
defaults for the configure command. Because of the way that configure |
302 |
works, --enable and --disable always come in pairs, so the complemen- |
works, --enable and --disable always come in pairs, so the complemen- |
303 |
tary option always exists as well, but as it specifies the default, it |
tary option always exists as well, but as it specifies the default, it |
304 |
is not described. |
is not described. |
305 |
|
|
306 |
|
|
321 |
|
|
322 |
--enable-utf8 |
--enable-utf8 |
323 |
|
|
324 |
to the configure command. Of itself, this does not make PCRE treat |
to the configure command. Of itself, this does not make PCRE treat |
325 |
strings as UTF-8. As well as compiling PCRE with this option, you also |
strings as UTF-8. As well as compiling PCRE with this option, you also |
326 |
have have to set the PCRE_UTF8 option when you call the pcre_compile() |
have have to set the PCRE_UTF8 option when you call the pcre_compile() |
327 |
function. |
function. |
328 |
|
|
329 |
If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE |
If you set --enable-utf8 when compiling in an EBCDIC environment, PCRE |
330 |
expects its input to be either ASCII or UTF-8 (depending on the runtime |
expects its input to be either ASCII or UTF-8 (depending on the runtime |
331 |
option). It is not possible to support both EBCDIC and UTF-8 codes in |
option). It is not possible to support both EBCDIC and UTF-8 codes in |
332 |
the same version of the library. Consequently, --enable-utf8 and |
the same version of the library. Consequently, --enable-utf8 and |
333 |
--enable-ebcdic are mutually exclusive. |
--enable-ebcdic are mutually exclusive. |
334 |
|
|
335 |
|
|
336 |
UNICODE CHARACTER PROPERTY SUPPORT |
UNICODE CHARACTER PROPERTY SUPPORT |
337 |
|
|
338 |
UTF-8 support allows PCRE to process character values greater than 255 |
UTF-8 support allows PCRE to process character values greater than 255 |
339 |
in the strings that it handles. On its own, however, it does not pro- |
in the strings that it handles. On its own, however, it does not pro- |
340 |
vide any facilities for accessing the properties of such characters. If |
vide any facilities for accessing the properties of such characters. If |
341 |
you want to be able to use the pattern escapes \P, \p, and \X, which |
you want to be able to use the pattern escapes \P, \p, and \X, which |
342 |
refer to Unicode character properties, you must add |
refer to Unicode character properties, you must add |
343 |
|
|
344 |
--enable-unicode-properties |
--enable-unicode-properties |
345 |
|
|
346 |
to the configure command. This implies UTF-8 support, even if you have |
to the configure command. This implies UTF-8 support, even if you have |
347 |
not explicitly requested it. |
not explicitly requested it. |
348 |
|
|
349 |
Including Unicode property support adds around 30K of tables to the |
Including Unicode property support adds around 30K of tables to the |
350 |
PCRE library. Only the general category properties such as Lu and Nd |
PCRE library. Only the general category properties such as Lu and Nd |
351 |
are supported. Details are given in the pcrepattern documentation. |
are supported. Details are given in the pcrepattern documentation. |
352 |
|
|
353 |
|
|
354 |
CODE VALUE OF NEWLINE |
CODE VALUE OF NEWLINE |
355 |
|
|
356 |
By default, PCRE interprets the linefeed (LF) character as indicating |
By default, PCRE interprets the linefeed (LF) character as indicating |
357 |
the end of a line. This is the normal newline character on Unix-like |
the end of a line. This is the normal newline character on Unix-like |
358 |
systems. You can compile PCRE to use carriage return (CR) instead, by |
systems. You can compile PCRE to use carriage return (CR) instead, by |
359 |
adding |
adding |
360 |
|
|
361 |
--enable-newline-is-cr |
--enable-newline-is-cr |
362 |
|
|
363 |
to the configure command. There is also a --enable-newline-is-lf |
to the configure command. There is also a --enable-newline-is-lf |
364 |
option, which explicitly specifies linefeed as the newline character. |
option, which explicitly specifies linefeed as the newline character. |
365 |
|
|
366 |
Alternatively, you can specify that line endings are to be indicated by |
Alternatively, you can specify that line endings are to be indicated by |
372 |
|
|
373 |
--enable-newline-is-anycrlf |
--enable-newline-is-anycrlf |
374 |
|
|
375 |
which causes PCRE to recognize any of the three sequences CR, LF, or |
which causes PCRE to recognize any of the three sequences CR, LF, or |
376 |
CRLF as indicating a line ending. Finally, a fifth option, specified by |
CRLF as indicating a line ending. Finally, a fifth option, specified by |
377 |
|
|
378 |
--enable-newline-is-any |
--enable-newline-is-any |
379 |
|
|
380 |
causes PCRE to recognize any Unicode newline sequence. |
causes PCRE to recognize any Unicode newline sequence. |
381 |
|
|
382 |
Whatever line ending convention is selected when PCRE is built can be |
Whatever line ending convention is selected when PCRE is built can be |
383 |
overridden when the library functions are called. At build time it is |
overridden when the library functions are called. At build time it is |
384 |
conventional to use the standard for your operating system. |
conventional to use the standard for your operating system. |
385 |
|
|
386 |
|
|
387 |
WHAT \R MATCHES |
WHAT \R MATCHES |
388 |
|
|
389 |
By default, the sequence \R in a pattern matches any Unicode newline |
By default, the sequence \R in a pattern matches any Unicode newline |
390 |
sequence, whatever has been selected as the line ending sequence. If |
sequence, whatever has been selected as the line ending sequence. If |
391 |
you specify |
you specify |
392 |
|
|
393 |
--enable-bsr-anycrlf |
--enable-bsr-anycrlf |
394 |
|
|
395 |
the default is changed so that \R matches only CR, LF, or CRLF. What- |
the default is changed so that \R matches only CR, LF, or CRLF. What- |
396 |
ever is selected when PCRE is built can be overridden when the library |
ever is selected when PCRE is built can be overridden when the library |
397 |
functions are called. |
functions are called. |
398 |
|
|
399 |
|
|
400 |
BUILDING SHARED AND STATIC LIBRARIES |
BUILDING SHARED AND STATIC LIBRARIES |
401 |
|
|
402 |
The PCRE building process uses libtool to build both shared and static |
The PCRE building process uses libtool to build both shared and static |
403 |
Unix libraries by default. You can suppress one of these by adding one |
Unix libraries by default. You can suppress one of these by adding one |
404 |
of |
of |
405 |
|
|
406 |
--disable-shared |
--disable-shared |
412 |
POSIX MALLOC USAGE |
POSIX MALLOC USAGE |
413 |
|
|
414 |
When PCRE is called through the POSIX interface (see the pcreposix doc- |
When PCRE is called through the POSIX interface (see the pcreposix doc- |
415 |
umentation), additional working storage is required for holding the |
umentation), additional working storage is required for holding the |
416 |
pointers to capturing substrings, because PCRE requires three integers |
pointers to capturing substrings, because PCRE requires three integers |
417 |
per substring, whereas the POSIX interface provides only two. If the |
per substring, whereas the POSIX interface provides only two. If the |
418 |
number of expected substrings is small, the wrapper function uses space |
number of expected substrings is small, the wrapper function uses space |
419 |
on the stack, because this is faster than using malloc() for each call. |
on the stack, because this is faster than using malloc() for each call. |
420 |
The default threshold above which the stack is no longer used is 10; it |
The default threshold above which the stack is no longer used is 10; it |
427 |
|
|
428 |
HANDLING VERY LARGE PATTERNS |
HANDLING VERY LARGE PATTERNS |
429 |
|
|
430 |
Within a compiled pattern, offset values are used to point from one |
Within a compiled pattern, offset values are used to point from one |
431 |
part to another (for example, from an opening parenthesis to an alter- |
part to another (for example, from an opening parenthesis to an alter- |
432 |
nation metacharacter). By default, two-byte values are used for these |
nation metacharacter). By default, two-byte values are used for these |
433 |
offsets, leading to a maximum size for a compiled pattern of around |
offsets, leading to a maximum size for a compiled pattern of around |
434 |
64K. This is sufficient to handle all but the most gigantic patterns. |
64K. This is sufficient to handle all but the most gigantic patterns. |
435 |
Nevertheless, some people do want to process enormous patterns, so it |
Nevertheless, some people do want to process enormous patterns, so it |
436 |
is possible to compile PCRE to use three-byte or four-byte offsets by |
is possible to compile PCRE to use three-byte or four-byte offsets by |
437 |
adding a setting such as |
adding a setting such as |
438 |
|
|
439 |
--with-link-size=3 |
--with-link-size=3 |
440 |
|
|
441 |
to the configure command. The value given must be 2, 3, or 4. Using |
to the configure command. The value given must be 2, 3, or 4. Using |
442 |
longer offsets slows down the operation of PCRE because it has to load |
longer offsets slows down the operation of PCRE because it has to load |
443 |
additional bytes when handling them. |
additional bytes when handling them. |
444 |
|
|
445 |
|
|
446 |
AVOIDING EXCESSIVE STACK USAGE |
AVOIDING EXCESSIVE STACK USAGE |
447 |
|
|
448 |
When matching with the pcre_exec() function, PCRE implements backtrack- |
When matching with the pcre_exec() function, PCRE implements backtrack- |
449 |
ing by making recursive calls to an internal function called match(). |
ing by making recursive calls to an internal function called match(). |
450 |
In environments where the size of the stack is limited, this can se- |
In environments where the size of the stack is limited, this can se- |
451 |
verely limit PCRE's operation. (The Unix environment does not usually |
verely limit PCRE's operation. (The Unix environment does not usually |
452 |
suffer from this problem, but it may sometimes be necessary to increase |
suffer from this problem, but it may sometimes be necessary to increase |
453 |
the maximum stack size. There is a discussion in the pcrestack docu- |
the maximum stack size. There is a discussion in the pcrestack docu- |
454 |
mentation.) An alternative approach to recursion that uses memory from |
mentation.) An alternative approach to recursion that uses memory from |
455 |
the heap to remember data, instead of using recursive function calls, |
the heap to remember data, instead of using recursive function calls, |
456 |
has been implemented to work round the problem of limited stack size. |
has been implemented to work round the problem of limited stack size. |
457 |
If you want to build a version of PCRE that works this way, add |
If you want to build a version of PCRE that works this way, add |
458 |
|
|
459 |
--disable-stack-for-recursion |
--disable-stack-for-recursion |
460 |
|
|
461 |
to the configure command. With this configuration, PCRE will use the |
to the configure command. With this configuration, PCRE will use the |
462 |
pcre_stack_malloc and pcre_stack_free variables to call memory manage- |
pcre_stack_malloc and pcre_stack_free variables to call memory manage- |
463 |
ment functions. By default these point to malloc() and free(), but you |
ment functions. By default these point to malloc() and free(), but you |
464 |
can replace the pointers so that your own functions are used. |
can replace the pointers so that your own functions are used. |
465 |
|
|
466 |
Separate functions are provided rather than using pcre_malloc and |
Separate functions are provided rather than using pcre_malloc and |
467 |
pcre_free because the usage is very predictable: the block sizes |
pcre_free because the usage is very predictable: the block sizes |
468 |
requested are always the same, and the blocks are always freed in |
requested are always the same, and the blocks are always freed in |
469 |
reverse order. A calling program might be able to implement optimized |
reverse order. A calling program might be able to implement optimized |
470 |
functions that perform better than malloc() and free(). PCRE runs |
functions that perform better than malloc() and free(). PCRE runs |
471 |
noticeably more slowly when built in this way. This option affects only |
noticeably more slowly when built in this way. This option affects only |
472 |
the pcre_exec() function; it is not relevant for the the |
the pcre_exec() function; it is not relevant for the the |
473 |
pcre_dfa_exec() function. |
pcre_dfa_exec() function. |
474 |
|
|
475 |
|
|
476 |
LIMITING PCRE RESOURCE USAGE |
LIMITING PCRE RESOURCE USAGE |
477 |
|
|
478 |
Internally, PCRE has a function called match(), which it calls repeat- |
Internally, PCRE has a function called match(), which it calls repeat- |
479 |
edly (sometimes recursively) when matching a pattern with the |
edly (sometimes recursively) when matching a pattern with the |
480 |
pcre_exec() function. By controlling the maximum number of times this |
pcre_exec() function. By controlling the maximum number of times this |
481 |
function may be called during a single matching operation, a limit can |
function may be called during a single matching operation, a limit can |
482 |
be placed on the resources used by a single call to pcre_exec(). The |
be placed on the resources used by a single call to pcre_exec(). The |
483 |
limit can be changed at run time, as described in the pcreapi documen- |
limit can be changed at run time, as described in the pcreapi documen- |
484 |
tation. The default is 10 million, but this can be changed by adding a |
tation. The default is 10 million, but this can be changed by adding a |
485 |
setting such as |
setting such as |
486 |
|
|
487 |
--with-match-limit=500000 |
--with-match-limit=500000 |
488 |
|
|
489 |
to the configure command. This setting has no effect on the |
to the configure command. This setting has no effect on the |
490 |
pcre_dfa_exec() matching function. |
pcre_dfa_exec() matching function. |
491 |
|
|
492 |
In some environments it is desirable to limit the depth of recursive |
In some environments it is desirable to limit the depth of recursive |
493 |
calls of match() more strictly than the total number of calls, in order |
calls of match() more strictly than the total number of calls, in order |
494 |
to restrict the maximum amount of stack (or heap, if --disable-stack- |
to restrict the maximum amount of stack (or heap, if --disable-stack- |
495 |
for-recursion is specified) that is used. A second limit controls this; |
for-recursion is specified) that is used. A second limit controls this; |
496 |
it defaults to the value that is set for --with-match-limit, which |
it defaults to the value that is set for --with-match-limit, which |
497 |
imposes no additional constraints. However, you can set a lower limit |
imposes no additional constraints. However, you can set a lower limit |
498 |
by adding, for example, |
by adding, for example, |
499 |
|
|
500 |
--with-match-limit-recursion=10000 |
--with-match-limit-recursion=10000 |
501 |
|
|
502 |
to the configure command. This value can also be overridden at run |
to the configure command. This value can also be overridden at run |
503 |
time. |
time. |
504 |
|
|
505 |
|
|
506 |
CREATING CHARACTER TABLES AT BUILD TIME |
CREATING CHARACTER TABLES AT BUILD TIME |
507 |
|
|
508 |
PCRE uses fixed tables for processing characters whose code values are |
PCRE uses fixed tables for processing characters whose code values are |
509 |
less than 256. By default, PCRE is built with a set of tables that are |
less than 256. By default, PCRE is built with a set of tables that are |
510 |
distributed in the file pcre_chartables.c.dist. These tables are for |
distributed in the file pcre_chartables.c.dist. These tables are for |
511 |
ASCII codes only. If you add |
ASCII codes only. If you add |
512 |
|
|
513 |
--enable-rebuild-chartables |
--enable-rebuild-chartables |
514 |
|
|
515 |
to the configure command, the distributed tables are no longer used. |
to the configure command, the distributed tables are no longer used. |
516 |
Instead, a program called dftables is compiled and run. This outputs |
Instead, a program called dftables is compiled and run. This outputs |
517 |
the source for new set of tables, created in the default locale of your |
the source for new set of tables, created in the default locale of your |
518 |
C runtime system. (This method of replacing the tables does not work if |
C runtime system. (This method of replacing the tables does not work if |
519 |
you are cross compiling, because dftables is run on the local host. If |
you are cross compiling, because dftables is run on the local host. If |
520 |
you need to create alternative tables when cross compiling, you will |
you need to create alternative tables when cross compiling, you will |
521 |
have to do so "by hand".) |
have to do so "by hand".) |
522 |
|
|
523 |
|
|
524 |
USING EBCDIC CODE |
USING EBCDIC CODE |
525 |
|
|
526 |
PCRE assumes by default that it will run in an environment where the |
PCRE assumes by default that it will run in an environment where the |
527 |
character code is ASCII (or Unicode, which is a superset of ASCII). |
character code is ASCII (or Unicode, which is a superset of ASCII). |
528 |
This is the case for most computer operating systems. PCRE can, how- |
This is the case for most computer operating systems. PCRE can, how- |
529 |
ever, be compiled to run in an EBCDIC environment by adding |
ever, be compiled to run in an EBCDIC environment by adding |
530 |
|
|
531 |
--enable-ebcdic |
--enable-ebcdic |
532 |
|
|
533 |
to the configure command. This setting implies --enable-rebuild-charta- |
to the configure command. This setting implies --enable-rebuild-charta- |
534 |
bles. You should only use it if you know that you are in an EBCDIC |
bles. You should only use it if you know that you are in an EBCDIC |
535 |
environment (for example, an IBM mainframe operating system). The |
environment (for example, an IBM mainframe operating system). The |
536 |
--enable-ebcdic option is incompatible with --enable-utf8. |
--enable-ebcdic option is incompatible with --enable-utf8. |
537 |
|
|
538 |
|
|
546 |
--enable-pcregrep-libbz2 |
--enable-pcregrep-libbz2 |
547 |
|
|
548 |
to the configure command. These options naturally require that the rel- |
to the configure command. These options naturally require that the rel- |
549 |
evant libraries are installed on your system. Configuration will fail |
evant libraries are installed on your system. Configuration will fail |
550 |
if they are not. |
if they are not. |
551 |
|
|
552 |
|
|
556 |
|
|
557 |
--enable-pcretest-libreadline |
--enable-pcretest-libreadline |
558 |
|
|
559 |
to the configure command, pcretest is linked with the libreadline |
to the configure command, pcretest is linked with the libreadline |
560 |
library, and when its input is from a terminal, it reads it using the |
library, and when its input is from a terminal, it reads it using the |
561 |
readline() function. This provides line-editing and history facilities. |
readline() function. This provides line-editing and history facilities. |
562 |
Note that libreadline is GPL-licenced, so if you distribute a binary of |
Note that libreadline is GPL-licenced, so if you distribute a binary of |
563 |
pcretest linked in this way, there may be licensing issues. |
pcretest linked in this way, there may be licensing issues. |
564 |
|
|
565 |
Setting this option causes the -lreadline option to be added to the |
Setting this option causes the -lreadline option to be added to the |
566 |
pcretest build. In many operating environments with a sytem-installed |
pcretest build. In many operating environments with a sytem-installed |
567 |
libreadline this is sufficient. However, in some environments (e.g. if |
libreadline this is sufficient. However, in some environments (e.g. if |
568 |
an unmodified distribution version of readline is in use), some extra |
an unmodified distribution version of readline is in use), some extra |
569 |
configuration may be necessary. The INSTALL file for libreadline says |
configuration may be necessary. The INSTALL file for libreadline says |
570 |
this: |
this: |
571 |
|
|
572 |
"Readline uses the termcap functions, but does not link with the |
"Readline uses the termcap functions, but does not link with the |
573 |
termcap or curses library itself, allowing applications which link |
termcap or curses library itself, allowing applications which link |
574 |
with readline the to choose an appropriate library." |
with readline the to choose an appropriate library." |
575 |
|
|
576 |
If your environment has not been set up so that an appropriate library |
If your environment has not been set up so that an appropriate library |
577 |
is automatically included, you may need to add something like |
is automatically included, you may need to add something like |
578 |
|
|
579 |
LIBS="-ncurses" |
LIBS="-ncurses" |
595 |
|
|
596 |
REVISION |
REVISION |
597 |
|
|
598 |
Last updated: 17 March 2009 |
Last updated: 06 September 2009 |
599 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
600 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
601 |
|
|
602 |
|
|
603 |
PCREMATCHING(3) PCREMATCHING(3) |
PCREMATCHING(3) PCREMATCHING(3) |
604 |
|
|
605 |
|
|
701 |
at the fourth character of the subject. The algorithm does not automat- |
at the fourth character of the subject. The algorithm does not automat- |
702 |
ically move on to find matches that start at later positions. |
ically move on to find matches that start at later positions. |
703 |
|
|
704 |
|
Although the general principle of this matching algorithm is that it |
705 |
|
scans the subject string only once, without backtracking, there is one |
706 |
|
exception: when a lookbehind assertion is encountered, the preceding |
707 |
|
characters have to be re-inspected. |
708 |
|
|
709 |
There are a number of features of PCRE regular expressions that are not |
There are a number of features of PCRE regular expressions that are not |
710 |
supported by the alternative matching algorithm. They are as follows: |
supported by the alternative matching algorithm. They are as follows: |
711 |
|
|
712 |
1. Because the algorithm finds all possible matches, the greedy or |
1. Because the algorithm finds all possible matches, the greedy or |
713 |
ungreedy nature of repetition quantifiers is not relevant. Greedy and |
ungreedy nature of repetition quantifiers is not relevant. Greedy and |
714 |
ungreedy quantifiers are treated in exactly the same way. However, pos- |
ungreedy quantifiers are treated in exactly the same way. However, pos- |
715 |
sessive quantifiers can make a difference when what follows could also |
sessive quantifiers can make a difference when what follows could also |
716 |
match what is quantified, for example in a pattern like this: |
match what is quantified, for example in a pattern like this: |
717 |
|
|
718 |
^a++\w! |
^a++\w! |
719 |
|
|
720 |
This pattern matches "aaab!" but not "aaa!", which would be matched by |
This pattern matches "aaab!" but not "aaa!", which would be matched by |
721 |
a non-possessive quantifier. Similarly, if an atomic group is present, |
a non-possessive quantifier. Similarly, if an atomic group is present, |
722 |
it is matched as if it were a standalone pattern at the current point, |
it is matched as if it were a standalone pattern at the current point, |
723 |
and the longest match is then "locked in" for the rest of the overall |
and the longest match is then "locked in" for the rest of the overall |
724 |
pattern. |
pattern. |
725 |
|
|
726 |
2. When dealing with multiple paths through the tree simultaneously, it |
2. When dealing with multiple paths through the tree simultaneously, it |
727 |
is not straightforward to keep track of captured substrings for the |
is not straightforward to keep track of captured substrings for the |
728 |
different matching possibilities, and PCRE's implementation of this |
different matching possibilities, and PCRE's implementation of this |
729 |
algorithm does not attempt to do this. This means that no captured sub- |
algorithm does not attempt to do this. This means that no captured sub- |
730 |
strings are available. |
strings are available. |
731 |
|
|
732 |
3. Because no substrings are captured, back references within the pat- |
3. Because no substrings are captured, back references within the pat- |
733 |
tern are not supported, and cause errors if encountered. |
tern are not supported, and cause errors if encountered. |
734 |
|
|
735 |
4. For the same reason, conditional expressions that use a backrefer- |
4. For the same reason, conditional expressions that use a backrefer- |
736 |
ence as the condition or test for a specific group recursion are not |
ence as the condition or test for a specific group recursion are not |
737 |
supported. |
supported. |
738 |
|
|
739 |
5. Because many paths through the tree may be active, the \K escape |
5. Because many paths through the tree may be active, the \K escape |
740 |
sequence, which resets the start of the match when encountered (but may |
sequence, which resets the start of the match when encountered (but may |
741 |
be on some paths and not on others), is not supported. It causes an |
be on some paths and not on others), is not supported. It causes an |
742 |
error if encountered. |
error if encountered. |
743 |
|
|
744 |
6. Callouts are supported, but the value of the capture_top field is |
6. Callouts are supported, but the value of the capture_top field is |
745 |
always 1, and the value of the capture_last field is always -1. |
always 1, and the value of the capture_last field is always -1. |
746 |
|
|
747 |
7. The \C escape sequence, which (in the standard algorithm) matches a |
7. The \C escape sequence, which (in the standard algorithm) matches a |
748 |
single byte, even in UTF-8 mode, is not supported because the alterna- |
single byte, even in UTF-8 mode, is not supported because the alterna- |
749 |
tive algorithm moves through the subject string one character at a |
tive algorithm moves through the subject string one character at a |
750 |
time, for all active paths through the tree. |
time, for all active paths through the tree. |
751 |
|
|
752 |
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) |
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) |
753 |
are not supported. (*FAIL) is supported, and behaves like a failing |
are not supported. (*FAIL) is supported, and behaves like a failing |
754 |
negative assertion. |
negative assertion. |
755 |
|
|
756 |
|
|
757 |
ADVANTAGES OF THE ALTERNATIVE ALGORITHM |
ADVANTAGES OF THE ALTERNATIVE ALGORITHM |
758 |
|
|
759 |
Using the alternative matching algorithm provides the following advan- |
Using the alternative matching algorithm provides the following advan- |
760 |
tages: |
tages: |
761 |
|
|
762 |
1. All possible matches (at a single point in the subject) are automat- |
1. All possible matches (at a single point in the subject) are automat- |
763 |
ically found, and in particular, the longest match is found. To find |
ically found, and in particular, the longest match is found. To find |
764 |
more than one match using the standard algorithm, you have to do kludgy |
more than one match using the standard algorithm, you have to do kludgy |
765 |
things with callouts. |
things with callouts. |
766 |
|
|
767 |
2. There is much better support for partial matching. The restrictions |
2. Because the alternative algorithm scans the subject string just |
768 |
on the content of the pattern that apply when using the standard algo- |
once, and never needs to backtrack, it is possible to pass very long |
769 |
rithm for partial matching do not apply to the alternative algorithm. |
subject strings to the matching function in several pieces, checking |
|
For non-anchored patterns, the starting position of a partial match is |
|
|
available. |
|
|
|
|
|
3. Because the alternative algorithm scans the subject string just |
|
|
once, and never needs to backtrack, it is possible to pass very long |
|
|
subject strings to the matching function in several pieces, checking |
|
770 |
for partial matching each time. |
for partial matching each time. |
771 |
|
|
772 |
|
|
774 |
|
|
775 |
The alternative algorithm suffers from a number of disadvantages: |
The alternative algorithm suffers from a number of disadvantages: |
776 |
|
|
777 |
1. It is substantially slower than the standard algorithm. This is |
1. It is substantially slower than the standard algorithm. This is |
778 |
partly because it has to search for all possible matches, but is also |
partly because it has to search for all possible matches, but is also |
779 |
because it is less susceptible to optimization. |
because it is less susceptible to optimization. |
780 |
|
|
781 |
2. Capturing parentheses and back references are not supported. |
2. Capturing parentheses and back references are not supported. |
793 |
|
|
794 |
REVISION |
REVISION |
795 |
|
|
796 |
Last updated: 19 April 2008 |
Last updated: 05 September 2009 |
797 |
Copyright (c) 1997-2008 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
798 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
799 |
|
|
800 |
|
|
801 |
PCREAPI(3) PCREAPI(3) |
PCREAPI(3) PCREAPI(3) |
802 |
|
|
803 |
|
|
905 |
pcre_exec() are used for compiling and matching regular expressions in |
pcre_exec() are used for compiling and matching regular expressions in |
906 |
a Perl-compatible manner. A sample program that demonstrates the sim- |
a Perl-compatible manner. A sample program that demonstrates the sim- |
907 |
plest way of using them is provided in the file called pcredemo.c in |
plest way of using them is provided in the file called pcredemo.c in |
908 |
the source distribution. The pcresample documentation describes how to |
the PCRE source distribution. A listing of this program is given in the |
909 |
compile and run it. |
pcredemo documentation, and the pcresample documentation describes how |
910 |
|
to compile and run it. |
911 |
|
|
912 |
A second matching function, pcre_dfa_exec(), which is not Perl-compati- |
A second matching function, pcre_dfa_exec(), which is not Perl-compati- |
913 |
ble, is also provided. This uses a different algorithm for the match- |
ble, is also provided. This uses a different algorithm for the match- |
914 |
ing. The alternative algorithm finds all possible matches (at a given |
ing. The alternative algorithm finds all possible matches (at a given |
915 |
point in the subject), and scans the subject just once. However, this |
point in the subject), and scans the subject just once (unless there |
916 |
algorithm does not return captured substrings. A description of the two |
are lookbehind assertions). However, this algorithm does not return |
917 |
matching algorithms and their advantages and disadvantages is given in |
captured substrings. A description of the two matching algorithms and |
918 |
the pcrematching documentation. |
their advantages and disadvantages is given in the pcrematching docu- |
919 |
|
mentation. |
920 |
|
|
921 |
In addition to the main compiling and matching functions, there are |
In addition to the main compiling and matching functions, there are |
922 |
convenience functions for extracting captured substrings from a subject |
convenience functions for extracting captured substrings from a subject |
1143 |
|
|
1144 |
The options argument contains various bit settings that affect the com- |
The options argument contains various bit settings that affect the com- |
1145 |
pilation. It should be zero if no options are required. The available |
pilation. It should be zero if no options are required. The available |
1146 |
options are described below. Some of them, in particular, those that |
options are described below. Some of them (in particular, those that |
1147 |
are compatible with Perl, can also be set and unset from within the |
are compatible with Perl, but also some others) can also be set and |
1148 |
pattern (see the detailed description in the pcrepattern documenta- |
unset from within the pattern (see the detailed description in the |
1149 |
tion). For these options, the contents of the options argument speci- |
pcrepattern documentation). For those options that can be different in |
1150 |
fies their initial settings at the start of compilation and execution. |
different parts of the pattern, the contents of the options argument |
1151 |
The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the time |
specifies their initial settings at the start of compilation and execu- |
1152 |
of matching as well as at compile time. |
tion. The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the |
1153 |
|
time of matching as well as at compile time. |
1154 |
|
|
1155 |
If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, |
If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, |
1156 |
if compilation of a pattern fails, pcre_compile() returns NULL, and |
if compilation of a pattern fails, pcre_compile() returns NULL, and |
1157 |
sets the variable pointed to by errptr to point to a textual error mes- |
sets the variable pointed to by errptr to point to a textual error mes- |
1158 |
sage. This is a static string that is part of the library. You must not |
sage. This is a static string that is part of the library. You must not |
1159 |
try to free it. The offset from the start of the pattern to the charac- |
try to free it. The offset from the start of the pattern to the charac- |
1160 |
ter where the error was discovered is placed in the variable pointed to |
ter where the error was discovered is placed in the variable pointed to |
1161 |
by erroffset, which must not be NULL. If it is, an immediate error is |
by erroffset, which must not be NULL. If it is, an immediate error is |
1162 |
given. |
given. |
1163 |
|
|
1164 |
If pcre_compile2() is used instead of pcre_compile(), and the error- |
If pcre_compile2() is used instead of pcre_compile(), and the error- |
1165 |
codeptr argument is not NULL, a non-zero error code number is returned |
codeptr argument is not NULL, a non-zero error code number is returned |
1166 |
via this argument in the event of an error. This is in addition to the |
via this argument in the event of an error. This is in addition to the |
1167 |
textual error message. Error codes and messages are listed below. |
textual error message. Error codes and messages are listed below. |
1168 |
|
|
1169 |
If the final argument, tableptr, is NULL, PCRE uses a default set of |
If the final argument, tableptr, is NULL, PCRE uses a default set of |
1170 |
character tables that are built when PCRE is compiled, using the |
character tables that are built when PCRE is compiled, using the |
1171 |
default C locale. Otherwise, tableptr must be an address that is the |
default C locale. Otherwise, tableptr must be an address that is the |
1172 |
result of a call to pcre_maketables(). This value is stored with the |
result of a call to pcre_maketables(). This value is stored with the |
1173 |
compiled pattern, and used again by pcre_exec(), unless another table |
compiled pattern, and used again by pcre_exec(), unless another table |
1174 |
pointer is passed to it. For more discussion, see the section on locale |
pointer is passed to it. For more discussion, see the section on locale |
1175 |
support below. |
support below. |
1176 |
|
|
1177 |
This code fragment shows a typical straightforward call to pcre_com- |
This code fragment shows a typical straightforward call to pcre_com- |
1178 |
pile(): |
pile(): |
1179 |
|
|
1180 |
pcre *re; |
pcre *re; |
1187 |
&erroffset, /* for error offset */ |
&erroffset, /* for error offset */ |
1188 |
NULL); /* use default character tables */ |
NULL); /* use default character tables */ |
1189 |
|
|
1190 |
The following names for option bits are defined in the pcre.h header |
The following names for option bits are defined in the pcre.h header |
1191 |
file: |
file: |
1192 |
|
|
1193 |
PCRE_ANCHORED |
PCRE_ANCHORED |
1194 |
|
|
1195 |
If this bit is set, the pattern is forced to be "anchored", that is, it |
If this bit is set, the pattern is forced to be "anchored", that is, it |
1196 |
is constrained to match only at the first matching point in the string |
is constrained to match only at the first matching point in the string |
1197 |
that is being searched (the "subject string"). This effect can also be |
that is being searched (the "subject string"). This effect can also be |
1198 |
achieved by appropriate constructs in the pattern itself, which is the |
achieved by appropriate constructs in the pattern itself, which is the |
1199 |
only way to do it in Perl. |
only way to do it in Perl. |
1200 |
|
|
1201 |
PCRE_AUTO_CALLOUT |
PCRE_AUTO_CALLOUT |
1202 |
|
|
1203 |
If this bit is set, pcre_compile() automatically inserts callout items, |
If this bit is set, pcre_compile() automatically inserts callout items, |
1204 |
all with number 255, before each pattern item. For discussion of the |
all with number 255, before each pattern item. For discussion of the |
1205 |
callout facility, see the pcrecallout documentation. |
callout facility, see the pcrecallout documentation. |
1206 |
|
|
1207 |
PCRE_BSR_ANYCRLF |
PCRE_BSR_ANYCRLF |
1208 |
PCRE_BSR_UNICODE |
PCRE_BSR_UNICODE |
1209 |
|
|
1210 |
These options (which are mutually exclusive) control what the \R escape |
These options (which are mutually exclusive) control what the \R escape |
1211 |
sequence matches. The choice is either to match only CR, LF, or CRLF, |
sequence matches. The choice is either to match only CR, LF, or CRLF, |
1212 |
or to match any Unicode newline sequence. The default is specified when |
or to match any Unicode newline sequence. The default is specified when |
1213 |
PCRE is built. It can be overridden from within the pattern, or by set- |
PCRE is built. It can be overridden from within the pattern, or by set- |
1214 |
ting an option when a compiled pattern is matched. |
ting an option when a compiled pattern is matched. |
1215 |
|
|
1216 |
PCRE_CASELESS |
PCRE_CASELESS |
1217 |
|
|
1218 |
If this bit is set, letters in the pattern match both upper and lower |
If this bit is set, letters in the pattern match both upper and lower |
1219 |
case letters. It is equivalent to Perl's /i option, and it can be |
case letters. It is equivalent to Perl's /i option, and it can be |
1220 |
changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE |
changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE |
1221 |
always understands the concept of case for characters whose values are |
always understands the concept of case for characters whose values are |
1222 |
less than 128, so caseless matching is always possible. For characters |
less than 128, so caseless matching is always possible. For characters |
1223 |
with higher values, the concept of case is supported if PCRE is com- |
with higher values, the concept of case is supported if PCRE is com- |
1224 |
piled with Unicode property support, but not otherwise. If you want to |
piled with Unicode property support, but not otherwise. If you want to |
1225 |
use caseless matching for characters 128 and above, you must ensure |
use caseless matching for characters 128 and above, you must ensure |
1226 |
that PCRE is compiled with Unicode property support as well as with |
that PCRE is compiled with Unicode property support as well as with |
1227 |
UTF-8 support. |
UTF-8 support. |
1228 |
|
|
1229 |
PCRE_DOLLAR_ENDONLY |
PCRE_DOLLAR_ENDONLY |
1230 |
|
|
1231 |
If this bit is set, a dollar metacharacter in the pattern matches only |
If this bit is set, a dollar metacharacter in the pattern matches only |
1232 |
at the end of the subject string. Without this option, a dollar also |
at the end of the subject string. Without this option, a dollar also |
1233 |
matches immediately before a newline at the end of the string (but not |
matches immediately before a newline at the end of the string (but not |
1234 |
before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored |
before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored |
1235 |
if PCRE_MULTILINE is set. There is no equivalent to this option in |
if PCRE_MULTILINE is set. There is no equivalent to this option in |
1236 |
Perl, and no way to set it within a pattern. |
Perl, and no way to set it within a pattern. |
1237 |
|
|
1238 |
PCRE_DOTALL |
PCRE_DOTALL |
1239 |
|
|
1240 |
If this bit is set, a dot metacharater in the pattern matches all char- |
If this bit is set, a dot metacharater in the pattern matches all char- |
1241 |
acters, including those that indicate newline. Without it, a dot does |
acters, including those that indicate newline. Without it, a dot does |
1242 |
not match when the current position is at a newline. This option is |
not match when the current position is at a newline. This option is |
1243 |
equivalent to Perl's /s option, and it can be changed within a pattern |
equivalent to Perl's /s option, and it can be changed within a pattern |
1244 |
by a (?s) option setting. A negative class such as [^a] always matches |
by a (?s) option setting. A negative class such as [^a] always matches |
1245 |
newline characters, independent of the setting of this option. |
newline characters, independent of the setting of this option. |
1246 |
|
|
1247 |
PCRE_DUPNAMES |
PCRE_DUPNAMES |
1248 |
|
|
1249 |
If this bit is set, names used to identify capturing subpatterns need |
If this bit is set, names used to identify capturing subpatterns need |
1250 |
not be unique. This can be helpful for certain types of pattern when it |
not be unique. This can be helpful for certain types of pattern when it |
1251 |
is known that only one instance of the named subpattern can ever be |
is known that only one instance of the named subpattern can ever be |
1252 |
matched. There are more details of named subpatterns below; see also |
matched. There are more details of named subpatterns below; see also |
1253 |
the pcrepattern documentation. |
the pcrepattern documentation. |
1254 |
|
|
1255 |
PCRE_EXTENDED |
PCRE_EXTENDED |
1256 |
|
|
1257 |
If this bit is set, whitespace data characters in the pattern are |
If this bit is set, whitespace data characters in the pattern are |
1258 |
totally ignored except when escaped or inside a character class. White- |
totally ignored except when escaped or inside a character class. White- |
1259 |
space does not include the VT character (code 11). In addition, charac- |
space does not include the VT character (code 11). In addition, charac- |
1260 |
ters between an unescaped # outside a character class and the next new- |
ters between an unescaped # outside a character class and the next new- |
1261 |
line, inclusive, are also ignored. This is equivalent to Perl's /x |
line, inclusive, are also ignored. This is equivalent to Perl's /x |
1262 |
option, and it can be changed within a pattern by a (?x) option set- |
option, and it can be changed within a pattern by a (?x) option set- |
1263 |
ting. |
ting. |
1264 |
|
|
1265 |
This option makes it possible to include comments inside complicated |
This option makes it possible to include comments inside complicated |
1266 |
patterns. Note, however, that this applies only to data characters. |
patterns. Note, however, that this applies only to data characters. |
1267 |
Whitespace characters may never appear within special character |
Whitespace characters may never appear within special character |
1268 |
sequences in a pattern, for example within the sequence (?( which |
sequences in a pattern, for example within the sequence (?( which |
1269 |
introduces a conditional subpattern. |
introduces a conditional subpattern. |
1270 |
|
|
1271 |
PCRE_EXTRA |
PCRE_EXTRA |
1272 |
|
|
1273 |
This option was invented in order to turn on additional functionality |
This option was invented in order to turn on additional functionality |
1274 |
of PCRE that is incompatible with Perl, but it is currently of very |
of PCRE that is incompatible with Perl, but it is currently of very |
1275 |
little use. When set, any backslash in a pattern that is followed by a |
little use. When set, any backslash in a pattern that is followed by a |
1276 |
letter that has no special meaning causes an error, thus reserving |
letter that has no special meaning causes an error, thus reserving |
1277 |
these combinations for future expansion. By default, as in Perl, a |
these combinations for future expansion. By default, as in Perl, a |
1278 |
backslash followed by a letter with no special meaning is treated as a |
backslash followed by a letter with no special meaning is treated as a |
1279 |
literal. (Perl can, however, be persuaded to give a warning for this.) |
literal. (Perl can, however, be persuaded to give a warning for this.) |
1280 |
There are at present no other features controlled by this option. It |
There are at present no other features controlled by this option. It |
1281 |
can also be set by a (?X) option setting within a pattern. |
can also be set by a (?X) option setting within a pattern. |
1282 |
|
|
1283 |
PCRE_FIRSTLINE |
PCRE_FIRSTLINE |
1284 |
|
|
1285 |
If this option is set, an unanchored pattern is required to match |
If this option is set, an unanchored pattern is required to match |
1286 |
before or at the first newline in the subject string, though the |
before or at the first newline in the subject string, though the |
1287 |
matched text may continue over the newline. |
matched text may continue over the newline. |
1288 |
|
|
1289 |
PCRE_JAVASCRIPT_COMPAT |
PCRE_JAVASCRIPT_COMPAT |
1290 |
|
|
1291 |
If this option is set, PCRE's behaviour is changed in some ways so that |
If this option is set, PCRE's behaviour is changed in some ways so that |
1292 |
it is compatible with JavaScript rather than Perl. The changes are as |
it is compatible with JavaScript rather than Perl. The changes are as |
1293 |
follows: |
follows: |
1294 |
|
|
1295 |
(1) A lone closing square bracket in a pattern causes a compile-time |
(1) A lone closing square bracket in a pattern causes a compile-time |
1296 |
error, because this is illegal in JavaScript (by default it is treated |
error, because this is illegal in JavaScript (by default it is treated |
1297 |
as a data character). Thus, the pattern AB]CD becomes illegal when this |
as a data character). Thus, the pattern AB]CD becomes illegal when this |
1298 |
option is set. |
option is set. |
1299 |
|
|
1300 |
(2) At run time, a back reference to an unset subpattern group matches |
(2) At run time, a back reference to an unset subpattern group matches |
1301 |
an empty string (by default this causes the current matching alterna- |
an empty string (by default this causes the current matching alterna- |
1302 |
tive to fail). A pattern such as (\1)(a) succeeds when this option is |
tive to fail). A pattern such as (\1)(a) succeeds when this option is |
1303 |
set (assuming it can find an "a" in the subject), whereas it fails by |
set (assuming it can find an "a" in the subject), whereas it fails by |
1304 |
default, for Perl compatibility. |
default, for Perl compatibility. |
1305 |
|
|
1306 |
PCRE_MULTILINE |
PCRE_MULTILINE |
1307 |
|
|
1308 |
By default, PCRE treats the subject string as consisting of a single |
By default, PCRE treats the subject string as consisting of a single |
1309 |
line of characters (even if it actually contains newlines). The "start |
line of characters (even if it actually contains newlines). The "start |
1310 |
of line" metacharacter (^) matches only at the start of the string, |
of line" metacharacter (^) matches only at the start of the string, |
1311 |
while the "end of line" metacharacter ($) matches only at the end of |
while the "end of line" metacharacter ($) matches only at the end of |
1312 |
the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY |
the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY |
1313 |
is set). This is the same as Perl. |
is set). This is the same as Perl. |
1314 |
|
|
1315 |
When PCRE_MULTILINE it is set, the "start of line" and "end of line" |
When PCRE_MULTILINE it is set, the "start of line" and "end of line" |
1316 |
constructs match immediately following or immediately before internal |
constructs match immediately following or immediately before internal |
1317 |
newlines in the subject string, respectively, as well as at the very |
newlines in the subject string, respectively, as well as at the very |
1318 |
start and end. This is equivalent to Perl's /m option, and it can be |
start and end. This is equivalent to Perl's /m option, and it can be |
1319 |
changed within a pattern by a (?m) option setting. If there are no new- |
changed within a pattern by a (?m) option setting. If there are no new- |
1320 |
lines in a subject string, or no occurrences of ^ or $ in a pattern, |
lines in a subject string, or no occurrences of ^ or $ in a pattern, |
1321 |
setting PCRE_MULTILINE has no effect. |
setting PCRE_MULTILINE has no effect. |
1322 |
|
|
1323 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1326 |
PCRE_NEWLINE_ANYCRLF |
PCRE_NEWLINE_ANYCRLF |
1327 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
1328 |
|
|
1329 |
These options override the default newline definition that was chosen |
These options override the default newline definition that was chosen |
1330 |
when PCRE was built. Setting the first or the second specifies that a |
when PCRE was built. Setting the first or the second specifies that a |
1331 |
newline is indicated by a single character (CR or LF, respectively). |
newline is indicated by a single character (CR or LF, respectively). |
1332 |
Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the |
Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the |
1333 |
two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies |
two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies |
1334 |
that any of the three preceding sequences should be recognized. Setting |
that any of the three preceding sequences should be recognized. Setting |
1335 |
PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be |
PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be |
1336 |
recognized. The Unicode newline sequences are the three just mentioned, |
recognized. The Unicode newline sequences are the three just mentioned, |
1337 |
plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
1338 |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
1339 |
(paragraph separator, U+2029). The last two are recognized only in |
(paragraph separator, U+2029). The last two are recognized only in |
1340 |
UTF-8 mode. |
UTF-8 mode. |
1341 |
|
|
1342 |
The newline setting in the options word uses three bits that are |
The newline setting in the options word uses three bits that are |
1343 |
treated as a number, giving eight possibilities. Currently only six are |
treated as a number, giving eight possibilities. Currently only six are |
1344 |
used (default plus the five values above). This means that if you set |
used (default plus the five values above). This means that if you set |
1345 |
more than one newline option, the combination may or may not be sensi- |
more than one newline option, the combination may or may not be sensi- |
1346 |
ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to |
ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to |
1347 |
PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and |
PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and |
1348 |
cause an error. |
cause an error. |
1349 |
|
|
1350 |
The only time that a line break is specially recognized when compiling |
The only time that a line break is specially recognized when compiling |
1351 |
a pattern is if PCRE_EXTENDED is set, and an unescaped # outside a |
a pattern is if PCRE_EXTENDED is set, and an unescaped # outside a |
1352 |
character class is encountered. This indicates a comment that lasts |
character class is encountered. This indicates a comment that lasts |
1353 |
until after the next line break sequence. In other circumstances, line |
until after the next line break sequence. In other circumstances, line |
1354 |
break sequences are treated as literal data, except that in |
break sequences are treated as literal data, except that in |
1355 |
PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters |
PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters |
1356 |
and are therefore ignored. |
and are therefore ignored. |
1357 |
|
|
1361 |
PCRE_NO_AUTO_CAPTURE |
PCRE_NO_AUTO_CAPTURE |
1362 |
|
|
1363 |
If this option is set, it disables the use of numbered capturing paren- |
If this option is set, it disables the use of numbered capturing paren- |
1364 |
theses in the pattern. Any opening parenthesis that is not followed by |
theses in the pattern. Any opening parenthesis that is not followed by |
1365 |
? behaves as if it were followed by ?: but named parentheses can still |
? behaves as if it were followed by ?: but named parentheses can still |
1366 |
be used for capturing (and they acquire numbers in the usual way). |
be used for capturing (and they acquire numbers in the usual way). |
1367 |
There is no equivalent of this option in Perl. |
There is no equivalent of this option in Perl. |
1368 |
|
|
1369 |
PCRE_UNGREEDY |
PCRE_UNGREEDY |
1370 |
|
|
1371 |
This option inverts the "greediness" of the quantifiers so that they |
This option inverts the "greediness" of the quantifiers so that they |
1372 |
are not greedy by default, but become greedy if followed by "?". It is |
are not greedy by default, but become greedy if followed by "?". It is |
1373 |
not compatible with Perl. It can also be set by a (?U) option setting |
not compatible with Perl. It can also be set by a (?U) option setting |
1374 |
within the pattern. |
within the pattern. |
1375 |
|
|
1376 |
PCRE_UTF8 |
PCRE_UTF8 |
1377 |
|
|
1378 |
This option causes PCRE to regard both the pattern and the subject as |
This option causes PCRE to regard both the pattern and the subject as |
1379 |
strings of UTF-8 characters instead of single-byte character strings. |
strings of UTF-8 characters instead of single-byte character strings. |
1380 |
However, it is available only when PCRE is built to include UTF-8 sup- |
However, it is available only when PCRE is built to include UTF-8 sup- |
1381 |
port. If not, the use of this option provokes an error. Details of how |
port. If not, the use of this option provokes an error. Details of how |
1382 |
this option changes the behaviour of PCRE are given in the section on |
this option changes the behaviour of PCRE are given in the section on |
1383 |
UTF-8 support in the main pcre page. |
UTF-8 support in the main pcre page. |
1384 |
|
|
1385 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
1386 |
|
|
1387 |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is |
1388 |
automatically checked. There is a discussion about the validity of |
automatically checked. There is a discussion about the validity of |
1389 |
UTF-8 strings in the main pcre page. If an invalid UTF-8 sequence of |
UTF-8 strings in the main pcre page. If an invalid UTF-8 sequence of |
1390 |
bytes is found, pcre_compile() returns an error. If you already know |
bytes is found, pcre_compile() returns an error. If you already know |
1391 |
that your pattern is valid, and you want to skip this check for perfor- |
that your pattern is valid, and you want to skip this check for perfor- |
1392 |
mance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is |
mance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is |
1393 |
set, the effect of passing an invalid UTF-8 string as a pattern is |
set, the effect of passing an invalid UTF-8 string as a pattern is |
1394 |
undefined. It may cause your program to crash. Note that this option |
undefined. It may cause your program to crash. Note that this option |
1395 |
can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the |
can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the |
1396 |
UTF-8 validity checking of subject strings. |
UTF-8 validity checking of subject strings. |
1397 |
|
|
1398 |
|
|
1399 |
COMPILATION ERROR CODES |
COMPILATION ERROR CODES |
1400 |
|
|
1401 |
The following table lists the error codes than may be returned by |
The following table lists the error codes than may be returned by |
1402 |
pcre_compile2(), along with the error messages that may be returned by |
pcre_compile2(), along with the error messages that may be returned by |
1403 |
both compiling functions. As PCRE has developed, some error codes have |
both compiling functions. As PCRE has developed, some error codes have |
1404 |
fallen out of use. To avoid confusion, they have not been re-used. |
fallen out of use. To avoid confusion, they have not been re-used. |
1405 |
|
|
1406 |
0 no error |
0 no error |
1456 |
50 [this code is not in use] |
50 [this code is not in use] |
1457 |
51 octal value is greater than \377 (not in UTF-8 mode) |
51 octal value is greater than \377 (not in UTF-8 mode) |
1458 |
52 internal error: overran compiling workspace |
52 internal error: overran compiling workspace |
1459 |
53 internal error: previously-checked referenced subpattern not |
53 internal error: previously-checked referenced subpattern not |
1460 |
found |
found |
1461 |
54 DEFINE group contains more than one branch |
54 DEFINE group contains more than one branch |
1462 |
55 repeating a DEFINE group is not allowed |
55 repeating a DEFINE group is not allowed |
1471 |
63 digit expected after (?+ |
63 digit expected after (?+ |
1472 |
64 ] is an invalid data character in JavaScript compatibility mode |
64 ] is an invalid data character in JavaScript compatibility mode |
1473 |
|
|
1474 |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different |
1475 |
values may be used if the limits were changed when PCRE was built. |
values may be used if the limits were changed when PCRE was built. |
1476 |
|
|
1477 |
|
|
1480 |
pcre_extra *pcre_study(const pcre *code, int options |
pcre_extra *pcre_study(const pcre *code, int options |
1481 |
const char **errptr); |
const char **errptr); |
1482 |
|
|
1483 |
If a compiled pattern is going to be used several times, it is worth |
If a compiled pattern is going to be used several times, it is worth |
1484 |
spending more time analyzing it in order to speed up the time taken for |
spending more time analyzing it in order to speed up the time taken for |
1485 |
matching. The function pcre_study() takes a pointer to a compiled pat- |
matching. The function pcre_study() takes a pointer to a compiled pat- |
1486 |
tern as its first argument. If studying the pattern produces additional |
tern as its first argument. If studying the pattern produces additional |
1487 |
information that will help speed up matching, pcre_study() returns a |
information that will help speed up matching, pcre_study() returns a |
1488 |
pointer to a pcre_extra block, in which the study_data field points to |
pointer to a pcre_extra block, in which the study_data field points to |
1489 |
the results of the study. |
the results of the study. |
1490 |
|
|
1491 |
The returned value from pcre_study() can be passed directly to |
The returned value from pcre_study() can be passed directly to |
1492 |
pcre_exec(). However, a pcre_extra block also contains other fields |
pcre_exec(). However, a pcre_extra block also contains other fields |
1493 |
that can be set by the caller before the block is passed; these are |
that can be set by the caller before the block is passed; these are |
1494 |
described below in the section on matching a pattern. |
described below in the section on matching a pattern. |
1495 |
|
|
1496 |
If studying the pattern does not produce any additional information |
If studying the pattern does not produce any additional information |
1497 |
pcre_study() returns NULL. In that circumstance, if the calling program |
pcre_study() returns NULL. In that circumstance, if the calling program |
1498 |
wants to pass any of the other fields to pcre_exec(), it must set up |
wants to pass any of the other fields to pcre_exec(), it must set up |
1499 |
its own pcre_extra block. |
its own pcre_extra block. |
1500 |
|
|
1501 |
The second argument of pcre_study() contains option bits. At present, |
The second argument of pcre_study() contains option bits. At present, |
1502 |
no options are defined, and this argument should always be zero. |
no options are defined, and this argument should always be zero. |
1503 |
|
|
1504 |
The third argument for pcre_study() is a pointer for an error message. |
The third argument for pcre_study() is a pointer for an error message. |
1505 |
If studying succeeds (even if no data is returned), the variable it |
If studying succeeds (even if no data is returned), the variable it |
1506 |
points to is set to NULL. Otherwise it is set to point to a textual |
points to is set to NULL. Otherwise it is set to point to a textual |
1507 |
error message. This is a static string that is part of the library. You |
error message. This is a static string that is part of the library. You |
1508 |
must not try to free it. You should test the error pointer for NULL |
must not try to free it. You should test the error pointer for NULL |
1509 |
after calling pcre_study(), to be sure that it has run successfully. |
after calling pcre_study(), to be sure that it has run successfully. |
1510 |
|
|
1511 |
This is a typical call to pcre_study(): |
This is a typical call to pcre_study(): |
1517 |
&error); /* set to NULL or points to a message */ |
&error); /* set to NULL or points to a message */ |
1518 |
|
|
1519 |
At present, studying a pattern is useful only for non-anchored patterns |
At present, studying a pattern is useful only for non-anchored patterns |
1520 |
that do not have a single fixed starting character. A bitmap of possi- |
that do not have a single fixed starting character. A bitmap of possi- |
1521 |
ble starting bytes is created. |
ble starting bytes is created. |
1522 |
|
|
1523 |
|
|
1524 |
LOCALE SUPPORT |
LOCALE SUPPORT |
1525 |
|
|
1526 |
PCRE handles caseless matching, and determines whether characters are |
PCRE handles caseless matching, and determines whether characters are |
1527 |
letters, digits, or whatever, by reference to a set of tables, indexed |
letters, digits, or whatever, by reference to a set of tables, indexed |
1528 |
by character value. When running in UTF-8 mode, this applies only to |
by character value. When running in UTF-8 mode, this applies only to |
1529 |
characters with codes less than 128. Higher-valued codes never match |
characters with codes less than 128. Higher-valued codes never match |
1530 |
escapes such as \w or \d, but can be tested with \p if PCRE is built |
escapes such as \w or \d, but can be tested with \p if PCRE is built |
1531 |
with Unicode character property support. The use of locales with Uni- |
with Unicode character property support. The use of locales with Uni- |
1532 |
code is discouraged. If you are handling characters with codes greater |
code is discouraged. If you are handling characters with codes greater |
1533 |
than 128, you should either use UTF-8 and Unicode, or use locales, but |
than 128, you should either use UTF-8 and Unicode, or use locales, but |
1534 |
not try to mix the two. |
not try to mix the two. |
1535 |
|
|
1536 |
PCRE contains an internal set of tables that are used when the final |
PCRE contains an internal set of tables that are used when the final |
1537 |
argument of pcre_compile() is NULL. These are sufficient for many |
argument of pcre_compile() is NULL. These are sufficient for many |
1538 |
applications. Normally, the internal tables recognize only ASCII char- |
applications. Normally, the internal tables recognize only ASCII char- |
1539 |
acters. However, when PCRE is built, it is possible to cause the inter- |
acters. However, when PCRE is built, it is possible to cause the inter- |
1540 |
nal tables to be rebuilt in the default "C" locale of the local system, |
nal tables to be rebuilt in the default "C" locale of the local system, |
1541 |
which may cause them to be different. |
which may cause them to be different. |
1542 |
|
|
1543 |
The internal tables can always be overridden by tables supplied by the |
The internal tables can always be overridden by tables supplied by the |
1544 |
application that calls PCRE. These may be created in a different locale |
application that calls PCRE. These may be created in a different locale |
1545 |
from the default. As more and more applications change to using Uni- |
from the default. As more and more applications change to using Uni- |
1546 |
code, the need for this locale support is expected to die away. |
code, the need for this locale support is expected to die away. |
1547 |
|
|
1548 |
External tables are built by calling the pcre_maketables() function, |
External tables are built by calling the pcre_maketables() function, |
1549 |
which has no arguments, in the relevant locale. The result can then be |
which has no arguments, in the relevant locale. The result can then be |
1550 |
passed to pcre_compile() or pcre_exec() as often as necessary. For |
passed to pcre_compile() or pcre_exec() as often as necessary. For |
1551 |
example, to build and use tables that are appropriate for the French |
example, to build and use tables that are appropriate for the French |
1552 |
locale (where accented characters with values greater than 128 are |
locale (where accented characters with values greater than 128 are |
1553 |
treated as letters), the following code could be used: |
treated as letters), the following code could be used: |
1554 |
|
|
1555 |
setlocale(LC_CTYPE, "fr_FR"); |
setlocale(LC_CTYPE, "fr_FR"); |
1556 |
tables = pcre_maketables(); |
tables = pcre_maketables(); |
1557 |
re = pcre_compile(..., tables); |
re = pcre_compile(..., tables); |
1558 |
|
|
1559 |
The locale name "fr_FR" is used on Linux and other Unix-like systems; |
The locale name "fr_FR" is used on Linux and other Unix-like systems; |
1560 |
if you are using Windows, the name for the French locale is "french". |
if you are using Windows, the name for the French locale is "french". |
1561 |
|
|
1562 |
When pcre_maketables() runs, the tables are built in memory that is |
When pcre_maketables() runs, the tables are built in memory that is |
1563 |
obtained via pcre_malloc. It is the caller's responsibility to ensure |
obtained via pcre_malloc. It is the caller's responsibility to ensure |
1564 |
that the memory containing the tables remains available for as long as |
that the memory containing the tables remains available for as long as |
1565 |
it is needed. |
it is needed. |
1566 |
|
|
1567 |
The pointer that is passed to pcre_compile() is saved with the compiled |
The pointer that is passed to pcre_compile() is saved with the compiled |
1568 |
pattern, and the same tables are used via this pointer by pcre_study() |
pattern, and the same tables are used via this pointer by pcre_study() |
1569 |
and normally also by pcre_exec(). Thus, by default, for any single pat- |
and normally also by pcre_exec(). Thus, by default, for any single pat- |
1570 |
tern, compilation, studying and matching all happen in the same locale, |
tern, compilation, studying and matching all happen in the same locale, |
1571 |
but different patterns can be compiled in different locales. |
but different patterns can be compiled in different locales. |
1572 |
|
|
1573 |
It is possible to pass a table pointer or NULL (indicating the use of |
It is possible to pass a table pointer or NULL (indicating the use of |
1574 |
the internal tables) to pcre_exec(). Although not intended for this |
the internal tables) to pcre_exec(). Although not intended for this |
1575 |
purpose, this facility could be used to match a pattern in a different |
purpose, this facility could be used to match a pattern in a different |
1576 |
locale from the one in which it was compiled. Passing table pointers at |
locale from the one in which it was compiled. Passing table pointers at |
1577 |
run time is discussed below in the section on matching a pattern. |
run time is discussed below in the section on matching a pattern. |
1578 |
|
|
1582 |
int pcre_fullinfo(const pcre *code, const pcre_extra *extra, |
int pcre_fullinfo(const pcre *code, const pcre_extra *extra, |
1583 |
int what, void *where); |
int what, void *where); |
1584 |
|
|
1585 |
The pcre_fullinfo() function returns information about a compiled pat- |
The pcre_fullinfo() function returns information about a compiled pat- |
1586 |
tern. It replaces the obsolete pcre_info() function, which is neverthe- |
tern. It replaces the obsolete pcre_info() function, which is neverthe- |
1587 |
less retained for backwards compability (and is documented below). |
less retained for backwards compability (and is documented below). |
1588 |
|
|
1589 |
The first argument for pcre_fullinfo() is a pointer to the compiled |
The first argument for pcre_fullinfo() is a pointer to the compiled |
1590 |
pattern. The second argument is the result of pcre_study(), or NULL if |
pattern. The second argument is the result of pcre_study(), or NULL if |
1591 |
the pattern was not studied. The third argument specifies which piece |
the pattern was not studied. The third argument specifies which piece |
1592 |
of information is required, and the fourth argument is a pointer to a |
of information is required, and the fourth argument is a pointer to a |
1593 |
variable to receive the data. The yield of the function is zero for |
variable to receive the data. The yield of the function is zero for |
1594 |
success, or one of the following negative numbers: |
success, or one of the following negative numbers: |
1595 |
|
|
1596 |
PCRE_ERROR_NULL the argument code was NULL |
PCRE_ERROR_NULL the argument code was NULL |
1598 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
1599 |
PCRE_ERROR_BADOPTION the value of what was invalid |
PCRE_ERROR_BADOPTION the value of what was invalid |
1600 |
|
|
1601 |
The "magic number" is placed at the start of each compiled pattern as |
The "magic number" is placed at the start of each compiled pattern as |
1602 |
an simple check against passing an arbitrary memory pointer. Here is a |
an simple check against passing an arbitrary memory pointer. Here is a |
1603 |
typical call of pcre_fullinfo(), to obtain the length of the compiled |
typical call of pcre_fullinfo(), to obtain the length of the compiled |
1604 |
pattern: |
pattern: |
1605 |
|
|
1606 |
int rc; |
int rc; |
1611 |
PCRE_INFO_SIZE, /* what is required */ |
PCRE_INFO_SIZE, /* what is required */ |
1612 |
&length); /* where to put the data */ |
&length); /* where to put the data */ |
1613 |
|
|
1614 |
The possible values for the third argument are defined in pcre.h, and |
The possible values for the third argument are defined in pcre.h, and |
1615 |
are as follows: |
are as follows: |
1616 |
|
|
1617 |
PCRE_INFO_BACKREFMAX |
PCRE_INFO_BACKREFMAX |
1618 |
|
|
1619 |
Return the number of the highest back reference in the pattern. The |
Return the number of the highest back reference in the pattern. The |
1620 |
fourth argument should point to an int variable. Zero is returned if |
fourth argument should point to an int variable. Zero is returned if |
1621 |
there are no back references. |
there are no back references. |
1622 |
|
|
1623 |
PCRE_INFO_CAPTURECOUNT |
PCRE_INFO_CAPTURECOUNT |
1624 |
|
|
1625 |
Return the number of capturing subpatterns in the pattern. The fourth |
Return the number of capturing subpatterns in the pattern. The fourth |
1626 |
argument should point to an int variable. |
argument should point to an int variable. |
1627 |
|
|
1628 |
PCRE_INFO_DEFAULT_TABLES |
PCRE_INFO_DEFAULT_TABLES |
1629 |
|
|
1630 |
Return a pointer to the internal default character tables within PCRE. |
Return a pointer to the internal default character tables within PCRE. |
1631 |
The fourth argument should point to an unsigned char * variable. This |
The fourth argument should point to an unsigned char * variable. This |
1632 |
information call is provided for internal use by the pcre_study() func- |
information call is provided for internal use by the pcre_study() func- |
1633 |
tion. External callers can cause PCRE to use its internal tables by |
tion. External callers can cause PCRE to use its internal tables by |
1634 |
passing a NULL table pointer. |
passing a NULL table pointer. |
1635 |
|
|
1636 |
PCRE_INFO_FIRSTBYTE |
PCRE_INFO_FIRSTBYTE |
1637 |
|
|
1638 |
Return information about the first byte of any matched string, for a |
Return information about the first byte of any matched string, for a |
1639 |
non-anchored pattern. The fourth argument should point to an int vari- |
non-anchored pattern. The fourth argument should point to an int vari- |
1640 |
able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name |
able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name |
1641 |
is still recognized for backwards compatibility.) |
is still recognized for backwards compatibility.) |
1642 |
|
|
1643 |
If there is a fixed first byte, for example, from a pattern such as |
If there is a fixed first byte, for example, from a pattern such as |
1644 |
(cat|cow|coyote), its value is returned. Otherwise, if either |
(cat|cow|coyote), its value is returned. Otherwise, if either |
1645 |
|
|
1646 |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every |
1647 |
branch starts with "^", or |
branch starts with "^", or |
1648 |
|
|
1649 |
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not |
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not |
1650 |
set (if it were set, the pattern would be anchored), |
set (if it were set, the pattern would be anchored), |
1651 |
|
|
1652 |
-1 is returned, indicating that the pattern matches only at the start |
-1 is returned, indicating that the pattern matches only at the start |
1653 |
of a subject string or after any newline within the string. Otherwise |
of a subject string or after any newline within the string. Otherwise |
1654 |
-2 is returned. For anchored patterns, -2 is returned. |
-2 is returned. For anchored patterns, -2 is returned. |
1655 |
|
|
1656 |
PCRE_INFO_FIRSTTABLE |
PCRE_INFO_FIRSTTABLE |
1657 |
|
|
1658 |
If the pattern was studied, and this resulted in the construction of a |
If the pattern was studied, and this resulted in the construction of a |
1659 |
256-bit table indicating a fixed set of bytes for the first byte in any |
256-bit table indicating a fixed set of bytes for the first byte in any |
1660 |
matching string, a pointer to the table is returned. Otherwise NULL is |
matching string, a pointer to the table is returned. Otherwise NULL is |
1661 |
returned. The fourth argument should point to an unsigned char * vari- |
returned. The fourth argument should point to an unsigned char * vari- |
1662 |
able. |
able. |
1663 |
|
|
1664 |
PCRE_INFO_HASCRORLF |
PCRE_INFO_HASCRORLF |
1665 |
|
|
1666 |
Return 1 if the pattern contains any explicit matches for CR or LF |
Return 1 if the pattern contains any explicit matches for CR or LF |
1667 |
characters, otherwise 0. The fourth argument should point to an int |
characters, otherwise 0. The fourth argument should point to an int |
1668 |
variable. An explicit match is either a literal CR or LF character, or |
variable. An explicit match is either a literal CR or LF character, or |
1669 |
\r or \n. |
\r or \n. |
1670 |
|
|
1671 |
PCRE_INFO_JCHANGED |
PCRE_INFO_JCHANGED |
1672 |
|
|
1673 |
Return 1 if the (?J) or (?-J) option setting is used in the pattern, |
Return 1 if the (?J) or (?-J) option setting is used in the pattern, |
1674 |
otherwise 0. The fourth argument should point to an int variable. (?J) |
otherwise 0. The fourth argument should point to an int variable. (?J) |
1675 |
and (?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
and (?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
1676 |
|
|
1677 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
1678 |
|
|
1679 |
Return the value of the rightmost literal byte that must exist in any |
Return the value of the rightmost literal byte that must exist in any |
1680 |
matched string, other than at its start, if such a byte has been |
matched string, other than at its start, if such a byte has been |
1681 |
recorded. The fourth argument should point to an int variable. If there |
recorded. The fourth argument should point to an int variable. If there |
1682 |
is no such byte, -1 is returned. For anchored patterns, a last literal |
is no such byte, -1 is returned. For anchored patterns, a last literal |
1683 |
byte is recorded only if it follows something of variable length. For |
byte is recorded only if it follows something of variable length. For |
1684 |
example, for the pattern /^a\d+z\d+/ the returned value is "z", but for |
example, for the pattern /^a\d+z\d+/ the returned value is "z", but for |
1685 |
/^a\dz\d/ the returned value is -1. |
/^a\dz\d/ the returned value is -1. |
1686 |
|
|
1688 |
PCRE_INFO_NAMEENTRYSIZE |
PCRE_INFO_NAMEENTRYSIZE |
1689 |
PCRE_INFO_NAMETABLE |
PCRE_INFO_NAMETABLE |
1690 |
|
|
1691 |
PCRE supports the use of named as well as numbered capturing parenthe- |
PCRE supports the use of named as well as numbered capturing parenthe- |
1692 |
ses. The names are just an additional way of identifying the parenthe- |
ses. The names are just an additional way of identifying the parenthe- |
1693 |
ses, which still acquire numbers. Several convenience functions such as |
ses, which still acquire numbers. Several convenience functions such as |
1694 |
pcre_get_named_substring() are provided for extracting captured sub- |
pcre_get_named_substring() are provided for extracting captured sub- |
1695 |
strings by name. It is also possible to extract the data directly, by |
strings by name. It is also possible to extract the data directly, by |
1696 |
first converting the name to a number in order to access the correct |
first converting the name to a number in order to access the correct |
1697 |
pointers in the output vector (described with pcre_exec() below). To do |
pointers in the output vector (described with pcre_exec() below). To do |
1698 |
the conversion, you need to use the name-to-number map, which is |
the conversion, you need to use the name-to-number map, which is |
1699 |
described by these three values. |
described by these three values. |
1700 |
|
|
1701 |
The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT |
The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT |
1702 |
gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size |
gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size |
1703 |
of each entry; both of these return an int value. The entry size |
of each entry; both of these return an int value. The entry size |
1704 |
depends on the length of the longest name. PCRE_INFO_NAMETABLE returns |
depends on the length of the longest name. PCRE_INFO_NAMETABLE returns |
1705 |
a pointer to the first entry of the table (a pointer to char). The |
a pointer to the first entry of the table (a pointer to char). The |
1706 |
first two bytes of each entry are the number of the capturing parenthe- |
first two bytes of each entry are the number of the capturing parenthe- |
1707 |
sis, most significant byte first. The rest of the entry is the corre- |
sis, most significant byte first. The rest of the entry is the corre- |
1708 |
sponding name, zero terminated. The names are in alphabetical order. |
sponding name, zero terminated. The names are in alphabetical order. |
1709 |
When PCRE_DUPNAMES is set, duplicate names are in order of their paren- |
When PCRE_DUPNAMES is set, duplicate names are in order of their paren- |
1710 |
theses numbers. For example, consider the following pattern (assume |
theses numbers. For example, consider the following pattern (assume |
1711 |
PCRE_EXTENDED is set, so white space - including newlines - is |
PCRE_EXTENDED is set, so white space - including newlines - is |
1712 |
ignored): |
ignored): |
1713 |
|
|
1714 |
(?<date> (?<year>(\d\d)?\d\d) - |
(?<date> (?<year>(\d\d)?\d\d) - |
1715 |
(?<month>\d\d) - (?<day>\d\d) ) |
(?<month>\d\d) - (?<day>\d\d) ) |
1716 |
|
|
1717 |
There are four named subpatterns, so the table has four entries, and |
There are four named subpatterns, so the table has four entries, and |
1718 |
each entry in the table is eight bytes long. The table is as follows, |
each entry in the table is eight bytes long. The table is as follows, |
1719 |
with non-printing bytes shows in hexadecimal, and undefined bytes shown |
with non-printing bytes shows in hexadecimal, and undefined bytes shown |
1720 |
as ??: |
as ??: |
1721 |
|
|
1724 |
00 04 m o n t h 00 |
00 04 m o n t h 00 |
1725 |
00 02 y e a r 00 ?? |
00 02 y e a r 00 ?? |
1726 |
|
|
1727 |
When writing code to extract data from named subpatterns using the |
When writing code to extract data from named subpatterns using the |
1728 |
name-to-number map, remember that the length of the entries is likely |
name-to-number map, remember that the length of the entries is likely |
1729 |
to be different for each compiled pattern. |
to be different for each compiled pattern. |
1730 |
|
|
1731 |
PCRE_INFO_OKPARTIAL |
PCRE_INFO_OKPARTIAL |
1732 |
|
|
1733 |
Return 1 if the pattern can be used for partial matching, otherwise 0. |
Return 1 if the pattern can be used for partial matching with |
1734 |
The fourth argument should point to an int variable. The pcrepartial |
pcre_exec(), otherwise 0. The fourth argument should point to an int |
1735 |
documentation lists the restrictions that apply to patterns when par- |
variable. From release 8.00, this always returns 1, because the |
1736 |
tial matching is used. |
restrictions that previously applied to partial matching have been |
1737 |
|
lifted. The pcrepartial documentation gives details of partial match- |
1738 |
|
ing. |
1739 |
|
|
1740 |
PCRE_INFO_OPTIONS |
PCRE_INFO_OPTIONS |
1741 |
|
|
1742 |
Return a copy of the options with which the pattern was compiled. The |
Return a copy of the options with which the pattern was compiled. The |
1743 |
fourth argument should point to an unsigned long int variable. These |
fourth argument should point to an unsigned long int variable. These |
1744 |
option bits are those specified in the call to pcre_compile(), modified |
option bits are those specified in the call to pcre_compile(), modified |
1745 |
by any top-level option settings at the start of the pattern itself. In |
by any top-level option settings at the start of the pattern itself. In |
1746 |
other words, they are the options that will be in force when matching |
other words, they are the options that will be in force when matching |
1747 |
starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with |
starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with |
1748 |
the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, |
the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, |
1749 |
and PCRE_EXTENDED. |
and PCRE_EXTENDED. |
1750 |
|
|
1751 |
A pattern is automatically anchored by PCRE if all of its top-level |
A pattern is automatically anchored by PCRE if all of its top-level |
1752 |
alternatives begin with one of the following: |
alternatives begin with one of the following: |
1753 |
|
|
1754 |
^ unless PCRE_MULTILINE is set |
^ unless PCRE_MULTILINE is set |
1762 |
|
|
1763 |
PCRE_INFO_SIZE |
PCRE_INFO_SIZE |
1764 |
|
|
1765 |
Return the size of the compiled pattern, that is, the value that was |
Return the size of the compiled pattern, that is, the value that was |
1766 |
passed as the argument to pcre_malloc() when PCRE was getting memory in |
passed as the argument to pcre_malloc() when PCRE was getting memory in |
1767 |
which to place the compiled data. The fourth argument should point to a |
which to place the compiled data. The fourth argument should point to a |
1768 |
size_t variable. |
size_t variable. |
1770 |
PCRE_INFO_STUDYSIZE |
PCRE_INFO_STUDYSIZE |
1771 |
|
|
1772 |
Return the size of the data block pointed to by the study_data field in |
Return the size of the data block pointed to by the study_data field in |
1773 |
a pcre_extra block. That is, it is the value that was passed to |
a pcre_extra block. That is, it is the value that was passed to |
1774 |
pcre_malloc() when PCRE was getting memory into which to place the data |
pcre_malloc() when PCRE was getting memory into which to place the data |
1775 |
created by pcre_study(). The fourth argument should point to a size_t |
created by pcre_study(). The fourth argument should point to a size_t |
1776 |
variable. |
variable. |
1777 |
|
|
1778 |
|
|
1780 |
|
|
1781 |
int pcre_info(const pcre *code, int *optptr, int *firstcharptr); |
int pcre_info(const pcre *code, int *optptr, int *firstcharptr); |
1782 |
|
|
1783 |
The pcre_info() function is now obsolete because its interface is too |
The pcre_info() function is now obsolete because its interface is too |
1784 |
restrictive to return all the available data about a compiled pattern. |
restrictive to return all the available data about a compiled pattern. |
1785 |
New programs should use pcre_fullinfo() instead. The yield of |
New programs should use pcre_fullinfo() instead. The yield of |
1786 |
pcre_info() is the number of capturing subpatterns, or one of the fol- |
pcre_info() is the number of capturing subpatterns, or one of the fol- |
1787 |
lowing negative numbers: |
lowing negative numbers: |
1788 |
|
|
1789 |
PCRE_ERROR_NULL the argument code was NULL |
PCRE_ERROR_NULL the argument code was NULL |
1790 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
1791 |
|
|
1792 |
If the optptr argument is not NULL, a copy of the options with which |
If the optptr argument is not NULL, a copy of the options with which |
1793 |
the pattern was compiled is placed in the integer it points to (see |
the pattern was compiled is placed in the integer it points to (see |
1794 |
PCRE_INFO_OPTIONS above). |
PCRE_INFO_OPTIONS above). |
1795 |
|
|
1796 |
If the pattern is not anchored and the firstcharptr argument is not |
If the pattern is not anchored and the firstcharptr argument is not |
1797 |
NULL, it is used to pass back information about the first character of |
NULL, it is used to pass back information about the first character of |
1798 |
any matched string (see PCRE_INFO_FIRSTBYTE above). |
any matched string (see PCRE_INFO_FIRSTBYTE above). |
1799 |
|
|
1800 |
|
|
1802 |
|
|
1803 |
int pcre_refcount(pcre *code, int adjust); |
int pcre_refcount(pcre *code, int adjust); |
1804 |
|
|
1805 |
The pcre_refcount() function is used to maintain a reference count in |
The pcre_refcount() function is used to maintain a reference count in |
1806 |
the data block that contains a compiled pattern. It is provided for the |
the data block that contains a compiled pattern. It is provided for the |
1807 |
benefit of applications that operate in an object-oriented manner, |
benefit of applications that operate in an object-oriented manner, |
1808 |
where different parts of the application may be using the same compiled |
where different parts of the application may be using the same compiled |
1809 |
pattern, but you want to free the block when they are all done. |
pattern, but you want to free the block when they are all done. |
1810 |
|
|
1811 |
When a pattern is compiled, the reference count field is initialized to |
When a pattern is compiled, the reference count field is initialized to |
1812 |
zero. It is changed only by calling this function, whose action is to |
zero. It is changed only by calling this function, whose action is to |
1813 |
add the adjust value (which may be positive or negative) to it. The |
add the adjust value (which may be positive or negative) to it. The |
1814 |
yield of the function is the new value. However, the value of the count |
yield of the function is the new value. However, the value of the count |
1815 |
is constrained to lie between 0 and 65535, inclusive. If the new value |
is constrained to lie between 0 and 65535, inclusive. If the new value |
1816 |
is outside these limits, it is forced to the appropriate limit value. |
is outside these limits, it is forced to the appropriate limit value. |
1817 |
|
|
1818 |
Except when it is zero, the reference count is not correctly preserved |
Except when it is zero, the reference count is not correctly preserved |
1819 |
if a pattern is compiled on one host and then transferred to a host |
if a pattern is compiled on one host and then transferred to a host |
1820 |
whose byte-order is different. (This seems a highly unlikely scenario.) |
whose byte-order is different. (This seems a highly unlikely scenario.) |
1821 |
|
|
1822 |
|
|
1826 |
const char *subject, int length, int startoffset, |
const char *subject, int length, int startoffset, |
1827 |
int options, int *ovector, int ovecsize); |
int options, int *ovector, int ovecsize); |
1828 |
|
|
1829 |
The function pcre_exec() is called to match a subject string against a |
The function pcre_exec() is called to match a subject string against a |
1830 |
compiled pattern, which is passed in the code argument. If the pattern |
compiled pattern, which is passed in the code argument. If the pattern |
1831 |
has been studied, the result of the study should be passed in the extra |
has been studied, the result of the study should be passed in the extra |
1832 |
argument. This function is the main matching facility of the library, |
argument. This function is the main matching facility of the library, |
1833 |
and it operates in a Perl-like manner. For specialist use there is also |
and it operates in a Perl-like manner. For specialist use there is also |
1834 |
an alternative matching function, which is described below in the sec- |
an alternative matching function, which is described below in the sec- |
1835 |
tion about the pcre_dfa_exec() function. |
tion about the pcre_dfa_exec() function. |
1836 |
|
|
1837 |
In most applications, the pattern will have been compiled (and option- |
In most applications, the pattern will have been compiled (and option- |
1838 |
ally studied) in the same process that calls pcre_exec(). However, it |
ally studied) in the same process that calls pcre_exec(). However, it |
1839 |
is possible to save compiled patterns and study data, and then use them |
is possible to save compiled patterns and study data, and then use them |
1840 |
later in different processes, possibly even on different hosts. For a |
later in different processes, possibly even on different hosts. For a |
1841 |
discussion about this, see the pcreprecompile documentation. |
discussion about this, see the pcreprecompile documentation. |
1842 |
|
|
1843 |
Here is an example of a simple call to pcre_exec(): |
Here is an example of a simple call to pcre_exec(): |
1856 |
|
|
1857 |
Extra data for pcre_exec() |
Extra data for pcre_exec() |
1858 |
|
|
1859 |
If the extra argument is not NULL, it must point to a pcre_extra data |
If the extra argument is not NULL, it must point to a pcre_extra data |
1860 |
block. The pcre_study() function returns such a block (when it doesn't |
block. The pcre_study() function returns such a block (when it doesn't |
1861 |
return NULL), but you can also create one for yourself, and pass addi- |
return NULL), but you can also create one for yourself, and pass addi- |
1862 |
tional information in it. The pcre_extra block contains the following |
tional information in it. The pcre_extra block contains the following |
1863 |
fields (not necessarily in this order): |
fields (not necessarily in this order): |
1864 |
|
|
1865 |
unsigned long int flags; |
unsigned long int flags; |
1869 |
void *callout_data; |
void *callout_data; |
1870 |
const unsigned char *tables; |
const unsigned char *tables; |
1871 |
|
|
1872 |
The flags field is a bitmap that specifies which of the other fields |
The flags field is a bitmap that specifies which of the other fields |
1873 |
are set. The flag bits are: |
are set. The flag bits are: |
1874 |
|
|
1875 |
PCRE_EXTRA_STUDY_DATA |
PCRE_EXTRA_STUDY_DATA |
1878 |
PCRE_EXTRA_CALLOUT_DATA |
PCRE_EXTRA_CALLOUT_DATA |
1879 |
PCRE_EXTRA_TABLES |
PCRE_EXTRA_TABLES |
1880 |
|
|
1881 |
Other flag bits should be set to zero. The study_data field is set in |
Other flag bits should be set to zero. The study_data field is set in |
1882 |
the pcre_extra block that is returned by pcre_study(), together with |
the pcre_extra block that is returned by pcre_study(), together with |
1883 |
the appropriate flag bit. You should not set this yourself, but you may |
the appropriate flag bit. You should not set this yourself, but you may |
1884 |
add to the block by setting the other fields and their corresponding |
add to the block by setting the other fields and their corresponding |
1885 |
flag bits. |
flag bits. |
1886 |
|
|
1887 |
The match_limit field provides a means of preventing PCRE from using up |
The match_limit field provides a means of preventing PCRE from using up |
1888 |
a vast amount of resources when running patterns that are not going to |
a vast amount of resources when running patterns that are not going to |
1889 |
match, but which have a very large number of possibilities in their |
match, but which have a very large number of possibilities in their |
1890 |
search trees. The classic example is the use of nested unlimited |
search trees. The classic example is the use of nested unlimited |
1891 |
repeats. |
repeats. |
1892 |
|
|
1893 |
Internally, PCRE uses a function called match() which it calls repeat- |
Internally, PCRE uses a function called match() which it calls repeat- |
1894 |
edly (sometimes recursively). The limit set by match_limit is imposed |
edly (sometimes recursively). The limit set by match_limit is imposed |
1895 |
on the number of times this function is called during a match, which |
on the number of times this function is called during a match, which |
1896 |
has the effect of limiting the amount of backtracking that can take |
has the effect of limiting the amount of backtracking that can take |
1897 |
place. For patterns that are not anchored, the count restarts from zero |
place. For patterns that are not anchored, the count restarts from zero |
1898 |
for each position in the subject string. |
for each position in the subject string. |
1899 |
|
|
1900 |
The default value for the limit can be set when PCRE is built; the |
The default value for the limit can be set when PCRE is built; the |
1901 |
default default is 10 million, which handles all but the most extreme |
default default is 10 million, which handles all but the most extreme |
1902 |
cases. You can override the default by suppling pcre_exec() with a |
cases. You can override the default by suppling pcre_exec() with a |
1903 |
pcre_extra block in which match_limit is set, and |
pcre_extra block in which match_limit is set, and |
1904 |
PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is |
PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is |
1905 |
exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. |
exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. |
1906 |
|
|
1907 |
The match_limit_recursion field is similar to match_limit, but instead |
The match_limit_recursion field is similar to match_limit, but instead |
1908 |
of limiting the total number of times that match() is called, it limits |
of limiting the total number of times that match() is called, it limits |
1909 |
the depth of recursion. The recursion depth is a smaller number than |
the depth of recursion. The recursion depth is a smaller number than |
1910 |
the total number of calls, because not all calls to match() are recur- |
the total number of calls, because not all calls to match() are recur- |
1911 |
sive. This limit is of use only if it is set smaller than match_limit. |
sive. This limit is of use only if it is set smaller than match_limit. |
1912 |
|
|
1913 |
Limiting the recursion depth limits the amount of stack that can be |
Limiting the recursion depth limits the amount of stack that can be |
1914 |
used, or, when PCRE has been compiled to use memory on the heap instead |
used, or, when PCRE has been compiled to use memory on the heap instead |
1915 |
of the stack, the amount of heap memory that can be used. |
of the stack, the amount of heap memory that can be used. |
1916 |
|
|
1917 |
The default value for match_limit_recursion can be set when PCRE is |
The default value for match_limit_recursion can be set when PCRE is |
1918 |
built; the default default is the same value as the default for |
built; the default default is the same value as the default for |
1919 |
match_limit. You can override the default by suppling pcre_exec() with |
match_limit. You can override the default by suppling pcre_exec() with |
1920 |
a pcre_extra block in which match_limit_recursion is set, and |
a pcre_extra block in which match_limit_recursion is set, and |
1921 |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the |
1922 |
limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. |
limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. |
1923 |
|
|
1924 |
The pcre_callout field is used in conjunction with the "callout" fea- |
The callout_data field is used in conjunction with the "callout" fea- |
1925 |
ture, which is described in the pcrecallout documentation. |
ture, and is described in the pcrecallout documentation. |
1926 |
|
|
1927 |
The tables field is used to pass a character tables pointer to |
The tables field is used to pass a character tables pointer to |
1928 |
pcre_exec(); this overrides the value that is stored with the compiled |
pcre_exec(); this overrides the value that is stored with the compiled |
1929 |
pattern. A non-NULL value is stored with the compiled pattern only if |
pattern. A non-NULL value is stored with the compiled pattern only if |
1930 |
custom tables were supplied to pcre_compile() via its tableptr argu- |
custom tables were supplied to pcre_compile() via its tableptr argu- |
1931 |
ment. If NULL is passed to pcre_exec() using this mechanism, it forces |
ment. If NULL is passed to pcre_exec() using this mechanism, it forces |
1932 |
PCRE's internal tables to be used. This facility is helpful when re- |
PCRE's internal tables to be used. This facility is helpful when re- |
1933 |
using patterns that have been saved after compiling with an external |
using patterns that have been saved after compiling with an external |
1934 |
set of tables, because the external tables might be at a different |
set of tables, because the external tables might be at a different |
1935 |
address when pcre_exec() is called. See the pcreprecompile documenta- |
address when pcre_exec() is called. See the pcreprecompile documenta- |
1936 |
tion for a discussion of saving compiled patterns for later use. |
tion for a discussion of saving compiled patterns for later use. |
1937 |
|
|
1938 |
Option bits for pcre_exec() |
Option bits for pcre_exec() |
1939 |
|
|
1940 |
The unused bits of the options argument for pcre_exec() must be zero. |
The unused bits of the options argument for pcre_exec() must be zero. |
1941 |
The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, |
The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, |
1942 |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_START_OPTIMIZE, |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
1943 |
PCRE_NO_UTF8_CHECK and PCRE_PARTIAL. |
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and |
1944 |
|
PCRE_PARTIAL_HARD. |
1945 |
|
|
1946 |
PCRE_ANCHORED |
PCRE_ANCHORED |
1947 |
|
|
2021 |
|
|
2022 |
a?b? |
a?b? |
2023 |
|
|
2024 |
is applied to a string not beginning with "a" or "b", it matches the |
is applied to a string not beginning with "a" or "b", it matches an |
2025 |
empty string at the start of the subject. With PCRE_NOTEMPTY set, this |
empty string at the start of the subject. With PCRE_NOTEMPTY set, this |
2026 |
match is not valid, so PCRE searches further into the string for occur- |
match is not valid, so PCRE searches further into the string for occur- |
2027 |
rences of "a" or "b". |
rences of "a" or "b". |
2028 |
|
|
2029 |
Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe- |
PCRE_NOTEMPTY_ATSTART |
2030 |
cial case of a pattern match of the empty string within its split() |
|
2031 |
function, and when using the /g modifier. It is possible to emulate |
This is like PCRE_NOTEMPTY, except that an empty string match that is |
2032 |
Perl's behaviour after matching a null string by first trying the match |
not at the start of the subject is permitted. If the pattern is |
2033 |
again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then |
anchored, such a match can occur only if the pattern contains \K. |
2034 |
if that fails by advancing the starting offset (see below) and trying |
|
2035 |
an ordinary match again. There is some code that demonstrates how to do |
Perl has no direct equivalent of PCRE_NOTEMPTY or |
2036 |
this in the pcredemo.c sample program. |
PCRE_NOTEMPTY_ATSTART, but it does make a special case of a pattern |
2037 |
|
match of the empty string within its split() function, and when using |
2038 |
|
the /g modifier. It is possible to emulate Perl's behaviour after |
2039 |
|
matching a null string by first trying the match again at the same off- |
2040 |
|
set with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then if that |
2041 |
|
fails, by advancing the starting offset (see below) and trying an ordi- |
2042 |
|
nary match again. There is some code that demonstrates how to do this |
2043 |
|
in the pcredemo sample program. |
2044 |
|
|
2045 |
PCRE_NO_START_OPTIMIZE |
PCRE_NO_START_OPTIMIZE |
2046 |
|
|
2047 |
There are a number of optimizations that pcre_exec() uses at the start |
There are a number of optimizations that pcre_exec() uses at the start |
2048 |
of a match, in order to speed up the process. For example, if it is |
of a match, in order to speed up the process. For example, if it is |
2049 |
known that a match must start with a specific character, it searches |
known that a match must start with a specific character, it searches |
2050 |
the subject for that character, and fails immediately if it cannot find |
the subject for that character, and fails immediately if it cannot find |
2051 |
it, without actually running the main matching function. When callouts |
it, without actually running the main matching function. When callouts |
2052 |
are in use, these optimizations can cause them to be skipped. This |
are in use, these optimizations can cause them to be skipped. This |
2053 |
option disables the "start-up" optimizations, causing performance to |
option disables the "start-up" optimizations, causing performance to |
2054 |
suffer, but ensuring that the callouts do occur. |
suffer, but ensuring that the callouts do occur. |
2055 |
|
|
2056 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
2057 |
|
|
2058 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a |
When PCRE_UTF8 is set at compile time, the validity of the subject as a |
2059 |
UTF-8 string is automatically checked when pcre_exec() is subsequently |
UTF-8 string is automatically checked when pcre_exec() is subsequently |
2060 |
called. The value of startoffset is also checked to ensure that it |
called. The value of startoffset is also checked to ensure that it |
2061 |
points to the start of a UTF-8 character. There is a discussion about |
points to the start of a UTF-8 character. There is a discussion about |
2062 |
the validity of UTF-8 strings in the section on UTF-8 support in the |
the validity of UTF-8 strings in the section on UTF-8 support in the |
2063 |
main pcre page. If an invalid UTF-8 sequence of bytes is found, |
main pcre page. If an invalid UTF-8 sequence of bytes is found, |
2064 |
pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con- |
pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con- |
2065 |
tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
2066 |
|
|
2067 |
If you already know that your subject is valid, and you want to skip |
If you already know that your subject is valid, and you want to skip |
2068 |
these checks for performance reasons, you can set the |
these checks for performance reasons, you can set the |
2069 |
PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to |
PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to |
2070 |
do this for the second and subsequent calls to pcre_exec() if you are |
do this for the second and subsequent calls to pcre_exec() if you are |
2071 |
making repeated calls to find all the matches in a single subject |
making repeated calls to find all the matches in a single subject |
2072 |
string. However, you should be sure that the value of startoffset |
string. However, you should be sure that the value of startoffset |
2073 |
points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is |
points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is |
2074 |
set, the effect of passing an invalid UTF-8 string as a subject, or a |
set, the effect of passing an invalid UTF-8 string as a subject, or a |
2075 |
value of startoffset that does not point to the start of a UTF-8 char- |
value of startoffset that does not point to the start of a UTF-8 char- |
2076 |
acter, is undefined. Your program may crash. |
acter, is undefined. Your program may crash. |
2077 |
|
|
2078 |
PCRE_PARTIAL |
PCRE_PARTIAL_HARD |
2079 |
|
PCRE_PARTIAL_SOFT |
2080 |
|
|
2081 |
This option turns on the partial matching feature. If the subject |
These options turn on the partial matching feature. For backwards com- |
2082 |
string fails to match the pattern, but at some point during the match- |
patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial |
2083 |
ing process the end of the subject was reached (that is, the subject |
match occurs if the end of the subject string is reached successfully, |
2084 |
partially matches the pattern and the failure to match occurred only |
but there are not enough subject characters to complete the match. If |
2085 |
because there were not enough subject characters), pcre_exec() returns |
this happens when PCRE_PARTIAL_HARD is set, pcre_exec() immediately |
2086 |
PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is |
returns PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, |
2087 |
used, there are restrictions on what may appear in the pattern. These |
matching continues by testing any other alternatives. Only if they all |
2088 |
are discussed in the pcrepartial documentation. |
fail is PCRE_ERROR_PARTIAL returned (instead of PCRE_ERROR_NOMATCH). |
2089 |
|
The portion of the string that was inspected when the partial match was |
2090 |
|
found is set as the first matching string. There is a more detailed |
2091 |
|
discussion in the pcrepartial documentation. |
2092 |
|
|
2093 |
The string to be matched by pcre_exec() |
The string to be matched by pcre_exec() |
2094 |
|
|
2274 |
|
|
2275 |
PCRE_ERROR_BADPARTIAL (-13) |
PCRE_ERROR_BADPARTIAL (-13) |
2276 |
|
|
2277 |
The PCRE_PARTIAL option was used with a compiled pattern containing |
This code is no longer in use. It was formerly returned when the |
2278 |
items that are not supported for partial matching. See the pcrepartial |
PCRE_PARTIAL option was used with a compiled pattern containing items |
2279 |
documentation for details of partial matching. |
that were not supported for partial matching. From release 8.00 |
2280 |
|
onwards, there are no restrictions on partial matching. |
2281 |
|
|
2282 |
PCRE_ERROR_INTERNAL (-14) |
PCRE_ERROR_INTERNAL (-14) |
2283 |
|
|
2284 |
An unexpected internal error has occurred. This error could be caused |
An unexpected internal error has occurred. This error could be caused |
2285 |
by a bug in PCRE or by overwriting of the compiled pattern. |
by a bug in PCRE or by overwriting of the compiled pattern. |
2286 |
|
|
2287 |
PCRE_ERROR_BADCOUNT (-15) |
PCRE_ERROR_BADCOUNT (-15) |
2291 |
PCRE_ERROR_RECURSIONLIMIT (-21) |
PCRE_ERROR_RECURSIONLIMIT (-21) |
2292 |
|
|
2293 |
The internal recursion limit, as specified by the match_limit_recursion |
The internal recursion limit, as specified by the match_limit_recursion |
2294 |
field in a pcre_extra structure (or defaulted) was reached. See the |
field in a pcre_extra structure (or defaulted) was reached. See the |
2295 |
description above. |
description above. |
2296 |
|
|
2297 |
PCRE_ERROR_BADNEWLINE (-23) |
PCRE_ERROR_BADNEWLINE (-23) |
2314 |
int pcre_get_substring_list(const char *subject, |
int pcre_get_substring_list(const char *subject, |
2315 |
int *ovector, int stringcount, const char ***listptr); |
int *ovector, int stringcount, const char ***listptr); |
2316 |
|
|
2317 |
Captured substrings can be accessed directly by using the offsets |
Captured substrings can be accessed directly by using the offsets |
2318 |
returned by pcre_exec() in ovector. For convenience, the functions |
returned by pcre_exec() in ovector. For convenience, the functions |
2319 |
pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- |
pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- |
2320 |
string_list() are provided for extracting captured substrings as new, |
string_list() are provided for extracting captured substrings as new, |
2321 |
separate, zero-terminated strings. These functions identify substrings |
separate, zero-terminated strings. These functions identify substrings |
2322 |
by number. The next section describes functions for extracting named |
by number. The next section describes functions for extracting named |
2323 |
substrings. |
substrings. |
2324 |
|
|
2325 |
A substring that contains a binary zero is correctly extracted and has |
A substring that contains a binary zero is correctly extracted and has |
2326 |
a further zero added on the end, but the result is not, of course, a C |
a further zero added on the end, but the result is not, of course, a C |
2327 |
string. However, you can process such a string by referring to the |
string. However, you can process such a string by referring to the |
2328 |
length that is returned by pcre_copy_substring() and pcre_get_sub- |
length that is returned by pcre_copy_substring() and pcre_get_sub- |
2329 |
string(). Unfortunately, the interface to pcre_get_substring_list() is |
string(). Unfortunately, the interface to pcre_get_substring_list() is |
2330 |
not adequate for handling strings containing binary zeros, because the |
not adequate for handling strings containing binary zeros, because the |
2331 |
end of the final string is not independently indicated. |
end of the final string is not independently indicated. |
2332 |
|
|
2333 |
The first three arguments are the same for all three of these func- |
The first three arguments are the same for all three of these func- |
2334 |
tions: subject is the subject string that has just been successfully |
tions: subject is the subject string that has just been successfully |
2335 |
matched, ovector is a pointer to the vector of integer offsets that was |
matched, ovector is a pointer to the vector of integer offsets that was |
2336 |
passed to pcre_exec(), and stringcount is the number of substrings that |
passed to pcre_exec(), and stringcount is the number of substrings that |
2337 |
were captured by the match, including the substring that matched the |
were captured by the match, including the substring that matched the |
2338 |
entire regular expression. This is the value returned by pcre_exec() if |
entire regular expression. This is the value returned by pcre_exec() if |
2339 |
it is greater than zero. If pcre_exec() returned zero, indicating that |
it is greater than zero. If pcre_exec() returned zero, indicating that |
2340 |
it ran out of space in ovector, the value passed as stringcount should |
it ran out of space in ovector, the value passed as stringcount should |
2341 |
be the number of elements in the vector divided by three. |
be the number of elements in the vector divided by three. |
2342 |
|
|
2343 |
The functions pcre_copy_substring() and pcre_get_substring() extract a |
The functions pcre_copy_substring() and pcre_get_substring() extract a |
2344 |
single substring, whose number is given as stringnumber. A value of |
single substring, whose number is given as stringnumber. A value of |
2345 |
zero extracts the substring that matched the entire pattern, whereas |
zero extracts the substring that matched the entire pattern, whereas |
2346 |
higher values extract the captured substrings. For pcre_copy_sub- |
higher values extract the captured substrings. For pcre_copy_sub- |
2347 |
string(), the string is placed in buffer, whose length is given by |
string(), the string is placed in buffer, whose length is given by |
2348 |
buffersize, while for pcre_get_substring() a new block of memory is |
buffersize, while for pcre_get_substring() a new block of memory is |
2349 |
obtained via pcre_malloc, and its address is returned via stringptr. |
obtained via pcre_malloc, and its address is returned via stringptr. |
2350 |
The yield of the function is the length of the string, not including |
The yield of the function is the length of the string, not including |
2351 |
the terminating zero, or one of these error codes: |
the terminating zero, or one of these error codes: |
2352 |
|
|
2353 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2354 |
|
|
2355 |
The buffer was too small for pcre_copy_substring(), or the attempt to |
The buffer was too small for pcre_copy_substring(), or the attempt to |
2356 |
get memory failed for pcre_get_substring(). |
get memory failed for pcre_get_substring(). |
2357 |
|
|
2358 |
PCRE_ERROR_NOSUBSTRING (-7) |
PCRE_ERROR_NOSUBSTRING (-7) |
2359 |
|
|
2360 |
There is no substring whose number is stringnumber. |
There is no substring whose number is stringnumber. |
2361 |
|
|
2362 |
The pcre_get_substring_list() function extracts all available sub- |
The pcre_get_substring_list() function extracts all available sub- |
2363 |
strings and builds a list of pointers to them. All this is done in a |
strings and builds a list of pointers to them. All this is done in a |
2364 |
single block of memory that is obtained via pcre_malloc. The address of |
single block of memory that is obtained via pcre_malloc. The address of |
2365 |
the memory block is returned via listptr, which is also the start of |
the memory block is returned via listptr, which is also the start of |
2366 |
the list of string pointers. The end of the list is marked by a NULL |
the list of string pointers. The end of the list is marked by a NULL |
2367 |
pointer. The yield of the function is zero if all went well, or the |
pointer. The yield of the function is zero if all went well, or the |
2368 |
error code |
error code |
2369 |
|
|
2370 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2371 |
|
|
2372 |
if the attempt to get the memory block failed. |
if the attempt to get the memory block failed. |
2373 |
|
|
2374 |
When any of these functions encounter a substring that is unset, which |
When any of these functions encounter a substring that is unset, which |
2375 |
can happen when capturing subpattern number n+1 matches some part of |
can happen when capturing subpattern number n+1 matches some part of |
2376 |
the subject, but subpattern n has not been used at all, they return an |
the subject, but subpattern n has not been used at all, they return an |
2377 |
empty string. This can be distinguished from a genuine zero-length sub- |
empty string. This can be distinguished from a genuine zero-length sub- |
2378 |
string by inspecting the appropriate offset in ovector, which is nega- |
string by inspecting the appropriate offset in ovector, which is nega- |
2379 |
tive for unset substrings. |
tive for unset substrings. |
2380 |
|
|
2381 |
The two convenience functions pcre_free_substring() and pcre_free_sub- |
The two convenience functions pcre_free_substring() and pcre_free_sub- |
2382 |
string_list() can be used to free the memory returned by a previous |
string_list() can be used to free the memory returned by a previous |
2383 |
call of pcre_get_substring() or pcre_get_substring_list(), respec- |
call of pcre_get_substring() or pcre_get_substring_list(), respec- |
2384 |
tively. They do nothing more than call the function pointed to by |
tively. They do nothing more than call the function pointed to by |
2385 |
pcre_free, which of course could be called directly from a C program. |
pcre_free, which of course could be called directly from a C program. |
2386 |
However, PCRE is used in some situations where it is linked via a spe- |
However, PCRE is used in some situations where it is linked via a spe- |
2387 |
cial interface to another programming language that cannot use |
cial interface to another programming language that cannot use |
2388 |
pcre_free directly; it is for these cases that the functions are pro- |
pcre_free directly; it is for these cases that the functions are pro- |
2389 |
vided. |
vided. |
2390 |
|
|
2391 |
|
|
2404 |
int stringcount, const char *stringname, |
int stringcount, const char *stringname, |
2405 |
const char **stringptr); |
const char **stringptr); |
2406 |
|
|
2407 |
To extract a substring by name, you first have to find associated num- |
To extract a substring by name, you first have to find associated num- |
2408 |
ber. For example, for this pattern |
ber. For example, for this pattern |
2409 |
|
|
2410 |
(a+)b(?<xxx>\d+)... |
(a+)b(?<xxx>\d+)... |
2413 |
be unique (PCRE_DUPNAMES was not set), you can find the number from the |
be unique (PCRE_DUPNAMES was not set), you can find the number from the |
2414 |
name by calling pcre_get_stringnumber(). The first argument is the com- |
name by calling pcre_get_stringnumber(). The first argument is the com- |
2415 |
piled pattern, and the second is the name. The yield of the function is |
piled pattern, and the second is the name. The yield of the function is |
2416 |
the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no |
the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no |
2417 |
subpattern of that name. |
subpattern of that name. |
2418 |
|
|
2419 |
Given the number, you can extract the substring directly, or use one of |
Given the number, you can extract the substring directly, or use one of |
2420 |
the functions described in the previous section. For convenience, there |
the functions described in the previous section. For convenience, there |
2421 |
are also two functions that do the whole job. |
are also two functions that do the whole job. |
2422 |
|
|
2423 |
Most of the arguments of pcre_copy_named_substring() and |
Most of the arguments of pcre_copy_named_substring() and |
2424 |
pcre_get_named_substring() are the same as those for the similarly |
pcre_get_named_substring() are the same as those for the similarly |
2425 |
named functions that extract by number. As these are described in the |
named functions that extract by number. As these are described in the |
2426 |
previous section, they are not re-described here. There are just two |
previous section, they are not re-described here. There are just two |
2427 |
differences: |
differences: |
2428 |
|
|
2429 |
First, instead of a substring number, a substring name is given. Sec- |
First, instead of a substring number, a substring name is given. Sec- |
2430 |
ond, there is an extra argument, given at the start, which is a pointer |
ond, there is an extra argument, given at the start, which is a pointer |
2431 |
to the compiled pattern. This is needed in order to gain access to the |
to the compiled pattern. This is needed in order to gain access to the |
2432 |
name-to-number translation table. |
name-to-number translation table. |
2433 |
|
|
2434 |
These functions call pcre_get_stringnumber(), and if it succeeds, they |
These functions call pcre_get_stringnumber(), and if it succeeds, they |
2435 |
then call pcre_copy_substring() or pcre_get_substring(), as appropri- |
then call pcre_copy_substring() or pcre_get_substring(), as appropri- |
2436 |
ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the |
ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the |
2437 |
behaviour may not be what you want (see the next section). |
behaviour may not be what you want (see the next section). |
2438 |
|
|
2439 |
Warning: If the pattern uses the "(?|" feature to set up multiple sub- |
Warning: If the pattern uses the "(?|" feature to set up multiple sub- |
2440 |
patterns with the same number, you cannot use names to distinguish |
patterns with the same number, you cannot use names to distinguish |
2441 |
them, because names are not included in the compiled code. The matching |
them, because names are not included in the compiled code. The matching |
2442 |
process uses only numbers. |
process uses only numbers. |
2443 |
|
|
2447 |
int pcre_get_stringtable_entries(const pcre *code, |
int pcre_get_stringtable_entries(const pcre *code, |
2448 |
const char *name, char **first, char **last); |
const char *name, char **first, char **last); |
2449 |
|
|
2450 |
When a pattern is compiled with the PCRE_DUPNAMES option, names for |
When a pattern is compiled with the PCRE_DUPNAMES option, names for |
2451 |
subpatterns are not required to be unique. Normally, patterns with |
subpatterns are not required to be unique. Normally, patterns with |
2452 |
duplicate names are such that in any one match, only one of the named |
duplicate names are such that in any one match, only one of the named |
2453 |
subpatterns participates. An example is shown in the pcrepattern docu- |
subpatterns participates. An example is shown in the pcrepattern docu- |
2454 |
mentation. |
mentation. |
2455 |
|
|
2456 |
When duplicates are present, pcre_copy_named_substring() and |
When duplicates are present, pcre_copy_named_substring() and |
2457 |
pcre_get_named_substring() return the first substring corresponding to |
pcre_get_named_substring() return the first substring corresponding to |
2458 |
the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING |
the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING |
2459 |
(-7) is returned; no data is returned. The pcre_get_stringnumber() |
(-7) is returned; no data is returned. The pcre_get_stringnumber() |
2460 |
function returns one of the numbers that are associated with the name, |
function returns one of the numbers that are associated with the name, |
2461 |
but it is not defined which it is. |
but it is not defined which it is. |
2462 |
|
|
2463 |
If you want to get full details of all captured substrings for a given |
If you want to get full details of all captured substrings for a given |
2464 |
name, you must use the pcre_get_stringtable_entries() function. The |
name, you must use the pcre_get_stringtable_entries() function. The |
2465 |
first argument is the compiled pattern, and the second is the name. The |
first argument is the compiled pattern, and the second is the name. The |
2466 |
third and fourth are pointers to variables which are updated by the |
third and fourth are pointers to variables which are updated by the |
2467 |
function. After it has run, they point to the first and last entries in |
function. After it has run, they point to the first and last entries in |
2468 |
the name-to-number table for the given name. The function itself |
the name-to-number table for the given name. The function itself |
2469 |
returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if |
returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if |
2470 |
there are none. The format of the table is described above in the sec- |
there are none. The format of the table is described above in the sec- |
2471 |
tion entitled Information about a pattern. Given all the relevant |
tion entitled Information about a pattern. Given all the relevant |
2472 |
entries for the name, you can extract each of their numbers, and hence |
entries for the name, you can extract each of their numbers, and hence |
2473 |
the captured data, if any. |
the captured data, if any. |
2474 |
|
|
2475 |
|
|
2476 |
FINDING ALL POSSIBLE MATCHES |
FINDING ALL POSSIBLE MATCHES |
2477 |
|
|
2478 |
The traditional matching function uses a similar algorithm to Perl, |
The traditional matching function uses a similar algorithm to Perl, |
2479 |
which stops when it finds the first match, starting at a given point in |
which stops when it finds the first match, starting at a given point in |
2480 |
the subject. If you want to find all possible matches, or the longest |
the subject. If you want to find all possible matches, or the longest |
2481 |
possible match, consider using the alternative matching function (see |
possible match, consider using the alternative matching function (see |
2482 |
below) instead. If you cannot use the alternative function, but still |
below) instead. If you cannot use the alternative function, but still |
2483 |
need to find all possible matches, you can kludge it up by making use |
need to find all possible matches, you can kludge it up by making use |
2484 |
of the callout facility, which is described in the pcrecallout documen- |
of the callout facility, which is described in the pcrecallout documen- |
2485 |
tation. |
tation. |
2486 |
|
|
2487 |
What you have to do is to insert a callout right at the end of the pat- |
What you have to do is to insert a callout right at the end of the pat- |
2488 |
tern. When your callout function is called, extract and save the cur- |
tern. When your callout function is called, extract and save the cur- |
2489 |
rent matched substring. Then return 1, which forces pcre_exec() to |
rent matched substring. Then return 1, which forces pcre_exec() to |
2490 |
backtrack and try other alternatives. Ultimately, when it runs out of |
backtrack and try other alternatives. Ultimately, when it runs out of |
2491 |
matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. |
matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. |
2492 |
|
|
2493 |
|
|
2498 |
int options, int *ovector, int ovecsize, |
int options, int *ovector, int ovecsize, |
2499 |
int *workspace, int wscount); |
int *workspace, int wscount); |
2500 |
|
|
2501 |
The function pcre_dfa_exec() is called to match a subject string |
The function pcre_dfa_exec() is called to match a subject string |
2502 |
against a compiled pattern, using a matching algorithm that scans the |
against a compiled pattern, using a matching algorithm that scans the |
2503 |
subject string just once, and does not backtrack. This has different |
subject string just once, and does not backtrack. This has different |
2504 |
characteristics to the normal algorithm, and is not compatible with |
characteristics to the normal algorithm, and is not compatible with |
2505 |
Perl. Some of the features of PCRE patterns are not supported. Never- |
Perl. Some of the features of PCRE patterns are not supported. Never- |
2506 |
theless, there are times when this kind of matching can be useful. For |
theless, there are times when this kind of matching can be useful. For |
2507 |
a discussion of the two matching algorithms, see the pcrematching docu- |
a discussion of the two matching algorithms, and a list of features |
2508 |
mentation. |
that pcre_dfa_exec() does not support, see the pcrematching documenta- |
2509 |
|
tion. |
2510 |
|
|
2511 |
The arguments for the pcre_dfa_exec() function are the same as for |
The arguments for the pcre_dfa_exec() function are the same as for |
2512 |
pcre_exec(), plus two extras. The ovector argument is used in a differ- |
pcre_exec(), plus two extras. The ovector argument is used in a differ- |
2541 |
|
|
2542 |
The unused bits of the options argument for pcre_dfa_exec() must be |
The unused bits of the options argument for pcre_dfa_exec() must be |
2543 |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- |
2544 |
LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, |
LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, |
2545 |
PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PAR- |
2546 |
three of these are the same as for pcre_exec(), so their description is |
TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
2547 |
not repeated here. |
four of these are exactly the same as for pcre_exec(), so their |
2548 |
|
description is not repeated here. |
2549 |
PCRE_PARTIAL |
|
2550 |
|
PCRE_PARTIAL_HARD |
2551 |
This has the same general effect as it does for pcre_exec(), but the |
PCRE_PARTIAL_SOFT |
2552 |
details are slightly different. When PCRE_PARTIAL is set for |
|
2553 |
pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is converted into |
These have the same general effect as they do for pcre_exec(), but the |
2554 |
PCRE_ERROR_PARTIAL if the end of the subject is reached, there have |
details are slightly different. When PCRE_PARTIAL_HARD is set for |
2555 |
been no complete matches, but there is still at least one matching pos- |
pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of the sub- |
2556 |
sibility. The portion of the string that provided the partial match is |
ject is reached and there is still at least one matching possibility |
2557 |
set as the first matching string. |
that requires additional characters. This happens even if some complete |
2558 |
|
matches have also been found. When PCRE_PARTIAL_SOFT is set, the return |
2559 |
|
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end |
2560 |
|
of the subject is reached, there have been no complete matches, but |
2561 |
|
there is still at least one matching possibility. The portion of the |
2562 |
|
string that was inspected when the longest partial match was found is |
2563 |
|
set as the first matching string in both cases. |
2564 |
|
|
2565 |
PCRE_DFA_SHORTEST |
PCRE_DFA_SHORTEST |
2566 |
|
|
2567 |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to |
2568 |
stop as soon as it has found one match. Because of the way the alterna- |
stop as soon as it has found one match. Because of the way the alterna- |
2569 |
tive algorithm works, this is necessarily the shortest possible match |
tive algorithm works, this is necessarily the shortest possible match |
2570 |
at the first possible matching point in the subject string. |
at the first possible matching point in the subject string. |
2571 |
|
|
2572 |
PCRE_DFA_RESTART |
PCRE_DFA_RESTART |
2573 |
|
|
2574 |
When pcre_dfa_exec() is called with the PCRE_PARTIAL option, and |
When pcre_dfa_exec() returns a partial match, it is possible to call it |
2575 |
returns a partial match, it is possible to call it again, with addi- |
again, with additional subject characters, and have it continue with |
2576 |
tional subject characters, and have it continue with the same match. |
the same match. The PCRE_DFA_RESTART option requests this action; when |
2577 |
The PCRE_DFA_RESTART option requests this action; when it is set, the |
it is set, the workspace and wscount options must reference the same |
2578 |
workspace and wscount options must reference the same vector as before |
vector as before because data about the match so far is left in them |
2579 |
because data about the match so far is left in them after a partial |
after a partial match. There is more discussion of this facility in the |
2580 |
match. There is more discussion of this facility in the pcrepartial |
pcrepartial documentation. |
|
documentation. |
|
2581 |
|
|
2582 |
Successful returns from pcre_dfa_exec() |
Successful returns from pcre_dfa_exec() |
2583 |
|
|
2666 |
|
|
2667 |
REVISION |
REVISION |
2668 |
|
|
2669 |
Last updated: 17 March 2009 |
Last updated: 11 September 2009 |
2670 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
2671 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
2672 |
|
|
2673 |
|
|
2674 |
PCRECALLOUT(3) PCRECALLOUT(3) |
PCRECALLOUT(3) PCRECALLOUT(3) |
2675 |
|
|
2676 |
|
|
2845 |
Last updated: 15 March 2009 |
Last updated: 15 March 2009 |
2846 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
2847 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
2848 |
|
|
2849 |
|
|
2850 |
PCRECOMPAT(3) PCRECOMPAT(3) |
PCRECOMPAT(3) PCRECOMPAT(3) |
2851 |
|
|
2852 |
|
|
2859 |
This document describes the differences in the ways that PCRE and Perl |
This document describes the differences in the ways that PCRE and Perl |
2860 |
handle regular expressions. The differences described here are mainly |
handle regular expressions. The differences described here are mainly |
2861 |
with respect to Perl 5.8, though PCRE versions 7.0 and later contain |
with respect to Perl 5.8, though PCRE versions 7.0 and later contain |
2862 |
some features that are expected to be in the forthcoming Perl 5.10. |
some features that are in Perl 5.10. |
2863 |
|
|
2864 |
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details |
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details |
2865 |
of what it does have are given in the section on UTF-8 support in the |
of what it does have are given in the section on UTF-8 support in the |
2891 |
is built with Unicode character property support. The properties that |
is built with Unicode character property support. The properties that |
2892 |
can be tested with \p and \P are limited to the general category prop- |
can be tested with \p and \P are limited to the general category prop- |
2893 |
erties such as Lu and Nd, script names such as Greek or Han, and the |
erties such as Lu and Nd, script names such as Greek or Han, and the |
2894 |
derived properties Any and L&. |
derived properties Any and L&. PCRE does support the Cs (surrogate) |
2895 |
|
property, which Perl does not; the Perl documentation says "Because |
2896 |
|
Perl hides the need for the user to understand the internal representa- |
2897 |
|
tion of Unicode characters, there is no need to implement the somewhat |
2898 |
|
messy concept of surrogates." |
2899 |
|
|
2900 |
7. PCRE does support the \Q...\E escape for quoting substrings. Charac- |
7. PCRE does support the \Q...\E escape for quoting substrings. Charac- |
2901 |
ters in between are treated as literals. This is slightly different |
ters in between are treated as literals. This is slightly different |
2915 |
|
|
2916 |
8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) |
8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) |
2917 |
constructions. However, there is support for recursive patterns. This |
constructions. However, there is support for recursive patterns. This |
2918 |
is not available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE |
is not available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE |
2919 |
"callout" feature allows an external function to be called during pat- |
"callout" feature allows an external function to be called during pat- |
2920 |
tern matching. See the pcrecallout documentation for details. |
tern matching. See the pcrecallout documentation for details. |
2921 |
|
|
2922 |
9. Subpatterns that are called recursively or as "subroutines" are |
9. Subpatterns that are called recursively or as "subroutines" are |
2923 |
always treated as atomic groups in PCRE. This is like Python, but |
always treated as atomic groups in PCRE. This is like Python, but |
2924 |
unlike Perl. |
unlike Perl. There is a discussion of an example that explains this in |
2925 |
|
more detail in the section on recursion differences from Perl in the |
2926 |
|
pcrecompat page. |
2927 |
|
|
2928 |
10. There are some differences that are concerned with the settings of |
10. There are some differences that are concerned with the settings of |
2929 |
captured strings when part of a pattern is repeated. For example, |
captured strings when part of a pattern is repeated. For example, |
2932 |
|
|
2933 |
11. PCRE does support Perl 5.10's backtracking verbs (*ACCEPT), |
11. PCRE does support Perl 5.10's backtracking verbs (*ACCEPT), |
2934 |
(*FAIL), (*F), (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in |
(*FAIL), (*F), (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in |
2935 |
the forms without an argument. PCRE does not support (*MARK). If |
the forms without an argument. PCRE does not support (*MARK). |
|
(*ACCEPT) is within capturing parentheses, PCRE does not set that cap- |
|
|
ture group; this is different to Perl. |
|
2936 |
|
|
2937 |
12. PCRE provides some extensions to the Perl regular expression facil- |
12. PCRE provides some extensions to the Perl regular expression facil- |
2938 |
ities. Perl 5.10 will include new features that are not in earlier |
ities. Perl 5.10 will include new features that are not in earlier |
2957 |
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be |
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be |
2958 |
tried only at the first matching position in the subject string. |
tried only at the first matching position in the subject string. |
2959 |
|
|
2960 |
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP- |
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
2961 |
TURE options for pcre_exec() have no Perl equivalents. |
and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no Perl equiva- |
2962 |
|
lents. |
2963 |
|
|
2964 |
(g) The \R escape sequence can be restricted to match only CR, LF, or |
(g) The \R escape sequence can be restricted to match only CR, LF, or |
2965 |
CRLF by the PCRE_BSR_ANYCRLF option. |
CRLF by the PCRE_BSR_ANYCRLF option. |
2966 |
|
|
2967 |
(h) The callout facility is PCRE-specific. |
(h) The callout facility is PCRE-specific. |
2971 |
(j) Patterns compiled by PCRE can be saved and re-used at a later time, |
(j) Patterns compiled by PCRE can be saved and re-used at a later time, |
2972 |
even on different hosts that have the other endianness. |
even on different hosts that have the other endianness. |
2973 |
|
|
2974 |
(k) The alternative matching function (pcre_dfa_exec()) matches in a |
(k) The alternative matching function (pcre_dfa_exec()) matches in a |
2975 |
different way and is not Perl-compatible. |
different way and is not Perl-compatible. |
2976 |
|
|
2977 |
(l) PCRE recognizes some special sequences such as (*CR) at the start |
(l) PCRE recognizes some special sequences such as (*CR) at the start |
2978 |
of a pattern that set overall options that cannot be changed within the |
of a pattern that set overall options that cannot be changed within the |
2979 |
pattern. |
pattern. |
2980 |
|
|
2988 |
|
|
2989 |
REVISION |
REVISION |
2990 |
|
|
2991 |
Last updated: 11 September 2007 |
Last updated: 18 September 2009 |
2992 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
2993 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
2994 |
|
|
2995 |
|
|
2996 |
PCREPATTERN(3) PCREPATTERN(3) |
PCREPATTERN(3) PCREPATTERN(3) |
2997 |
|
|
2998 |
|
|
3020 |
The original operation of PCRE was on strings of one-byte characters. |
The original operation of PCRE was on strings of one-byte characters. |
3021 |
However, there is now also support for UTF-8 character strings. To use |
However, there is now also support for UTF-8 character strings. To use |
3022 |
this, you must build PCRE to include UTF-8 support, and then call |
this, you must build PCRE to include UTF-8 support, and then call |
3023 |
pcre_compile() with the PCRE_UTF8 option. How this affects pattern |
pcre_compile() with the PCRE_UTF8 option. There is also a special |
3024 |
matching is mentioned in several places below. There is also a summary |
sequence that can be given at the start of a pattern: |
3025 |
of UTF-8 features in the section on UTF-8 support in the main pcre |
|
3026 |
page. |
(*UTF8) |
3027 |
|
|
3028 |
|
Starting a pattern with this sequence is equivalent to setting the |
3029 |
|
PCRE_UTF8 option. This feature is not Perl-compatible. How setting |
3030 |
|
UTF-8 mode affects pattern matching is mentioned in several places |
3031 |
|
below. There is also a summary of UTF-8 features in the section on |
3032 |
|
UTF-8 support in the main pcre page. |
3033 |
|
|
3034 |
The remainder of this document discusses the patterns that are sup- |
The remainder of this document discusses the patterns that are sup- |
3035 |
ported by PCRE when its main matching function, pcre_exec(), is used. |
ported by PCRE when its main matching function, pcre_exec(), is used. |
3507 |
U+D800 to U+DFFF. Such characters are not valid in UTF-8 strings (see |
U+D800 to U+DFFF. Such characters are not valid in UTF-8 strings (see |
3508 |
RFC 3629) and so cannot be tested by PCRE, unless UTF-8 validity check- |
RFC 3629) and so cannot be tested by PCRE, unless UTF-8 validity check- |
3509 |
ing has been turned off (see the discussion of PCRE_NO_UTF8_CHECK in |
ing has been turned off (see the discussion of PCRE_NO_UTF8_CHECK in |
3510 |
the pcreapi page). |
the pcreapi page). Perl does not support the Cs property. |
3511 |
|
|
3512 |
The long synonyms for these properties that Perl supports (such as |
The long synonyms for property names that Perl supports (such as |
3513 |
\p{Letter}) are not supported by PCRE, nor is it permitted to prefix |
\p{Letter}) are not supported by PCRE, nor is it permitted to prefix |
3514 |
any of these properties with "Is". |
any of these properties with "Is". |
3515 |
|
|
3875 |
can be changed in the same way as the Perl-compatible options by using |
can be changed in the same way as the Perl-compatible options by using |
3876 |
the characters J, U and X respectively. |
the characters J, U and X respectively. |
3877 |
|
|
3878 |
When an option change occurs at top level (that is, not inside subpat- |
When one of these option changes occurs at top level (that is, not |
3879 |
tern parentheses), the change applies to the remainder of the pattern |
inside subpattern parentheses), the change applies to the remainder of |
3880 |
that follows. If the change is placed right at the start of a pattern, |
the pattern that follows. If the change is placed right at the start of |
3881 |
PCRE extracts it into the global options (and it will therefore show up |
a pattern, PCRE extracts it into the global options (and it will there- |
3882 |
in data extracted by the pcre_fullinfo() function). |
fore show up in data extracted by the pcre_fullinfo() function). |
3883 |
|
|
3884 |
An option change within a subpattern (see below for a description of |
An option change within a subpattern (see below for a description of |
3885 |
subpatterns) affects only that part of the current pattern that follows |
subpatterns) affects only that part of the current pattern that follows |
3902 |
|
|
3903 |
Note: There are other PCRE-specific options that can be set by the |
Note: There are other PCRE-specific options that can be set by the |
3904 |
application when the compile or match functions are called. In some |
application when the compile or match functions are called. In some |
3905 |
cases the pattern can contain special leading sequences to override |
cases the pattern can contain special leading sequences such as (*CRLF) |
3906 |
what the application has set or what has been defaulted. Details are |
to override what the application has set or what has been defaulted. |
3907 |
given in the section entitled "Newline sequences" above. |
Details are given in the section entitled "Newline sequences" above. |
3908 |
|
There is also the (*UTF8) leading sequence that can be used to set |
3909 |
|
UTF-8 mode; this is equivalent to setting the PCRE_UTF8 option. |
3910 |
|
|
3911 |
|
|
3912 |
SUBPATTERNS |
SUBPATTERNS |
4734 |
Obviously, PCRE cannot support the interpolation of Perl code. Instead, |
Obviously, PCRE cannot support the interpolation of Perl code. Instead, |
4735 |
it supports special syntax for recursion of the entire pattern, and |
it supports special syntax for recursion of the entire pattern, and |
4736 |
also for individual subpattern recursion. After its introduction in |
also for individual subpattern recursion. After its introduction in |
4737 |
PCRE and Python, this kind of recursion was introduced into Perl at |
PCRE and Python, this kind of recursion was subsequently introduced |
4738 |
release 5.10. |
into Perl at release 5.10. |
4739 |
|
|
4740 |
A special item that consists of (? followed by a number greater than |
A special item that consists of (? followed by a number greater than |
4741 |
zero and a closing parenthesis is a recursive call of the subpattern of |
zero and a closing parenthesis is a recursive call of the subpattern of |
4744 |
tion.) The special item (?R) or (?0) is a recursive call of the entire |
tion.) The special item (?R) or (?0) is a recursive call of the entire |
4745 |
regular expression. |
regular expression. |
4746 |
|
|
4747 |
In PCRE (like Python, but unlike Perl), a recursive subpattern call is |
This PCRE pattern solves the nested parentheses problem (assume the |
|
always treated as an atomic group. That is, once it has matched some of |
|
|
the subject string, it is never re-entered, even if it contains untried |
|
|
alternatives and there is a subsequent matching failure. |
|
|
|
|
|
This PCRE pattern solves the nested parentheses problem (assume the |
|
4748 |
PCRE_EXTENDED option is set so that white space is ignored): |
PCRE_EXTENDED option is set so that white space is ignored): |
4749 |
|
|
4750 |
\( ( (?>[^()]+) | (?R) )* \) |
\( ( (?>[^()]+) | (?R) )* \) |
4751 |
|
|
4752 |
First it matches an opening parenthesis. Then it matches any number of |
First it matches an opening parenthesis. Then it matches any number of |
4753 |
substrings which can either be a sequence of non-parentheses, or a |
substrings which can either be a sequence of non-parentheses, or a |
4754 |
recursive match of the pattern itself (that is, a correctly parenthe- |
recursive match of the pattern itself (that is, a correctly parenthe- |
4755 |
sized substring). Finally there is a closing parenthesis. |
sized substring). Finally there is a closing parenthesis. |
4756 |
|
|
4757 |
If this were part of a larger pattern, you would not want to recurse |
If this were part of a larger pattern, you would not want to recurse |
4758 |
the entire pattern, so instead you could use this: |
the entire pattern, so instead you could use this: |
4759 |
|
|
4760 |
( \( ( (?>[^()]+) | (?1) )* \) ) |
( \( ( (?>[^()]+) | (?1) )* \) ) |
4761 |
|
|
4762 |
We have put the pattern into parentheses, and caused the recursion to |
We have put the pattern into parentheses, and caused the recursion to |
4763 |
refer to them instead of the whole pattern. |
refer to them instead of the whole pattern. |
4764 |
|
|
4765 |
In a larger pattern, keeping track of parenthesis numbers can be |
In a larger pattern, keeping track of parenthesis numbers can be |
4766 |
tricky. This is made easier by the use of relative references. (A Perl |
tricky. This is made easier by the use of relative references. (A Perl |
4767 |
5.10 feature.) Instead of (?1) in the pattern above you can write |
5.10 feature.) Instead of (?1) in the pattern above you can write |
4768 |
(?-2) to refer to the second most recently opened parentheses preceding |
(?-2) to refer to the second most recently opened parentheses preceding |
4769 |
the recursion. In other words, a negative number counts capturing |
the recursion. In other words, a negative number counts capturing |
4770 |
parentheses leftwards from the point at which it is encountered. |
parentheses leftwards from the point at which it is encountered. |
4771 |
|
|
4772 |
It is also possible to refer to subsequently opened parentheses, by |
It is also possible to refer to subsequently opened parentheses, by |
4773 |
writing references such as (?+2). However, these cannot be recursive |
writing references such as (?+2). However, these cannot be recursive |
4774 |
because the reference is not inside the parentheses that are refer- |
because the reference is not inside the parentheses that are refer- |
4775 |
enced. They are always "subroutine" calls, as described in the next |
enced. They are always "subroutine" calls, as described in the next |
4776 |
section. |
section. |
4777 |
|
|
4778 |
An alternative approach is to use named parentheses instead. The Perl |
An alternative approach is to use named parentheses instead. The Perl |
4779 |
syntax for this is (?&name); PCRE's earlier syntax (?P>name) is also |
syntax for this is (?&name); PCRE's earlier syntax (?P>name) is also |
4780 |
supported. We could rewrite the above example as follows: |
supported. We could rewrite the above example as follows: |
4781 |
|
|
4782 |
(?<pn> \( ( (?>[^()]+) | (?&pn) )* \) ) |
(?<pn> \( ( (?>[^()]+) | (?&pn) )* \) ) |
4783 |
|
|
4784 |
If there is more than one subpattern with the same name, the earliest |
If there is more than one subpattern with the same name, the earliest |
4785 |
one is used. |
one is used. |
4786 |
|
|
4787 |
This particular example pattern that we have been looking at contains |
This particular example pattern that we have been looking at contains |
4788 |
nested unlimited repeats, and so the use of atomic grouping for match- |
nested unlimited repeats, and so the use of atomic grouping for match- |
4789 |
ing strings of non-parentheses is important when applying the pattern |
ing strings of non-parentheses is important when applying the pattern |
4790 |
to strings that do not match. For example, when this pattern is applied |
to strings that do not match. For example, when this pattern is applied |
4791 |
to |
to |
4792 |
|
|
4793 |
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() |
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() |
4794 |
|
|
4795 |
it yields "no match" quickly. However, if atomic grouping is not used, |
it yields "no match" quickly. However, if atomic grouping is not used, |
4796 |
the match runs for a very long time indeed because there are so many |
the match runs for a very long time indeed because there are so many |
4797 |
different ways the + and * repeats can carve up the subject, and all |
different ways the + and * repeats can carve up the subject, and all |
4798 |
have to be tested before failure can be reported. |
have to be tested before failure can be reported. |
4799 |
|
|
4800 |
At the end of a match, the values set for any capturing subpatterns are |
At the end of a match, the values set for any capturing subpatterns are |
4801 |
those from the outermost level of the recursion at which the subpattern |
those from the outermost level of the recursion at which the subpattern |
4802 |
value is set. If you want to obtain intermediate values, a callout |
value is set. If you want to obtain intermediate values, a callout |
4803 |
function can be used (see below and the pcrecallout documentation). If |
function can be used (see below and the pcrecallout documentation). If |
4804 |
the pattern above is matched against |
the pattern above is matched against |
4805 |
|
|
4806 |
(ab(cd)ef) |
(ab(cd)ef) |
4807 |
|
|
4808 |
the value for the capturing parentheses is "ef", which is the last |
the value for the capturing parentheses is "ef", which is the last |
4809 |
value taken on at the top level. If additional parentheses are added, |
value taken on at the top level. If additional parentheses are added, |
4810 |
giving |
giving |
4811 |
|
|
4812 |
\( ( ( (?>[^()]+) | (?R) )* ) \) |
\( ( ( (?>[^()]+) | (?R) )* ) \) |
4813 |
^ ^ |
^ ^ |
4814 |
^ ^ |
^ ^ |
4815 |
|
|
4816 |
the string they capture is "ab(cd)ef", the contents of the top level |
the string they capture is "ab(cd)ef", the contents of the top level |
4817 |
parentheses. If there are more than 15 capturing parentheses in a pat- |
parentheses. If there are more than 15 capturing parentheses in a pat- |
4818 |
tern, PCRE has to obtain extra memory to store data during a recursion, |
tern, PCRE has to obtain extra memory to store data during a recursion, |
4819 |
which it does by using pcre_malloc, freeing it via pcre_free after- |
which it does by using pcre_malloc, freeing it via pcre_free after- |
4820 |
wards. If no memory can be obtained, the match fails with the |
wards. If no memory can be obtained, the match fails with the |
4821 |
PCRE_ERROR_NOMEMORY error. |
PCRE_ERROR_NOMEMORY error. |
4822 |
|
|
4823 |
Do not confuse the (?R) item with the condition (R), which tests for |
Do not confuse the (?R) item with the condition (R), which tests for |
4824 |
recursion. Consider this pattern, which matches text in angle brack- |
recursion. Consider this pattern, which matches text in angle brack- |
4825 |
ets, allowing for arbitrary nesting. Only digits are allowed in nested |
ets, allowing for arbitrary nesting. Only digits are allowed in nested |
4826 |
brackets (that is, when recursing), whereas any characters are permit- |
brackets (that is, when recursing), whereas any characters are permit- |
4827 |
ted at the outer level. |
ted at the outer level. |
4828 |
|
|
4829 |
< (?: (?(R) \d++ | [^<>]*+) | (?R)) * > |
< (?: (?(R) \d++ | [^<>]*+) | (?R)) * > |
4830 |
|
|
4831 |
In this pattern, (?(R) is the start of a conditional subpattern, with |
In this pattern, (?(R) is the start of a conditional subpattern, with |
4832 |
two different alternatives for the recursive and non-recursive cases. |
two different alternatives for the recursive and non-recursive cases. |
4833 |
The (?R) item is the actual recursive call. |
The (?R) item is the actual recursive call. |
4834 |
|
|
4835 |
|
Recursion difference from Perl |
4836 |
|
|
4837 |
|
In PCRE (like Python, but unlike Perl), a recursive subpattern call is |
4838 |
|
always treated as an atomic group. That is, once it has matched some of |
4839 |
|
the subject string, it is never re-entered, even if it contains untried |
4840 |
|
alternatives and there is a subsequent matching failure. This can be |
4841 |
|
illustrated by the following pattern, which purports to match a palin- |
4842 |
|
dromic string that contains an odd number of characters (for example, |
4843 |
|
"a", "aba", "abcba", "abcdcba"): |
4844 |
|
|
4845 |
|
^(.|(.)(?1)\2)$ |
4846 |
|
|
4847 |
|
The idea is that it either matches a single character, or two identical |
4848 |
|
characters surrounding a sub-palindrome. In Perl, this pattern works; |
4849 |
|
in PCRE it does not if the pattern is longer than three characters. |
4850 |
|
Consider the subject string "abcba": |
4851 |
|
|
4852 |
|
At the top level, the first character is matched, but as it is not at |
4853 |
|
the end of the string, the first alternative fails; the second alterna- |
4854 |
|
tive is taken and the recursion kicks in. The recursive call to subpat- |
4855 |
|
tern 1 successfully matches the next character ("b"). (Note that the |
4856 |
|
beginning and end of line tests are not part of the recursion). |
4857 |
|
|
4858 |
|
Back at the top level, the next character ("c") is compared with what |
4859 |
|
subpattern 2 matched, which was "a". This fails. Because the recursion |
4860 |
|
is treated as an atomic group, there are now no backtracking points, |
4861 |
|
and so the entire match fails. (Perl is able, at this point, to re- |
4862 |
|
enter the recursion and try the second alternative.) However, if the |
4863 |
|
pattern is written with the alternatives in the other order, things are |
4864 |
|
different: |
4865 |
|
|
4866 |
|
^((.)(?1)\2|.)$ |
4867 |
|
|
4868 |
|
This time, the recursing alternative is tried first, and continues to |
4869 |
|
recurse until it runs out of characters, at which point the recursion |
4870 |
|
fails. But this time we do have another alternative to try at the |
4871 |
|
higher level. That is the big difference: in the previous case the |
4872 |
|
remaining alternative is at a deeper recursion level, which PCRE cannot |
4873 |
|
use. |
4874 |
|
|
4875 |
|
To change the pattern so that matches all palindromic strings, not just |
4876 |
|
those with an odd number of characters, it is tempting to change the |
4877 |
|
pattern to this: |
4878 |
|
|
4879 |
|
^((.)(?1)\2|.?)$ |
4880 |
|
|
4881 |
|
Again, this works in Perl, but not in PCRE, and for the same reason. |
4882 |
|
When a deeper recursion has matched a single character, it cannot be |
4883 |
|
entered again in order to match an empty string. The solution is to |
4884 |
|
separate the two cases, and write out the odd and even cases as alter- |
4885 |
|
natives at the higher level: |
4886 |
|
|
4887 |
|
^(?:((.)(?1)\2|)|((.)(?3)\4|.)) |
4888 |
|
|
4889 |
|
If you want to match typical palindromic phrases, the pattern has to |
4890 |
|
ignore all non-word characters, which can be done like this: |
4891 |
|
|
4892 |
|
^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+4|\W*+.\W*+))\W*+$ |
4893 |
|
|
4894 |
|
If run with the PCRE_CASELESS option, this pattern matches phrases such |
4895 |
|
as "A man, a plan, a canal: Panama!" and it works well in both PCRE and |
4896 |
|
Perl. Note the use of the possessive quantifier *+ to avoid backtrack- |
4897 |
|
ing into sequences of non-word characters. Without this, PCRE takes a |
4898 |
|
great deal longer (ten times or more) to match typical phrases, and |
4899 |
|
Perl takes so long that you think it has gone into a loop. |
4900 |
|
|
4901 |
|
|
4902 |
SUBPATTERNS AS SUBROUTINES |
SUBPATTERNS AS SUBROUTINES |
4903 |
|
|
4904 |
If the syntax for a recursive subpattern reference (either by number or |
If the syntax for a recursive subpattern reference (either by number or |
4905 |
by name) is used outside the parentheses to which it refers, it oper- |
by name) is used outside the parentheses to which it refers, it oper- |
4906 |
ates like a subroutine in a programming language. The "called" subpat- |
ates like a subroutine in a programming language. The "called" subpat- |
4907 |
tern may be defined before or after the reference. A numbered reference |
tern may be defined before or after the reference. A numbered reference |
4908 |
can be absolute or relative, as in these examples: |
can be absolute or relative, as in these examples: |
4909 |
|
|
4915 |
|
|
4916 |
(sens|respons)e and \1ibility |
(sens|respons)e and \1ibility |
4917 |
|
|
4918 |
matches "sense and sensibility" and "response and responsibility", but |
matches "sense and sensibility" and "response and responsibility", but |
4919 |
not "sense and responsibility". If instead the pattern |
not "sense and responsibility". If instead the pattern |
4920 |
|
|
4921 |
(sens|respons)e and (?1)ibility |
(sens|respons)e and (?1)ibility |
4922 |
|
|
4923 |
is used, it does match "sense and responsibility" as well as the other |
is used, it does match "sense and responsibility" as well as the other |
4924 |
two strings. Another example is given in the discussion of DEFINE |
two strings. Another example is given in the discussion of DEFINE |
4925 |
above. |
above. |
4926 |
|
|
4927 |
Like recursive subpatterns, a "subroutine" call is always treated as an |
Like recursive subpatterns, a "subroutine" call is always treated as an |
4928 |
atomic group. That is, once it has matched some of the subject string, |
atomic group. That is, once it has matched some of the subject string, |
4929 |
it is never re-entered, even if it contains untried alternatives and |
it is never re-entered, even if it contains untried alternatives and |
4930 |
there is a subsequent matching failure. |
there is a subsequent matching failure. |
4931 |
|
|
4932 |
When a subpattern is used as a subroutine, processing options such as |
When a subpattern is used as a subroutine, processing options such as |
4933 |
case-independence are fixed when the subpattern is defined. They cannot |
case-independence are fixed when the subpattern is defined. They cannot |
4934 |
be changed for different calls. For example, consider this pattern: |
be changed for different calls. For example, consider this pattern: |
4935 |
|
|
4936 |
(abc)(?i:(?-1)) |
(abc)(?i:(?-1)) |
4937 |
|
|
4938 |
It matches "abcabc". It does not match "abcABC" because the change of |
It matches "abcabc". It does not match "abcABC" because the change of |
4939 |
processing option does not affect the called subpattern. |
processing option does not affect the called subpattern. |
4940 |
|
|
4941 |
|
|
4942 |
ONIGURUMA SUBROUTINE SYNTAX |
ONIGURUMA SUBROUTINE SYNTAX |
4943 |
|
|
4944 |
For compatibility with Oniguruma, the non-Perl syntax \g followed by a |
For compatibility with Oniguruma, the non-Perl syntax \g followed by a |
4945 |
name or a number enclosed either in angle brackets or single quotes, is |
name or a number enclosed either in angle brackets or single quotes, is |
4946 |
an alternative syntax for referencing a subpattern as a subroutine, |
an alternative syntax for referencing a subpattern as a subroutine, |
4947 |
possibly recursively. Here are two of the examples used above, rewrit- |
possibly recursively. Here are two of the examples used above, rewrit- |
4948 |
ten using this syntax: |
ten using this syntax: |
4949 |
|
|
4950 |
(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) ) |
(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) ) |
4951 |
(sens|respons)e and \g'1'ibility |
(sens|respons)e and \g'1'ibility |
4952 |
|
|
4953 |
PCRE supports an extension to Oniguruma: if a number is preceded by a |
PCRE supports an extension to Oniguruma: if a number is preceded by a |
4954 |
plus or a minus sign it is taken as a relative reference. For example: |
plus or a minus sign it is taken as a relative reference. For example: |
4955 |
|
|
4956 |
(abc)(?i:\g<-1>) |
(abc)(?i:\g<-1>) |
4957 |
|
|
4958 |
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not |
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not |
4959 |
synonymous. The former is a back reference; the latter is a subroutine |
synonymous. The former is a back reference; the latter is a subroutine |
4960 |
call. |
call. |
4961 |
|
|
4962 |
|
|
4963 |
CALLOUTS |
CALLOUTS |
4964 |
|
|
4965 |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary |
Perl has a feature whereby using the sequence (?{...}) causes arbitrary |
4966 |
Perl code to be obeyed in the middle of matching a regular expression. |
Perl code to be obeyed in the middle of matching a regular expression. |
4967 |
This makes it possible, amongst other things, to extract different sub- |
This makes it possible, amongst other things, to extract different sub- |
4968 |
strings that match the same pair of parentheses when there is a repeti- |
strings that match the same pair of parentheses when there is a repeti- |
4969 |
tion. |
tion. |
4970 |
|
|
4971 |
PCRE provides a similar feature, but of course it cannot obey arbitrary |
PCRE provides a similar feature, but of course it cannot obey arbitrary |
4972 |
Perl code. The feature is called "callout". The caller of PCRE provides |
Perl code. The feature is called "callout". The caller of PCRE provides |
4973 |
an external function by putting its entry point in the global variable |
an external function by putting its entry point in the global variable |
4974 |
pcre_callout. By default, this variable contains NULL, which disables |
pcre_callout. By default, this variable contains NULL, which disables |
4975 |
all calling out. |
all calling out. |
4976 |
|
|
4977 |
Within a regular expression, (?C) indicates the points at which the |
Within a regular expression, (?C) indicates the points at which the |
4978 |
external function is to be called. If you want to identify different |
external function is to be called. If you want to identify different |
4979 |
callout points, you can put a number less than 256 after the letter C. |
callout points, you can put a number less than 256 after the letter C. |
4980 |
The default value is zero. For example, this pattern has two callout |
The default value is zero. For example, this pattern has two callout |
4981 |
points: |
points: |
4982 |
|
|
4983 |
(?C1)abc(?C2)def |
(?C1)abc(?C2)def |
4984 |
|
|
4985 |
If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are |
If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are |
4986 |
automatically installed before each item in the pattern. They are all |
automatically installed before each item in the pattern. They are all |
4987 |
numbered 255. |
numbered 255. |
4988 |
|
|
4989 |
During matching, when PCRE reaches a callout point (and pcre_callout is |
During matching, when PCRE reaches a callout point (and pcre_callout is |
4990 |
set), the external function is called. It is provided with the number |
set), the external function is called. It is provided with the number |
4991 |
of the callout, the position in the pattern, and, optionally, one item |
of the callout, the position in the pattern, and, optionally, one item |
4992 |
of data originally supplied by the caller of pcre_exec(). The callout |
of data originally supplied by the caller of pcre_exec(). The callout |
4993 |
function may cause matching to proceed, to backtrack, or to fail alto- |
function may cause matching to proceed, to backtrack, or to fail alto- |
4994 |
gether. A complete description of the interface to the callout function |
gether. A complete description of the interface to the callout function |
4995 |
is given in the pcrecallout documentation. |
is given in the pcrecallout documentation. |
4996 |
|
|
4997 |
|
|
4998 |
BACKTRACKING CONTROL |
BACKTRACKING CONTROL |
4999 |
|
|
5000 |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", |
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", |
5001 |
which are described in the Perl documentation as "experimental and sub- |
which are described in the Perl documentation as "experimental and sub- |
5002 |
ject to change or removal in a future version of Perl". It goes on to |
ject to change or removal in a future version of Perl". It goes on to |
5003 |
say: "Their usage in production code should be noted to avoid problems |
say: "Their usage in production code should be noted to avoid problems |
5004 |
during upgrades." The same remarks apply to the PCRE features described |
during upgrades." The same remarks apply to the PCRE features described |
5005 |
in this section. |
in this section. |
5006 |
|
|
5007 |
Since these verbs are specifically related to backtracking, most of |
Since these verbs are specifically related to backtracking, most of |
5008 |
them can be used only when the pattern is to be matched using |
them can be used only when the pattern is to be matched using |
5009 |
pcre_exec(), which uses a backtracking algorithm. With the exception of |
pcre_exec(), which uses a backtracking algorithm. With the exception of |
5010 |
(*FAIL), which behaves like a failing negative assertion, they cause an |
(*FAIL), which behaves like a failing negative assertion, they cause an |
5011 |
error if encountered by pcre_dfa_exec(). |
error if encountered by pcre_dfa_exec(). |
5012 |
|
|
5013 |
|
If any of these verbs are used in an assertion subpattern, their effect |
5014 |
|
is confined to that subpattern; it does not extend to the surrounding |
5015 |
|
pattern. Note that assertion subpatterns are processed as anchored at |
5016 |
|
the point where they are tested. |
5017 |
|
|
5018 |
The new verbs make use of what was previously invalid syntax: an open- |
The new verbs make use of what was previously invalid syntax: an open- |
5019 |
ing parenthesis followed by an asterisk. In Perl, they are generally of |
ing parenthesis followed by an asterisk. In Perl, they are generally of |
5020 |
the form (*VERB:ARG) but PCRE does not support the use of arguments, so |
the form (*VERB:ARG) but PCRE does not support the use of arguments, so |
5029 |
|
|
5030 |
This verb causes the match to end successfully, skipping the remainder |
This verb causes the match to end successfully, skipping the remainder |
5031 |
of the pattern. When inside a recursion, only the innermost pattern is |
of the pattern. When inside a recursion, only the innermost pattern is |
5032 |
ended immediately. PCRE differs from Perl in what happens if the |
ended immediately. If the (*ACCEPT) is inside capturing parentheses, |
5033 |
(*ACCEPT) is inside capturing parentheses. In Perl, the data so far is |
the data so far is captured. (This feature was added to PCRE at release |
5034 |
captured: in PCRE no data is captured. For example: |
8.00.) For example: |
5035 |
|
|
5036 |
A(A|B(*ACCEPT)|C)D |
A((?:A|B(*ACCEPT)|C)D) |
5037 |
|
|
5038 |
This matches "AB", "AAD", or "ACD", but when it matches "AB", no data |
This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap- |
5039 |
is captured. |
tured by the outer parentheses. |
5040 |
|
|
5041 |
(*FAIL) or (*F) |
(*FAIL) or (*F) |
5042 |
|
|
5132 |
|
|
5133 |
REVISION |
REVISION |
5134 |
|
|
5135 |
Last updated: 18 March 2009 |
Last updated: 18 September 2009 |
5136 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
5137 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5138 |
|
|
5139 |
|
|
5140 |
PCRESYNTAX(3) PCRESYNTAX(3) |
PCRESYNTAX(3) PCRESYNTAX(3) |
5141 |
|
|
5142 |
|
|
5245 |
SCRIPT NAMES FOR \p AND \P |
SCRIPT NAMES FOR \p AND \P |
5246 |
|
|
5247 |
Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, |
Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, |
5248 |
Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic, Cuneiform, |
Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cu- |
5249 |
Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, |
neiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, |
5250 |
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira- |
Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, |
5251 |
gana, Inherited, Kannada, Katakana, Kharoshthi, Khmer, Lao, Latin, |
Hebrew, Hiragana, Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi, |
5252 |
Limbu, Linear_B, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, |
Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Malayalam, |
5253 |
Ogham, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, |
Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian, |
5254 |
Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, |
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurash- |
5255 |
Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi. |
tra, Shavian, Sinhala, Sudanese, Syloti_Nagri, Syriac, Tagalog, Tag- |
5256 |
|
banwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, |
5257 |
|
Ugaritic, Vai, Yi. |
5258 |
|
|
5259 |
|
|
5260 |
CHARACTER CLASSES |
CHARACTER CLASSES |
5306 |
|
|
5307 |
ANCHORS AND SIMPLE ASSERTIONS |
ANCHORS AND SIMPLE ASSERTIONS |
5308 |
|
|
5309 |
\b word boundary |
\b word boundary (only ASCII letters recognized) |
5310 |
\B not a word boundary |
\B not a word boundary |
5311 |
^ start of subject |
^ start of subject |
5312 |
also after internal newline in multiline mode |
also after internal newline in multiline mode |
5332 |
|
|
5333 |
CAPTURING |
CAPTURING |
5334 |
|
|
5335 |
(...) capturing group |
(...) capturing group |
5336 |
(?<name>...) named capturing group (Perl) |
(?<name>...) named capturing group (Perl) |
5337 |
(?'name'...) named capturing group (Perl) |
(?'name'...) named capturing group (Perl) |
5338 |
(?P<name>...) named capturing group (Python) |
(?P<name>...) named capturing group (Python) |
5339 |
(?:...) non-capturing group |
(?:...) non-capturing group |
5340 |
(?|...) non-capturing group; reset group numbers for |
(?|...) non-capturing group; reset group numbers for |
5341 |
capturing groups in each alternative |
capturing groups in each alternative |
5342 |
|
|
5343 |
|
|
5344 |
ATOMIC GROUPS |
ATOMIC GROUPS |
5345 |
|
|
5346 |
(?>...) atomic, non-capturing group |
(?>...) atomic, non-capturing group |
5347 |
|
|
5348 |
|
|
5349 |
COMMENT |
COMMENT |
5350 |
|
|
5351 |
(?#....) comment (not nestable) |
(?#....) comment (not nestable) |
5352 |
|
|
5353 |
|
|
5354 |
OPTION SETTING |
OPTION SETTING |
5355 |
|
|
5356 |
(?i) caseless |
(?i) caseless |
5357 |
(?J) allow duplicate names |
(?J) allow duplicate names |
5358 |
(?m) multiline |
(?m) multiline |
5359 |
(?s) single line (dotall) |
(?s) single line (dotall) |
5360 |
(?U) default ungreedy (lazy) |
(?U) default ungreedy (lazy) |
5361 |
(?x) extended (ignore white space) |
(?x) extended (ignore white space) |
5362 |
(?-...) unset option(s) |
(?-...) unset option(s) |
5363 |
|
|
5364 |
|
The following is recognized only at the start of a pattern or after one |
5365 |
|
of the newline-setting options with similar syntax: |
5366 |
|
|
5367 |
|
(*UTF8) set UTF-8 mode |
5368 |
|
|
5369 |
|
|
5370 |
LOOKAHEAD AND LOOKBEHIND ASSERTIONS |
LOOKAHEAD AND LOOKBEHIND ASSERTIONS |
5371 |
|
|
5372 |
(?=...) positive look ahead |
(?=...) positive look ahead |
5373 |
(?!...) negative look ahead |
(?!...) negative look ahead |
5374 |
(?<=...) positive look behind |
(?<=...) positive look behind |
5375 |
(?<!...) negative look behind |
(?<!...) negative look behind |
5376 |
|
|
5377 |
Each top-level branch of a look behind must be of a fixed length. |
Each top-level branch of a look behind must be of a fixed length. |
5378 |
|
|
5379 |
|
|
5380 |
BACKREFERENCES |
BACKREFERENCES |
5381 |
|
|
5382 |
\n reference by number (can be ambiguous) |
\n reference by number (can be ambiguous) |
5383 |
\gn reference by number |
\gn reference by number |
5384 |
\g{n} reference by number |
\g{n} reference by number |
5385 |
\g{-n} relative reference by number |
\g{-n} relative reference by number |
5386 |
\k<name> reference by name (Perl) |
\k<name> reference by name (Perl) |
5387 |
\k'name' reference by name (Perl) |
\k'name' reference by name (Perl) |
5388 |
\g{name} reference by name (Perl) |
\g{name} reference by name (Perl) |
5389 |
\k{name} reference by name (.NET) |
\k{name} reference by name (.NET) |
5390 |
(?P=name) reference by name (Python) |
(?P=name) reference by name (Python) |
5391 |
|
|
5392 |
|
|
5393 |
SUBROUTINE REFERENCES (POSSIBLY RECURSIVE) |
SUBROUTINE REFERENCES (POSSIBLY RECURSIVE) |
5394 |
|
|
5395 |
(?R) recurse whole pattern |
(?R) recurse whole pattern |
5396 |
(?n) call subpattern by absolute number |
(?n) call subpattern by absolute number |
5397 |
(?+n) call subpattern by relative number |
(?+n) call subpattern by relative number |
5398 |
(?-n) call subpattern by relative number |
(?-n) call subpattern by relative number |
5399 |
(?&name) call subpattern by name (Perl) |
(?&name) call subpattern by name (Perl) |
5400 |
(?P>name) call subpattern by name (Python) |
(?P>name) call subpattern by name (Python) |
5401 |
\g<name> call subpattern by name (Oniguruma) |
\g<name> call subpattern by name (Oniguruma) |
5402 |
\g'name' call subpattern by name (Oniguruma) |
\g'name' call subpattern by name (Oniguruma) |
5403 |
\g<n> call subpattern by absolute number (Oniguruma) |
\g<n> call subpattern by absolute number (Oniguruma) |
5404 |
\g'n' call subpattern by absolute number (Oniguruma) |
\g'n' call subpattern by absolute number (Oniguruma) |
5405 |
\g<+n> call subpattern by relative number (PCRE extension) |
\g<+n> call subpattern by relative number (PCRE extension) |
5406 |
\g'+n' call subpattern by relative number (PCRE extension) |
\g'+n' call subpattern by relative number (PCRE extension) |
5407 |
\g<-n> call subpattern by relative number (PCRE extension) |
\g<-n> call subpattern by relative number (PCRE extension) |
5408 |
\g'-n' call subpattern by relative number (PCRE extension) |
\g'-n' call subpattern by relative number (PCRE extension) |
5409 |
|
|
5410 |
|
|
5411 |
CONDITIONAL PATTERNS |
CONDITIONAL PATTERNS |
5413 |
(?(condition)yes-pattern) |
(?(condition)yes-pattern) |
5414 |
(?(condition)yes-pattern|no-pattern) |
(?(condition)yes-pattern|no-pattern) |
5415 |
|
|
5416 |
(?(n)... absolute reference condition |
(?(n)... absolute reference condition |
5417 |
(?(+n)... relative reference condition |
(?(+n)... relative reference condition |
5418 |
(?(-n)... relative reference condition |
(?(-n)... relative reference condition |
5419 |
(?(<name>)... named reference condition (Perl) |
(?(<name>)... named reference condition (Perl) |
5420 |
(?('name')... named reference condition (Perl) |
(?('name')... named reference condition (Perl) |
5421 |
(?(name)... named reference condition (PCRE) |
(?(name)... named reference condition (PCRE) |
5422 |
(?(R)... overall recursion condition |
(?(R)... overall recursion condition |
5423 |
(?(Rn)... specific group recursion condition |
(?(Rn)... specific group recursion condition |
5424 |
(?(R&name)... specific recursion condition |
(?(R&name)... specific recursion condition |
5425 |
(?(DEFINE)... define subpattern for reference |
(?(DEFINE)... define subpattern for reference |
5426 |
(?(assert)... assertion condition |
(?(assert)... assertion condition |
5427 |
|
|
5428 |
|
|
5429 |
BACKTRACKING CONTROL |
BACKTRACKING CONTROL |
5430 |
|
|
5431 |
The following act immediately they are reached: |
The following act immediately they are reached: |
5432 |
|
|
5433 |
(*ACCEPT) force successful match |
(*ACCEPT) force successful match |
5434 |
(*FAIL) force backtrack; synonym (*F) |
(*FAIL) force backtrack; synonym (*F) |
5435 |
|
|
5436 |
The following act only when a subsequent match failure causes a back- |
The following act only when a subsequent match failure causes a back- |
5437 |
track to reach them. They all force a match failure, but they differ in |
track to reach them. They all force a match failure, but they differ in |
5438 |
what happens afterwards. Those that advance the start-of-match point do |
what happens afterwards. Those that advance the start-of-match point do |
5439 |
so only if the pattern is not anchored. |
so only if the pattern is not anchored. |
5440 |
|
|
5441 |
(*COMMIT) overall failure, no advance of starting point |
(*COMMIT) overall failure, no advance of starting point |
5442 |
(*PRUNE) advance to next starting character |
(*PRUNE) advance to next starting character |
5443 |
(*SKIP) advance start to current matching position |
(*SKIP) advance start to current matching position |
5444 |
(*THEN) local failure, backtrack to next alternation |
(*THEN) local failure, backtrack to next alternation |
5445 |
|
|
5446 |
|
|
5447 |
NEWLINE CONVENTIONS |
NEWLINE CONVENTIONS |
5448 |
|
|
5449 |
These are recognized only at the very start of the pattern or after a |
These are recognized only at the very start of the pattern or after a |
5450 |
(*BSR_...) option. |
(*BSR_...) or (*UTF8) option. |
5451 |
|
|
5452 |
(*CR) |
(*CR) carriage return only |
5453 |
(*LF) |
(*LF) linefeed only |
5454 |
(*CRLF) |
(*CRLF) carriage return followed by linefeed |
5455 |
(*ANYCRLF) |
(*ANYCRLF) all three of the above |
5456 |
(*ANY) |
(*ANY) any Unicode newline sequence |
5457 |
|
|
5458 |
|
|
5459 |
WHAT \R MATCHES |
WHAT \R MATCHES |
5460 |
|
|
5461 |
These are recognized only at the very start of the pattern or after a |
These are recognized only at the very start of the pattern or after a |
5462 |
(*...) option that sets the newline convention. |
(*...) option that sets the newline convention or UTF-8 mode. |
5463 |
|
|
5464 |
(*BSR_ANYCRLF) |
(*BSR_ANYCRLF) CR, LF, or CRLF |
5465 |
(*BSR_UNICODE) |
(*BSR_UNICODE) any Unicode newline sequence |
5466 |
|
|
5467 |
|
|
5468 |
CALLOUTS |
CALLOUTS |
5485 |
|
|
5486 |
REVISION |
REVISION |
5487 |
|
|
5488 |
Last updated: 09 April 2008 |
Last updated: 11 April 2009 |
5489 |
Copyright (c) 1997-2008 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
5490 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5491 |
|
|
5492 |
|
|
5493 |
PCREPARTIAL(3) PCREPARTIAL(3) |
PCREPARTIAL(3) PCREPARTIAL(3) |
5494 |
|
|
5495 |
|
|
5513 |
|
|
5514 |
If the application sees the user's keystrokes one by one, and can check |
If the application sees the user's keystrokes one by one, and can check |
5515 |
that what has been typed so far is potentially valid, it is able to |
that what has been typed so far is potentially valid, it is able to |
5516 |
raise an error as soon as a mistake is made, possibly beeping and not |
raise an error as soon as a mistake is made, by beeping and not |
5517 |
reflecting the character that has been typed. This immediate feedback |
reflecting the character that has been typed, for example. This immedi- |
5518 |
is likely to be a better user interface than a check that is delayed |
ate feedback is likely to be a better user interface than a check that |
5519 |
until the entire string has been entered. |
is delayed until the entire string has been entered. Partial matching |
5520 |
|
can also sometimes be useful when the subject string is very long and |
5521 |
PCRE supports the concept of partial matching by means of the PCRE_PAR- |
is not all available at once. |
5522 |
TIAL option, which can be set when calling pcre_exec() or |
|
5523 |
pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code |
PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and |
5524 |
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time |
PCRE_PARTIAL_HARD options, which can be set when calling pcre_exec() or |
5525 |
during the matching process the last part of the subject string matched |
pcre_dfa_exec(). For backwards compatibility, PCRE_PARTIAL is a synonym |
5526 |
part of the pattern. Unfortunately, for non-anchored matching, it is |
for PCRE_PARTIAL_SOFT. The essential difference between the two options |
5527 |
not possible to obtain the position of the start of the partial match. |
is whether or not a partial match is preferred to an alternative com- |
5528 |
No captured data is set when PCRE_ERROR_PARTIAL is returned. |
plete match, though the details differ between the two matching func- |
5529 |
|
tions. If both options are set, PCRE_PARTIAL_HARD takes precedence. |
5530 |
When PCRE_PARTIAL is set for pcre_dfa_exec(), the return code |
|
5531 |
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of |
Setting a partial matching option disables one of PCRE's optimizations. |
5532 |
the subject is reached, there have been no complete matches, but there |
PCRE remembers the last literal byte in a pattern, and abandons match- |
5533 |
is still at least one matching possibility. The portion of the string |
ing immediately if such a byte is not present in the subject string. |
5534 |
that provided the partial match is set as the first matching string. |
This optimization cannot be used for a subject string that might match |
5535 |
|
only partially. |
5536 |
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers |
|
5537 |
the last literal byte in a pattern, and abandons matching immediately |
|
5538 |
if such a byte is not present in the subject string. This optimization |
PARTIAL MATCHING USING pcre_exec() |
5539 |
cannot be used for a subject string that might match only partially. |
|
5540 |
|
A partial match occurs during a call to pcre_exec() whenever the end of |
5541 |
|
the subject string is reached successfully, but matching cannot con- |
5542 |
RESTRICTED PATTERNS FOR PCRE_PARTIAL |
tinue because more characters are needed. However, at least one charac- |
5543 |
|
ter must have been matched. (In other words, a partial match can never |
5544 |
Because of the way certain internal optimizations are implemented in |
be an empty string.) |
5545 |
the pcre_exec() function, the PCRE_PARTIAL option cannot be used with |
|
5546 |
all patterns. These restrictions do not apply when pcre_dfa_exec() is |
If PCRE_PARTIAL_SOFT is set, the partial match is remembered, but |
5547 |
used. For pcre_exec(), repeated single characters such as |
matching continues as normal, and other alternatives in the pattern are |
5548 |
|
tried. If no complete match can be found, pcre_exec() returns |
5549 |
a{2,4} |
PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. If there are at least |
5550 |
|
two slots in the offsets vector, the first of them is set to the offset |
5551 |
and repeated single metasequences such as |
of the earliest character that was inspected when the partial match was |
5552 |
|
found. For convenience, the second offset points to the end of the |
5553 |
\d+ |
string so that a substring can easily be extracted. |
5554 |
|
|
5555 |
are not permitted if the maximum number of occurrences is greater than |
For the majority of patterns, the first offset identifies the start of |
5556 |
one. Optional items such as \d? (where the maximum is one) are permit- |
the partially matched string. However, for patterns that contain look- |
5557 |
ted. Quantifiers with any values are permitted after parentheses, so |
behind assertions, or \K, or begin with \b or \B, earlier characters |
5558 |
the invalid examples above can be coded thus: |
have been inspected while carrying out the match. For example: |
5559 |
|
|
5560 |
(a){2,4} |
/(?<=abc)123/ |
5561 |
(\d)+ |
|
5562 |
|
This pattern matches "123", but only if it is preceded by "abc". If the |
5563 |
These constructions run more slowly, but for the kinds of application |
subject string is "xyzabc12", the offsets after a partial match are for |
5564 |
that are envisaged for this facility, this is not felt to be a major |
the substring "abc12", because all these characters are needed if |
5565 |
restriction. |
another match is tried with extra characters added. |
5566 |
|
|
5567 |
If PCRE_PARTIAL is set for a pattern that does not conform to the |
If there is more than one partial match, the first one that was found |
5568 |
restrictions, pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL |
provides the data that is returned. Consider this pattern: |
5569 |
(-13). You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to |
|
5570 |
find out if a compiled pattern can be used for partial matching. |
/123\w+X|dogY/ |
5571 |
|
|
5572 |
|
If this is matched against the subject string "abc123dog", both alter- |
5573 |
|
natives fail to match, but the end of the subject is reached during |
5574 |
|
matching, so PCRE_ERROR_PARTIAL is returned instead of |
5575 |
|
PCRE_ERROR_NOMATCH. The offsets are set to 3 and 9, identifying |
5576 |
|
"123dog" as the first partial match that was found. (In this example, |
5577 |
|
there are two partial matches, because "dog" on its own partially |
5578 |
|
matches the second alternative.) |
5579 |
|
|
5580 |
|
If PCRE_PARTIAL_HARD is set for pcre_exec(), it returns PCRE_ERROR_PAR- |
5581 |
|
TIAL as soon as a partial match is found, without continuing to search |
5582 |
|
for possible complete matches. The difference between the two options |
5583 |
|
can be illustrated by a pattern such as: |
5584 |
|
|
5585 |
|
/dog(sbody)?/ |
5586 |
|
|
5587 |
|
This matches either "dog" or "dogsbody", greedily (that is, it prefers |
5588 |
|
the longer string if possible). If it is matched against the string |
5589 |
|
"dog" with PCRE_PARTIAL_SOFT, it yields a complete match for "dog". |
5590 |
|
However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. |
5591 |
|
On the other hand, if the pattern is made ungreedy the result is dif- |
5592 |
|
ferent: |
5593 |
|
|
5594 |
|
/dog(sbody)??/ |
5595 |
|
|
5596 |
|
In this case the result is always a complete match because pcre_exec() |
5597 |
|
finds that first, and it never continues after finding a match. It |
5598 |
|
might be easier to follow this explanation by thinking of the two pat- |
5599 |
|
terns like this: |
5600 |
|
|
5601 |
|
/dog(sbody)?/ is the same as /dogsbody|dog/ |
5602 |
|
/dog(sbody)??/ is the same as /dog|dogsbody/ |
5603 |
|
|
5604 |
|
The second pattern will never match "dogsbody" when pcre_exec() is |
5605 |
|
used, because it will always find the shorter match first. |
5606 |
|
|
5607 |
|
|
5608 |
|
PARTIAL MATCHING USING pcre_dfa_exec() |
5609 |
|
|
5610 |
|
The pcre_dfa_exec() function moves along the subject string character |
5611 |
|
by character, without backtracking, searching for all possible matches |
5612 |
|
simultaneously. If the end of the subject is reached before the end of |
5613 |
|
the pattern, there is the possibility of a partial match, again pro- |
5614 |
|
vided that at least one character has matched. |
5615 |
|
|
5616 |
|
When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if |
5617 |
|
there have been no complete matches. Otherwise, the complete matches |
5618 |
|
are returned. However, if PCRE_PARTIAL_HARD is set, a partial match |
5619 |
|
takes precedence over any complete matches. The portion of the string |
5620 |
|
that was inspected when the longest partial match was found is set as |
5621 |
|
the first matching string, provided there are at least two slots in the |
5622 |
|
offsets vector. |
5623 |
|
|
5624 |
|
Because pcre_dfa_exec() always searches for all possible matches, and |
5625 |
|
there is no difference between greedy and ungreedy repetition, its be- |
5626 |
|
haviour is different from pcre_exec when PCRE_PARTIAL_HARD is set. Con- |
5627 |
|
sider the string "dog" matched against the ungreedy pattern shown |
5628 |
|
above: |
5629 |
|
|
5630 |
|
/dog(sbody)??/ |
5631 |
|
|
5632 |
|
Whereas pcre_exec() stops as soon as it finds the complete match for |
5633 |
|
"dog", pcre_dfa_exec() also finds the partial match for "dogsbody", and |
5634 |
|
so returns that when PCRE_PARTIAL_HARD is set. |
5635 |
|
|
5636 |
|
|
5637 |
|
PARTIAL MATCHING AND WORD BOUNDARIES |
5638 |
|
|
5639 |
|
If a pattern ends with one of sequences \w or \W, which test for word |
5640 |
|
boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter- |
5641 |
|
intuitive results. Consider this pattern: |
5642 |
|
|
5643 |
|
/\bcat\b/ |
5644 |
|
|
5645 |
|
This matches "cat", provided there is a word boundary at either end. If |
5646 |
|
the subject string is "the cat", the comparison of the final "t" with a |
5647 |
|
following character cannot take place, so a partial match is found. |
5648 |
|
However, pcre_exec() carries on with normal matching, which matches \b |
5649 |
|
at the end of the subject when the last character is a letter, thus |
5650 |
|
finding a complete match. The result, therefore, is not PCRE_ERROR_PAR- |
5651 |
|
TIAL. The same thing happens with pcre_dfa_exec(), because it also |
5652 |
|
finds the complete match. |
5653 |
|
|
5654 |
|
Using PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, |
5655 |
|
because then the partial match takes precedence. |
5656 |
|
|
5657 |
|
|
5658 |
|
FORMERLY RESTRICTED PATTERNS |
5659 |
|
|
5660 |
|
For releases of PCRE prior to 8.00, because of the way certain internal |
5661 |
|
optimizations were implemented in the pcre_exec() function, the |
5662 |
|
PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be |
5663 |
|
used with all patterns. From release 8.00 onwards, the restrictions no |
5664 |
|
longer apply, and partial matching with pcre_exec() can be requested |
5665 |
|
for any pattern. |
5666 |
|
|
5667 |
|
Items that were formerly restricted were repeated single characters and |
5668 |
|
repeated metasequences. If PCRE_PARTIAL was set for a pattern that did |
5669 |
|
not conform to the restrictions, pcre_exec() returned the error code |
5670 |
|
PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in use. The |
5671 |
|
PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if a compiled |
5672 |
|
pattern can be used for partial matching now always returns 1. |
5673 |
|
|
5674 |
|
|
5675 |
EXAMPLE OF PARTIAL MATCHING USING PCRETEST |
EXAMPLE OF PARTIAL MATCHING USING PCRETEST |
5676 |
|
|
5677 |
If the escape sequence \P is present in a pcretest data line, the |
If the escape sequence \P is present in a pcretest data line, the |
5678 |
PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that |
PCRE_PARTIAL_SOFT option is used for the match. Here is a run of |
5679 |
uses the date example quoted above: |
pcretest that uses the date example quoted above: |
5680 |
|
|
5681 |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
5682 |
data> 25jun04\P |
data> 25jun04\P |
5683 |
0: 25jun04 |
0: 25jun04 |
5684 |
1: jun |
1: jun |
5685 |
data> 25dec3\P |
data> 25dec3\P |
5686 |
Partial match |
Partial match: 23dec3 |
5687 |
data> 3ju\P |
data> 3ju\P |
5688 |
Partial match |
Partial match: 3ju |
5689 |
data> 3juj\P |
data> 3juj\P |
5690 |
No match |
No match |
5691 |
data> j\P |
data> j\P |
5692 |
No match |
No match |
5693 |
|
|
5694 |
The first data string is matched completely, so pcretest shows the |
The first data string is matched completely, so pcretest shows the |
5695 |
matched substrings. The remaining four strings do not match the com- |
matched substrings. The remaining four strings do not match the com- |
5696 |
plete pattern, but the first two are partial matches. The same test, |
plete pattern, but the first two are partial matches. Similar output is |
5697 |
using pcre_dfa_exec() matching (by means of the \D escape sequence), |
obtained when pcre_dfa_exec() is used. |
|
produces the following output: |
|
|
|
|
|
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
|
|
data> 25jun04\P\D |
|
|
0: 25jun04 |
|
|
data> 23dec3\P\D |
|
|
Partial match: 23dec3 |
|
|
data> 3ju\P\D |
|
|
Partial match: 3ju |
|
|
data> 3juj\P\D |
|
|
No match |
|
|
data> j\P\D |
|
|
No match |
|
5698 |
|
|
5699 |
Notice that in this case the portion of the string that was matched is |
If the escape sequence \P is present more than once in a pcretest data |
5700 |
made available. |
line, the PCRE_PARTIAL_HARD option is set for the match. |
5701 |
|
|
5702 |
|
|
5703 |
MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() |
MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() |
5705 |
When a partial match has been found using pcre_dfa_exec(), it is possi- |
When a partial match has been found using pcre_dfa_exec(), it is possi- |
5706 |
ble to continue the match by providing additional subject data and |
ble to continue the match by providing additional subject data and |
5707 |
calling pcre_dfa_exec() again with the same compiled regular expres- |
calling pcre_dfa_exec() again with the same compiled regular expres- |
5708 |
sion, this time setting the PCRE_DFA_RESTART option. You must also pass |
sion, this time setting the PCRE_DFA_RESTART option. You must pass the |
5709 |
the same working space as before, because this is where details of the |
same working space as before, because this is where details of the pre- |
5710 |
previous partial match are stored. Here is an example using pcretest, |
vious partial match are stored. Here is an example using pcretest, |
5711 |
using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and |
using the \R escape sequence to set the PCRE_DFA_RESTART option (\D |
5712 |
\D are as above): |
specifies the use of pcre_dfa_exec()): |
5713 |
|
|
5714 |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
5715 |
data> 23ja\P\D |
data> 23ja\P\D |
5724 |
matched string. It is up to the calling program to do that if it needs |
matched string. It is up to the calling program to do that if it needs |
5725 |
to. |
to. |
5726 |
|
|
5727 |
You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial |
You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with |
5728 |
matching over multiple segments. This facility can be used to pass very |
PCRE_DFA_RESTART to continue partial matching over multiple segments. |
5729 |
long subject strings to pcre_dfa_exec(). However, some care is needed |
This facility can be used to pass very long subject strings to |
5730 |
for certain types of pattern. |
pcre_dfa_exec(). |
5731 |
|
|
5732 |
|
|
5733 |
|
MULTI-SEGMENT MATCHING WITH pcre_exec() |
5734 |
|
|
5735 |
|
From release 8.00, pcre_exec() can also be used to do multi-segment |
5736 |
|
matching. Unlike pcre_dfa_exec(), it is not possible to restart the |
5737 |
|
previous match with a new segment of data. Instead, new data must be |
5738 |
|
added to the previous subject string, and the entire match re-run, |
5739 |
|
starting from the point where the partial match occurred. Earlier data |
5740 |
|
can be discarded. Consider an unanchored pattern that matches dates: |
5741 |
|
|
5742 |
|
re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/ |
5743 |
|
data> The date is 23ja\P |
5744 |
|
Partial match: 23ja |
5745 |
|
|
5746 |
|
The this stage, an application could discard the text preceding "23ja", |
5747 |
|
add on text from the next segment, and call pcre_exec() again. Unlike |
5748 |
|
pcre_dfa_exec(), the entire matching string must always be available, |
5749 |
|
and the complete matching process occurs for each call, so more memory |
5750 |
|
and more processing time is needed. |
5751 |
|
|
5752 |
|
Note: If the pattern contains lookbehind assertions, or \K, or starts |
5753 |
|
with \b or \B, the string that is returned for a partial match will |
5754 |
|
include characters that precede the partially matched string itself, |
5755 |
|
because these must be retained when adding on more characters for a |
5756 |
|
subsequent matching attempt. |
5757 |
|
|
5758 |
|
|
5759 |
|
ISSUES WITH MULTI-SEGMENT MATCHING |
5760 |
|
|
5761 |
|
Certain types of pattern may give problems with multi-segment matching, |
5762 |
|
whichever matching function is used. |
5763 |
|
|
5764 |
1. If the pattern contains tests for the beginning or end of a line, |
1. If the pattern contains tests for the beginning or end of a line, |
5765 |
you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri- |
you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri- |
5766 |
ate, when the subject string for any call does not contain the begin- |
ate, when the subject string for any call does not contain the begin- |
5767 |
ning or end of a line. |
ning or end of a line. |
5768 |
|
|
5769 |
2. If the pattern contains backward assertions (including \b or \B), |
2. Lookbehind assertions at the start of a pattern are catered for in |
5770 |
you need to arrange for some overlap in the subject strings to allow |
the offsets that are returned for a partial match. However, in theory, |
5771 |
for this. For example, you could pass the subject in chunks that are |
a lookbehind assertion later in the pattern could require even earlier |
5772 |
500 bytes long, but in a buffer of 700 bytes, with the starting offset |
characters to be inspected, and it might not have been reached when a |
5773 |
set to 200 and the previous 200 bytes at the start of the buffer. |
partial match occurs. This is probably an extremely unlikely case; you |
5774 |
|
could guard against it to a certain extent by always including extra |
5775 |
|
characters at the start. |
5776 |
|
|
5777 |
3. Matching a subject string that is split into multiple segments does |
3. Matching a subject string that is split into multiple segments may |
5778 |
not always produce exactly the same result as matching over one single |
not always produce exactly the same result as matching over one single |
5779 |
long string. The difference arises when there are multiple matching |
long string, especially when PCRE_PARTIAL_SOFT is used. The section |
5780 |
possibilities, because a partial match result is given only when there |
"Partial Matching and Word Boundaries" above describes an issue that |
5781 |
are no completed matches in a call to pcre_dfa_exec(). This means that |
arises if the pattern ends with \b or \B. Another kind of difference |
5782 |
as soon as the shortest match has been found, continuation to a new |
may occur when there are multiple matching possibilities, because a |
5783 |
subject segment is no longer possible. Consider this pcretest example: |
partial match result is given only when there are no completed matches. |
5784 |
|
This means that as soon as the shortest match has been found, continua- |
5785 |
|
tion to a new subject segment is no longer possible. Consider again |
5786 |
|
this pcretest example: |
5787 |
|
|
5788 |
re> /dog(sbody)?/ |
re> /dog(sbody)?/ |
5789 |
|
data> dogsb\P |
5790 |
|
0: dog |
5791 |
data> do\P\D |
data> do\P\D |
5792 |
Partial match: do |
Partial match: do |
5793 |
data> gsb\R\P\D |
data> gsb\R\P\D |
5796 |
0: dogsbody |
0: dogsbody |
5797 |
1: dog |
1: dog |
5798 |
|
|
5799 |
The pattern matches the words "dog" or "dogsbody". When the subject is |
The first data line passes the string "dogsb" to pcre_exec(), setting |
5800 |
presented in several parts ("do" and "gsb" being the first two) the |
the PCRE_PARTIAL_SOFT option. Although the string is a partial match |
5801 |
match stops when "dog" has been found, and it is not possible to con- |
for "dogsbody", the result is not PCRE_ERROR_PARTIAL, because the |
5802 |
tinue. On the other hand, if "dogsbody" is presented as a single |
shorter string "dog" is a complete match. Similarly, when the subject |
5803 |
string, both matches are found. |
is presented to pcre_dfa_exec() in several parts ("do" and "gsb" being |
5804 |
|
the first two) the match stops when "dog" has been found, and it is not |
5805 |
|
possible to continue. On the other hand, if "dogsbody" is presented as |
5806 |
|
a single string, pcre_dfa_exec() finds both matches. |
5807 |
|
|
5808 |
|
Because of these problems, it is probably best to use PCRE_PARTIAL_HARD |
5809 |
|
when matching multi-segment data. The example above then behaves dif- |
5810 |
|
ferently: |
5811 |
|
|
5812 |
|
re> /dog(sbody)?/ |
5813 |
|
data> dogsb\P\P |
5814 |
|
Partial match: dogsb |
5815 |
|
data> do\P\D |
5816 |
|
Partial match: do |
5817 |
|
data> gsb\R\P\P\D |
5818 |
|
Partial match: gsb |
5819 |
|
|
|
Because of this phenomenon, it does not usually make sense to end a |
|
|
pattern that is going to be matched in this way with a variable repeat. |
|
5820 |
|
|
5821 |
4. Patterns that contain alternatives at the top level which do not all |
4. Patterns that contain alternatives at the top level which do not all |
5822 |
start with the same pattern item may not work as expected. For example, |
start with the same pattern item may not work as expected when |
5823 |
consider this pattern: |
pcre_dfa_exec() is used. For example, consider this pattern: |
5824 |
|
|
5825 |
1234|3789 |
1234|3789 |
5826 |
|
|
5827 |
If the first part of the subject is "ABC123", a partial match of the |
If the first part of the subject is "ABC123", a partial match of the |
5828 |
first alternative is found at offset 3. There is no partial match for |
first alternative is found at offset 3. There is no partial match for |
5829 |
the second alternative, because such a match does not start at the same |
the second alternative, because such a match does not start at the same |
5830 |
point in the subject string. Attempting to continue with the string |
point in the subject string. Attempting to continue with the string |
5831 |
"789" does not yield a match because only those alternatives that match |
"7890" does not yield a match because only those alternatives that |
5832 |
at one point in the subject are remembered. The problem arises because |
match at one point in the subject are remembered. The problem arises |
5833 |
the start of the second alternative matches within the first alterna- |
because the start of the second alternative matches within the first |
5834 |
tive. There is no problem with anchored patterns or patterns such as: |
alternative. There is no problem with anchored patterns or patterns |
5835 |
|
such as: |
5836 |
|
|
5837 |
1234|ABCD |
1234|ABCD |
5838 |
|
|
5839 |
where no string can be a partial match for both alternatives. |
where no string can be a partial match for both alternatives. This is |
5840 |
|
not a problem if pcre_exec() is used, because the entire match has to |
5841 |
|
be rerun each time: |
5842 |
|
|
5843 |
|
re> /1234|3789/ |
5844 |
|
data> ABC123\P |
5845 |
|
Partial match: 123 |
5846 |
|
data> 1237890 |
5847 |
|
0: 3789 |
5848 |
|
|
5849 |
|
|
5850 |
AUTHOR |
AUTHOR |
5856 |
|
|
5857 |
REVISION |
REVISION |
5858 |
|
|
5859 |
Last updated: 04 June 2007 |
Last updated: 05 September 2009 |
5860 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
5861 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5862 |
|
|
5863 |
|
|
5864 |
PCREPRECOMPILE(3) PCREPRECOMPILE(3) |
PCREPRECOMPILE(3) PCREPRECOMPILE(3) |
5865 |
|
|
5866 |
|
|
5983 |
Last updated: 13 June 2007 |
Last updated: 13 June 2007 |
5984 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
5985 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5986 |
|
|
5987 |
|
|
5988 |
PCREPERFORM(3) PCREPERFORM(3) |
PCREPERFORM(3) PCREPERFORM(3) |
5989 |
|
|
5990 |
|
|
6133 |
Last updated: 06 March 2007 |
Last updated: 06 March 2007 |
6134 |
Copyright (c) 1997-2007 University of Cambridge. |
Copyright (c) 1997-2007 University of Cambridge. |
6135 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
6136 |
|
|
6137 |
|
|
6138 |
PCREPOSIX(3) PCREPOSIX(3) |
PCREPOSIX(3) PCREPOSIX(3) |
6139 |
|
|
6140 |
|
|
6178 |
easier to slot in PCRE as a replacement library. Other POSIX options |
easier to slot in PCRE as a replacement library. Other POSIX options |
6179 |
are not even defined. |
are not even defined. |
6180 |
|
|
6181 |
|
There are also some other options that are not defined by POSIX. These |
6182 |
|
have been added at the request of users who want to make use of certain |
6183 |
|
PCRE-specific features via the POSIX calling interface. |
6184 |
|
|
6185 |
When PCRE is called via these functions, it is only the API that is |
When PCRE is called via these functions, it is only the API that is |
6186 |
POSIX-like in style. The syntax and semantics of the regular expres- |
POSIX-like in style. The syntax and semantics of the regular expres- |
6187 |
sions themselves are still those of Perl, subject to the setting of |
sions themselves are still those of Perl, subject to the setting of |
6236 |
ing, the nmatch and pmatch arguments are ignored, and no captured |
ing, the nmatch and pmatch arguments are ignored, and no captured |
6237 |
strings are returned. |
strings are returned. |
6238 |
|
|
6239 |
|
REG_UNGREEDY |
6240 |
|
|
6241 |
|
The PCRE_UNGREEDY option is set when the regular expression is passed |
6242 |
|
for compilation to the native function. Note that REG_UNGREEDY is not |
6243 |
|
part of the POSIX standard. |
6244 |
|
|
6245 |
REG_UTF8 |
REG_UTF8 |
6246 |
|
|
6247 |
The PCRE_UTF8 option is set when the regular expression is passed for |
The PCRE_UTF8 option is set when the regular expression is passed for |
6254 |
semantics. In particular, the way it handles newline characters in the |
semantics. In particular, the way it handles newline characters in the |
6255 |
subject string is the Perl way, not the POSIX way. Note that setting |
subject string is the Perl way, not the POSIX way. Note that setting |
6256 |
PCRE_MULTILINE has only some of the effects specified for REG_NEWLINE. |
PCRE_MULTILINE has only some of the effects specified for REG_NEWLINE. |
6257 |
It does not affect the way newlines are matched by . (they aren't) or |
It does not affect the way newlines are matched by . (they are not) or |
6258 |
by a negative class such as [^a] (they are). |
by a negative class such as [^a] (they are). |
6259 |
|
|
6260 |
The yield of regcomp() is zero on success, and non-zero otherwise. The |
The yield of regcomp() is zero on success, and non-zero otherwise. The |
6262 |
is public: re_nsub contains the number of capturing subpatterns in the |
is public: re_nsub contains the number of capturing subpatterns in the |
6263 |
regular expression. Various error codes are defined in the header file. |
regular expression. Various error codes are defined in the header file. |
6264 |
|
|
6265 |
|
NOTE: If the yield of regcomp() is non-zero, you must not attempt to |
6266 |
|
use the contents of the preg structure. If, for example, you pass it to |
6267 |
|
regexec(), the result is undefined and your program is likely to crash. |
6268 |
|
|
6269 |
|
|
6270 |
MATCHING NEWLINE CHARACTERS |
MATCHING NEWLINE CHARACTERS |
6271 |
|
|
6341 |
matched strings is returned. The nmatch and pmatch arguments of |
matched strings is returned. The nmatch and pmatch arguments of |
6342 |
regexec() are ignored. |
regexec() are ignored. |
6343 |
|
|
6344 |
|
If the value of nmatch is zero, or if the value pmatch is NULL, no data |
6345 |
|
about any matched strings is returned. |
6346 |
|
|
6347 |
Otherwise,the portion of the string that was matched, and also any cap- |
Otherwise,the portion of the string that was matched, and also any cap- |
6348 |
tured substrings, are returned via the pmatch argument, which points to |
tured substrings, are returned via the pmatch argument, which points to |
6349 |
an array of nmatch structures of type regmatch_t, containing the mem- |
an array of nmatch structures of type regmatch_t, containing the mem- |
6350 |
bers rm_so and rm_eo. These contain the offset to the first character |
bers rm_so and rm_eo. These contain the offset to the first character |
6351 |
of each substring and the offset to the first character after the end |
of each substring and the offset to the first character after the end |
6352 |
of each substring, respectively. The 0th element of the vector relates |
of each substring, respectively. The 0th element of the vector relates |
6353 |
to the entire portion of string that was matched; subsequent elements |
to the entire portion of string that was matched; subsequent elements |
6354 |
relate to the capturing subpatterns of the regular expression. Unused |
relate to the capturing subpatterns of the regular expression. Unused |
6355 |
entries in the array have both structure members set to -1. |
entries in the array have both structure members set to -1. |
6356 |
|
|
6357 |
A successful match yields a zero return; various error codes are |
A successful match yields a zero return; various error codes are |
6358 |
defined in the header file, of which REG_NOMATCH is the "expected" |
defined in the header file, of which REG_NOMATCH is the "expected" |
6359 |
failure code. |
failure code. |
6360 |
|
|
6361 |
|
|
6362 |
ERROR MESSAGES |
ERROR MESSAGES |
6363 |
|
|
6364 |
The regerror() function maps a non-zero errorcode from either regcomp() |
The regerror() function maps a non-zero errorcode from either regcomp() |
6365 |
or regexec() to a printable message. If preg is not NULL, the error |
or regexec() to a printable message. If preg is not NULL, the error |
6366 |
should have arisen from the use of that structure. A message terminated |
should have arisen from the use of that structure. A message terminated |
6367 |
by a binary zero is placed in errbuf. The length of the message, |
by a binary zero is placed in errbuf. The length of the message, |
6368 |
including the zero, is limited to errbuf_size. The yield of the func- |
including the zero, is limited to errbuf_size. The yield of the func- |
6369 |
tion is the size of buffer needed to hold the whole message. |
tion is the size of buffer needed to hold the whole message. |
6370 |
|
|
6371 |
|
|
6372 |
MEMORY USAGE |
MEMORY USAGE |
6373 |
|
|
6374 |
Compiling a regular expression causes memory to be allocated and asso- |
Compiling a regular expression causes memory to be allocated and asso- |
6375 |
ciated with the preg structure. The function regfree() frees all such |
ciated with the preg structure. The function regfree() frees all such |
6376 |
memory, after which preg may no longer be used as a compiled expres- |
memory, after which preg may no longer be used as a compiled expres- |
6377 |
sion. |
sion. |
6378 |
|
|
6379 |
|
|
6386 |
|
|
6387 |
REVISION |
REVISION |
6388 |
|
|
6389 |
Last updated: 11 March 2009 |
Last updated: 02 September 2009 |
6390 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
6391 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
6392 |
|
|
6393 |
|
|
6394 |
PCRECPP(3) PCRECPP(3) |
PCRECPP(3) PCRECPP(3) |
6395 |
|
|
6396 |
|
|
6730 |
|
|
6731 |
Last updated: 17 March 2009 |
Last updated: 17 March 2009 |
6732 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
6733 |
|
|
6734 |
|
|
6735 |
PCRESAMPLE(3) PCRESAMPLE(3) |
PCRESAMPLE(3) PCRESAMPLE(3) |
6736 |
|
|
6737 |
|
|
6742 |
PCRE SAMPLE PROGRAM |
PCRE SAMPLE PROGRAM |
6743 |
|
|
6744 |
A simple, complete demonstration program, to get you started with using |
A simple, complete demonstration program, to get you started with using |
6745 |
PCRE, is supplied in the file pcredemo.c in the PCRE distribution. |
PCRE, is supplied in the file pcredemo.c in the PCRE distribution. A |
6746 |
|
listing of this program is given in the pcredemo documentation. If you |
6747 |
|
do not have a copy of the PCRE distribution, you can save this listing |
6748 |
|
to re-create pcredemo.c. |
6749 |
|
|
6750 |
The program compiles the regular expression that is its first argument, |
The program compiles the regular expression that is its first argument, |
6751 |
and matches it against the subject string in its second argument. No |
and matches it against the subject string in its second argument. No |
6752 |
PCRE options are set, and default character tables are used. If match- |
PCRE options are set, and default character tables are used. If match- |
6753 |
ing succeeds, the program outputs the portion of the subject that |
ing succeeds, the program outputs the portion of the subject that |
6754 |
matched, together with the contents of any captured substrings. |
matched, together with the contents of any captured substrings. |
6755 |
|
|
6756 |
If the -g option is given on the command line, the program then goes on |
If the -g option is given on the command line, the program then goes on |
6757 |
to check for further matches of the same regular expression in the same |
to check for further matches of the same regular expression in the same |
6758 |
subject string. The logic is a little bit tricky because of the possi- |
subject string. The logic is a little bit tricky because of the possi- |
6759 |
bility of matching an empty string. Comments in the code explain what |
bility of matching an empty string. Comments in the code explain what |
6760 |
is going on. |
is going on. |
6761 |
|
|
6762 |
If PCRE is installed in the standard include and library directories |
If PCRE is installed in the standard include and library directories |
6763 |
for your system, you should be able to compile the demonstration pro- |
for your system, you should be able to compile the demonstration pro- |
6764 |
gram using this command: |
gram using this command: |
6765 |
|
|
6766 |
gcc -o pcredemo pcredemo.c -lpcre |
gcc -o pcredemo pcredemo.c -lpcre |
6767 |
|
|
6768 |
If PCRE is installed elsewhere, you may need to add additional options |
If PCRE is installed elsewhere, you may need to add additional options |
6769 |
to the command line. For example, on a Unix-like system that has PCRE |
to the command line. For example, on a Unix-like system that has PCRE |
6770 |
installed in /usr/local, you can compile the demonstration program |
installed in /usr/local, you can compile the demonstration program |
6771 |
using a command like this: |
using a command like this: |
6772 |
|
|
6773 |
gcc -o pcredemo -I/usr/local/include pcredemo.c \ |
gcc -o pcredemo -I/usr/local/include pcredemo.c \ |
6774 |
-L/usr/local/lib -lpcre |
-L/usr/local/lib -lpcre |
6775 |
|
|
6776 |
Once you have compiled the demonstration program, you can run simple |
Once you have compiled the demonstration program, you can run simple |
6777 |
tests like this: |
tests like this: |
6778 |
|
|
6779 |
./pcredemo 'cat|dog' 'the cat sat on the mat' |
./pcredemo 'cat|dog' 'the cat sat on the mat' |
6780 |
./pcredemo -g 'cat|dog' 'the dog sat on the cat' |
./pcredemo -g 'cat|dog' 'the dog sat on the cat' |
6781 |
|
|
6782 |
Note that there is a much more comprehensive test program, called |
Note that there is a much more comprehensive test program, called |
6783 |
pcretest, which supports many more facilities for testing regular |
pcretest, which supports many more facilities for testing regular |
6784 |
expressions and the PCRE library. The pcredemo program is provided as a |
expressions and the PCRE library. The pcredemo program is provided as a |
6785 |
simple coding example. |
simple coding example. |
6786 |
|
|
6787 |
On some operating systems (e.g. Solaris), when PCRE is not installed in |
When you try to run pcredemo when PCRE is not installed in the standard |
6788 |
the standard library directory, you may get an error like this when you |
library directory, you may get an error like this on some operating |
6789 |
try to run pcredemo: |
systems (e.g. Solaris): |
6790 |
|
|
6791 |
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or |
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or |
6792 |
directory |
directory |
6793 |
|
|
6794 |
This is caused by the way shared library support works on those sys- |
This is caused by the way shared library support works on those sys- |
6795 |
tems. You need to add |
tems. You need to add |
6796 |
|
|
6797 |
-R/usr/local/lib |
-R/usr/local/lib |
6808 |
|
|
6809 |
REVISION |
REVISION |
6810 |
|
|
6811 |
Last updated: 23 January 2008 |
Last updated: 01 September 2009 |
6812 |
Copyright (c) 1997-2008 University of Cambridge. |
Copyright (c) 1997-2009 University of Cambridge. |
6813 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
6814 |
PCRESTACK(3) PCRESTACK(3) |
PCRESTACK(3) PCRESTACK(3) |
6815 |
|
|
6947 |
Last updated: 09 July 2008 |
Last updated: 09 July 2008 |
6948 |
Copyright (c) 1997-2008 University of Cambridge. |
Copyright (c) 1997-2008 University of Cambridge. |
6949 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
6950 |
|
|
6951 |
|
|