99 |
.B void (*pcre_free)(void *); |
.B void (*pcre_free)(void *); |
100 |
.PP |
.PP |
101 |
.br |
.br |
102 |
|
.B void *(*pcre_stack_malloc)(size_t); |
103 |
|
.PP |
104 |
|
.br |
105 |
|
.B void (*pcre_stack_free)(void *); |
106 |
|
.PP |
107 |
|
.br |
108 |
.B int (*pcre_callout)(pcre_callout_block *); |
.B int (*pcre_callout)(pcre_callout_block *); |
109 |
|
|
110 |
.SH PCRE API |
.SH PCRE API |
153 |
so a calling program can replace them if it wishes to intercept the calls. This |
so a calling program can replace them if it wishes to intercept the calls. This |
154 |
should be done before calling any PCRE functions. |
should be done before calling any PCRE functions. |
155 |
|
|
156 |
|
The global variables \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are also |
157 |
|
indirections to memory management functions. These special functions are used |
158 |
|
only when PCRE is compiled to use the heap for remembering data, instead of |
159 |
|
recursive function calls. This is a non-standard way of building PCRE, for use |
160 |
|
in environments that have limited stacks. Because of the greater use of memory |
161 |
|
management, it runs more slowly. Separate functions are provided so that |
162 |
|
special-purpose external code can be used for this case. When used, these |
163 |
|
functions are always called in a stack-like manner (last obtained, first |
164 |
|
freed), and always for memory blocks of the same size. |
165 |
|
|
166 |
The global variable \fBpcre_callout\fR initially contains NULL. It can be set |
The global variable \fBpcre_callout\fR initially contains NULL. It can be set |
167 |
by the caller to a "callout" function, which PCRE will then call at specified |
by the caller to a "callout" function, which PCRE will then call at specified |
168 |
points during a matching operation. Details are given in the \fBpcrecallout\fR |
points during a matching operation. Details are given in the \fBpcrecallout\fR |
172 |
.rs |
.rs |
173 |
.sp |
.sp |
174 |
The PCRE functions can be used in multi-threading applications, with the |
The PCRE functions can be used in multi-threading applications, with the |
175 |
proviso that the memory management functions pointed to by \fBpcre_malloc\fR |
proviso that the memory management functions pointed to by \fBpcre_malloc\fR, |
176 |
and \fBpcre_free\fR, and the callout function pointed to by \fBpcre_callout\fR, |
\fBpcre_free\fR, \fBpcre_stack_malloc\fR, and \fBpcre_stack_free\fR, and the |
177 |
are shared by all threads. |
callout function pointed to by \fBpcre_callout\fR, are shared by all threads. |
178 |
|
|
179 |
The compiled form of a regular expression is not altered during matching, so |
The compiled form of a regular expression is not altered during matching, so |
180 |
the same compiled pattern can safely be used by several threads at once. |
the same compiled pattern can safely be used by several threads at once. |
226 |
internal matching function calls in a \fBpcre_exec()\fR execution. Further |
internal matching function calls in a \fBpcre_exec()\fR execution. Further |
227 |
details are given with \fBpcre_exec()\fR below. |
details are given with \fBpcre_exec()\fR below. |
228 |
|
|
229 |
|
PCRE_CONFIG_STACKRECURSE |
230 |
|
|
231 |
|
The output is an integer that is set to one if internal recursion is |
232 |
|
implemented by recursive function calls that use the stack to remember their |
233 |
|
state. This is the usual way that PCRE is compiled. The output is zero if PCRE |
234 |
|
was compiled to use blocks of data on the heap instead of recursive function |
235 |
|
calls. In this case, \fBpcre_stack_malloc\fR and \fBpcre_stack_free\fR are |
236 |
|
called to manage memory blocks on the heap, thus avoiding the use of the stack. |
237 |
|
|
238 |
.SH COMPILING A PATTERN |
.SH COMPILING A PATTERN |
239 |
.rs |
.rs |
240 |
.sp |
.sp |
736 |
unachored at matching time. |
unachored at matching time. |
737 |
|
|
738 |
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 |
When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 |
739 |
string is automatically checked. If an invalid UTF-8 sequence of bytes is |
string is automatically checked, and the value of \fIstartoffset\fR is also |
740 |
found, \fBpcre_exec()\fR returns the error PCRE_ERROR_BADUTF8. If you already |
checked to ensure that it points to the start of a UTF-8 character. If an |
741 |
know that your subject is valid, and you want to skip this check for |
invalid UTF-8 sequence of bytes is found, \fBpcre_exec()\fR returns the error |
742 |
performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling |
PCRE_ERROR_BADUTF8. If \fIstartoffset\fR contains an invalid value, |
743 |
\fBpcre_exec()\fR. When this option is set, the effect of passing an invalid |
PCRE_ERROR_BADUTF8_OFFSET is returned. |
744 |
UTF-8 string as a subject is undefined. It may cause your program to crash. |
|
745 |
|
If you already know that your subject is valid, and you want to skip these |
746 |
|
checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when |
747 |
|
calling \fBpcre_exec()\fR. You might want to do this for the second and |
748 |
|
subsequent calls to \fBpcre_exec()\fR if you are making repeated calls to find |
749 |
|
all the matches in a single subject string. However, you should be sure that |
750 |
|
the value of \fIstartoffset\fR points to the start of a UTF-8 character. When |
751 |
|
PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a |
752 |
|
subject, or a value of \fIstartoffset\fR that does not point to the start of a |
753 |
|
UTF-8 character, is undefined. Your program may crash. |
754 |
|
|
755 |
There are also three further options that can be set only at matching time: |
There are also three further options that can be set only at matching time: |
756 |
|
|
787 |
below) and trying an ordinary match again. |
below) and trying an ordinary match again. |
788 |
|
|
789 |
The subject string is passed to \fBpcre_exec()\fR as a pointer in |
The subject string is passed to \fBpcre_exec()\fR as a pointer in |
790 |
\fIsubject\fR, a length in \fIlength\fR, and a starting offset in |
\fIsubject\fR, a length in \fIlength\fR, and a starting byte offset in |
791 |
\fIstartoffset\fR. Unlike the pattern string, the subject may contain binary |
\fIstartoffset\fR. Unlike the pattern string, the subject may contain binary |
792 |
zero bytes. When the starting offset is zero, the search for a match starts at |
zero bytes. When the starting offset is zero, the search for a match starts at |
793 |
the beginning of the subject, and this is by far the most common case. |
the beginning of the subject, and this is by far the most common case. |
794 |
|
|
795 |
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a |
If the pattern was compiled with the PCRE_UTF8 option, the subject must be a |
796 |
sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is |
sequence of bytes that is a valid UTF-8 string, and the starting offset must |
797 |
passed, PCRE's behaviour is not defined. |
point to the beginning of a UTF-8 character. If an invalid UTF-8 string or |
798 |
|
offset is passed, an error (either PCRE_ERROR_BADUTF8 or |
799 |
|
PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is |
800 |
|
set, in which case PCRE's behaviour is not defined. |
801 |
|
|
802 |
A non-zero starting offset is useful when searching for another match in the |
A non-zero starting offset is useful when searching for another match in the |
803 |
same subject by calling \fBpcre_exec()\fR again after a previous success. |
same subject by calling \fBpcre_exec()\fR again after a previous success. |
929 |
use by callout functions that want to yield a distinctive error code. See the |
use by callout functions that want to yield a distinctive error code. See the |
930 |
\fBpcrecallout\fR documentation for details. |
\fBpcrecallout\fR documentation for details. |
931 |
|
|
932 |
PCRE_ERROR_BADUTF8 (-10) |
PCRE_ERROR_BADUTF8 (-10) |
933 |
|
|
934 |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
A string that contains an invalid UTF-8 byte sequence was passed as a subject. |
935 |
|
|
936 |
|
PCRE_ERROR_BADUTF8_OFFSET (-11) |
937 |
|
|
938 |
|
The UTF-8 byte sequence that was passed as a subject was valid, but the value |
939 |
|
of \fIstartoffset\fR did not point to the beginning of a UTF-8 character. |
940 |
|
|
941 |
.SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER |
.SH EXTRACTING CAPTURED SUBSTRINGS BY NUMBER |
942 |
.rs |
.rs |
943 |
.sp |
.sp |
1077 |
appropriate. |
appropriate. |
1078 |
|
|
1079 |
.in 0 |
.in 0 |
1080 |
Last updated: 20 August 2003 |
Last updated: 09 December 2003 |
1081 |
.br |
.br |
1082 |
Copyright (c) 1997-2003 University of Cambridge. |
Copyright (c) 1997-2003 University of Cambridge. |