8 |
.br |
.br |
9 |
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR, |
.B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR, |
10 |
.ti +5n |
.ti +5n |
11 |
.B const char **\fIerrptr\fR, int *\fIerroffset\fR); |
.B const char **\fIerrptr\fR, int *\fIerroffset\fR, |
12 |
|
.ti +5n |
13 |
|
.B const unsigned char *\fItableptr\fR); |
14 |
|
.PP |
15 |
|
.br |
16 |
|
.B const unsigned char *pcre_maketables(void); |
17 |
.PP |
.PP |
18 |
.br |
.br |
19 |
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR, |
.B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR, |
39 |
.PP |
.PP |
40 |
.br |
.br |
41 |
.B void (*pcre_free)(void *); |
.B void (*pcre_free)(void *); |
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_cbits[128]; |
|
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_ctypes[256]; |
|
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_fcc[256]; |
|
|
.PP |
|
|
.br |
|
|
.B unsigned char *pcre_lcc[256]; |
|
42 |
|
|
43 |
|
|
44 |
|
|
53 |
|
|
54 |
The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and |
The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and |
55 |
\fBpcre_exec()\fR are used for compiling and matching regular expressions. The |
\fBpcre_exec()\fR are used for compiling and matching regular expressions. The |
56 |
function \fBpcre_info()\fR is used to find out information about a compiled |
function \fBpcre_maketables()\fR is used (optionally) to build a set of |
57 |
|
character tables in the current locale for passing to \fBpcre_compile()\fR. |
58 |
|
|
59 |
|
The function \fBpcre_info()\fR is used to find out information about a compiled |
60 |
pattern, while the function \fBpcre_version()\fR returns a pointer to a string |
pattern, while the function \fBpcre_version()\fR returns a pointer to a string |
61 |
containing the version of PCRE and its date of release. |
containing the version of PCRE and its date of release. |
62 |
|
|
66 |
so a calling program can replace them if it wishes to intercept the calls. This |
so a calling program can replace them if it wishes to intercept the calls. This |
67 |
should be done before calling any PCRE functions. |
should be done before calling any PCRE functions. |
68 |
|
|
|
The other global variables are character tables. They are initialized when PCRE |
|
|
is compiled, from source that is generated by reference to the C character type |
|
|
functions, but which a user of PCRE is free to modify. In principle the tables |
|
|
could also be modified at run time. See PCRE's README file for more details. |
|
|
|
|
69 |
|
|
70 |
.SH MULTI-THREADING |
.SH MULTI-THREADING |
71 |
The PCRE functions can be used in multi-threading applications, with the |
The PCRE functions can be used in multi-threading applications, with the |
72 |
proviso that the character tables and the memory management functions pointed |
proviso that the memory management functions pointed to by \fBpcre_malloc\fR |
73 |
to by \fBpcre_malloc\fR and \fBpcre_free\fR are shared by all threads. |
and \fBpcre_free\fR are shared by all threads. |
74 |
|
|
75 |
The compiled form of a regular expression is not altered during matching, so |
The compiled form of a regular expression is not altered during matching, so |
76 |
the same compiled pattern can safely be used by several threads at once. |
the same compiled pattern can safely be used by several threads at once. |
79 |
.SH COMPILING A PATTERN |
.SH COMPILING A PATTERN |
80 |
The function \fBpcre_compile()\fR is called to compile a pattern into an |
The function \fBpcre_compile()\fR is called to compile a pattern into an |
81 |
internal form. The pattern is a C string terminated by a binary zero, and |
internal form. The pattern is a C string terminated by a binary zero, and |
82 |
is passed in the argument \fIpattern\fR. A pointer to the compiled code block |
is passed in the argument \fIpattern\fR. A pointer to a single block of memory |
83 |
is returned. The \fBpcre\fR type is defined for this for convenience, but in |
that is obtained via \fBpcre_malloc\fR is returned. This contains the |
84 |
fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the contents of the |
compiled code and related data. The \fBpcre\fR type is defined for this for |
85 |
block are not defined. |
convenience, but in fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the |
86 |
|
contents of the block are not externally defined. It is up to the caller to |
87 |
|
free the memory when it is no longer required. |
88 |
.PP |
.PP |
89 |
The size of a compiled pattern is roughly proportional to the length of the |
The size of a compiled pattern is roughly proportional to the length of the |
90 |
pattern string, except that each character class (other than those containing |
pattern string, except that each character class (other than those containing |
104 |
If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately. |
If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately. |
105 |
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns |
Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns |
106 |
NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual |
NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual |
107 |
error message. |
error message. The offset from the start of the pattern to the character where |
108 |
|
the error was discovered is placed in the variable pointed to by |
109 |
The offset from the start of the pattern to the character where the error was |
\fIerroffset\fR, which must not be NULL. If it is, an immediate error is given. |
110 |
discovered is placed in the variable pointed to by \fIerroffset\fR, which must |
.PP |
111 |
not be NULL. If it is, an immediate error is given. |
If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of |
112 |
|
character tables which are built when it is compiled, using the default C |
113 |
|
locale. Otherwise, \fItableptr\fR must be the result of a call to |
114 |
|
\fBpcre_maketables()\fR. See the section on locale support below. |
115 |
.PP |
.PP |
116 |
The following option bits are defined in the header file: |
The following option bits are defined in the header file: |
117 |
|
|
206 |
characters is created. |
characters is created. |
207 |
|
|
208 |
|
|
209 |
|
.SH LOCALE SUPPORT |
210 |
|
PCRE handles caseless matching, and determines whether characters are letters, |
211 |
|
digits, or whatever, by reference to a set of tables. The library contains a |
212 |
|
default set of tables which is created in the default C locale when PCRE is |
213 |
|
compiled. This is used when the final argument of \fBpcre_compile()\fR is NULL, |
214 |
|
and is sufficient for many applications. |
215 |
|
|
216 |
|
An alternative set of tables can, however, be supplied. Such tables are built |
217 |
|
by calling the \fBpcre_maketables()\fR function, which has no arguments, in the |
218 |
|
relevant locale. The result can then be passed to \fBpcre_compile()\ as often |
219 |
|
as necessary. For example, to build and use tables that are appropriate for the |
220 |
|
French locale (where accented characters with codes greater than 128 are |
221 |
|
treated as letters), the following code could be used: |
222 |
|
|
223 |
|
setlocale(LC_CTYPE, "fr"); |
224 |
|
tables = pcre_maketables(); |
225 |
|
re = pcre_compile(..., tables); |
226 |
|
|
227 |
|
The tables are built in memory that is obtained via \fBpcre_malloc\fR. The |
228 |
|
pointer that is passed to \fBpcre_compile\fR is saved with the compiled |
229 |
|
pattern, and the same tables are used via this pointer by \fBpcre_study()\fR |
230 |
|
and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and |
231 |
|
matching all happen in the same locale, but different patterns can be compiled |
232 |
|
in different locales. It is the caller's responsibility to ensure that the |
233 |
|
memory containing the tables remains available for as long as it is needed. |
234 |
|
|
235 |
|
|
236 |
.SH MATCHING A PATTERN |
.SH MATCHING A PATTERN |
237 |
The function \fBpcre_exec()\fR is called to match a subject string against a |
The function \fBpcre_exec()\fR is called to match a subject string against a |
238 |
pre-compiled pattern, which is passed in the \fIcode\fR argument. If the |
pre-compiled pattern, which is passed in the \fIcode\fR argument. If the |
602 |
two disjoint sets. Any given character matches one, and only one, of each pair. |
two disjoint sets. Any given character matches one, and only one, of each pair. |
603 |
|
|
604 |
A "word" character is any letter or digit or the underscore character, that is, |
A "word" character is any letter or digit or the underscore character, that is, |
605 |
any character which can be part of a Perl "word". These character type |
any character which can be part of a Perl "word". The definition of letters and |
606 |
sequences can appear both inside and outside character classes. They each match |
digits is controlled by PCRE's character tables, and may vary if locale- |
607 |
one character of the appropriate type. If the current matching point is at the |
specific matching is taking place (see "Locale support" above). For example, in |
608 |
end of the subject string, all of them fail, since there is no character to |
the "fr" (French) locale, some character codes greater than 128 are used for |
609 |
match. |
accented letters, and these are matched by \\w. |
610 |
|
|
611 |
|
These character type sequences can appear both inside and outside character |
612 |
|
classes. They each match one character of the appropriate type. If the current |
613 |
|
matching point is at the end of the subject string, all of them fail, since |
614 |
|
there is no character to match. |
615 |
|
|
616 |
The fourth use of backslash is for certain simple assertions. An assertion |
The fourth use of backslash is for certain simple assertions. An assertion |
617 |
specifies a condition that has to be met at a particular point in a match, |
specifies a condition that has to be met at a particular point in a match, |
710 |
still consumes a character from the subject string, and fails if the current |
still consumes a character from the subject string, and fails if the current |
711 |
pointer is at the end of the string. |
pointer is at the end of the string. |
712 |
|
|
713 |
When PCRE_CASELESS is set, any letters in a class represent both their upper |
When caseless matching is set, any letters in a class represent both their |
714 |
case and lower case versions, so for example, a caseless [aeiou] matches "A" as |
upper case and lower case versions, so for example, a caseless [aeiou] matches |
715 |
well as "a", and a caseless [^aeiou] does not match "A", whereas a caseful |
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a |
716 |
version would. |
caseful version would. |
717 |
|
|
718 |
The newline character is never treated in any special way in character classes, |
The newline character is never treated in any special way in character classes, |
719 |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class |
730 |
range. |
range. |
731 |
|
|
732 |
Ranges operate in ASCII collating sequence. They can also be used for |
Ranges operate in ASCII collating sequence. They can also be used for |
733 |
characters specified numerically, for example [\\000-\\037]. If a range such as |
characters specified numerically, for example [\\000-\\037]. If a range that |
734 |
[W-c] is used when PCRE_CASELESS is set, it matches the letters involved in |
includes letters is used when caseless matching is set, it matches the letters |
735 |
either case, so is equivalent to [][\\^_`wxyzabc], matched caselessly. |
in either case. For example, [W-c] is equivalent to [][\\^_`wxyzabc], matched |
736 |
|
caselessly, and if character tables for the "fr" locale are in use, |
737 |
|
[\\xc8-\\xcb] matches accented E characters in both cases. |
738 |
|
|
739 |
The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in a |
The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in a |
740 |
character class, and add the characters that they match to the class. For |
character class, and add the characters that they match to the class. For |