1 |
NAME
|
2 |
pcreposix - POSIX API for Perl-compatible regular expres-
|
3 |
sions.
|
4 |
|
5 |
|
6 |
|
7 |
SYNOPSIS
|
8 |
#include <pcreposix.h>
|
9 |
|
10 |
int regcomp(regex_t *preg, const char *pattern,
|
11 |
int cflags);
|
12 |
|
13 |
int regexec(regex_t *preg, const char *string,
|
14 |
size_t nmatch, regmatch_t pmatch[], int eflags);
|
15 |
|
16 |
size_t regerror(int errcode, const regex_t *preg,
|
17 |
char *errbuf, size_t errbuf_size);
|
18 |
|
19 |
void regfree(regex_t *preg);
|
20 |
|
21 |
|
22 |
|
23 |
DESCRIPTION
|
24 |
This set of functions provides a POSIX-style API to the PCRE
|
25 |
regular expression package. See the pcre documentation for a
|
26 |
description of the native API, which contains additional
|
27 |
functionality.
|
28 |
|
29 |
The functions described here are just wrapper functions that
|
30 |
ultimately call the native API. Their prototypes are defined
|
31 |
in the pcreposix.h header file, and on Unix systems the
|
32 |
library itself is called pcreposix.a, so can be accessed by
|
33 |
adding -lpcreposix to the command for linking an application
|
34 |
which uses them. Because the POSIX functions call the native
|
35 |
ones, it is also necessary to add -lpcre.
|
36 |
|
37 |
As I am pretty ignorant about POSIX, these functions must be
|
38 |
considered as experimental. I have implemented only those
|
39 |
option bits that can be reasonably mapped to PCRE native
|
40 |
options. Other POSIX options are not even defined. It may be
|
41 |
that it is useful to define, but ignore, other options.
|
42 |
Feedback from more knowledgeable folk may cause this kind of
|
43 |
detail to change.
|
44 |
|
45 |
When PCRE is called via these functions, it is only the API
|
46 |
that is POSIX-like in style. The syntax and semantics of the
|
47 |
regular expressions themselves are still those of Perl, sub-
|
48 |
ject to the setting of various PCRE options, as described
|
49 |
below.
|
50 |
|
51 |
The header for these functions is supplied as pcreposix.h to
|
52 |
avoid any potential clash with other POSIX libraries. It
|
53 |
can, of course, be renamed or aliased as regex.h, which is
|
54 |
the "correct" name. It provides two structure types, regex_t
|
55 |
for compiled internal forms, and regmatch_t for returning
|
56 |
captured substrings. It also defines some constants whose
|
57 |
names start with "REG_"; these are used for setting options
|
58 |
and identifying error codes.
|
59 |
|
60 |
|
61 |
|
62 |
COMPILING A PATTERN
|
63 |
The function regcomp() is called to compile a pattern into
|
64 |
an internal form. The pattern is a C string terminated by a
|
65 |
binary zero, and is passed in the argument pattern. The preg
|
66 |
argument is a pointer to a regex_t structure which is used
|
67 |
as a base for storing information about the compiled expres-
|
68 |
sion.
|
69 |
|
70 |
The argument cflags is either zero, or contains one or more
|
71 |
of the bits defined by the following macros:
|
72 |
|
73 |
REG_ICASE
|
74 |
|
75 |
The PCRE_CASELESS option is set when the expression is
|
76 |
passed for compilation to the native function.
|
77 |
|
78 |
REG_NEWLINE
|
79 |
|
80 |
The PCRE_MULTILINE option is set when the expression is
|
81 |
passed for compilation to the native function.
|
82 |
|
83 |
The yield of regcomp() is zero on success, and non-zero oth-
|
84 |
erwise. The preg structure is filled in on success, and one
|
85 |
member of the structure is publicized: re_nsub contains the
|
86 |
number of capturing subpatterns in the regular expression.
|
87 |
Various error codes are defined in the header file.
|
88 |
|
89 |
|
90 |
|
91 |
MATCHING A PATTERN
|
92 |
The function regexec() is called to match a pre-compiled
|
93 |
pattern preg against a given string, which is terminated by
|
94 |
a zero byte, subject to the options in eflags. These can be:
|
95 |
|
96 |
REG_NOTBOL
|
97 |
|
98 |
The PCRE_NOTBOL option is set when calling the underlying
|
99 |
PCRE matching function.
|
100 |
|
101 |
REG_NOTEOL
|
102 |
|
103 |
The PCRE_NOTEOL option is set when calling the underlying
|
104 |
PCRE matching function.
|
105 |
|
106 |
The portion of the string that was matched, and also any
|
107 |
captured substrings, are returned via the pmatch argument,
|
108 |
which points to an array of nmatch structures of type
|
109 |
regmatch_t, containing the members rm_so and rm_eo. These
|
110 |
contain the offset to the first character of each substring
|
111 |
and the offset to the first character after the end of each
|
112 |
substring, respectively. The 0th element of the vector
|
113 |
relates to the entire portion of string that was matched;
|
114 |
subsequent elements relate to the capturing subpatterns of
|
115 |
the regular expression. Unused entries in the array have
|
116 |
both structure members set to -1.
|
117 |
|
118 |
A successful match yields a zero return; various error codes
|
119 |
are defined in the header file, of which REG_NOMATCH is the
|
120 |
"expected" failure code.
|
121 |
|
122 |
|
123 |
|
124 |
ERROR MESSAGES
|
125 |
The regerror() function maps a non-zero errorcode from
|
126 |
either regcomp or regexec to a printable message. If preg is
|
127 |
not NULL, the error should have arisen from the use of that
|
128 |
structure. A message terminated by a binary zero is placed
|
129 |
in errbuf. The length of the message, including the zero, is
|
130 |
limited to errbuf_size. The yield of the function is the
|
131 |
size of buffer needed to hold the whole message.
|
132 |
|
133 |
|
134 |
|
135 |
STORAGE
|
136 |
Compiling a regular expression causes memory to be allocated
|
137 |
and associated with the preg structure. The function reg-
|
138 |
free() frees all such memory, after which preg may no longer
|
139 |
be used as a compiled expression.
|
140 |
|
141 |
|
142 |
|
143 |
AUTHOR
|
144 |
Philip Hazel <ph10@cam.ac.uk>
|
145 |
University Computing Service,
|
146 |
New Museums Site,
|
147 |
Cambridge CB2 3QG, England.
|
148 |
Phone: +44 1223 334714
|
149 |
|
150 |
Copyright (c) 1997-1999 University of Cambridge.
|