165 |
.sp |
.sp |
166 |
/caseless/i |
/caseless/i |
167 |
.sp |
.sp |
168 |
The following table shows additional modifiers for setting PCRE options that do |
The following table shows additional modifiers for setting PCRE compile-time |
169 |
not correspond to anything in Perl: |
options that do not correspond to anything in Perl: |
170 |
.sp |
.sp |
171 |
|
\fB/8\fP PCRE_UTF8 |
172 |
|
\fB/?\fP PCRE_NO_UTF8_CHECK |
173 |
\fB/A\fP PCRE_ANCHORED |
\fB/A\fP PCRE_ANCHORED |
174 |
\fB/C\fP PCRE_AUTO_CALLOUT |
\fB/C\fP PCRE_AUTO_CALLOUT |
175 |
\fB/E\fP PCRE_DOLLAR_ENDONLY |
\fB/E\fP PCRE_DOLLAR_ENDONLY |
177 |
\fB/J\fP PCRE_DUPNAMES |
\fB/J\fP PCRE_DUPNAMES |
178 |
\fB/N\fP PCRE_NO_AUTO_CAPTURE |
\fB/N\fP PCRE_NO_AUTO_CAPTURE |
179 |
\fB/U\fP PCRE_UNGREEDY |
\fB/U\fP PCRE_UNGREEDY |
180 |
|
\fB/W\fP PCRE_UCP |
181 |
\fB/X\fP PCRE_EXTRA |
\fB/X\fP PCRE_EXTRA |
182 |
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT |
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT |
183 |
\fB/<cr>\fP PCRE_NEWLINE_CR |
\fB/<cr>\fP PCRE_NEWLINE_CR |
188 |
\fB/<bsr_anycrlf>\fP PCRE_BSR_ANYCRLF |
\fB/<bsr_anycrlf>\fP PCRE_BSR_ANYCRLF |
189 |
\fB/<bsr_unicode>\fP PCRE_BSR_UNICODE |
\fB/<bsr_unicode>\fP PCRE_BSR_UNICODE |
190 |
.sp |
.sp |
191 |
Those specifying line ending sequences are literal strings as shown, but the |
The modifiers that are enclosed in angle brackets are literal strings as shown, |
192 |
letters can be in either case. This example sets multiline matching with CRLF |
including the angle brackets, but the letters can be in either case. This |
193 |
as the line ending sequence: |
example sets multiline matching with CRLF as the line ending sequence: |
194 |
.sp |
.sp |
195 |
/^abc/m<crlf> |
/^abc/m<crlf> |
196 |
.sp |
.sp |
197 |
Details of the meanings of these PCRE options are given in the |
As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes |
198 |
|
any non-printing characters in output strings to be printed using the |
199 |
|
\ex{hh...} notation if they are valid UTF-8 sequences. Full details of the PCRE |
200 |
|
options are given in the |
201 |
.\" HREF |
.\" HREF |
202 |
\fBpcreapi\fP |
\fBpcreapi\fP |
203 |
.\" |
.\" |
204 |
documentation. |
documentation. |
205 |
. |
. |
206 |
. |
. |
207 |
.SS "Finding all matches in a string" |
.SS "Finding all matches in a string" |
230 |
There are yet more modifiers for controlling the way \fBpcretest\fP |
There are yet more modifiers for controlling the way \fBpcretest\fP |
231 |
operates. |
operates. |
232 |
.P |
.P |
|
The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8 |
|
|
option set. This turns on support for UTF-8 character handling in PCRE, |
|
|
provided that it was compiled with this support enabled. This modifier also |
|
|
causes any non-printing characters in output strings to be printed using the |
|
|
\ex{hh...} notation if they are valid UTF-8 sequences. |
|
|
.P |
|
|
If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to |
|
|
call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the |
|
|
checking of the string for UTF-8 validity. |
|
|
.P |
|
233 |
The \fB/+\fP modifier requests that as well as outputting the substring that |
The \fB/+\fP modifier requests that as well as outputting the substring that |
234 |
matched the entire pattern, pcretest should in addition output the remainder of |
matched the entire pattern, pcretest should in addition output the remainder of |
235 |
the subject string. This is useful for tests where the subject contains |
the subject string. This is useful for tests where the subject contains |
282 |
The \fB/M\fP modifier causes the size of memory block used to hold the compiled |
The \fB/M\fP modifier causes the size of memory block used to hold the compiled |
283 |
pattern to be output. |
pattern to be output. |
284 |
.P |
.P |
|
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper |
|
|
API rather than its native API. When this is done, all other modifiers except |
|
|
\fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is |
|
|
present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions |
|
|
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set. |
|
|
.P |
|
285 |
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the |
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the |
286 |
expression has been compiled, and the results used when the expression is |
expression has been compiled, and the results used when the expression is |
287 |
matched. |
matched. |
288 |
. |
. |
289 |
. |
. |
290 |
|
.SS "Using the POSIX wrapper API" |
291 |
|
.rs |
292 |
|
.sp |
293 |
|
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper |
294 |
|
API rather than its native API. When \fB/P\fP is set, the following modifiers |
295 |
|
set options for the \fBregcomp()\fP function: |
296 |
|
.sp |
297 |
|
/i REG_ICASE |
298 |
|
/m REG_NEWLINE |
299 |
|
/N REG_NOSUB |
300 |
|
/s REG_DOTALL ) |
301 |
|
/U REG_UNGREEDY ) These options are not part of |
302 |
|
/W REG_UCP ) the POSIX standard |
303 |
|
/8 REG_UTF8 ) |
304 |
|
.sp |
305 |
|
The \fB/+\fP modifier works as described above. All other modifiers are |
306 |
|
ignored. |
307 |
|
. |
308 |
|
. |
309 |
.SH "DATA LINES" |
.SH "DATA LINES" |
310 |
.rs |
.rs |
311 |
.sp |
.sp |
443 |
the call of \fBpcre_exec()\fP for the line in which it appears. |
the call of \fBpcre_exec()\fP for the line in which it appears. |
444 |
.P |
.P |
445 |
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper |
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper |
446 |
API to be used, the only option-setting sequences that have any effect are \eB |
API to be used, the only option-setting sequences that have any effect are \eB, |
447 |
and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to |
\eN, and \eZ, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, |
448 |
\fBregexec()\fP. |
to be passed to \fBregexec()\fP. |
449 |
.P |
.P |
450 |
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use |
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use |
451 |
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be |
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be |
750 |
.rs |
.rs |
751 |
.sp |
.sp |
752 |
.nf |
.nf |
753 |
Last updated: 26 March 2010 |
Last updated: 16 May 2010 |
754 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
755 |
.fi |
.fi |