49 |
Behave as if each regex has the \fB/I\fP modifier; information about the |
Behave as if each regex has the \fB/I\fP modifier; information about the |
50 |
compiled pattern is given after compilation. |
compiled pattern is given after compilation. |
51 |
.TP 10 |
.TP 10 |
52 |
|
\fB-M\fP |
53 |
|
Behave as if each data line contains the \eM escape sequence; this causes |
54 |
|
PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by |
55 |
|
calling \fBpcre_exec()\fP repeatedly with different limits. |
56 |
|
.TP 10 |
57 |
\fB-m\fP |
\fB-m\fP |
58 |
Output the size of each compiled pattern after it has been compiled. This is |
Output the size of each compiled pattern after it has been compiled. This is |
59 |
equivalent to adding \fB/M\fP to each regular expression. For compatibility |
equivalent to adding \fB/M\fP to each regular expression. For compatibility |
102 |
stdout, and prompts for each line of input, using "re>" to prompt for regular |
stdout, and prompts for each line of input, using "re>" to prompt for regular |
103 |
expressions, and "data>" to prompt for data lines. |
expressions, and "data>" to prompt for data lines. |
104 |
.P |
.P |
105 |
|
When \fBpcretest\fP is built, a configuration option can specify that it should |
106 |
|
be linked with the \fBlibreadline\fP library. When this is done, if the input |
107 |
|
is from a terminal, it is read using the \fBreadline()\fP function. This |
108 |
|
provides line-editing and history facilities. The output from the \fB-help\fP |
109 |
|
option states whether or not \fBreadline()\fP will be used. |
110 |
|
.P |
111 |
The program handles any number of sets of input on a single input file. Each |
The program handles any number of sets of input on a single input file. Each |
112 |
set starts with a regular expression, and continues with any number of data |
set starts with a regular expression, and continues with any number of data |
113 |
lines to be matched against the pattern. |
lines to be matched against the pattern. |
168 |
The following table shows additional modifiers for setting PCRE options that do |
The following table shows additional modifiers for setting PCRE options that do |
169 |
not correspond to anything in Perl: |
not correspond to anything in Perl: |
170 |
.sp |
.sp |
171 |
\fB/A\fP PCRE_ANCHORED |
\fB/A\fP PCRE_ANCHORED |
172 |
\fB/C\fP PCRE_AUTO_CALLOUT |
\fB/C\fP PCRE_AUTO_CALLOUT |
173 |
\fB/E\fP PCRE_DOLLAR_ENDONLY |
\fB/E\fP PCRE_DOLLAR_ENDONLY |
174 |
\fB/f\fP PCRE_FIRSTLINE |
\fB/f\fP PCRE_FIRSTLINE |
175 |
\fB/J\fP PCRE_DUPNAMES |
\fB/J\fP PCRE_DUPNAMES |
176 |
\fB/N\fP PCRE_NO_AUTO_CAPTURE |
\fB/N\fP PCRE_NO_AUTO_CAPTURE |
177 |
\fB/U\fP PCRE_UNGREEDY |
\fB/U\fP PCRE_UNGREEDY |
178 |
\fB/X\fP PCRE_EXTRA |
\fB/X\fP PCRE_EXTRA |
179 |
\fB/<cr>\fP PCRE_NEWLINE_CR |
\fB/<JS>\fP PCRE_JAVASCRIPT_COMPAT |
180 |
\fB/<lf>\fP PCRE_NEWLINE_LF |
\fB/<cr>\fP PCRE_NEWLINE_CR |
181 |
\fB/<crlf>\fP PCRE_NEWLINE_CRLF |
\fB/<lf>\fP PCRE_NEWLINE_LF |
182 |
\fB/<any>\fP PCRE_NEWLINE_ANY |
\fB/<crlf>\fP PCRE_NEWLINE_CRLF |
183 |
.sp |
\fB/<anycrlf>\fP PCRE_NEWLINE_ANYCRLF |
184 |
Those specifying line ending sequencess are literal strings as shown. This |
\fB/<any>\fP PCRE_NEWLINE_ANY |
185 |
example sets multiline matching with CRLF as the line ending sequence: |
\fB/<bsr_anycrlf>\fP PCRE_BSR_ANYCRLF |
186 |
|
\fB/<bsr_unicode>\fP PCRE_BSR_UNICODE |
187 |
|
.sp |
188 |
|
Those specifying line ending sequences are literal strings as shown, but the |
189 |
|
letters can be in either case. This example sets multiline matching with CRLF |
190 |
|
as the line ending sequence: |
191 |
.sp |
.sp |
192 |
/^abc/m<crlf> |
/^abc/m<crlf> |
193 |
.sp |
.sp |
230 |
multiple copies of the same substring. |
multiple copies of the same substring. |
231 |
.P |
.P |
232 |
The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP |
The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP |
233 |
output a representation of the compiled byte code after compilation. |
output a representation of the compiled byte code after compilation. Normally |
234 |
|
this information contains length and offset values; however, if \fB/Z\fP is |
235 |
|
also present, this data is replaced by spaces. This is a special feature for |
236 |
|
use in the automatic test scripts; it ensures that the same output is generated |
237 |
|
for different internal link sizes. |
238 |
.P |
.P |
239 |
The \fB/L\fP modifier must be followed directly by the name of a locale, for |
The \fB/L\fP modifier must be followed directly by the name of a locale, for |
240 |
example, |
example, |
253 |
pattern. If the pattern is studied, the results of that are also output. |
pattern. If the pattern is studied, the results of that are also output. |
254 |
.P |
.P |
255 |
The \fB/D\fP modifier is a PCRE debugging feature, and is equivalent to |
The \fB/D\fP modifier is a PCRE debugging feature, and is equivalent to |
256 |
\fB/BI\fP, that is, both the \fP/B\fP and the \fB/I\fP modifiers. |
\fB/BI\fP, that is, both the \fB/B\fP and the \fB/I\fP modifiers. |
257 |
.P |
.P |
258 |
The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the |
The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the |
259 |
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This |
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This |
361 |
\eOdd set the size of the output vector passed to |
\eOdd set the size of the output vector passed to |
362 |
\fBpcre_exec()\fP to dd (any number of digits) |
\fBpcre_exec()\fP to dd (any number of digits) |
363 |
.\" JOIN |
.\" JOIN |
364 |
\eP pass the PCRE_PARTIAL option to \fBpcre_exec()\fP |
\eP pass the PCRE_PARTIAL_SOFT option to \fBpcre_exec()\fP |
365 |
or \fBpcre_dfa_exec()\fP |
or \fBpcre_dfa_exec()\fP; if used twice, pass the |
366 |
|
PCRE_PARTIAL_HARD option |
367 |
.\" JOIN |
.\" JOIN |
368 |
\eQdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd |
\eQdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd |
369 |
(any number of digits) |
(any number of digits) |
389 |
\e<crlf> pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP |
\e<crlf> pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP |
390 |
or \fBpcre_dfa_exec()\fP |
or \fBpcre_dfa_exec()\fP |
391 |
.\" JOIN |
.\" JOIN |
392 |
|
\e<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to \fBpcre_exec()\fP |
393 |
|
or \fBpcre_dfa_exec()\fP |
394 |
|
.\" JOIN |
395 |
\e<any> pass the PCRE_NEWLINE_ANY option to \fBpcre_exec()\fP |
\e<any> pass the PCRE_NEWLINE_ANY option to \fBpcre_exec()\fP |
396 |
or \fBpcre_dfa_exec()\fP |
or \fBpcre_dfa_exec()\fP |
397 |
.sp |
.sp |
427 |
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use |
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use |
428 |
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be |
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be |
429 |
any number of hexadecimal digits inside the braces. The result is from one to |
any number of hexadecimal digits inside the braces. The result is from one to |
430 |
six bytes, encoded according to the UTF-8 rules. |
six bytes, encoded according to the original UTF-8 rules of RFC 2279. This |
431 |
|
allows for values in the range 0 to 0x7FFFFFFF. Note that not all of those are |
432 |
|
valid Unicode code points, or indeed valid UTF-8 characters according to the |
433 |
|
later rules in RFC 3629. |
434 |
. |
. |
435 |
. |
. |
436 |
.SH "THE ALTERNATIVE MATCHING FUNCTION" |
.SH "THE ALTERNATIVE MATCHING FUNCTION" |
461 |
.P |
.P |
462 |
When a match succeeds, pcretest outputs the list of captured substrings that |
When a match succeeds, pcretest outputs the list of captured substrings that |
463 |
\fBpcre_exec()\fP returns, starting with number 0 for the string that matched |
\fBpcre_exec()\fP returns, starting with number 0 for the string that matched |
464 |
the whole pattern. Otherwise, it outputs "No match" or "Partial match" |
the whole pattern. Otherwise, it outputs "No match" when the return is |
465 |
when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, |
PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching |
466 |
respectively, and otherwise the PCRE negative error number. Here is an example |
substring when \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL. For any other |
467 |
of an interactive \fBpcretest\fP run. |
returns, it outputs the PCRE negative error number. Here is an example of an |
468 |
|
interactive \fBpcretest\fP run. |
469 |
.sp |
.sp |
470 |
$ pcretest |
$ pcretest |
471 |
PCRE version 7.0 30-Nov-2006 |
PCRE version 7.0 30-Nov-2006 |
477 |
data> xyz |
data> xyz |
478 |
No match |
No match |
479 |
.sp |
.sp |
480 |
|
Note that unset capturing substrings that are not followed by one that is set |
481 |
|
are not returned by \fBpcre_exec()\fP, and are not shown by \fBpcretest\fP. In |
482 |
|
the following example, there are two capturing substrings, but when the first |
483 |
|
data line is matched, the second, unset substring is not shown. An "internal" |
484 |
|
unset substring is shown as "<unset>", as for the second data line. |
485 |
|
.sp |
486 |
|
re> /(a)|(b)/ |
487 |
|
data> a |
488 |
|
0: a |
489 |
|
1: a |
490 |
|
data> b |
491 |
|
0: b |
492 |
|
1: <unset> |
493 |
|
2: b |
494 |
|
.sp |
495 |
If the strings contain any non-printing characters, they are output as \e0x |
If the strings contain any non-printing characters, they are output as \e0x |
496 |
escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the |
escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the |
497 |
pattern. See below for the definition of non-printing characters. If the |
pattern. See below for the definition of non-printing characters. If the |
546 |
2: tan |
2: tan |
547 |
.sp |
.sp |
548 |
(Using the normal matching function on this data finds only "tang".) The |
(Using the normal matching function on this data finds only "tang".) The |
549 |
longest matching string is always given first (and numbered zero). |
longest matching string is always given first (and numbered zero). After a |
550 |
|
PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the |
551 |
|
partially matching substring. |
552 |
.P |
.P |
553 |
If \fB/g\fP is present on the pattern, the search for further matches resumes |
If \fB/g\fP is present on the pattern, the search for further matches resumes |
554 |
at the end of the longest match. For example: |
at the end of the longest match. For example: |
574 |
match with additional subject data by means of the \eR escape sequence. For |
match with additional subject data by means of the \eR escape sequence. For |
575 |
example: |
example: |
576 |
.sp |
.sp |
577 |
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ |
re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ |
578 |
data> 23ja\eP\eD |
data> 23ja\eP\eD |
579 |
Partial match: 23ja |
Partial match: 23ja |
580 |
data> n05\eR\eD |
data> n05\eR\eD |
710 |
.rs |
.rs |
711 |
.sp |
.sp |
712 |
\fBpcre\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), |
\fBpcre\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), |
713 |
\fBpcrepartial\fP(d), \fPpcrepattern\fP(3), \fBpcreprecompile\fP(3). |
\fBpcrepartial\fP(d), \fBpcrepattern\fP(3), \fBpcreprecompile\fP(3). |
714 |
. |
. |
715 |
. |
. |
716 |
.SH AUTHOR |
.SH AUTHOR |
717 |
.rs |
.rs |
718 |
.sp |
.sp |
719 |
|
.nf |
720 |
Philip Hazel |
Philip Hazel |
721 |
.br |
University Computing Service |
|
University Computing Service, |
|
|
.br |
|
722 |
Cambridge CB2 3QH, England. |
Cambridge CB2 3QH, England. |
723 |
.P |
.fi |
724 |
.in 0 |
. |
725 |
Last updated: 30 November 2006 |
. |
726 |
.br |
.SH REVISION |
727 |
Copyright (c) 1997-2006 University of Cambridge. |
.rs |
728 |
|
.sp |
729 |
|
.nf |
730 |
|
Last updated: 05 September 2009 |
731 |
|
Copyright (c) 1997-2009 University of Cambridge. |
732 |
|
.fi |