28 |
</ul> |
</ul> |
29 |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br> |
30 |
<P> |
<P> |
31 |
<b>pcretest [-C] [-d] [-dfa] [-i] [-m] [-o osize] [-p] [-t] [source]</b> |
<b>pcretest [options] [source] [destination]</b> |
32 |
<b>[destination]</b> |
<br> |
33 |
</P> |
<br> |
|
<P> |
|
34 |
<b>pcretest</b> was written as a test program for the PCRE regular expression |
<b>pcretest</b> was written as a test program for the PCRE regular expression |
35 |
library itself, but it can also be used for experimenting with regular |
library itself, but it can also be used for experimenting with regular |
36 |
expressions. This document describes the features of the test program; for |
expressions. This document describes the features of the test program; for |
83 |
set. |
set. |
84 |
</P> |
</P> |
85 |
<P> |
<P> |
86 |
|
<b>-q</b> |
87 |
|
Do not output the version number of <b>pcretest</b> at the start of execution. |
88 |
|
</P> |
89 |
|
<P> |
90 |
|
<b>-S</b> <i>size</i> |
91 |
|
On Unix-like systems, set the size of the runtime stack to <i>size</i> |
92 |
|
megabytes. |
93 |
|
</P> |
94 |
|
<P> |
95 |
<b>-t</b> |
<b>-t</b> |
96 |
Run each compile, study, and match many times with a timer, and output |
Run each compile, study, and match many times with a timer, and output |
97 |
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with |
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with |
113 |
</P> |
</P> |
114 |
<P> |
<P> |
115 |
Each data line is matched separately and independently. If you want to do |
Each data line is matched separately and independently. If you want to do |
116 |
multiple-line matches, you have to use the \n escape sequence in a single line |
multi-line matches, you have to use the \n escape sequence (or \r or \r\n, |
117 |
of input to encode the newline characters. The maximum length of data line is |
depending on the newline setting) in a single line of input to encode the |
118 |
30,000 characters. |
newline characters. There is no limit on the length of data lines; the input |
119 |
|
buffer is automatically extended if it is too small. |
120 |
</P> |
</P> |
121 |
<P> |
<P> |
122 |
An empty line signals the end of the data lines, at which point a new regular |
An empty line signals the end of the data lines, at which point a new regular |
123 |
expression is read. The regular expressions are given enclosed in any |
expression is read. The regular expressions are given enclosed in any |
124 |
non-alphanumeric delimiters other than backslash, for example |
non-alphanumeric delimiters other than backslash, for example: |
125 |
<pre> |
<pre> |
126 |
/(a|bc)x+yz/ |
/(a|bc)x+yz/ |
127 |
</pre> |
</pre> |
168 |
The following table shows additional modifiers for setting PCRE options that do |
The following table shows additional modifiers for setting PCRE options that do |
169 |
not correspond to anything in Perl: |
not correspond to anything in Perl: |
170 |
<pre> |
<pre> |
171 |
<b>/A</b> PCRE_ANCHORED |
<b>/A</b> PCRE_ANCHORED |
172 |
<b>/C</b> PCRE_AUTO_CALLOUT |
<b>/C</b> PCRE_AUTO_CALLOUT |
173 |
<b>/E</b> PCRE_DOLLAR_ENDONLY |
<b>/E</b> PCRE_DOLLAR_ENDONLY |
174 |
<b>/f</b> PCRE_FIRSTLINE |
<b>/f</b> PCRE_FIRSTLINE |
175 |
<b>/N</b> PCRE_NO_AUTO_CAPTURE |
<b>/J</b> PCRE_DUPNAMES |
176 |
<b>/U</b> PCRE_UNGREEDY |
<b>/N</b> PCRE_NO_AUTO_CAPTURE |
177 |
<b>/X</b> PCRE_EXTRA |
<b>/U</b> PCRE_UNGREEDY |
178 |
|
<b>/X</b> PCRE_EXTRA |
179 |
|
<b>/<cr></b> PCRE_NEWLINE_CR |
180 |
|
<b>/<lf></b> PCRE_NEWLINE_LF |
181 |
|
<b>/<crlf></b> PCRE_NEWLINE_CRLF |
182 |
</pre> |
</pre> |
183 |
|
Those specifying line endings are literal strings as shown. Details of the |
184 |
|
meanings of these PCRE options are given in the |
185 |
|
<a href="pcreapi.html"><b>pcreapi</b></a> |
186 |
|
documentation. |
187 |
|
</P> |
188 |
|
<br><b> |
189 |
|
Finding all matches in a string |
190 |
|
</b><br> |
191 |
|
<P> |
192 |
Searching for all possible matches within each subject string can be requested |
Searching for all possible matches within each subject string can be requested |
193 |
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called |
194 |
again to search the remainder of the subject string. The difference between |
again to search the remainder of the subject string. The difference between |
206 |
match is retried. This imitates the way Perl handles such cases when using the |
match is retried. This imitates the way Perl handles such cases when using the |
207 |
<b>/g</b> modifier or the <b>split()</b> function. |
<b>/g</b> modifier or the <b>split()</b> function. |
208 |
</P> |
</P> |
209 |
|
<br><b> |
210 |
|
Other modifiers |
211 |
|
</b><br> |
212 |
<P> |
<P> |
213 |
There are yet more modifiers for controlling the way <b>pcretest</b> |
There are yet more modifiers for controlling the way <b>pcretest</b> |
214 |
operates. |
operates. |
294 |
\e escape |
\e escape |
295 |
\f formfeed |
\f formfeed |
296 |
\n newline |
\n newline |
297 |
|
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits) |
298 |
\r carriage return |
\r carriage return |
299 |
\t tab |
\t tab |
300 |
\v vertical tab |
\v vertical tab |
301 |
\nnn octal character (up to 3 octal digits) |
\nnn octal character (up to 3 octal digits) |
302 |
\xhh hexadecimal character (up to 2 hex digits) |
\xhh hexadecimal character (up to 2 hex digits) |
303 |
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode |
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode |
304 |
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> |
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
305 |
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> |
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
306 |
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32) |
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32) |
307 |
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin- |
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin- |
308 |
ated by next non alphanumeric character) |
ated by next non alphanumeric character) |
317 |
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- |
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- |
318 |
ated by next non-alphanumeric character) |
ated by next non-alphanumeric character) |
319 |
\L call pcre_get_substringlist() after a successful match |
\L call pcre_get_substringlist() after a successful match |
320 |
\M discover the minimum MATCH_LIMIT setting |
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings |
321 |
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> |
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
322 |
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits) |
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits) |
323 |
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
324 |
|
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits) |
325 |
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b> |
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b> |
326 |
\S output details of memory get/free calls during matching |
\S output details of memory get/free calls during matching |
327 |
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> |
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
328 |
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> |
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
329 |
\>dd start the match at offset dd (any number of digits); |
\>dd start the match at offset dd (any number of digits); |
330 |
this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> |
this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
331 |
|
\<cr> pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
332 |
|
\<lf> pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
333 |
|
\<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> |
334 |
</pre> |
</pre> |
335 |
|
The escapes that specify line endings are literal strings, exactly as shown. |
336 |
A backslash followed by anything else just escapes the anything else. If the |
A backslash followed by anything else just escapes the anything else. If the |
337 |
very last character is a backslash, it is ignored. This gives a way of passing |
very last character is a backslash, it is ignored. This gives a way of passing |
338 |
an empty line as data, since a real empty line terminates the data input. |
an empty line as data, since a real empty line terminates the data input. |
339 |
</P> |
</P> |
340 |
<P> |
<P> |
341 |
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with |
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with |
342 |
different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data |
different values in the <i>match_limit</i> and <i>match_limit_recursion</i> |
343 |
structure, until it finds the minimum number that is needed for |
fields of the <b>pcre_extra</b> data structure, until it finds the minimum |
344 |
<b>pcre_exec()</b> to complete. This number is a measure of the amount of |
numbers for each parameter that allow <b>pcre_exec()</b> to complete. The |
345 |
recursion and backtracking that takes place, and checking it out can be |
<i>match_limit</i> number is a measure of the amount of backtracking that takes |
346 |
instructive. For most simple matches, the number is quite small, but for |
place, and checking it out can be instructive. For most simple matches, the |
347 |
patterns with very large numbers of matching possibilities, it can become large |
number is quite small, but for patterns with very large numbers of matching |
348 |
very quickly with increasing length of subject string. |
possibilities, it can become large very quickly with increasing length of |
349 |
|
subject string. The <i>match_limit_recursion</i> number is a measure of how much |
350 |
|
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed |
351 |
|
to complete the match attempt. |
352 |
</P> |
</P> |
353 |
<P> |
<P> |
354 |
When \O is used, the value specified may be higher or lower than the size set |
When \O is used, the value specified may be higher or lower than the size set |
357 |
</P> |
</P> |
358 |
<P> |
<P> |
359 |
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper |
360 |
API to be used, only \B and \Z have any effect, causing REG_NOTBOL and |
API to be used, the only option-setting sequences that have any effect are \B |
361 |
REG_NOTEOL to be passed to <b>regexec()</b> respectively. |
and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to |
362 |
|
<b>regexec()</b>. |
363 |
</P> |
</P> |
364 |
<P> |
<P> |
365 |
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use |
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use |
443 |
<P> |
<P> |
444 |
Note that while patterns can be continued over several lines (a plain ">" |
Note that while patterns can be continued over several lines (a plain ">" |
445 |
prompt is used for continuations), data lines may not. However newlines can be |
prompt is used for continuations), data lines may not. However newlines can be |
446 |
included in data by means of the \n escape. |
included in data by means of the \n escape (or \r or \r\n for those newline |
447 |
|
settings). |
448 |
</P> |
</P> |
449 |
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> |
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br> |
450 |
<P> |
<P> |
608 |
Cambridge CB2 3QG, England. |
Cambridge CB2 3QG, England. |
609 |
</P> |
</P> |
610 |
<P> |
<P> |
611 |
Last updated: 28 February 2005 |
Last updated: 29 June 2006 |
612 |
<br> |
<br> |
613 |
Copyright © 1997-2005 University of Cambridge. |
Copyright © 1997-2006 University of Cambridge. |
614 |
<p> |
<p> |
615 |
Return to the <a href="index.html">PCRE index page</a>. |
Return to the <a href="index.html">PCRE index page</a>. |
616 |
</p> |
</p> |