/[pcre]/code/trunk/doc/html/pcretest.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcretest.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 18  man page, in case the conversion went wr Line 18  man page, in case the conversion went wr
18  <li><a name="TOC3" href="#SEC3">DESCRIPTION</a>  <li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
19  <li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>  <li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
20  <li><a name="TOC5" href="#SEC5">DATA LINES</a>  <li><a name="TOC5" href="#SEC5">DATA LINES</a>
21  <li><a name="TOC6" href="#SEC6">OUTPUT FROM PCRETEST</a>  <li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
22  <li><a name="TOC7" href="#SEC7">CALLOUTS</a>  <li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
23  <li><a name="TOC8" href="#SEC8">SAVING AND RELOADING COMPILED PATTERNS</a>  <li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
24  <li><a name="TOC9" href="#SEC9">AUTHOR</a>  <li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
25    <li><a name="TOC10" href="#SEC10">CALLOUTS</a>
26    <li><a name="TOC11" href="#SEC11">SAVING AND RELOADING COMPILED PATTERNS</a>
27    <li><a name="TOC12" href="#SEC12">AUTHOR</a>
28  </ul>  </ul>
29  <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>  <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
30  <P>  <P>
31  <b>pcretest [-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]</b>  <b>pcretest [options] [source] [destination]</b>
32  <b>[destination]</b>  <br>
33  </P>  <br>
 <P>  
34  <b>pcretest</b> was written as a test program for the PCRE regular expression  <b>pcretest</b> was written as a test program for the PCRE regular expression
35  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
36  expressions. This document describes the features of the test program; for  expressions. This document describes the features of the test program; for
# Line 47  about the optional features that are inc Line 49  about the optional features that are inc
49  </P>  </P>
50  <P>  <P>
51  <b>-d</b>  <b>-d</b>
52  Behave as if each regex had the <b>/D</b> (debug) modifier; the internal  Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
53  form is output after compilation.  form is output after compilation.
54  </P>  </P>
55  <P>  <P>
56    <b>-dfa</b>
57    Behave as if each data line contains the \D escape sequence; this causes the
58    alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
59    standard <b>pcre_exec()</b> function (more detail is given below).
60    </P>
61    <P>
62  <b>-i</b>  <b>-i</b>
63  Behave as if each regex had the <b>/I</b> modifier; information about the  Behave as if each regex has the <b>/I</b> modifier; information about the
64  compiled pattern is given after compilation.  compiled pattern is given after compilation.
65  </P>  </P>
66  <P>  <P>
# Line 70  matching calls by including \O in the da Line 78  matching calls by including \O in the da
78  </P>  </P>
79  <P>  <P>
80  <b>-p</b>  <b>-p</b>
81  Behave as if each regex has <b>/P</b> modifier; the POSIX wrapper API is used  Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
82  to call PCRE. None of the other options has any effect when <b>-p</b> is set.  used to call PCRE. None of the other options has any effect when <b>-p</b> is
83    set.
84    </P>
85    <P>
86    <b>-q</b>
87    Do not output the version number of <b>pcretest</b> at the start of execution.
88    </P>
89    <P>
90    <b>-S</b> <i>size</i>
91    On Unix-like systems, set the size of the runtime stack to <i>size</i>
92    megabytes.
93  </P>  </P>
94  <P>  <P>
95  <b>-t</b>  <b>-t</b>
# Line 95  lines to be matched against the pattern. Line 113  lines to be matched against the pattern.
113  </P>  </P>
114  <P>  <P>
115  Each data line is matched separately and independently. If you want to do  Each data line is matched separately and independently. If you want to do
116  multiple-line matches, you have to use the \n escape sequence in a single line  multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
117  of input to encode the newline characters. The maximum length of data line is  depending on the newline setting) in a single line of input to encode the
118  30,000 characters.  newline characters. There is no limit on the length of data lines; the input
119    buffer is automatically extended if it is too small.
120  </P>  </P>
121  <P>  <P>
122  An empty line signals the end of the data lines, at which point a new regular  An empty line signals the end of the data lines, at which point a new regular
123  expression is read. The regular expressions are given enclosed in any  expression is read. The regular expressions are given enclosed in any
124  non-alphanumeric delimiters other than backslash, for example  non-alphanumeric delimiters other than backslash, for example:
125  <pre>  <pre>
126    /(a|bc)x+yz/    /(a|bc)x+yz/
127  </pre>  </pre>
# Line 149  effect as they do in Perl. For example: Line 168  effect as they do in Perl. For example:
168  The following table shows additional modifiers for setting PCRE options that do  The following table shows additional modifiers for setting PCRE options that do
169  not correspond to anything in Perl:  not correspond to anything in Perl:
170  <pre>  <pre>
171    <b>/A</b>    PCRE_ANCHORED    <b>/A</b>       PCRE_ANCHORED
172    <b>/C</b>    PCRE_AUTO_CALLOUT    <b>/C</b>       PCRE_AUTO_CALLOUT
173    <b>/E</b>    PCRE_DOLLAR_ENDONLY    <b>/E</b>       PCRE_DOLLAR_ENDONLY
174    <b>/N</b>    PCRE_NO_AUTO_CAPTURE    <b>/f</b>       PCRE_FIRSTLINE
175    <b>/U</b>    PCRE_UNGREEDY    <b>/J</b>       PCRE_DUPNAMES
176    <b>/X</b>    PCRE_EXTRA    <b>/N</b>       PCRE_NO_AUTO_CAPTURE
177      <b>/U</b>       PCRE_UNGREEDY
178      <b>/X</b>       PCRE_EXTRA
179      <b>/&#60;cr&#62;</b>    PCRE_NEWLINE_CR
180      <b>/&#60;lf&#62;</b>    PCRE_NEWLINE_LF
181      <b>/&#60;crlf&#62;</b>  PCRE_NEWLINE_CRLF
182  </pre>  </pre>
183    Those specifying line endings are literal strings as shown. Details of the
184    meanings of these PCRE options are given in the
185    <a href="pcreapi.html"><b>pcreapi</b></a>
186    documentation.
187    </P>
188    <br><b>
189    Finding all matches in a string
190    </b><br>
191    <P>
192  Searching for all possible matches within each subject string can be requested  Searching for all possible matches within each subject string can be requested
193  by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called  by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
194  again to search the remainder of the subject string. The difference between  again to search the remainder of the subject string. The difference between
# Line 173  If this second match fails, the start of Line 206  If this second match fails, the start of
206  match is retried. This imitates the way Perl handles such cases when using the  match is retried. This imitates the way Perl handles such cases when using the
207  <b>/g</b> modifier or the <b>split()</b> function.  <b>/g</b> modifier or the <b>split()</b> function.
208  </P>  </P>
209    <br><b>
210    Other modifiers
211    </b><br>
212  <P>  <P>
213  There are yet more modifiers for controlling the way <b>pcretest</b>  There are yet more modifiers for controlling the way <b>pcretest</b>
214  operates.  operates.
# Line 258  recognized: Line 294  recognized:
294    \e         escape    \e         escape
295    \f         formfeed    \f         formfeed
296    \n         newline    \n         newline
297      \qdd       set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
298    \r         carriage return    \r         carriage return
299    \t         tab    \t         tab
300    \v         vertical tab    \v         vertical tab
301    \nnn       octal character (up to 3 octal digits)    \nnn       octal character (up to 3 octal digits)
302    \xhh       hexadecimal character (up to 2 hex digits)    \xhh       hexadecimal character (up to 2 hex digits)
303    \x{hh...}  hexadecimal character, any number of digits in UTF-8 mode    \x{hh...}  hexadecimal character, any number of digits in UTF-8 mode
304    \A         pass the PCRE_ANCHORED option to <b>pcre_exec()</b>    \A         pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
305    \B         pass the PCRE_NOTBOL option to <b>pcre_exec()</b>    \B         pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
306    \Cdd       call pcre_copy_substring() for substring dd after a successful match (number less than 32)    \Cdd       call pcre_copy_substring() for substring dd after a successful match (number less than 32)
307    \Cname     call pcre_copy_named_substring() for substring "name" after a successful match (name termin-    \Cname     call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
308                 ated by next non alphanumeric character)                 ated by next non alphanumeric character)
# Line 274  recognized: Line 311  recognized:
311    \C!n       return 1 instead of 0 when callout number n is reached    \C!n       return 1 instead of 0 when callout number n is reached
312    \C!n!m     return 1 instead of 0 when callout number n is reached for the nth time    \C!n!m     return 1 instead of 0 when callout number n is reached for the nth time
313    \C*n       pass the number n (may be negative) as callout data; this is used as the callout return value    \C*n       pass the number n (may be negative) as callout data; this is used as the callout return value
314      \D         use the <b>pcre_dfa_exec()</b> match function
315      \F         only shortest match for <b>pcre_dfa_exec()</b>
316    \Gdd       call pcre_get_substring() for substring dd after a successful match (number less than 32)    \Gdd       call pcre_get_substring() for substring dd after a successful match (number less than 32)
317    \Gname     call pcre_get_named_substring() for substring "name" after a successful match (name termin-    \Gname     call pcre_get_named_substring() for substring "name" after a successful match (name termin-
318                 ated by next non-alphanumeric character)                 ated by next non-alphanumeric character)
319    \L         call pcre_get_substringlist() after a successful match    \L         call pcre_get_substringlist() after a successful match
320    \M         discover the minimum MATCH_LIMIT setting    \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
321    \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b>    \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
322    \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)    \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
323    \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b>    \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
324      \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
325      \R         pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
326    \S         output details of memory get/free calls during matching    \S         output details of memory get/free calls during matching
327    \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b>    \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
328    \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b>    \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
329    \&#62;dd       start the match at offset dd (any number of digits);    \&#62;dd       start the match at offset dd (any number of digits);
330                 this sets the <i>startoffset</i> argument for <b>pcre_exec()</b>                 this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
331      \&#60;cr&#62;      pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
332      \&#60;lf&#62;      pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
333      \&#60;crlf&#62;    pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
334  </pre>  </pre>
335    The escapes that specify line endings are literal strings, exactly as shown.
336  A backslash followed by anything else just escapes the anything else. If the  A backslash followed by anything else just escapes the anything else. If the
337  very last character is a backslash, it is ignored. This gives a way of passing  very last character is a backslash, it is ignored. This gives a way of passing
338  an empty line as data, since a real empty line terminates the data input.  an empty line as data, since a real empty line terminates the data input.
339  </P>  </P>
340  <P>  <P>
341  If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with  If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
342  different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data  different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
343  structure, until it finds the minimum number that is needed for  fields of the <b>pcre_extra</b> data structure, until it finds the minimum
344  <b>pcre_exec()</b> to complete. This number is a measure of the amount of  numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
345  recursion and backtracking that takes place, and checking it out can be  <i>match_limit</i> number is a measure of the amount of backtracking that takes
346  instructive. For most simple matches, the number is quite small, but for  place, and checking it out can be instructive. For most simple matches, the
347  patterns with very large numbers of matching possibilities, it can become large  number is quite small, but for patterns with very large numbers of matching
348  very quickly with increasing length of subject string.  possibilities, it can become large very quickly with increasing length of
349    subject string. The <i>match_limit_recursion</i> number is a measure of how much
350    stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
351    to complete the match attempt.
352  </P>  </P>
353  <P>  <P>
354  When \O is used, the value specified may be higher or lower than the size set  When \O is used, the value specified may be higher or lower than the size set
# Line 309  the call of <b>pcre_exec()</b> for the l Line 357  the call of <b>pcre_exec()</b> for the l
357  </P>  </P>
358  <P>  <P>
359  If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper  If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
360  API to be used, only \B and \Z have any effect, causing REG_NOTBOL and  API to be used, the only option-setting sequences that have any effect are \B
361  REG_NOTEOL to be passed to <b>regexec()</b> respectively.  and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
362    <b>regexec()</b>.
363  </P>  </P>
364  <P>  <P>
365  The use of \x{hh...} to represent UTF-8 characters is not dependent on the use  The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
# Line 318  of the <b>/8</b> modifier on the pattern Line 367  of the <b>/8</b> modifier on the pattern
367  any number of hexadecimal digits inside the braces. The result is from one to  any number of hexadecimal digits inside the braces. The result is from one to
368  six bytes, encoded according to the UTF-8 rules.  six bytes, encoded according to the UTF-8 rules.
369  </P>  </P>
370  <br><a name="SEC6" href="#TOC1">OUTPUT FROM PCRETEST</a><br>  <br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
371    <P>
372    By default, <b>pcretest</b> uses the standard PCRE matching function,
373    <b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
374    alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
375    different way, and has some restrictions. The differences between the two
376    functions are described in the
377    <a href="pcrematching.html"><b>pcrematching</b></a>
378    documentation.
379    </P>
380    <P>
381    If a data line contains the \D escape sequence, or if the command line
382    contains the <b>-dfa</b> option, the alternative matching function is called.
383    This function finds all possible matches at a given point. If, however, the \F
384    escape sequence is present in the data line, it stops after the first match is
385    found. This is always the shortest possible match.
386    </P>
387    <br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
388    <P>
389    This section describes the output when the normal matching function,
390    <b>pcre_exec()</b>, is being used.
391    </P>
392  <P>  <P>
393  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
394  <b>pcre_exec()</b> returns, starting with number 0 for the string that matched  <b>pcre_exec()</b> returns, starting with number 0 for the string that matched
395  the whole pattern. Otherwise, it outputs "No match" or "Partial match"  the whole pattern. Otherwise, it outputs "No match" or "Partial match"
396  when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,  when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
397  respectively, and otherwise the PCRE negative error number. Here is an example  respectively, and otherwise the PCRE negative error number. Here is an example
398  of an interactive pcretest run.  of an interactive <b>pcretest</b> run.
399  <pre>  <pre>
400    $ pcretest    $ pcretest
401    PCRE version 5.00 07-Sep-2004    PCRE version 5.00 07-Sep-2004
# Line 373  parentheses after each string for <b>\C< Line 443  parentheses after each string for <b>\C<
443  <P>  <P>
444  Note that while patterns can be continued over several lines (a plain "&#62;"  Note that while patterns can be continued over several lines (a plain "&#62;"
445  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
446  included in data by means of the \n escape.  included in data by means of the \n escape (or \r or \r\n for those newline
447    settings).
448    </P>
449    <br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
450    <P>
451    When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
452    means of the \D escape sequence or the <b>-dfa</b> command line option), the
453    output consists of a list of all the matches that start at the first point in
454    the subject where there is at least one match. For example:
455    <pre>
456        re&#62; /(tang|tangerine|tan)/
457      data&#62; yellow tangerine\D
458       0: tangerine
459       1: tang
460       2: tan
461    </pre>
462    (Using the normal matching function on this data finds only "tang".) The
463    longest matching string is always given first (and numbered zero).
464    </P>
465    <P>
466    If \fB/g\P is present on the pattern, the search for further matches resumes
467    at the end of the longest match. For example:
468    <pre>
469        re&#62; /(tang|tangerine|tan)/g
470      data&#62; yellow tangerine and tangy sultana\D
471       0: tangerine
472       1: tang
473       2: tan
474       0: tang
475       1: tan
476       0: tan
477    </pre>
478    Since the matching function does not support substring capture, the escape
479    sequences that are concerned with captured substrings are not relevant.
480    </P>
481    <br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
482    <P>
483    When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
484    indicating that the subject partially matched the pattern, you can restart the
485    match with additional subject data by means of the \R escape sequence. For
486    example:
487    <pre>
488        re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
489      data&#62; 23ja\P\D
490      Partial match: 23ja
491      data&#62; n05\R\D
492       0: n05
493    </pre>
494    For further information about partial matching, see the
495    <a href="pcrepartial.html"><b>pcrepartial</b></a>
496    documentation.
497  </P>  </P>
498  <br><a name="SEC7" href="#TOC1">CALLOUTS</a><br>  <br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
499  <P>  <P>
500  If the pattern contains any callout requests, <b>pcretest</b>'s callout function  If the pattern contains any callout requests, <b>pcretest</b>'s callout function
501  is called during matching. By default, it displays the callout number, the  is called during matching. This works with both matching functions. By default,
502  start and current positions in the text at the callout time, and the next  the called function displays the callout number, the start and current
503  pattern item to be tested. For example, the output  positions in the text at the callout time, and the next pattern item to be
504    tested. For example, the output
505  <pre>  <pre>
506    ---&#62;pqrabcdef    ---&#62;pqrabcdef
507      0    ^  ^     \d      0    ^  ^     \d
# Line 406  example: Line 527  example:
527     0: E*     0: E*
528  </pre>  </pre>
529  The callout function in <b>pcretest</b> returns zero (carry on matching) by  The callout function in <b>pcretest</b> returns zero (carry on matching) by
530  default, but you can use an \C item in a data line (as described above) to  default, but you can use a \C item in a data line (as described above) to
531  change this.  change this.
532  </P>  </P>
533  <P>  <P>
# Line 416  the Line 537  the
537  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
538  documentation.  documentation.
539  </P>  </P>
540  <br><a name="SEC8" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>  <br><a name="SEC11" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
541  <P>  <P>
542  The facilities described in this section are not available when the POSIX  The facilities described in this section are not available when the POSIX
543  inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is  inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
# Line 478  string using a reloaded pattern is likel Line 599  string using a reloaded pattern is likel
599  Finally, if you attempt to load a file that is not in the correct format, the  Finally, if you attempt to load a file that is not in the correct format, the
600  result is undefined.  result is undefined.
601  </P>  </P>
602  <br><a name="SEC9" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC12" href="#TOC1">AUTHOR</a><br>
603  <P>  <P>
604  Philip Hazel &#60;ph10@cam.ac.uk&#62;  Philip Hazel
605  <br>  <br>
606  University Computing Service,  University Computing Service,
607  <br>  <br>
608  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
609  </P>  </P>
610  <P>  <P>
611  Last updated: 10 September 2004  Last updated: 29 June 2006
612  <br>  <br>
613  Copyright &copy; 1997-2004 University of Cambridge.  Copyright &copy; 1997-2006 University of Cambridge.
614  <p>  <p>
615  Return to the <a href="index.html">PCRE index page</a>.  Return to the <a href="index.html">PCRE index page</a>.
616  </p>  </p>

Legend:
Removed from v.75  
changed lines
  Added in v.91

  ViewVC Help
Powered by ViewVC 1.1.5