/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 4  pcretest - a program for testing Perl-co Line 4  pcretest - a program for testing Perl-co
4  .SH SYNOPSIS  .SH SYNOPSIS
5  .rs  .rs
6  .sp  .sp
7  .B pcretest "[-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]"  .B pcretest "[options] [source] [destination]"
8  .ti +5n  .sp
 .B "[destination]"  
 .P  
9  \fBpcretest\fP was written as a test program for the PCRE regular expression  \fBpcretest\fP was written as a test program for the PCRE regular expression
10  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
11  expressions. This document describes the features of the test program; for  expressions. This document describes the features of the test program; for
# Line 31  Output the version number of the PCRE li Line 29  Output the version number of the PCRE li
29  about the optional features that are included, and then exit.  about the optional features that are included, and then exit.
30  .TP 10  .TP 10
31  \fB-d\fP  \fB-d\fP
32  Behave as if each regex had the \fB/D\fP (debug) modifier; the internal  Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
33  form is output after compilation.  form is output after compilation.
34  .TP 10  .TP 10
35    \fB-dfa\fP
36    Behave as if each data line contains the \eD escape sequence; this causes the
37    alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the
38    standard \fBpcre_exec()\fP function (more detail is given below).
39    .TP 10
40  \fB-i\fP  \fB-i\fP
41  Behave as if each regex had the \fB/I\fP modifier; information about the  Behave as if each regex has the \fB/I\fP modifier; information about the
42  compiled pattern is given after compilation.  compiled pattern is given after compilation.
43  .TP 10  .TP 10
44  \fB-m\fP  \fB-m\fP
# Line 50  for 14 capturing subexpressions. The vec Line 53  for 14 capturing subexpressions. The vec
53  matching calls by including \eO in the data line (see below).  matching calls by including \eO in the data line (see below).
54  .TP 10  .TP 10
55  \fB-p\fP  \fB-p\fP
56  Behave as if each regex has \fB/P\fP modifier; the POSIX wrapper API is used  Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
57  to call PCRE. None of the other options has any effect when \fB-p\fP is set.  used to call PCRE. None of the other options has any effect when \fB-p\fP is
58    set.
59    .TP 10
60    \fB-q\fP
61    Do not output the version number of \fBpcretest\fP at the start of execution.
62    .TP 10
63    \fB-S\fP \fIsize\fP
64    On Unix-like systems, set the size of the runtime stack to \fIsize\fP
65    megabytes.
66  .TP 10  .TP 10
67  \fB-t\fP  \fB-t\fP
68  Run each compile, study, and match many times with a timer, and output  Run each compile, study, and match many times with a timer, and output
# Line 74  set starts with a regular expression, an Line 85  set starts with a regular expression, an
85  lines to be matched against the pattern.  lines to be matched against the pattern.
86  .P  .P
87  Each data line is matched separately and independently. If you want to do  Each data line is matched separately and independently. If you want to do
88  multiple-line matches, you have to use the \en escape sequence in a single line  multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
89  of input to encode the newline characters. The maximum length of data line is  depending on the newline setting) in a single line of input to encode the
90  30,000 characters.  newline characters. There is no limit on the length of data lines; the input
91    buffer is automatically extended if it is too small.
92  .P  .P
93  An empty line signals the end of the data lines, at which point a new regular  An empty line signals the end of the data lines, at which point a new regular
94  expression is read. The regular expressions are given enclosed in any  expression is read. The regular expressions are given enclosed in any
95  non-alphanumeric delimiters other than backslash, for example  non-alphanumeric delimiters other than backslash, for example:
96  .sp  .sp
97    /(a|bc)x+yz/    /(a|bc)x+yz/
98  .sp  .sp
# Line 128  effect as they do in Perl. For example: Line 140  effect as they do in Perl. For example:
140  The following table shows additional modifiers for setting PCRE options that do  The following table shows additional modifiers for setting PCRE options that do
141  not correspond to anything in Perl:  not correspond to anything in Perl:
142  .sp  .sp
143    \fB/A\fP    PCRE_ANCHORED    \fB/A\fP       PCRE_ANCHORED
144    \fB/C\fP    PCRE_AUTO_CALLOUT    \fB/C\fP       PCRE_AUTO_CALLOUT
145    \fB/E\fP    PCRE_DOLLAR_ENDONLY    \fB/E\fP       PCRE_DOLLAR_ENDONLY
146    \fB/N\fP    PCRE_NO_AUTO_CAPTURE    \fB/f\fP       PCRE_FIRSTLINE
147    \fB/U\fP    PCRE_UNGREEDY    \fB/J\fP       PCRE_DUPNAMES
148    \fB/X\fP    PCRE_EXTRA    \fB/N\fP       PCRE_NO_AUTO_CAPTURE
149      \fB/U\fP       PCRE_UNGREEDY
150      \fB/X\fP       PCRE_EXTRA
151      \fB/<cr>\fP    PCRE_NEWLINE_CR
152      \fB/<lf>\fP    PCRE_NEWLINE_LF
153      \fB/<crlf>\fP  PCRE_NEWLINE_CRLF
154    .sp
155    Those specifying line endings are literal strings as shown. Details of the
156    meanings of these PCRE options are given in the
157    .\" HREF
158    \fBpcreapi\fP
159    .\"
160    documentation.
161    .
162    .
163    .SS "Finding all matches in a string"
164    .rs
165  .sp  .sp
166  Searching for all possible matches within each subject string can be requested  Searching for all possible matches within each subject string can be requested
167  by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called  by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called
# Line 150  flags set in order to search for another Line 178  flags set in order to search for another
178  If this second match fails, the start offset is advanced by one, and the normal  If this second match fails, the start offset is advanced by one, and the normal
179  match is retried. This imitates the way Perl handles such cases when using the  match is retried. This imitates the way Perl handles such cases when using the
180  \fB/g\fP modifier or the \fBsplit()\fP function.  \fB/g\fP modifier or the \fBsplit()\fP function.
181  .P  .
182    .
183    .SS "Other modifiers"
184    .rs
185    .sp
186  There are yet more modifiers for controlling the way \fBpcretest\fP  There are yet more modifiers for controlling the way \fBpcretest\fP
187  operates.  operates.
188  .P  .P
# Line 227  recognized: Line 259  recognized:
259    \ee         escape    \ee         escape
260    \ef         formfeed    \ef         formfeed
261    \en         newline    \en         newline
262    .\" JOIN
263      \eqdd       set the PCRE_MATCH_LIMIT limit to dd
264                   (any number of digits)
265    \er         carriage return    \er         carriage return
266    \et         tab    \et         tab
267    \ev         vertical tab    \ev         vertical tab
# Line 235  recognized: Line 270  recognized:
270  .\" JOIN  .\" JOIN
271    \ex{hh...}  hexadecimal character, any number of digits    \ex{hh...}  hexadecimal character, any number of digits
272                 in UTF-8 mode                 in UTF-8 mode
273    .\" JOIN
274    \eA         pass the PCRE_ANCHORED option to \fBpcre_exec()\fP    \eA         pass the PCRE_ANCHORED option to \fBpcre_exec()\fP
275                   or \fBpcre_dfa_exec()\fP
276    .\" JOIN
277    \eB         pass the PCRE_NOTBOL option to \fBpcre_exec()\fP    \eB         pass the PCRE_NOTBOL option to \fBpcre_exec()\fP
278                   or \fBpcre_dfa_exec()\fP
279  .\" JOIN  .\" JOIN
280    \eCdd       call pcre_copy_substring() for substring dd    \eCdd       call pcre_copy_substring() for substring dd
281                 after a successful match (number less than 32)                 after a successful match (number less than 32)
# Line 257  recognized: Line 296  recognized:
296  .\" JOIN  .\" JOIN
297    \eC*n       pass the number n (may be negative) as callout    \eC*n       pass the number n (may be negative) as callout
298                 data; this is used as the callout return value                 data; this is used as the callout return value
299      \eD         use the \fBpcre_dfa_exec()\fP match function
300      \eF         only shortest match for \fBpcre_dfa_exec()\fP
301  .\" JOIN  .\" JOIN
302    \eGdd       call pcre_get_substring() for substring dd    \eGdd       call pcre_get_substring() for substring dd
303                 after a successful match (number less than 32)                 after a successful match (number less than 32)
# Line 267  recognized: Line 308  recognized:
308  .\" JOIN  .\" JOIN
309    \eL         call pcre_get_substringlist() after a    \eL         call pcre_get_substringlist() after a
310                 successful match                 successful match
311    \eM         discover the minimum MATCH_LIMIT setting  .\" JOIN
312      \eM         discover the minimum MATCH_LIMIT and
313                   MATCH_LIMIT_RECURSION settings
314    .\" JOIN
315    \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP    \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
316                   or \fBpcre_dfa_exec()\fP
317  .\" JOIN  .\" JOIN
318    \eOdd       set the size of the output vector passed to    \eOdd       set the size of the output vector passed to
319                 \fBpcre_exec()\fP to dd (any number of digits)                 \fBpcre_exec()\fP to dd (any number of digits)
320    .\" JOIN
321    \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP    \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
322                   or \fBpcre_dfa_exec()\fP
323    .\" JOIN
324      \eQdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd
325                   (any number of digits)
326      \eR         pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP
327    \eS         output details of memory get/free calls during matching    \eS         output details of memory get/free calls during matching
328    .\" JOIN
329    \eZ         pass the PCRE_NOTEOL option to \fBpcre_exec()\fP    \eZ         pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
330                   or \fBpcre_dfa_exec()\fP
331  .\" JOIN  .\" JOIN
332    \e?         pass the PCRE_NO_UTF8_CHECK option to    \e?         pass the PCRE_NO_UTF8_CHECK option to
333                 \fBpcre_exec()\fP                 \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
334    \e>dd       start the match at offset dd (any number of digits);    \e>dd       start the match at offset dd (any number of digits);
335    .\" JOIN
336                 this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP                 this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP
337                   or \fBpcre_dfa_exec()\fP
338    .\" JOIN
339      \e<cr>      pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP
340                   or \fBpcre_dfa_exec()\fP
341    .\" JOIN
342      \e<lf>      pass the PCRE_NEWLINE_LF option to \fBpcre_exec()\fP
343                   or \fBpcre_dfa_exec()\fP
344    .\" JOIN
345      \e<crlf>    pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP
346                   or \fBpcre_dfa_exec()\fP
347  .sp  .sp
348    The escapes that specify line endings are literal strings, exactly as shown.
349  A backslash followed by anything else just escapes the anything else. If the  A backslash followed by anything else just escapes the anything else. If the
350  very last character is a backslash, it is ignored. This gives a way of passing  very last character is a backslash, it is ignored. This gives a way of passing
351  an empty line as data, since a real empty line terminates the data input.  an empty line as data, since a real empty line terminates the data input.
352  .P  .P
353  If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with  If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
354  different values in the \fImatch_limit\fP field of the \fBpcre_extra\fP data  different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP
355  structure, until it finds the minimum number that is needed for  fields of the \fBpcre_extra\fP data structure, until it finds the minimum
356  \fBpcre_exec()\fP to complete. This number is a measure of the amount of  numbers for each parameter that allow \fBpcre_exec()\fP to complete. The
357  recursion and backtracking that takes place, and checking it out can be  \fImatch_limit\fP number is a measure of the amount of backtracking that takes
358  instructive. For most simple matches, the number is quite small, but for  place, and checking it out can be instructive. For most simple matches, the
359  patterns with very large numbers of matching possibilities, it can become large  number is quite small, but for patterns with very large numbers of matching
360  very quickly with increasing length of subject string.  possibilities, it can become large very quickly with increasing length of
361    subject string. The \fImatch_limit_recursion\fP number is a measure of how much
362    stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
363    to complete the match attempt.
364  .P  .P
365  When \eO is used, the value specified may be higher or lower than the size set  When \eO is used, the value specified may be higher or lower than the size set
366  by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to  by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
367  the call of \fBpcre_exec()\fP for the line in which it appears.  the call of \fBpcre_exec()\fP for the line in which it appears.
368  .P  .P
369  If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper  If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
370  API to be used, only \eB and \eZ have any effect, causing REG_NOTBOL and  API to be used, the only option-setting sequences that have any effect are \eB
371  REG_NOTEOL to be passed to \fBregexec()\fP respectively.  and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
372    \fBregexec()\fP.
373  .P  .P
374  The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use  The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
375  of the \fB/8\fP modifier on the pattern. It is recognized always. There may be  of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
# Line 308  any number of hexadecimal digits inside Line 377  any number of hexadecimal digits inside
377  six bytes, encoded according to the UTF-8 rules.  six bytes, encoded according to the UTF-8 rules.
378  .  .
379  .  .
380  .SH "OUTPUT FROM PCRETEST"  .SH "THE ALTERNATIVE MATCHING FUNCTION"
381    .rs
382    .sp
383    By default, \fBpcretest\fP uses the standard PCRE matching function,
384    \fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an
385    alternative matching function, \fBpcre_dfa_test()\fP, which operates in a
386    different way, and has some restrictions. The differences between the two
387    functions are described in the
388    .\" HREF
389    \fBpcrematching\fP
390    .\"
391    documentation.
392    .P
393    If a data line contains the \eD escape sequence, or if the command line
394    contains the \fB-dfa\fP option, the alternative matching function is called.
395    This function finds all possible matches at a given point. If, however, the \eF
396    escape sequence is present in the data line, it stops after the first match is
397    found. This is always the shortest possible match.
398    .
399    .
400    .SH "DEFAULT OUTPUT FROM PCRETEST"
401  .rs  .rs
402  .sp  .sp
403    This section describes the output when the normal matching function,
404    \fBpcre_exec()\fP, is being used.
405    .P
406  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
407  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched
408  the whole pattern. Otherwise, it outputs "No match" or "Partial match"  the whole pattern. Otherwise, it outputs "No match" or "Partial match"
409  when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,  when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
410  respectively, and otherwise the PCRE negative error number. Here is an example  respectively, and otherwise the PCRE negative error number. Here is an example
411  of an interactive pcretest run.  of an interactive \fBpcretest\fP run.
412  .sp  .sp
413    $ pcretest    $ pcretest
414    PCRE version 5.00 07-Sep-2004    PCRE version 5.00 07-Sep-2004
# Line 362  parentheses after each string for \fB\eC Line 454  parentheses after each string for \fB\eC
454  .P  .P
455  Note that while patterns can be continued over several lines (a plain ">"  Note that while patterns can be continued over several lines (a plain ">"
456  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
457  included in data by means of the \en escape.  included in data by means of the \en escape (or \er or \er\en for those newline
458    settings).
459    .
460    .
461    .SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
462    .rs
463    .sp
464    When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by
465    means of the \eD escape sequence or the \fB-dfa\fP command line option), the
466    output consists of a list of all the matches that start at the first point in
467    the subject where there is at least one match. For example:
468    .sp
469        re> /(tang|tangerine|tan)/
470      data> yellow tangerine\eD
471       0: tangerine
472       1: tang
473       2: tan
474    .sp
475    (Using the normal matching function on this data finds only "tang".) The
476    longest matching string is always given first (and numbered zero).
477    .P
478    If \fB/g\P is present on the pattern, the search for further matches resumes
479    at the end of the longest match. For example:
480    .sp
481        re> /(tang|tangerine|tan)/g
482      data> yellow tangerine and tangy sultana\eD
483       0: tangerine
484       1: tang
485       2: tan
486       0: tang
487       1: tan
488       0: tan
489    .sp
490    Since the matching function does not support substring capture, the escape
491    sequences that are concerned with captured substrings are not relevant.
492    .
493    .
494    .SH "RESTARTING AFTER A PARTIAL MATCH"
495    .rs
496    .sp
497    When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
498    indicating that the subject partially matched the pattern, you can restart the
499    match with additional subject data by means of the \eR escape sequence. For
500    example:
501    .sp
502        re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
503      data> 23ja\eP\eD
504      Partial match: 23ja
505      data> n05\eR\eD
506       0: n05
507    .sp
508    For further information about partial matching, see the
509    .\" HREF
510    \fBpcrepartial\fP
511    .\"
512    documentation.
513  .  .
514  .  .
515  .SH CALLOUTS  .SH CALLOUTS
516  .rs  .rs
517  .sp  .sp
518  If the pattern contains any callout requests, \fBpcretest\fP's callout function  If the pattern contains any callout requests, \fBpcretest\fP's callout function
519  is called during matching. By default, it displays the callout number, the  is called during matching. This works with both matching functions. By default,
520  start and current positions in the text at the callout time, and the next  the called function displays the callout number, the start and current
521  pattern item to be tested. For example, the output  positions in the text at the callout time, and the next pattern item to be
522    tested. For example, the output
523  .sp  .sp
524    --->pqrabcdef    --->pqrabcdef
525      0    ^  ^     \ed      0    ^  ^     \ed
# Line 396  example: Line 544  example:
544     0: E*     0: E*
545  .sp  .sp
546  The callout function in \fBpcretest\fP returns zero (carry on matching) by  The callout function in \fBpcretest\fP returns zero (carry on matching) by
547  default, but you can use an \eC item in a data line (as described above) to  default, but you can use a \eC item in a data line (as described above) to
548  change this.  change this.
549  .P  .P
550  Inserting callouts can be helpful when using \fBpcretest\fP to check  Inserting callouts can be helpful when using \fBpcretest\fP to check
# Line 471  result is undefined. Line 619  result is undefined.
619  .SH AUTHOR  .SH AUTHOR
620  .rs  .rs
621  .sp  .sp
622  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel
623  .br  .br
624  University Computing Service,  University Computing Service,
625  .br  .br
626  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
627  .P  .P
628  .in 0  .in 0
629  Last updated: 10 September 2004  Last updated: 29 June 2006
630  .br  .br
631  Copyright (c) 1997-2004 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.75  
changed lines
  Added in v.91

  ViewVC Help
Powered by ViewVC 1.1.5