/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 579 by ph10, Wed Nov 24 17:39:25 2010 UTC revision 626 by ph10, Wed Jul 20 17:51:54 2011 UTC
# Line 4  pcretest - a program for testing Perl-co Line 4  pcretest - a program for testing Perl-co
4  .SH SYNOPSIS  .SH SYNOPSIS
5  .rs  .rs
6  .sp  .sp
7  .B pcretest "[options] [source] [destination]"  .B pcretest "[options] [input file [output file]]"
8  .sp  .sp
9  \fBpcretest\fP was written as a test program for the PCRE regular expression  \fBpcretest\fP was written as a test program for the PCRE regular expression
10  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
# Line 18  options, see the Line 18  options, see the
18  .\" HREF  .\" HREF
19  \fBpcreapi\fP  \fBpcreapi\fP
20  .\"  .\"
21  documentation.  documentation. The input for \fBpcretest\fP is a sequence of regular expression
22    patterns and strings to be matched, as described below. The output shows the
23    result of each match. Options on the command line and the patterns control PCRE
24    options and exactly what is output.
25  .  .
26  .  .
27  .SH OPTIONS  .SH COMMAND LINE OPTIONS
28  .rs  .rs
29  .TP 10  .TP 10
30  \fB-b\fP  \fB-b\fP
31  Behave as if each regex has the \fB/B\fP (show bytecode) modifier; the internal  Behave as if each pattern has the \fB/B\fP (show byte code) modifier; the
32  form is output after compilation.  internal form is output after compilation.
33  .TP 10  .TP 10
34  \fB-C\fP  \fB-C\fP
35  Output the version number of the PCRE library, and all available information  Output the version number of the PCRE library, and all available information
36  about the optional features that are included, and then exit.  about the optional features that are included, and then exit.
37  .TP 10  .TP 10
38  \fB-d\fP  \fB-d\fP
39  Behave as if each regex has the \fB/D\fP (debug) modifier; the internal  Behave as if each pattern has the \fB/D\fP (debug) modifier; the internal
40  form and information about the compiled pattern is output after compilation;  form and information about the compiled pattern is output after compilation;
41  \fB-d\fP is equivalent to \fB-b -i\fP.  \fB-d\fP is equivalent to \fB-b -i\fP.
42  .TP 10  .TP 10
# Line 46  standard \fBpcre_exec()\fP function (mor Line 49  standard \fBpcre_exec()\fP function (mor
49  Output a brief summary these options and then exit.  Output a brief summary these options and then exit.
50  .TP 10  .TP 10
51  \fB-i\fP  \fB-i\fP
52  Behave as if each regex has the \fB/I\fP modifier; information about the  Behave as if each pattern has the \fB/I\fP modifier; information about the
53  compiled pattern is given after compilation.  compiled pattern is given after compilation.
54  .TP 10  .TP 10
55  \fB-M\fP  \fB-M\fP
# Line 56  calling \fBpcre_exec()\fP repeatedly wit Line 59  calling \fBpcre_exec()\fP repeatedly wit
59  .TP 10  .TP 10
60  \fB-m\fP  \fB-m\fP
61  Output the size of each compiled pattern after it has been compiled. This is  Output the size of each compiled pattern after it has been compiled. This is
62  equivalent to adding \fB/M\fP to each regular expression. For compatibility  equivalent to adding \fB/M\fP to each regular expression.
 with earlier versions of pcretest, \fB-s\fP is a synonym for \fB-m\fP.  
63  .TP 10  .TP 10
64  \fB-o\fP \fIosize\fP  \fB-o\fP \fIosize\fP
65  Set the number of elements in the output vector that is used when calling  Set the number of elements in the output vector that is used when calling
# Line 68  changed for individual matching calls by Line 70  changed for individual matching calls by
70  below).  below).
71  .TP 10  .TP 10
72  \fB-p\fP  \fB-p\fP
73  Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is  Behave as if each pattern has the \fB/P\fP modifier; the POSIX wrapper API is
74  used to call PCRE. None of the other options has any effect when \fB-p\fP is  used to call PCRE. None of the other options has any effect when \fB-p\fP is
75  set.  set.
76  .TP 10  .TP 10
# Line 76  set. Line 78  set.
78  Do not output the version number of \fBpcretest\fP at the start of execution.  Do not output the version number of \fBpcretest\fP at the start of execution.
79  .TP 10  .TP 10
80  \fB-S\fP \fIsize\fP  \fB-S\fP \fIsize\fP
81  On Unix-like systems, set the size of the runtime stack to \fIsize\fP  On Unix-like systems, set the size of the run-time stack to \fIsize\fP
82  megabytes.  megabytes.
83  .TP 10  .TP 10
84    \fB-s\fP
85    Behave as if each pattern has the \fB/S\fP modifier; in other words, force each
86    pattern to be studied. If the \fB/I\fP or \fB/D\fP option is present on a
87    pattern (requesting output about the compiled pattern), information about the
88    result of studying is not included when studying is caused only by \fB-s\fP and
89    neither \fB-i\fP nor \fB-d\fP is present on the command line. This behaviour
90    means that the output from tests that are run with and without \fB-s\fP should
91    be identical, except when options that output information about the actual
92    running of a match are set. The \fB-M\fP, \fB-t\fP, and \fB-tm\fP options,
93    which give information about resources used, are likely to produce different
94    output with and without \fB-s\fP. Output may also differ if the \fB/C\fP option
95    is present on an individual pattern. This uses callouts to trace the the
96    matching process, and this may be different between studied and non-studied
97    patterns. If the pattern contains (*MARK) items there may also be differences,
98    for the same reason. The \fB-s\fP command line option can be overridden for
99    specific patterns that should never be studied (see the /S option below).
100    .TP 10
101  \fB-t\fP  \fB-t\fP
102  Run each compile, study, and match many times with a timer, and output  Run each compile, study, and match many times with a timer, and output
103  resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with  resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
# Line 154  pcretest to read the next line as a cont Line 173  pcretest to read the next line as a cont
173  A pattern may be followed by any number of modifiers, which are mostly single  A pattern may be followed by any number of modifiers, which are mostly single
174  characters. Following Perl usage, these are referred to below as, for example,  characters. Following Perl usage, these are referred to below as, for example,
175  "the \fB/i\fP modifier", even though the delimiter of the pattern need not  "the \fB/i\fP modifier", even though the delimiter of the pattern need not
176  always be a slash, and no slash is used when writing modifiers. Whitespace may  always be a slash, and no slash is used when writing modifiers. White space may
177  appear between the final pattern delimiter and the first modifier, and between  appear between the final pattern delimiter and the first modifier, and between
178  the modifiers themselves.  the modifiers themselves.
179  .P  .P
# Line 190  options that do not correspond to anythi Line 209  options that do not correspond to anythi
209    \fB/<bsr_unicode>\fP  PCRE_BSR_UNICODE    \fB/<bsr_unicode>\fP  PCRE_BSR_UNICODE
210  .sp  .sp
211  The modifiers that are enclosed in angle brackets are literal strings as shown,  The modifiers that are enclosed in angle brackets are literal strings as shown,
212  including the angle brackets, but the letters can be in either case. This  including the angle brackets, but the letters within can be in either case.
213  example sets multiline matching with CRLF as the line ending sequence:  This example sets multiline matching with CRLF as the line ending sequence:
214  .sp  .sp
215    /^abc/m<crlf>    /^abc/m<CRLF>
216  .sp  .sp
217  As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes  As well as turning on the PCRE_UTF8 option, the \fB/8\fP modifier also causes
218  any non-printing characters in output strings to be printed using the  any non-printing characters in output strings to be printed using the
# Line 235  There are yet more modifiers for control Line 254  There are yet more modifiers for control
254  operates.  operates.
255  .P  .P
256  The \fB/+\fP modifier requests that as well as outputting the substring that  The \fB/+\fP modifier requests that as well as outputting the substring that
257  matched the entire pattern, pcretest should in addition output the remainder of  matched the entire pattern, \fBpcretest\fP should in addition output the
258  the subject string. This is useful for tests where the subject contains  remainder of the subject string. This is useful for tests where the subject
259  multiple copies of the same substring.  contains multiple copies of the same substring. If the \fB+\fP modifier appears
260    twice, the same action is taken for captured substrings. In each case the
261    remainder is output on the following line with a plus character following the
262    capture number.
263    .P
264    The \fB/=\fP modifier requests that the values of all potential captured
265    parentheses be output after a match by \fBpcre_exec()\fP. By default, only
266    those up to the highest one actually used in the match are output
267    (corresponding to the return code from \fBpcre_exec()\fP). Values in the
268    offsets vector corresponding to higher numbers should be set to -1, and these
269    are output as "<unset>". This modifier gives a way of checking that this is
270    happening.
271  .P  .P
272  The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP  The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP
273  output a representation of the compiled byte code after compilation. Normally  output a representation of the compiled byte code after compilation. Normally
# Line 287  which it appears. Line 317  which it appears.
317  The \fB/M\fP modifier causes the size of memory block used to hold the compiled  The \fB/M\fP modifier causes the size of memory block used to hold the compiled
318  pattern to be output.  pattern to be output.
319  .P  .P
320  The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the  If the \fB/S\fP modifier appears once, it causes \fBpcre_study()\fP to be
321  expression has been compiled, and the results used when the expression is  called after the expression has been compiled, and the results used when the
322  matched.  expression is matched. If \fB/S\fP appears twice, it suppresses studying, even
323    if it was requested externally by the \fB-s\fP command line option. This makes
324    it possible to specify that certain patterns are always studied, and others are
325    never studied, independently of \fB-s\fP. This feature is used in the test
326    files in a few cases where the output is different when the pattern is studied.
327  .P  .P
328  The \fB/T\fP modifier must be followed by a single digit. It causes a specific  The \fB/T\fP modifier must be followed by a single digit. It causes a specific
329  set of built-in character tables to be passed to \fBpcre_compile()\fP. It is  set of built-in character tables to be passed to \fBpcre_compile()\fP. It is
# Line 327  ignored. Line 361  ignored.
361  .rs  .rs
362  .sp  .sp
363  Before each data line is passed to \fBpcre_exec()\fP, leading and trailing  Before each data line is passed to \fBpcre_exec()\fP, leading and trailing
364  whitespace is removed, and it is then scanned for \e escapes. Some of these are  white space is removed, and it is then scanned for \e escapes. Some of these
365  pretty esoteric features, intended for checking out some of the more  are pretty esoteric features, intended for checking out some of the more
366  complicated features of PCRE. If you are just testing "ordinary" regular  complicated features of PCRE. If you are just testing "ordinary" regular
367  expressions, you probably don't need any of these. The following escapes are  expressions, you probably don't need any of these. The following escapes are
368  recognized:  recognized:
# Line 336  recognized: Line 370  recognized:
370    \ea         alarm (BEL, \ex07)    \ea         alarm (BEL, \ex07)
371    \eb         backspace (\ex08)    \eb         backspace (\ex08)
372    \ee         escape (\ex27)    \ee         escape (\ex27)
373    \ef         formfeed (\ex0c)    \ef         form feed (\ex0c)
374    \en         newline (\ex0a)    \en         newline (\ex0a)
375  .\" JOIN  .\" JOIN
376    \eqdd       set the PCRE_MATCH_LIMIT limit to dd    \eqdd       set the PCRE_MATCH_LIMIT limit to dd
# Line 507  found. This is always the shortest possi Line 541  found. This is always the shortest possi
541  This section describes the output when the normal matching function,  This section describes the output when the normal matching function,
542  \fBpcre_exec()\fP, is being used.  \fBpcre_exec()\fP, is being used.
543  .P  .P
544  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, \fBpcretest\fP outputs the list of captured substrings
545  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched  that \fBpcre_exec()\fP returns, starting with number 0 for the string that
546  the whole pattern. Otherwise, it outputs "No match" when the return is  matched the whole pattern. Otherwise, it outputs "No match" when the return is
547  PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching  PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching
548  substring when \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL. (Note that this is  substring when \fBpcre_exec()\fP returns PCRE_ERROR_PARTIAL. (Note that this is
549  the entire substring that was inspected during the partial match; it may  the entire substring that was inspected during the partial match; it may
550  include characters before the actual match start if a lookbehind assertion,  include characters before the actual match start if a lookbehind assertion,
551  \eK, \eb, or \eB was involved.) For any other returns, it outputs the PCRE  \eK, \eb, or \eB was involved.) For any other return, \fBpcretest\fP outputs
552  negative error number. Here is an example of an interactive \fBpcretest\fP run.  the PCRE negative error number and a short descriptive phrase. If the error is
553    a failed UTF-8 string check, the byte offset of the start of the failing
554    character and the reason code are also output, provided that the size of the
555    output vector is at least two. Here is an example of an interactive
556    \fBpcretest\fP run.
557  .sp  .sp
558    $ pcretest    $ pcretest
559    PCRE version 7.0 30-Nov-2006    PCRE version 8.13 2011-04-30
560  .sp  .sp
561      re> /^abc(\ed+)/      re> /^abc(\ed+)/
562    data> abc123    data> abc123
# Line 527  negative error number. Here is an exampl Line 565  negative error number. Here is an exampl
565    data> xyz    data> xyz
566    No match    No match
567  .sp  .sp
568  Note that unset capturing substrings that are not followed by one that is set  Unset capturing substrings that are not followed by one that is set are not
569  are not returned by \fBpcre_exec()\fP, and are not shown by \fBpcretest\fP. In  returned by \fBpcre_exec()\fP, and are not shown by \fBpcretest\fP. In the
570  the following example, there are two capturing substrings, but when the first  following example, there are two capturing substrings, but when the first data
571  data line is matched, the second, unset substring is not shown. An "internal"  line is matched, the second, unset substring is not shown. An "internal" unset
572  unset substring is shown as "<unset>", as for the second data line.  substring is shown as "<unset>", as for the second data line.
573  .sp  .sp
574      re> /(a)|(b)/      re> /(a)|(b)/
575    data> a    data> a
# Line 565  matching attempts are output in sequence Line 603  matching attempts are output in sequence
603     0: ipp     0: ipp
604     1: pp     1: pp
605  .sp  .sp
606  "No match" is output only if the first match attempt fails.  "No match" is output only if the first match attempt fails. Here is an example
607    of a failure message (the offset 4 that is specified by \e>4 is past the end of
608    the subject string):
609    .sp
610        re> /xyz/
611      data> xyz\>4
612      Error -24 (bad offset value)
613  .P  .P
614  If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a  If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a
615  data line that is successfully matched, the substrings extracted by the  data line that is successfully matched, the substrings extracted by the
# Line 702  function to distinguish printing and non Line 746  function to distinguish printing and non
746  .rs  .rs
747  .sp  .sp
748  The facilities described in this section are not available when the POSIX  The facilities described in this section are not available when the POSIX
749  inteface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is  interface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
750  specified.  specified.
751  .P  .P
752  When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a  When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a
# Line 726  exact copy of the compiled pattern. If t Line 770  exact copy of the compiled pattern. If t
770  follows immediately after the compiled pattern. After writing the file,  follows immediately after the compiled pattern. After writing the file,
771  \fBpcretest\fP expects to read a new pattern.  \fBpcretest\fP expects to read a new pattern.
772  .P  .P
773  A saved pattern can be reloaded into \fBpcretest\fP by specifing < and a file  A saved pattern can be reloaded into \fBpcretest\fP by specifying < and a file
774  name instead of a pattern. The name of the file must not contain a < character,  name instead of a pattern. The name of the file must not contain a < character,
775  as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <  as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
776  characters.  characters.
777  For example:  For example:
778  .sp  .sp
779     re> </some/file     re> </some/file
780    Compiled regex loaded from /some/file    Compiled pattern loaded from /some/file
781    No study data    No study data
782  .sp  .sp
783  When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in  When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
# Line 779  Cambridge CB2 3QH, England. Line 823  Cambridge CB2 3QH, England.
823  .rs  .rs
824  .sp  .sp
825  .nf  .nf
826  Last updated: 21 November 2010  Last updated: 20 July 2011
827  Copyright (c) 1997-2010 University of Cambridge.  Copyright (c) 1997-2011 University of Cambridge.
828  .fi  .fi

Legend:
Removed from v.579  
changed lines
  Added in v.626

  ViewVC Help
Powered by ViewVC 1.1.5