/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 53 by nigel, Sat Feb 24 21:39:42 2007 UTC revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC
# Line 6  pcretest - a program for testing Perl-co Line 6  pcretest - a program for testing Perl-co
6    
7  \fBpcretest\fR was written as a test program for the PCRE regular expression  \fBpcretest\fR was written as a test program for the PCRE regular expression
8  library itself, but it can also be used for experimenting with regular  library itself, but it can also be used for experimenting with regular
9  expressions. This man page describes the features of the test program; for  expressions. This document describes the features of the test program; for
10  details of the regular expressions themselves, see the \fBpcre\fR man page.  details of the regular expressions themselves, see the
11    .\" HREF
12    \fBpcrepattern\fR
13    .\"
14    documentation. For details of PCRE and its options, see the
15    .\" HREF
16    \fBpcreapi\fR
17    .\"
18    documentation.
19    
20  .SH OPTIONS  .SH OPTIONS
21    .rs
22    .sp
23    .TP 10
24    \fB-C\fR
25    Output the version number of the PCRE library, and all available information
26    about the optional features that are included, and then exit.
27  .TP 10  .TP 10
28  \fB-d\fR  \fB-d\fR
29  Behave as if each regex had the \fB/D\fR modifier (see below); the internal  Behave as if each regex had the \fB/D\fR modifier (see below); the internal
# Line 35  Behave as if each regex has \fB/P\fR mod Line 49  Behave as if each regex has \fB/P\fR mod
49  to call PCRE. None of the other options has any effect when \fB-p\fR is set.  to call PCRE. None of the other options has any effect when \fB-p\fR is set.
50  .TP 10  .TP 10
51  \fB-t\fR  \fB-t\fR
52  Run each compile, study, and match 20000 times with a timer, and output  Run each compile, study, and match many times with a timer, and output
53  resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with  resulting time per compile or match (in milliseconds). Do not set \fB-t\fR with
54  \fB-m\fR, because you will then get the size output 20000 times and the timing  \fB-m\fR, because you will then get the size output 20000 times and the timing
55  will be distorted.  will be distorted.
56    
   
57  .SH DESCRIPTION  .SH DESCRIPTION
58    .rs
59    .sp
60  If \fBpcretest\fR is given two filename arguments, it reads from the first and  If \fBpcretest\fR is given two filename arguments, it reads from the first and
61  writes to the second. If it is given only one filename argument, it reads from  writes to the second. If it is given only one filename argument, it reads from
62  that file and writes to stdout. Otherwise, it reads from stdin and writes to  that file and writes to stdout. Otherwise, it reads from stdin and writes to
# Line 51  expressions, and "data>" to prompt for d Line 65  expressions, and "data>" to prompt for d
65    
66  The program handles any number of sets of input on a single input file. Each  The program handles any number of sets of input on a single input file. Each
67  set starts with a regular expression, and continues with any number of data  set starts with a regular expression, and continues with any number of data
68  lines to be matched against the pattern. An empty line signals the end of the  lines to be matched against the pattern.
69  data lines, at which point a new regular expression is read. The regular  
70  expressions are given enclosed in any non-alphameric delimiters other than  Each line is matched separately and independently. If you want to do
71  backslash, for example  multiple-line matches, you have to use the \\n escape sequence in a single line
72    of input to encode the newline characters. The maximum length of data line is
73    30,000 characters.
74    
75    An empty line signals the end of the data lines, at which point a new regular
76    expression is read. The regular expressions are given enclosed in any
77    non-alphameric delimiters other than backslash, for example
78    
79    /(a|bc)x+yz/    /(a|bc)x+yz/
80    
# Line 81  backslash, because Line 101  backslash, because
101  is interpreted as the first line of a pattern that starts with "abc/", causing  is interpreted as the first line of a pattern that starts with "abc/", causing
102  pcretest to read the next line as a continuation of the regular expression.  pcretest to read the next line as a continuation of the regular expression.
103    
   
104  .SH PATTERN MODIFIERS  .SH PATTERN MODIFIERS
105    .rs
106    .sp
107  The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the  The pattern may be followed by \fBi\fR, \fBm\fR, \fBs\fR, or \fBx\fR to set the
108  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options,
109  respectively. For example:  respectively. For example:
# Line 91  respectively. For example: Line 111  respectively. For example:
111    /caseless/i    /caseless/i
112    
113  These modifier letters have the same effect as they do in Perl. There are  These modifier letters have the same effect as they do in Perl. There are
114  others which set PCRE options that do not correspond to anything in Perl:  others that set PCRE options that do not correspond to anything in Perl:
115  \fB/A\fR, \fB/E\fR, and \fB/X\fR set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and  \fB/A\fR, \fB/E\fR, \fB/N\fR, \fB/U\fR, and \fB/X\fR set PCRE_ANCHORED,
116  PCRE_EXTRA respectively.  PCRE_DOLLAR_ENDONLY, PCRE_NO_AUTO_CAPTURE, PCRE_UNGREEDY, and PCRE_EXTRA
117    respectively.
118    
119  Searching for all possible matches within each subject string can be requested  Searching for all possible matches within each subject string can be requested
120  by the \fB/g\fR or \fB/G\fR modifier. After finding a match, PCRE is called  by the \fB/g\fR or \fB/G\fR modifier. After finding a match, PCRE is called
# Line 138  studied, the results of that are also ou Line 159  studied, the results of that are also ou
159    
160  The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.  The \fB/D\fR modifier is a PCRE debugging feature, which also assumes \fB/I\fR.
161  It causes the internal form of compiled regular expressions to be output after  It causes the internal form of compiled regular expressions to be output after
162  compilation.  compilation. If the pattern was studied, the information returned is also
163    output.
164    
165  The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the  The \fB/S\fR modifier causes \fBpcre_study()\fR to be called after the
166  expression has been compiled, and the results used when the expression is  expression has been compiled, and the results used when the expression is
# Line 154  present, and REG_NEWLINE is set if \fB/m Line 176  present, and REG_NEWLINE is set if \fB/m
176  force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.  force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
177    
178  The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8  The \fB/8\fR modifier causes \fBpcretest\fR to call PCRE with the PCRE_UTF8
179  option set. This turns on the (currently incomplete) support for UTF-8  option set. This turns on support for UTF-8 character handling in PCRE,
180  character handling in PCRE, provided that it was compiled with this support  provided that it was compiled with this support enabled. This modifier also
181  enabled. This modifier also causes any non-printing characters in output  causes any non-printing characters in output strings to be printed using the
182  strings to be printed using the \\x{hh...} notation if they are valid UTF-8  \\x{hh...} notation if they are valid UTF-8 sequences.
183  sequences.  
184    If the \fB/?\fR modifier is used with \fB/8\fR, it causes \fBpcretest\fR to
185    call \fBpcre_compile()\fR with the PCRE_NO_UTF8_CHECK option, to suppress the
186    checking of the string for UTF-8 validity.
187    
188    .SH CALLOUTS
189    .rs
190    .sp
191    If the pattern contains any callout requests, \fBpcretest\fR's callout function
192    will be called. By default, it displays the callout number, and the start and
193    current positions in the text at the callout time. For example, the output
194    
195      --->pqrabcdef
196        0    ^  ^
197    
198    indicates that callout number 0 occurred for a match attempt starting at the
199    fourth character of the subject string, when the pointer was at the seventh
200    character. The callout function returns zero (carry on matching) by default.
201    
202    Inserting callouts may be helpful when using \fBpcretest\fR to check
203    complicated regular expressions. For further information about callouts, see
204    the
205    .\" HREF
206    \fBpcrecallout\fR
207    .\"
208    documentation.
209    
210    For testing the PCRE library, additional control of callout behaviour is
211    available via escape sequences in the data, as described in the following
212    section. In particular, it is possible to pass in a number as callout data (the
213    default is zero). If the callout function receives a non-zero number, it
214    returns that value instead of zero.
215    
216  .SH DATA LINES  .SH DATA LINES
217    .rs
218    .sp
219  Before each data line is passed to \fBpcre_exec()\fR, leading and trailing  Before each data line is passed to \fBpcre_exec()\fR, leading and trailing
220  whitespace is removed, and it is then scanned for \\ escapes. The following are  whitespace is removed, and it is then scanned for \\ escapes. Some of these are
221    pretty esoteric features, intended for checking out some of the more
222    complicated features of PCRE. If you are just testing "ordinary" regular
223    expressions, you probably don't need any of these. The following escapes are
224  recognized:  recognized:
225    
226    \\a         alarm (= BEL)    \\a         alarm (= BEL)
# Line 177  recognized: Line 233  recognized:
233    \\v         vertical tab    \\v         vertical tab
234    \\nnn       octal character (up to 3 octal digits)    \\nnn       octal character (up to 3 octal digits)
235    \\xhh       hexadecimal character (up to 2 hex digits)    \\xhh       hexadecimal character (up to 2 hex digits)
236    \\x{hh...}  hexadecimal UTF-8 character    \\x{hh...}  hexadecimal character, any number of digits
237                   in UTF-8 mode
238    \\A         pass the PCRE_ANCHORED option to \fBpcre_exec()\fR    \\A         pass the PCRE_ANCHORED option to \fBpcre_exec()\fR
239    \\B         pass the PCRE_NOTBOL option to \fBpcre_exec()\fR    \\B         pass the PCRE_NOTBOL option to \fBpcre_exec()\fR
240    \\Cdd       call pcre_copy_substring() for substring dd    \\Cdd       call pcre_copy_substring() for substring dd
241                  after a successful match (any decimal number                 after a successful match (any decimal number
242                  less than 32)                 less than 32)
243      \\Cname     call pcre_copy_named_substring() for substring
244                   "name" after a successful match (name termin-
245                   ated by next non alphanumeric character)
246      \\C+        show the current captured substrings at callout
247                   time
248      \\C-        do not supply a callout function
249      \\C!n       return 1 instead of 0 when callout number n is
250                   reached
251      \\C!n!m     return 1 instead of 0 when callout number n is
252                   reached for the nth time
253      \\C*n       pass the number n (may be negative) as callout
254                   data
255    \\Gdd       call pcre_get_substring() for substring dd    \\Gdd       call pcre_get_substring() for substring dd
256                  after a successful match (any decimal number                 after a successful match (any decimal number
257                  less than 32)                 less than 32)
258      \\Gname     call pcre_get_named_substring() for substring
259                   "name" after a successful match (name termin-
260                   ated by next non-alphanumeric character)
261    \\L         call pcre_get_substringlist() after a    \\L         call pcre_get_substringlist() after a
262                  successful match                 successful match
263      \\M         discover the minimum MATCH_LIMIT setting
264    \\N         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR    \\N         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fR
265    \\Odd       set the size of the output vector passed to    \\Odd       set the size of the output vector passed to
266                  \fBpcre_exec()\fR to dd (any number of decimal                 \fBpcre_exec()\fR to dd (any number of decimal
267                  digits)                 digits)
268    \\Z         pass the PCRE_NOTEOL option to \fBpcre_exec()\fR    \\Z         pass the PCRE_NOTEOL option to \fBpcre_exec()\fR
269      \\?         pass the PCRE_NO_UTF8_CHECK option to
270                   \fBpcre_exec()\fR
271    
272    If \\M is present, \fBpcretest\fR calls \fBpcre_exec()\fR several times, with
273    different values in the \fImatch_limit\fR field of the \fBpcre_extra\fR data
274    structure, until it finds the minimum number that is needed for
275    \fBpcre_exec()\fR to complete. This number is a measure of the amount of
276    recursion and backtracking that takes place, and checking it out can be
277    instructive. For most simple matches, the number is quite small, but for
278    patterns with very large numbers of matching possibilities, it can become large
279    very quickly with increasing length of subject string.
280    
281  When \\O is used, it may be higher or lower than the size set by the \fB-O\fR  When \\O is used, it may be higher or lower than the size set by the \fB-O\fR
282  option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR  option (or defaulted to 45); \\O applies only to the call of \fBpcre_exec()\fR
# Line 212  of the \fB/8\fR modifier on the pattern. Line 295  of the \fB/8\fR modifier on the pattern.
295  any number of hexadecimal digits inside the braces. The result is from one to  any number of hexadecimal digits inside the braces. The result is from one to
296  six bytes, encoded according to the UTF-8 rules.  six bytes, encoded according to the UTF-8 rules.
297    
   
298  .SH OUTPUT FROM PCRETEST  .SH OUTPUT FROM PCRETEST
299    .rs
300    .sp
301  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
302  \fBpcre_exec()\fR returns, starting with number 0 for the string that matched  \fBpcre_exec()\fR returns, starting with number 0 for the string that matched
303  the whole pattern. Here is an example of an interactive pcretest run.  the whole pattern. Here is an example of an interactive pcretest run.
304    
305    $ pcretest    $ pcretest
306    PCRE version 2.06 08-Jun-1999    PCRE version 4.00 08-Jan-2003
307    
308      re> /^abc(\\d+)/      re> /^abc(\\d+)/
309    data> abc123    data> abc123
# Line 265  Note that while patterns can be continue Line 348  Note that while patterns can be continue
348  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
349  included in data by means of the \\n escape.  included in data by means of the \\n escape.
350    
   
351  .SH AUTHOR  .SH AUTHOR
352    .rs
353    .sp
354  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
355  .br  .br
356  University Computing Service,  University Computing Service,
357  .br  .br
 New Museums Site,  
 .br  
358  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
 .br  
 Phone: +44 1223 334714  
359    
360  Last updated: 15 August 2001  .in 0
361    Last updated: 20 August 2003
362  .br  .br
363  Copyright (c) 1997-2001 University of Cambridge.  Copyright (c) 1997-2003 University of Cambridge.

Legend:
Removed from v.53  
changed lines
  Added in v.71

  ViewVC Help
Powered by ViewVC 1.1.5