/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 53 by nigel, Sat Feb 24 21:39:42 2007 UTC revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC
# Line 3  NAME Line 3  NAME
3       expressions.       expressions.
4    
5    
   
6  SYNOPSIS  SYNOPSIS
7       pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source]  [des-       pcretest [-d] [-i] [-m] [-o osize] [-p] [-t] [source]  [des-
8       tination]       tination]
9    
10       pcretest was written as a test program for the PCRE  regular       pcretest was written as a test program for the PCRE  regular
11       expression  library  itself,  but  it  can  also be used for       expression  library  itself,  but  it  can  also be used for
12       experimenting  with  regular  expressions.  This  man   page       experimenting  with  regular  expressions.   This   document
13       describes  the  features of the test program; for details of       describes  the  features of the test program; for details of
14       the regular expressions themselves, see the pcre man page.       the regular  expressions  themselves,  see  the  pcrepattern
15         documentation.  For details of PCRE and its options, see the
16         pcreapi documentation.
17    
18    
19  OPTIONS  OPTIONS
20    
21    
22         -C        Output the version number of the PCRE library, and
23                   all   available  information  about  the  optional
24                   features that are included, and then exit.
25    
26       -d        Behave as if each regex had the /D  modifier  (see       -d        Behave as if each regex had the /D  modifier  (see
27                 below); the internal form is output after compila-                 below); the internal form is output after compila-
28                 tion.                 tion.
# Line 42  OPTIONS Line 48  OPTIONS
48                 wrapper  API  is  used  to  call PCRE. None of the                 wrapper  API  is  used  to  call PCRE. None of the
49                 other options has any effect when -p is set.                 other options has any effect when -p is set.
50    
51       -t        Run each compile, study,  and  match  20000  times       -t        Run each compile, study, and match many times with
52                 with  a  timer, and output resulting time per com-                 a  timer, and output resulting time per compile or
53                 pile or match (in milliseconds).  Do  not  set  -t                 match (in milliseconds). Do not set  -t  with  -m,
54                 with -m, because you will then get the size output                 because  you  will  then get the size output 20000
55                 20000 times and the timing will be distorted.                 times and the timing will be distorted.
   
56    
57    
58  DESCRIPTION  DESCRIPTION
59    
60       If pcretest is given two filename arguments, it  reads  from       If pcretest is given two filename arguments, it  reads  from
61       the  first and writes to the second. If it is given only one       the  first and writes to the second. If it is given only one
   
   
   
   
 SunOS 5.8                 Last change:                          1  
   
   
   
62       filename argument, it reads from that  file  and  writes  to       filename argument, it reads from that  file  and  writes  to
63       stdout. Otherwise, it reads from stdin and writes to stdout,       stdout. Otherwise, it reads from stdin and writes to stdout,
64       and prompts for each line of input, using  "re>"  to  prompt       and prompts for each line of input, using  "re>"  to  prompt
# Line 70  SunOS 5.8                 Last change: Line 68  SunOS 5.8                 Last change:
68       The program handles any number of sets of input on a  single       The program handles any number of sets of input on a  single
69       input  file.  Each set starts with a regular expression, and       input  file.  Each set starts with a regular expression, and
70       continues with any  number  of  data  lines  to  be  matched       continues with any  number  of  data  lines  to  be  matched
71       against  the  pattern.  An empty line signals the end of the       against the pattern.
72       data lines, at which point a new regular expression is read.  
73       The  regular  expressions  are  given  enclosed  in any non-       Each line is matched separately and  independently.  If  you
74       alphameric delimiters other than backslash, for example       want  to  do  multiple-line  matches, you have to use the \n
75         escape sequence in a single line of input to encode the new-
76         line  characters.  The maximum length of data line is 30,000
77         characters.
78    
79         An empty line signals the end of the data  lines,  at  which
80         point  a new regular expression is read. The regular expres-
81         sions are given enclosed in  any  non-alphameric  delimiters
82         other than backslash, for example
83    
84         /(a|bc)x+yz/         /(a|bc)x+yz/
85    
# Line 104  SunOS 5.8                 Last change: Line 110  SunOS 5.8                 Last change:
110       continuation of the regular expression.       continuation of the regular expression.
111    
112    
   
113  PATTERN MODIFIERS  PATTERN MODIFIERS
114    
115       The pattern may be followed by i, m, s,  or  x  to  set  the       The pattern may be followed by i, m, s,  or  x  to  set  the
116       PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED       PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED
117       options, respectively. For example:       options, respectively. For example:
# Line 113  PATTERN MODIFIERS Line 119  PATTERN MODIFIERS
119         /caseless/i         /caseless/i
120    
121       These modifier letters have the same effect as  they  do  in       These modifier letters have the same effect as  they  do  in
122       Perl.  There  are  others which set PCRE options that do not       Perl.  There  are  others  that set PCRE options that do not
123       correspond  to  anything  in  Perl:   /A,  /E,  and  /X  set       correspond to anything in Perl:  /A, /E, /N, /U, and /X  set
124       PCRE_ANCHORED,  PCRE_DOLLAR_ENDONLY,  and PCRE_EXTRA respec-       PCRE_ANCHORED,   PCRE_DOLLAR_ENDONLY,  PCRE_NO_AUTO_CAPTURE,
125       tively.       PCRE_UNGREEDY, and PCRE_EXTRA respectively.
126    
127       Searching for  all  possible  matches  within  each  subject       Searching for  all  possible  matches  within  each  subject
128       string  can  be  requested  by  the /g or /G modifier. After       string  can  be  requested  by  the /g or /G modifier. After
# Line 165  PATTERN MODIFIERS Line 171  PATTERN MODIFIERS
171       pcre_fullinfo()  after  compiling an expression, and output-       pcre_fullinfo()  after  compiling an expression, and output-
172       ting the information it gets back. If the  pattern  is  stu-       ting the information it gets back. If the  pattern  is  stu-
173       died, the results of that are also output.       died, the results of that are also output.
174    
175       The /D modifier is a  PCRE  debugging  feature,  which  also       The /D modifier is a  PCRE  debugging  feature,  which  also
176       assumes /I.  It causes the internal form of compiled regular       assumes /I.  It causes the internal form of compiled regular
177       expressions to be output after compilation.       expressions to be output after compilation. If  the  pattern
178         was studied, the information returned is also output.
179    
180       The /S modifier causes pcre_study() to be called  after  the       The /S modifier causes pcre_study() to be called  after  the
181       expression  has been compiled, and the results used when the       expression  has been compiled, and the results used when the
# Line 185  PATTERN MODIFIERS Line 193  PATTERN MODIFIERS
193       REG_NEWLINE is set.       REG_NEWLINE is set.
194    
195       The /8 modifier  causes  pcretest  to  call  PCRE  with  the       The /8 modifier  causes  pcretest  to  call  PCRE  with  the
196       PCRE_UTF8  option  set.  This turns on the (currently incom-       PCRE_UTF8  option set. This turns on support for UTF-8 char-
197       plete) support for UTF-8 character handling  in  PCRE,  pro-       acter handling in PCRE, provided that it was  compiled  with
198       vided  that  it was compiled with this support enabled. This       this  support  enabled.  This  modifier also causes any non-
199       modifier also causes any non-printing characters  in  output       printing characters in output strings to  be  printed  using
200       strings  to  be printed using the \x{hh...} notation if they       the \x{hh...} notation if they are valid UTF-8 sequences.
201       are valid UTF-8 sequences.  
202         If the /? modifier is used with /8, it  causes  pcretest  to
203         call  pcre_compile()  with the PCRE_NO_UTF8_CHECK option, to
204         suppress the checking of the string for UTF-8 validity.
205    
206    
207    CALLOUTS
208    
209         If the pattern contains  any  callout  requests,  pcretest's
210         callout function will be called. By default, it displays the
211         callout number, and the start and current positions  in  the
212         text at the callout time. For example, the output
213    
214           --->pqrabcdef
215             0    ^  ^
216    
217         indicates that callout number 0 occurred for a match attempt
218         starting at the fourth character of the subject string, when
219         the pointer was at the seventh character. The callout  func-
220         tion returns zero (carry on matching) by default.
221    
222         Inserting callouts may be helpful  when  using  pcretest  to
223         check  complicated regular expressions. For further informa-
224         tion about callouts, see the pcrecallout documentation.
225    
226         For testing the PCRE library, additional control of  callout
227         behaviour  is available via escape sequences in the data, as
228         described in the following section.  In  particular,  it  is
229         possible to pass in a number as callout data (the default is
230         zero). If the callout function receives a  non-zero  number,
231         it returns that value instead of zero.
232    
233    
234  DATA LINES  DATA LINES
235    
236       Before each data line is passed to pcre_exec(), leading  and       Before each data line is passed to pcre_exec(), leading  and
237       trailing whitespace is removed, and it is then scanned for \       trailing whitespace is removed, and it is then scanned for \
238       escapes. The following are recognized:       escapes.  Some  of  these  are  pretty  esoteric   features,
239         intended  for  checking  out  some  of  the more complicated
240         features of PCRE. If you are just testing "ordinary" regular
241         expressions,  you probably don't need any of these. The fol-
242         lowing escapes are recognized:
243    
244         \a         alarm (= BEL)         \a         alarm (= BEL)
245         \b         backspace         \b         backspace
# Line 209  DATA LINES Line 251  DATA LINES
251         \v         vertical tab         \v         vertical tab
252         \nnn       octal character (up to 3 octal digits)         \nnn       octal character (up to 3 octal digits)
253         \xhh       hexadecimal character (up to 2 hex digits)         \xhh       hexadecimal character (up to 2 hex digits)
254         \x{hh...}  hexadecimal UTF-8 character         \x{hh...}  hexadecimal character, any number of digits
255                        in UTF-8 mode
256         \A         pass the PCRE_ANCHORED option to pcre_exec()         \A         pass the PCRE_ANCHORED option to pcre_exec()
257         \B         pass the PCRE_NOTBOL option to pcre_exec()         \B         pass the PCRE_NOTBOL option to pcre_exec()
258         \Cdd       call pcre_copy_substring() for substring dd         \Cdd       call pcre_copy_substring() for substring dd
259                       after a successful match (any decimal number                      after a successful match (any decimal number
260                       less than 32)                      less than 32)
261           \Cname     call pcre_copy_named_substring() for substring
262    
263                        "name" after a successful match (name termin-
264                        ated by next non alphanumeric character)
265           \C+        show the current captured substrings at callout
266                        time
267           \C-        do not supply a callout function
268           \C!n       return 1 instead of 0 when callout number n is
269                        reached
270           \C!n!m     return 1 instead of 0 when callout number n is
271                        reached for the nth time
272           \C*n       pass the number n (may be negative) as callout
273                        data
274         \Gdd       call pcre_get_substring() for substring dd         \Gdd       call pcre_get_substring() for substring dd
275                        after a successful match (any decimal number
276                       after a successful match (any decimal number                      less than 32)
277                       less than 32)         \Gname     call pcre_get_named_substring() for substring
278                        "name" after a successful match (name termin-
279                        ated by next non-alphanumeric character)
280         \L         call pcre_get_substringlist() after a         \L         call pcre_get_substringlist() after a
281                       successful match                      successful match
282           \M         discover the minimum MATCH_LIMIT setting
283         \N         pass the PCRE_NOTEMPTY option to pcre_exec()         \N         pass the PCRE_NOTEMPTY option to pcre_exec()
284         \Odd       set the size of the output vector passed to         \Odd       set the size of the output vector passed to
285                       pcre_exec() to dd (any number of decimal                      pcre_exec() to dd (any number of decimal
286                       digits)                      digits)
287         \Z         pass the PCRE_NOTEOL option to pcre_exec()         \Z         pass the PCRE_NOTEOL option to pcre_exec()
288           \?         pass the PCRE_NO_UTF8_CHECK option to
289                        pcre_exec()
290    
291         If \M is present, pcretest calls pcre_exec() several  times,
292         with  different  values  in  the  match_limit  field  of the
293         pcre_extra data structure, until it finds the minimum number
294         that is needed for pcre_exec() to complete. This number is a
295         measure of the amount of  recursion  and  backtracking  that
296         takes  place,  and  checking  it out can be instructive. For
297         most simple matches, the number is quite small, but for pat-
298         terns  with very large numbers of matching possibilities, it
299         can become large very quickly with increasing length of sub-
300         ject string.
301    
302       When \O is used, it may be higher or lower than the size set       When \O is used, it may be higher or lower than the size set
303       by  the  -O  option (or defaulted to 45); \O applies only to       by  the  -O  option (or defaulted to 45); \O applies only to
# Line 241  DATA LINES Line 312  DATA LINES
312       API  to  be  used,  only  B,  and Z have any effect, causing       API  to  be  used,  only  B,  and Z have any effect, causing
313       REG_NOTBOL and REG_NOTEOL to be passed to regexec()  respec-       REG_NOTBOL and REG_NOTEOL to be passed to regexec()  respec-
314       tively.       tively.
   
315       The use of \x{hh...} to represent UTF-8  characters  is  not       The use of \x{hh...} to represent UTF-8  characters  is  not
316       dependent  on  the use of the /8 modifier on the pattern. It       dependent  on  the use of the /8 modifier on the pattern. It
317       is recognized always. There may be any number of hexadecimal       is recognized always. There may be any number of hexadecimal
# Line 249  DATA LINES Line 319  DATA LINES
319       bytes, encoded according to the UTF-8 rules.       bytes, encoded according to the UTF-8 rules.
320    
321    
   
322  OUTPUT FROM PCRETEST  OUTPUT FROM PCRETEST
323    
324       When a match succeeds, pcretest outputs the list of captured       When a match succeeds, pcretest outputs the list of captured
325       substrings  that pcre_exec() returns, starting with number 0       substrings  that pcre_exec() returns, starting with number 0
326       for the string that matched the whole pattern.  Here  is  an       for the string that matched the whole pattern.  Here  is  an
327       example of an interactive pcretest run.       example of an interactive pcretest run.
328    
329         $ pcretest         $ pcretest
330         PCRE version 2.06 08-Jun-1999         PCRE version 4.00 08-Jan-2003
331    
332           re> /^abc(\d+)/           re> /^abc(\d+)/
333         data> abc123         data> abc123
# Line 307  OUTPUT FROM PCRETEST Line 377  OUTPUT FROM PCRETEST
377       of the \n escape.       of the \n escape.
378    
379    
   
380  AUTHOR  AUTHOR
381    
382       Philip Hazel <ph10@cam.ac.uk>       Philip Hazel <ph10@cam.ac.uk>
383       University Computing Service,       University Computing Service,
      New Museums Site,  
384       Cambridge CB2 3QG, England.       Cambridge CB2 3QG, England.
      Phone: +44 1223 334714  
385    
386       Last updated: 15 August 2001  Last updated: 20 August 2003
387       Copyright (c) 1997-2001 University of Cambridge.  Copyright (c) 1997-2003 University of Cambridge.

Legend:
Removed from v.53  
changed lines
  Added in v.71

  ViewVC Help
Powered by ViewVC 1.1.5