/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 878 by ph10, Sun Jan 15 15:44:47 2012 UTC revision 930 by ph10, Fri Feb 24 12:05:54 2012 UTC
# Line 111  COMMAND LINE OPTIONS Line 111  COMMAND LINE OPTIONS
111                   size megabytes.                   size megabytes.
112    
113         -s or -s+ Behave as if each pattern  has  the  /S  modifier;  in  other         -s or -s+ Behave as if each pattern  has  the  /S  modifier;  in  other
114                   words,  force each pattern to be studied. If -s+ is used, the                   words,  force each pattern to be studied. If -s+ is used, all
115                   PCRE_STUDY_JIT_COMPILE flag is  passed  to  pcre[16]_study(),                   the JIT compile options are passed to pcre[16]_study(), caus-
116                   causing  just-in-time  optimization  to  be  set  up if it is                   ing  just-in-time  optimization  to be set up if it is avail-
117                   available. If the /I or /D option is  present  on  a  pattern                   able, for both full and partial matching. Specific  JIT  com-
118                   (requesting  output  about the compiled pattern), information                   pile options can be selected by following -s+ with a digit in
119                   about the result of studying is not included when studying is                   the range 1 to 7, which selects the JIT compile modes as fol-
120                   caused  only  by  -s  and neither -i nor -d is present on the                   lows:
121                   command line. This behaviour means that the output from tests  
122                   that  are run with and without -s should be identical, except                     1  normal match only
123                   when options that output information about the actual running                     2  soft partial match only
124                   of a match are set.                     3  normal match and soft partial match
125                       4  hard partial match only
126                   The  -M,  -t,  and  -tm options, which give information about                     6  soft and hard partial match
127                   resources used, are likely to produce different  output  with                     7  all three modes (default)
128                   and  without  -s.  Output may also differ if the /C option is  
129                   present on an individual pattern. This uses callouts to trace                   If  -s++  is used instead of -s+ (with or without a following
130                   the  the  matching process, and this may be different between                   digit), the text "(JIT)" is added to the  first  output  line
131                   studied and non-studied patterns.  If  the  pattern  contains                   after a match or no match when JIT-compiled code was actually
132                   (*MARK)  items  there  may  also be differences, for the same                   used.
133                   reason. The -s command line option can be overridden for spe-  
134                   cific  patterns that should never be studied (see the /S pat-         If the /I or /D option is present on a pattern (requesting output about
135                   tern modifier below).         the  compiled pattern), information about the result of studying is not
136           included when studying is caused only by -s and neither -i  nor  -d  is
137         -t        Run each compile, study, and match many times with  a  timer,         present  on the command line. This behaviour means that the output from
138                   and  output resulting time per compile or match (in millisec-         tests that are run with and without -s should be identical, except when
139                   onds). Do not set -m with -t, because you will then  get  the         options that output information about the actual running of a match are
140                   size  output  a  zillion  times,  and the timing will be dis-         set.
141                   torted. You can control the number  of  iterations  that  are  
142                   used  for timing by following -t with a number (as a separate         The -M, -t, and -tm options, which  give  information  about  resources
143           used,  are likely to produce different output with and without -s. Out-
144           put may also differ if the /C option is present on an  individual  pat-
145           tern.  This  uses  callouts to trace the the matching process, and this
146           may be different between studied and non-studied patterns. If the  pat-
147           tern contains (*MARK) items there may also be differences, for the same
148           reason. The -s command line option can be overridden for specific  pat-
149           terns that should never be studied (see the /S pattern modifier below).
150    
151           -t        Run  each  compile, study, and match many times with a timer,
152                     and output resulting time per compile or match (in  millisec-
153                     onds).  Do  not set -m with -t, because you will then get the
154                     size output a zillion times, and  the  timing  will  be  dis-
155                     torted.  You  can  control  the number of iterations that are
156                     used for timing by following -t with a number (as a  separate
157                   item on the command line). For example, "-t 1000" would iter-                   item on the command line). For example, "-t 1000" would iter-
158                   ate 1000 times. The default is to iterate 500000 times.                   ate 1000 times. The default is to iterate 500000 times.
159    
# Line 149  COMMAND LINE OPTIONS Line 163  COMMAND LINE OPTIONS
163    
164  DESCRIPTION  DESCRIPTION
165    
166         If pcretest is given two filename arguments, it reads  from  the  first         If  pcretest  is  given two filename arguments, it reads from the first
167         and writes to the second. If it is given only one filename argument, it         and writes to the second. If it is given only one filename argument, it
168         reads from that file and writes to stdout.  Otherwise,  it  reads  from         reads  from  that  file  and writes to stdout. Otherwise, it reads from
169         stdin  and  writes to stdout, and prompts for each line of input, using         stdin and writes to stdout, and prompts for each line of  input,  using
170         "re>" to prompt for regular expressions, and "data>" to prompt for data         "re>" to prompt for regular expressions, and "data>" to prompt for data
171         lines.         lines.
172    
173         When  pcretest  is  built,  a  configuration option can specify that it         When pcretest is built, a configuration  option  can  specify  that  it
174         should be linked with the libreadline library. When this  is  done,  if         should  be  linked  with the libreadline library. When this is done, if
175         the input is from a terminal, it is read using the readline() function.         the input is from a terminal, it is read using the readline() function.
176         This provides line-editing and history facilities. The output from  the         This  provides line-editing and history facilities. The output from the
177         -help option states whether or not readline() will be used.         -help option states whether or not readline() will be used.
178    
179         The program handles any number of sets of input on a single input file.         The program handles any number of sets of input on a single input file.
180         Each set starts with a regular expression, and continues with any  num-         Each  set starts with a regular expression, and continues with any num-
181         ber of data lines to be matched against the pattern.         ber of data lines to be matched against the pattern.
182    
183         Each  data line is matched separately and independently. If you want to         Each data line is matched separately and independently. If you want  to
184         do multi-line matches, you have to use the \n escape sequence (or \r or         do multi-line matches, you have to use the \n escape sequence (or \r or
185         \r\n, etc., depending on the newline setting) in a single line of input         \r\n, etc., depending on the newline setting) in a single line of input
186         to encode the newline sequences. There is no limit  on  the  length  of         to  encode  the  newline  sequences. There is no limit on the length of
187         data  lines;  the  input  buffer is automatically extended if it is too         data lines; the input buffer is automatically extended  if  it  is  too
188         small.         small.
189    
190         An empty line signals the end of the data lines, at which point  a  new         An  empty  line signals the end of the data lines, at which point a new
191         regular  expression is read. The regular expressions are given enclosed         regular expression is read. The regular expressions are given  enclosed
192         in any non-alphanumeric delimiters other than backslash, for example:         in any non-alphanumeric delimiters other than backslash, for example:
193    
194           /(a|bc)x+yz/           /(a|bc)x+yz/
195    
196         White space before the initial delimiter is ignored. A regular  expres-         White  space before the initial delimiter is ignored. A regular expres-
197         sion  may be continued over several input lines, in which case the new-         sion may be continued over several input lines, in which case the  new-
198         line characters are included within it. It is possible to  include  the         line  characters  are included within it. It is possible to include the
199         delimiter within the pattern by escaping it, for example         delimiter within the pattern by escaping it, for example
200    
201           /abc\/def/           /abc\/def/
202    
203         If  you  do  so, the escape and the delimiter form part of the pattern,         If you do so, the escape and the delimiter form part  of  the  pattern,
204         but since delimiters are always non-alphanumeric, this does not  affect         but  since delimiters are always non-alphanumeric, this does not affect
205         its  interpretation.   If the terminating delimiter is immediately fol-         its interpretation.  If the terminating delimiter is  immediately  fol-
206         lowed by a backslash, for example,         lowed by a backslash, for example,
207    
208           /abc/\           /abc/\
209    
210         then a backslash is added to the end of the pattern. This  is  done  to         then  a  backslash  is added to the end of the pattern. This is done to
211         provide  a  way of testing the error condition that arises if a pattern         provide a way of testing the error condition that arises if  a  pattern
212         finishes with a backslash, because         finishes with a backslash, because
213    
214           /abc\/           /abc\/
215    
216         is interpreted as the first line of a pattern that starts with  "abc/",         is  interpreted as the first line of a pattern that starts with "abc/",
217         causing pcretest to read the next line as a continuation of the regular         causing pcretest to read the next line as a continuation of the regular
218         expression.         expression.
219    
220    
221  PATTERN MODIFIERS  PATTERN MODIFIERS
222    
223         A pattern may be followed by any number of modifiers, which are  mostly         A  pattern may be followed by any number of modifiers, which are mostly
224         single  characters.  Following  Perl usage, these are referred to below         single characters. Following Perl usage, these are  referred  to  below
225         as, for example, "the /i modifier", even though the  delimiter  of  the         as,  for  example,  "the /i modifier", even though the delimiter of the
226         pattern  need  not always be a slash, and no slash is used when writing         pattern need not always be a slash, and no slash is used  when  writing
227         modifiers. White space may appear between the final  pattern  delimiter         modifiers.  White  space may appear between the final pattern delimiter
228         and the first modifier, and between the modifiers themselves.         and the first modifier, and between the modifiers themselves.
229    
230         The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,         The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,
231         PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com-         PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre[16]_com-
232         pile()  is  called. These four modifier letters have the same effect as         pile() is called. These four modifier letters have the same  effect  as
233         they do in Perl. For example:         they do in Perl. For example:
234    
235           /caseless/i           /caseless/i
236    
237         The following table shows additional modifiers for  setting  PCRE  com-         The  following  table  shows additional modifiers for setting PCRE com-
238         pile-time options that do not correspond to anything in Perl:         pile-time options that do not correspond to anything in Perl:
239    
240           /8              PCRE_UTF8           ) when using the 8-bit           /8              PCRE_UTF8           ) when using the 8-bit
# Line 248  PATTERN MODIFIERS Line 262  PATTERN MODIFIERS
262           /<bsr_anycrlf>  PCRE_BSR_ANYCRLF           /<bsr_anycrlf>  PCRE_BSR_ANYCRLF
263           /<bsr_unicode>  PCRE_BSR_UNICODE           /<bsr_unicode>  PCRE_BSR_UNICODE
264    
265         The  modifiers  that are enclosed in angle brackets are literal strings         The modifiers that are enclosed in angle brackets are  literal  strings
266         as shown, including the angle brackets, but the letters within  can  be         as  shown,  including the angle brackets, but the letters within can be
267         in  either case.  This example sets multiline matching with CRLF as the         in either case.  This example sets multiline matching with CRLF as  the
268         line ending sequence:         line ending sequence:
269    
270           /^abc/m<CRLF>           /^abc/m<CRLF>
271    
272         As well as turning on the PCRE_UTF8/16 option, the /8  modifier  causes         As  well  as turning on the PCRE_UTF8/16 option, the /8 modifier causes
273         all  non-printing  characters in output strings to be printed using the         all non-printing characters in output strings to be printed  using  the
274         \x{hh...} notation. Otherwise, those less than 0x100 are output in  hex         \x{hh...}  notation. Otherwise, those less than 0x100 are output in hex
275         without the curly brackets.         without the curly brackets.
276    
277         Full  details  of  the PCRE options are given in the pcreapi documenta-         Full details of the PCRE options are given in  the  pcreapi  documenta-
278         tion.         tion.
279    
280     Finding all matches in a string     Finding all matches in a string
281    
282         Searching for all possible matches within each subject  string  can  be         Searching  for  all  possible matches within each subject string can be
283         requested  by  the  /g  or  /G modifier. After finding a match, PCRE is         requested by the /g or /G modifier. After  finding  a  match,  PCRE  is
284         called again to search the remainder of the subject string. The differ-         called again to search the remainder of the subject string. The differ-
285         ence between /g and /G is that the former uses the startoffset argument         ence between /g and /G is that the former uses the startoffset argument
286         to pcre[16]_exec() to start searching at a new point within the  entire         to  pcre[16]_exec() to start searching at a new point within the entire
287         string  (which  is in effect what Perl does), whereas the latter passes         string (which is in effect what Perl does), whereas the  latter  passes
288         over a shortened substring. This makes a  difference  to  the  matching         over  a  shortened  substring.  This makes a difference to the matching
289         process if the pattern begins with a lookbehind assertion (including \b         process if the pattern begins with a lookbehind assertion (including \b
290         or \B).         or \B).
291    
292         If any call to pcre[16]_exec() in a /g or /G sequence matches an  empty         If  any call to pcre[16]_exec() in a /g or /G sequence matches an empty
293         string,  the  next  call  is  done  with  the PCRE_NOTEMPTY_ATSTART and         string, the next  call  is  done  with  the  PCRE_NOTEMPTY_ATSTART  and
294         PCRE_ANCHORED flags set in order  to  search  for  another,  non-empty,         PCRE_ANCHORED  flags  set  in  order  to search for another, non-empty,
295         match  at  the same point. If this second match fails, the start offset         match at the same point. If this second match fails, the  start  offset
296         is advanced, and the normal match is retried.  This  imitates  the  way         is  advanced,  and  the  normal match is retried. This imitates the way
297         Perl handles such cases when using the /g modifier or the split() func-         Perl handles such cases when using the /g modifier or the split() func-
298         tion. Normally, the start offset is advanced by one character,  but  if         tion.  Normally,  the start offset is advanced by one character, but if
299         the  newline  convention  recognizes CRLF as a newline, and the current         the newline convention recognizes CRLF as a newline,  and  the  current
300         character is CR followed by LF, an advance of two is used.         character is CR followed by LF, an advance of two is used.
301    
302     Other modifiers     Other modifiers
303    
304         There are yet more modifiers for controlling the way pcretest operates.         There are yet more modifiers for controlling the way pcretest operates.
305    
306         The /+ modifier requests that as well as outputting the substring  that         The  /+ modifier requests that as well as outputting the substring that
307         matched  the  entire  pattern,  pcretest  should in addition output the         matched the entire pattern, pcretest  should  in  addition  output  the
308         remainder of the subject string. This is useful  for  tests  where  the         remainder  of  the  subject  string. This is useful for tests where the
309         subject  contains multiple copies of the same substring. If the + modi-         subject contains multiple copies of the same substring. If the +  modi-
310         fier appears twice, the same action is taken for  captured  substrings.         fier  appears  twice, the same action is taken for captured substrings.
311         In  each case the remainder is output on the following line with a plus         In each case the remainder is output on the following line with a  plus
312         character following the capture number. Note that  this  modifier  must         character  following  the  capture number. Note that this modifier must
313         not immediately follow the /S modifier because /S+ has another meaning.         not immediately follow the /S modifier because /S+ and /S++ have  other
314           meanings.
315    
316         The  /=  modifier  requests  that  the values of all potential captured         The  /=  modifier  requests  that  the values of all potential captured
317         parentheses be output after a match. By default, only those up  to  the         parentheses be output after a match. By default, only those up  to  the
# Line 368  PATTERN MODIFIERS Line 383  PATTERN MODIFIERS
383         different when the pattern is studied.         different when the pattern is studied.
384    
385         If the /S modifier is immediately followed by a + character,  the  call         If the /S modifier is immediately followed by a + character,  the  call
386         to  pcre[16]_study()  is  made  with the PCRE_STUDY_JIT_COMPILE option,         to  pcre[16]_study() is made with all the JIT study options, requesting
387         requesting just-in-time optimization support if it is  available.  Note         just-in-time optimization support if it is available, for  both  normal
388         that  there  is  also  a  /+ modifier; it must not be given immediately         and  partial matching. If you want to restrict the JIT compiling modes,
389         after /S because this will be misinterpreted. If JIT studying  is  suc-         you can follow /S+ with a digit in the range 1 to 7:
390         cessful,  it  will  automatically  be used when pcre[16]_exec() is run,  
391         except when incompatible run-time options are specified. These  include           1  normal match only
392         the  partial  matching options; a complete list is given in the pcrejit           2  soft partial match only
393         documentation. See also the \J escape sequence below for a way of  set-           3  normal match and soft partial match
394         ting the size of the JIT stack.           4  hard partial match only
395             6  soft and hard partial match
396             7  all three modes (default)
397    
398           If /S++ is used instead of /S+ (with or without a following digit), the
399           text  "(JIT)"  is  added  to  the first output line after a match or no
400           match when JIT-compiled code was actually used.
401    
402           Note that there is also an independent /+  modifier;  it  must  not  be
403           given immediately after /S or /S+ because this will be misinterpreted.
404    
405           If JIT studying is successful, the compiled JIT code will automatically
406           be used when pcre[16]_exec() is run, except when incompatible  run-time
407           options are specified. For more details, see the pcrejit documentation.
408           See also the \J escape sequence below for a way of setting the size  of
409           the JIT stack.
410    
411         The  /T  modifier  must be followed by a single digit. It causes a spe-         The  /T  modifier  must be followed by a single digit. It causes a spe-
412         cific set of built-in character tables to be  passed  to  pcre[16]_com-         cific set of built-in character tables to be  passed  to  pcre[16]_com-
# Line 869  AUTHOR Line 899  AUTHOR
899    
900  REVISION  REVISION
901    
902         Last updated: 14 January 2012         Last updated: 21 February 2012
903         Copyright (c) 1997-2012 University of Cambridge.         Copyright (c) 1997-2012 University of Cambridge.

Legend:
Removed from v.878  
changed lines
  Added in v.930

  ViewVC Help
Powered by ViewVC 1.1.5