/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 654 by ph10, Tue Aug 2 11:00:40 2011 UTC revision 691 by ph10, Sun Sep 11 14:31:21 2011 UTC
# Line 71  COMMAND LINE OPTIONS Line 71  COMMAND LINE OPTIONS
71         -S size   On  Unix-like  systems, set the size of the run-time stack to         -S size   On  Unix-like  systems, set the size of the run-time stack to
72                   size megabytes.                   size megabytes.
73    
74         -s        Behave as if each pattern  has  the  /S  modifier;  in  other         -s or -s+ Behave as if each pattern  has  the  /S  modifier;  in  other
75                   words,  force  each  pattern  to  be studied. If the /I or /D                   words,  force each pattern to be studied. If -s+ is used, the
76                   option is present on a pattern (requesting output  about  the                   PCRE_STUDY_JIT_COMPILE flag is passed to pcre_study(),  caus-
77                   compiled  pattern),  information about the result of studying                   ing  just-in-time  optimization  to be set up if it is avail-
78                   is not included when studying is caused only by -s  and  nei-                   able. If the  /I  or  /D  option  is  present  on  a  pattern
79                   ther -i nor -d is present on the command line. This behaviour                   (requesting  output  about the compiled pattern), information
80                   means that the output from tests that are run with and  with-                   about the result of studying is not included when studying is
81                   out  -s  should be identical, except when options that output                   caused  only  by  -s  and neither -i nor -d is present on the
82                   information about the actual running of a match are set.  The                   command line. This behaviour means that the output from tests
83                   -M,  -t,  and  -tm  options,  which  give  information  about                   that  are run with and without -s should be identical, except
84                   resources used, are likely to produce different  output  with                   when options that output information about the actual running
85                   and  without  -s.  Output may also differ if the /C option is                   of  a  match are set. The -M, -t, and -tm options, which give
86                   present on an individual pattern. This uses callouts to trace                   information about resources used, are likely to produce  dif-
87                   the  the  matching process, and this may be different between                   ferent  output with and without -s. Output may also differ if
88                   studied and non-studied patterns.  If  the  pattern  contains                   the /C option is present on an individual pattern. This  uses
89                   (*MARK)  items  there  may  also be differences, for the same                   callouts  to  trace the the matching process, and this may be
90                   reason. The -s command line option can be overridden for spe-                   different between studied and non-studied  patterns.  If  the
91                   cific  patterns  that  should  never  be  studied (see the /S                   pattern contains (*MARK) items there may also be differences,
92                   option below).                   for the same reason. The -s command line option can be  over-
93                     ridden  for  specific  patterns  that should never be studied
94                     (see the /S pattern modifier below).
95    
96         -t        Run each compile, study, and match many times with  a  timer,         -t        Run each compile, study, and match many times with  a  timer,
97                   and  output resulting time per compile or match (in millisec-                   and  output resulting time per compile or match (in millisec-
# Line 245  PATTERN MODIFIERS Line 247  PATTERN MODIFIERS
247         subject contains multiple copies of the same substring. If the +  modi-         subject contains multiple copies of the same substring. If the +  modi-
248         fier  appears  twice, the same action is taken for captured substrings.         fier  appears  twice, the same action is taken for captured substrings.
249         In each case the remainder is output on the following line with a  plus         In each case the remainder is output on the following line with a  plus
250         character following the capture number.         character  following  the  capture number. Note that this modifier must
251           not immediately follow the /S modifier because /S+ has another meaning.
252    
253         The  /=  modifier  requests  that  the values of all potential captured         The /= modifier requests that the  values  of  all  potential  captured
254         parentheses be output after a match by pcre_exec().  By  default,  only         parentheses  be  output  after a match by pcre_exec(). By default, only
255         those up to the highest one actually used in the match are output (cor-         those up to the highest one actually used in the match are output (cor-
256         responding to the return code from pcre_exec()). Values in the  offsets         responding  to the return code from pcre_exec()). Values in the offsets
257         vector  corresponding  to higher numbers should be set to -1, and these         vector corresponding to higher numbers should be set to -1,  and  these
258         are output as "<unset>". This modifier gives a  way  of  checking  that         are  output  as  "<unset>".  This modifier gives a way of checking that
259         this is happening.         this is happening.
260    
261         The  /B modifier is a debugging feature. It requests that pcretest out-         The /B modifier is a debugging feature. It requests that pcretest  out-
262         put a representation of the compiled byte code after compilation.  Nor-         put  a representation of the compiled byte code after compilation. Nor-
263         mally  this  information contains length and offset values; however, if         mally this information contains length and offset values;  however,  if
264         /Z is also present, this data is replaced by spaces. This is a  special         /Z  is also present, this data is replaced by spaces. This is a special
265         feature for use in the automatic test scripts; it ensures that the same         feature for use in the automatic test scripts; it ensures that the same
266         output is generated for different internal link sizes.         output is generated for different internal link sizes.
267    
268         The /D modifier is a PCRE debugging feature, and is equivalent to  /BI,         The  /D modifier is a PCRE debugging feature, and is equivalent to /BI,
269         that is, both the /B and the /I modifiers.         that is, both the /B and the /I modifiers.
270    
271         The /F modifier causes pcretest to flip the byte order of the fields in         The /F modifier causes pcretest to flip the byte order of the fields in
272         the compiled pattern that  contain  2-byte  and  4-byte  numbers.  This         the  compiled  pattern  that  contain  2-byte  and 4-byte numbers. This
273         facility  is  for testing the feature in PCRE that allows it to execute         facility is for testing the feature in PCRE that allows it  to  execute
274         patterns that were compiled on a host with a different endianness. This         patterns that were compiled on a host with a different endianness. This
275         feature  is  not  available  when  the POSIX interface to PCRE is being         feature is not available when the POSIX  interface  to  PCRE  is  being
276         used, that is, when the /P pattern modifier is specified. See also  the         used,  that is, when the /P pattern modifier is specified. See also the
277         section about saving and reloading compiled patterns below.         section about saving and reloading compiled patterns below.
278    
279         The  /I  modifier  requests  that pcretest output information about the         The /I modifier requests that pcretest  output  information  about  the
280         compiled pattern (whether it is anchored, has a fixed first  character,         compiled  pattern (whether it is anchored, has a fixed first character,
281         and  so  on). It does this by calling pcre_fullinfo() after compiling a         and so on). It does this by calling pcre_fullinfo() after  compiling  a
282         pattern. If the pattern is studied, the results of that are  also  out-         pattern.  If  the pattern is studied, the results of that are also out-
283         put.         put.
284    
285         The  /K modifier requests pcretest to show names from backtracking con-         The /K modifier requests pcretest to show names from backtracking  con-
286         trol verbs that are returned  from  calls  to  pcre_exec().  It  causes         trol  verbs  that  are  returned  from  calls to pcre_exec(). It causes
287         pcretest  to create a pcre_extra block if one has not already been cre-         pcretest to create a pcre_extra block if one has not already been  cre-
288         ated by a call to pcre_study(), and to set the PCRE_EXTRA_MARK flag and         ated by a call to pcre_study(), and to set the PCRE_EXTRA_MARK flag and
289         the mark field within it, every time that pcre_exec() is called. If the         the mark field within it, every time that pcre_exec() is called. If the
290         variable that the mark field points to is non-NULL for  a  match,  non-         variable  that  the  mark field points to is non-NULL for a match, non-
291         match, or partial match, pcretest prints the string to which it points.         match, or partial match, pcretest prints the string to which it points.
292         For a match, this is shown on a line by itself, tagged with "MK:".  For         For a match, this is shown on a line by itself, tagged with "MK:".  For
293         a non-match it is added to the message.         a non-match it is added to the message.
294    
295         The  /L modifier must be followed directly by the name of a locale, for         The /L modifier must be followed directly by the name of a locale,  for
296         example,         example,
297    
298           /pattern/Lfr_FR           /pattern/Lfr_FR
299    
300         For this reason, it must be the last modifier. The given locale is set,         For this reason, it must be the last modifier. The given locale is set,
301         pcre_maketables()  is called to build a set of character tables for the         pcre_maketables() is called to build a set of character tables for  the
302         locale, and this is then passed to pcre_compile()  when  compiling  the         locale,  and  this  is then passed to pcre_compile() when compiling the
303         regular  expression.  Without an /L (or /T) modifier, NULL is passed as         regular expression. Without an /L (or /T) modifier, NULL is  passed  as
304         the tables pointer; that is, /L applies only to the expression on which         the tables pointer; that is, /L applies only to the expression on which
305         it appears.         it appears.
306    
307         The  /M  modifier causes the size of memory block used to hold the com-         The /M modifier causes the size of memory block used to hold  the  com-
308         piled pattern to be output.         piled pattern to be output.
309    
310         If the /S modifier appears once, it causes pcre_study()  to  be  called         If  the  /S  modifier appears once, it causes pcre_study() to be called
311         after  the  expression has been compiled, and the results used when the         after the expression has been compiled, and the results used  when  the
312         expression is matched. If /S appears  twice,  it  suppresses  studying,         expression  is  matched.  If  /S appears twice, it suppresses studying,
313         even if it was requested externally by the -s command line option. This         even if it was requested externally by the -s command line option. This
314         makes it possible to specify that certain patterns are always  studied,         makes  it possible to specify that certain patterns are always studied,
315         and others are never studied, independently of -s. This feature is used         and others are never studied, independently of -s. This feature is used
316         in the test files in a few cases where the output is different when the         in the test files in a few cases where the output is different when the
317         pattern is studied.         pattern is studied.
318    
319           If the /S modifier is immediately followed by a + character,  the  call
320           to   pcre_study()  is  made  with  the  PCRE_STUDY_JIT_COMPILE  option,
321           requesting just-in-time optimization support if it is  available.  Note
322           that  there  is  also  a  /+ modifier; it must not be given immediately
323           after /S because this will be misinterpreted. If JIT studying  is  suc-
324           cessful,  it will automatically be used when pcre_exec() is run, except
325           when incompatible run-time options are  specified.  These  include  the
326           partial matching options; a complete list is given in the pcrejit docu-
327           mentation. See also the \J escape sequence below for a way  of  setting
328           the size of the JIT stack.
329    
330         The  /T  modifier  must be followed by a single digit. It causes a spe-         The  /T  modifier  must be followed by a single digit. It causes a spe-
331         cific set of built-in character tables to be passed to  pcre_compile().         cific set of built-in character tables to be passed to  pcre_compile().
332         It is used in the standard PCRE tests to check behaviour with different         It is used in the standard PCRE tests to check behaviour with different
# Line 392  DATA LINES Line 406  DATA LINES
406           \Gname     call pcre_get_named_substring() for substring           \Gname     call pcre_get_named_substring() for substring
407                        "name" after a successful match (name termin-                        "name" after a successful match (name termin-
408                        ated by next non-alphanumeric character)                        ated by next non-alphanumeric character)
409             \Jdd       set up a JIT stack of dd kilobytes maximum (any
410                          number of digits)
411           \L         call pcre_get_substringlist() after a           \L         call pcre_get_substringlist() after a
412                        successful match                        successful match
413           \M         discover the minimum MATCH_LIMIT and           \M         discover the minimum MATCH_LIMIT and
# Line 444  DATA LINES Line 460  DATA LINES
460         way of passing an empty line as data, since a real  empty  line  termi-         way of passing an empty line as data, since a real  empty  line  termi-
461         nates the data input.         nates the data input.
462    
463         If  \M  is present, pcretest calls pcre_exec() several times, with dif-         The  \J escape provides a way of setting the maximum stack size that is
464         ferent values in the match_limit and  match_limit_recursion  fields  of         used by the just-in-time optimization code. It is ignored if JIT  opti-
465         the  pcre_extra  data structure, until it finds the minimum numbers for         mization  is  not being used. Providing a stack that is larger than the
466         each parameter that allow pcre_exec() to complete. The match_limit num-         default 32K is necessary only for very complicated patterns.
467         ber  is  a  measure of the amount of backtracking that takes place, and  
468         checking it out can be instructive. For most simple matches, the number         If \M is present, pcretest calls pcre_exec() several times,  with  dif-
469         is  quite  small,  but for patterns with very large numbers of matching         ferent  values  in  the match_limit and match_limit_recursion fields of
470         possibilities, it can become large very quickly with increasing  length         the pcre_extra data structure, until it finds the minimum  numbers  for
471         of subject string. The match_limit_recursion number is a measure of how         each  parameter  that  allow  pcre_exec()  to  complete  without error.
472         much stack (or, if PCRE is compiled with  NO_RECURSE,  how  much  heap)         Because this is testing a specific feature of the  normal  interpretive
473         memory is needed to complete the match attempt.         pcre_exec()  execution, the use of any JIT optimization that might have
474           been set up by the /S+ qualifier of -s+ option is disabled.
475    
476           The match_limit number is a measure of the amount of backtracking  that
477           takes  place,  and  checking it out can be instructive. For most simple
478           matches, the number is quite small, but for patterns  with  very  large
479           numbers  of  matching  possibilities,  it can become large very quickly
480           with increasing length of  subject  string.  The  match_limit_recursion
481           number  is  a  measure  of how much stack (or, if PCRE is compiled with
482           NO_RECURSE, how much heap) memory  is  needed  to  complete  the  match
483           attempt.
484    
485         When  \O  is  used, the value specified may be higher or lower than the         When  \O  is  used, the value specified may be higher or lower than the
486         size set by the -O command line option (or defaulted to 45); \O applies         size set by the -O command line option (or defaulted to 45); \O applies
# Line 720  SAVING AND RELOADING COMPILED PATTERNS Line 746  SAVING AND RELOADING COMPILED PATTERNS
746           /pattern/im >/some/file           /pattern/im >/some/file
747    
748         See the pcreprecompile documentation for a discussion about saving  and         See the pcreprecompile documentation for a discussion about saving  and
749         re-using compiled patterns.         re-using  compiled patterns.  Note that if the pattern was successfully
750           studied with JIT optimization, the JIT data cannot be saved.
751    
752         The  data  that  is  written  is  binary. The first eight bytes are the         The data that is written is binary.  The  first  eight  bytes  are  the
753         length of the compiled pattern data  followed  by  the  length  of  the         length  of  the  compiled  pattern  data  followed by the length of the
754         optional  study  data,  each  written as four bytes in big-endian order         optional study data, each written as four  bytes  in  big-endian  order
755         (most significant byte first). If there is no study  data  (either  the         (most  significant  byte  first). If there is no study data (either the
756         pattern was not studied, or studying did not return any data), the sec-         pattern was not studied, or studying did not return any data), the sec-
757         ond length is zero. The lengths are followed by an exact  copy  of  the         ond  length  is  zero. The lengths are followed by an exact copy of the
758         compiled pattern. If there is additional study data, this follows imme-         compiled pattern. If there is additional study  data,  this  (excluding
759         diately after the compiled pattern. After writing  the  file,  pcretest         any  JIT  data)  follows  immediately after the compiled pattern. After
760         expects to read a new pattern.         writing the file, pcretest expects to read a new pattern.
761    
762         A  saved  pattern  can  be reloaded into pcretest by specifying < and a         A saved pattern can be reloaded into pcretest by  specifying  <  and  a
763         file name instead of a pattern. The name of the file must not contain a         file name instead of a pattern. The name of the file must not contain a
764         < character, as otherwise pcretest will interpret the line as a pattern         < character, as otherwise pcretest will interpret the line as a pattern
765         delimited by < characters.  For example:         delimited by < characters.  For example:
# Line 741  SAVING AND RELOADING COMPILED PATTERNS Line 768  SAVING AND RELOADING COMPILED PATTERNS
768           Compiled pattern loaded from /some/file           Compiled pattern loaded from /some/file
769           No study data           No study data
770    
771         When the pattern has been loaded, pcretest proceeds to read data  lines         If  the  pattern  was previously studied with the JIT optimization, the
772         in the usual way.         JIT information cannot be saved and restored, and so is lost. When  the
773           pattern  has  been  loaded, pcretest proceeds to read data lines in the
774         You  can copy a file written by pcretest to a different host and reload         usual way.
775         it there, even if the new host has opposite endianness to  the  one  on  
776         which  the pattern was compiled. For example, you can compile on an i86         You can copy a file written by pcretest to a different host and  reload
777           it  there,  even  if the new host has opposite endianness to the one on
778           which the pattern was compiled. For example, you can compile on an  i86
779         machine and run on a SPARC machine.         machine and run on a SPARC machine.
780    
781         File names for saving and reloading can be absolute  or  relative,  but         File  names  for  saving and reloading can be absolute or relative, but
782         note  that the shell facility of expanding a file name that starts with         note that the shell facility of expanding a file name that starts  with
783         a tilde (~) is not available.         a tilde (~) is not available.
784    
785         The ability to save and reload files in pcretest is intended for  test-         The  ability to save and reload files in pcretest is intended for test-
786         ing  and experimentation. It is not intended for production use because         ing and experimentation. It is not intended for production use  because
787         only a single pattern can be written to a file. Furthermore,  there  is         only  a  single pattern can be written to a file. Furthermore, there is
788         no  facility  for  supplying  custom  character  tables  for use with a         no facility for supplying  custom  character  tables  for  use  with  a
789         reloaded pattern. If the original  pattern  was  compiled  with  custom         reloaded  pattern.  If  the  original  pattern was compiled with custom
790         tables,  an  attempt to match a subject string using a reloaded pattern         tables, an attempt to match a subject string using a  reloaded  pattern
791         is likely to cause pcretest to crash.  Finally, if you attempt to  load         is  likely to cause pcretest to crash.  Finally, if you attempt to load
792         a file that is not in the correct format, the result is undefined.         a file that is not in the correct format, the result is undefined.
793    
794    
795  SEE ALSO  SEE ALSO
796    
797         pcre(3),  pcreapi(3),  pcrecallout(3), pcrematching(3), pcrepartial(d),         pcre(3), pcreapi(3), pcrecallout(3), pcrejit, pcrematching(3), pcrepar-
798         pcrepattern(3), pcreprecompile(3).         tial(d), pcrepattern(3), pcreprecompile(3).
799    
800    
801  AUTHOR  AUTHOR
# Line 778  AUTHOR Line 807  AUTHOR
807    
808  REVISION  REVISION
809    
810         Last updated: 01 August 2011         Last updated: 26 August 2011
811         Copyright (c) 1997-2011 University of Cambridge.         Copyright (c) 1997-2011 University of Cambridge.

Legend:
Removed from v.654  
changed lines
  Added in v.691

  ViewVC Help
Powered by ViewVC 1.1.5