/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1458 by ph10, Tue Nov 19 15:36:57 2013 UTC revision 1459 by ph10, Tue Mar 4 10:45:15 2014 UTC
# Line 99  COMMAND LINE OPTIONS Line 99  COMMAND LINE OPTIONS
99                     newline    the default newline setting:                     newline    the default newline setting:
100                                  CR, LF, CRLF, ANYCRLF, or ANY                                  CR, LF, CRLF, ANYCRLF, or ANY
101                                  exit code is always 0                                  exit code is always 0
102                       bsr        the default setting for what \R matches:
103                                    ANYCRLF or ANY
104                                    exit code is always 0
105    
106                   The  following  options output 1 for true or 0 for false, and                   The  following  options output 1 for true or 0 for false, and
107                   set the exit code to the same value:                   set the exit code to the same value:
# Line 316  PATTERN MODIFIERS Line 319  PATTERN MODIFIERS
319           /N              set PCRE_NO_AUTO_CAPTURE           /N              set PCRE_NO_AUTO_CAPTURE
320           /O              set PCRE_NO_AUTO_POSSESS           /O              set PCRE_NO_AUTO_POSSESS
321           /P              use the POSIX wrapper           /P              use the POSIX wrapper
322             /Q              test external stack check function
323           /S              study the pattern after compilation           /S              study the pattern after compilation
324           /s              set PCRE_DOTALL           /s              set PCRE_DOTALL
325           /T              select character tables           /T              select character tables
# Line 462  PATTERN MODIFIERS Line 466  PATTERN MODIFIERS
466         compiled pattern (whether it is anchored, has a fixed first  character,         compiled pattern (whether it is anchored, has a fixed first  character,
467         and  so  on). It does this by calling pcre[16|32]_fullinfo() after com-         and  so  on). It does this by calling pcre[16|32]_fullinfo() after com-
468         piling a pattern. If the pattern is studied, the results  of  that  are         piling a pattern. If the pattern is studied, the results  of  that  are
469         also output.         also output. In this output, the word "char" means a non-UTF character,
470           that is, the value of a single data item  (8-bit,  16-bit,  or  32-bit,
471           depending on the library that is being tested).
472    
473         The  /K modifier requests pcretest to show names from backtracking con-         The  /K modifier requests pcretest to show names from backtracking con-
474         trol verbs that are  returned  from  calls  to  pcre[16|32]_exec().  It         trol verbs that are  returned  from  calls  to  pcre[16|32]_exec().  It
# Line 493  PATTERN MODIFIERS Line 499  PATTERN MODIFIERS
499         pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option,         pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option,
500         the size of the JIT compiled code is also output.         the size of the JIT compiled code is also output.
501    
502         The /S modifier causes  pcre[16|32]_study()  to  be  called  after  the         The /Q modifier is used to test the use of pcre_stack_guard. It must be
503         expression  has been compiled, and the results used when the expression         followed  by '0' or '1', specifying the return code to be given from an
504           external function that is passed to PCRE and used  for  stack  checking
505           during compilation (see the pcreapi documentation for details).
506    
507           The  /S  modifier  causes  pcre[16|32]_study()  to  be called after the
508           expression has been compiled, and the results used when the  expression
509         is matched. There are a number of qualifying characters that may follow         is matched. There are a number of qualifying characters that may follow
510         /S.  They may appear in any order.         /S.  They may appear in any order.
511    
512         If /S is followed by an exclamation mark, pcre[16|32]_study() is called         If /S is followed by an exclamation mark, pcre[16|32]_study() is called
513         with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return  a         with  the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a
514         pcre_extra block, even when studying discovers no useful information.         pcre_extra block, even when studying discovers no useful information.
515    
516         If /S is followed by a second S character, it suppresses studying, even         If /S is followed by a second S character, it suppresses studying, even
517         if it was requested externally by the  -s  command  line  option.  This         if  it  was  requested  externally  by the -s command line option. This
518         makes  it possible to specify that certain patterns are always studied,         makes it possible to specify that certain patterns are always  studied,
519         and others are never studied, independently of -s. This feature is used         and others are never studied, independently of -s. This feature is used
520         in the test files in a few cases where the output is different when the         in the test files in a few cases where the output is different when the
521         pattern is studied.         pattern is studied.
522    
523         If the  /S  modifier  is  followed  by  a  +  character,  the  call  to         If  the  /S  modifier  is  followed  by  a  +  character,  the  call to
524         pcre[16|32]_study()  is made with all the JIT study options, requesting         pcre[16|32]_study() is made with all the JIT study options,  requesting
525         just-in-time optimization support if it is available, for  both  normal         just-in-time  optimization  support if it is available, for both normal
526         and  partial matching. If you want to restrict the JIT compiling modes,         and partial matching. If you want to restrict the JIT compiling  modes,
527         you can follow /S+ with a digit in the range 1 to 7:         you can follow /S+ with a digit in the range 1 to 7:
528    
529           1  normal match only           1  normal match only
# Line 523  PATTERN MODIFIERS Line 534  PATTERN MODIFIERS
534           7  all three modes (default)           7  all three modes (default)
535    
536         If /S++ is used instead of /S+ (with or without a following digit), the         If /S++ is used instead of /S+ (with or without a following digit), the
537         text  "(JIT)"  is  added  to  the first output line after a match or no         text "(JIT)" is added to the first output line  after  a  match  or  no
538         match when JIT-compiled code was actually used.         match when JIT-compiled code was actually used.
539    
540         Note that there is also an independent /+  modifier;  it  must  not  be         Note  that  there  is  also  an independent /+ modifier; it must not be
541         given immediately after /S or /S+ because this will be misinterpreted.         given immediately after /S or /S+ because this will be misinterpreted.
542    
543         If JIT studying is successful, the compiled JIT code will automatically         If JIT studying is successful, the compiled JIT code will automatically
544         be used when pcre[16|32]_exec() is run, except when  incompatible  run-         be  used  when pcre[16|32]_exec() is run, except when incompatible run-
545         time  options are specified. For more details, see the pcrejit documen-         time options are specified. For more details, see the pcrejit  documen-
546         tation. See also the \J escape sequence below for a way of setting  the         tation.  See also the \J escape sequence below for a way of setting the
547         size of the JIT stack.         size of the JIT stack.
548    
549         Finally,  if  /S  is  followed by a minus character, JIT compilation is         Finally, if /S is followed by a minus  character,  JIT  compilation  is
550         suppressed, even if it was requested externally by the -s command  line         suppressed,  even if it was requested externally by the -s command line
551         option.  This makes it possible to specify that JIT is never to be used         option. This makes it possible to specify that JIT is never to be  used
552         for certain patterns.         for certain patterns.
553    
554         The /T modifier must be followed by a single digit. It  causes  a  spe-         The  /T  modifier  must be followed by a single digit. It causes a spe-
555         cific set of built-in character tables to be passed to pcre[16|32]_com-         cific set of built-in character tables to be passed to pcre[16|32]_com-
556         pile(). It is used in the standard PCRE tests to check  behaviour  with         pile().  It  is used in the standard PCRE tests to check behaviour with
557         different character tables. The digit specifies the tables as follows:         different character tables. The digit specifies the tables as follows:
558    
559           0   the default ASCII tables, as distributed in           0   the default ASCII tables, as distributed in
560                 pcre_chartables.c.dist                 pcre_chartables.c.dist
561           1   a set of tables defining ISO 8859 characters           1   a set of tables defining ISO 8859 characters
562    
563         In  table 1, some characters whose codes are greater than 128 are iden-         In table 1, some characters whose codes are greater than 128 are  iden-
564         tified as letters, digits, spaces, etc.         tified as letters, digits, spaces, etc.
565    
566     Using the POSIX wrapper API     Using the POSIX wrapper API
567    
568         The /P modifier causes pcretest to call PCRE via the POSIX wrapper  API         The  /P modifier causes pcretest to call PCRE via the POSIX wrapper API
569         rather  than its native API. This supports only the 8-bit library. When         rather than its native API. This supports only the 8-bit library.  When
570         /P is set, the following modifiers set options for the regcomp()  func-         /P  is set, the following modifiers set options for the regcomp() func-
571         tion:         tion:
572    
573           /i    REG_ICASE           /i    REG_ICASE
# Line 567  PATTERN MODIFIERS Line 578  PATTERN MODIFIERS
578           /W    REG_UCP        )   the POSIX standard           /W    REG_UCP        )   the POSIX standard
579           /8    REG_UTF8       )           /8    REG_UTF8       )
580    
581         The  /+  modifier  works  as  described  above. All other modifiers are         The /+ modifier works as  described  above.  All  other  modifiers  are
582         ignored.         ignored.
583    
584     Locking out certain modifiers     Locking out certain modifiers
585    
586         PCRE can be compiled with or without support for certain features  such         PCRE  can be compiled with or without support for certain features such
587         as  UTF-8/16/32  or Unicode properties. Accordingly, the standard tests         as UTF-8/16/32 or Unicode properties. Accordingly, the  standard  tests
588         are split up into a number of different files  that  are  selected  for         are  split  up  into  a number of different files that are selected for
589         running  depending  on  which features are available. When updating the         running depending on which features are available.  When  updating  the
590         tests, it is all too easy to put a new test into the wrong file by mis-         tests, it is all too easy to put a new test into the wrong file by mis-
591         take;  for example, to put a test that requires UTF support into a file         take; for example, to put a test that requires UTF support into a  file
592         that is used when it is not available. To help detect such mistakes  as         that  is used when it is not available. To help detect such mistakes as
593         early  as  possible, there is a facility for locking out specific modi-         early as possible, there is a facility for locking out  specific  modi-
594         fiers. If an input line for pcretest starts with the string "< forbid "         fiers. If an input line for pcretest starts with the string "< forbid "
595         the  following  sequence  of characters is taken as a list of forbidden         the following sequence of characters is taken as a  list  of  forbidden
596         modifiers. For example, in the test files that must not use UTF or Uni-         modifiers. For example, in the test files that must not use UTF or Uni-
597         code property support, this line appears:         code property support, this line appears:
598    
599           < forbid 8W           < forbid 8W
600    
601         This  locks out the /8 and /W modifiers. An immediate error is given if         This locks out the /8 and /W modifiers. An immediate error is given  if
602         they are subsequently encountered. If the character string  contains  <         they  are  subsequently encountered. If the character string contains <
603         but  not  >,  all  the  multi-character modifiers that begin with < are         but not >, all the multi-character modifiers  that  begin  with  <  are
604         locked out. Otherwise, such modifiers must be  explicitly  listed,  for         locked  out.  Otherwise,  such modifiers must be explicitly listed, for
605         example:         example:
606    
607           < forbid <JS><cr>           < forbid <JS><cr>
608    
609         There must be a single space between < and "forbid" for this feature to         There must be a single space between < and "forbid" for this feature to
610         be recognised. If there is not, the line is  interpreted  either  as  a         be  recognised.  If  there  is not, the line is interpreted either as a
611         request  to  re-load  a pre-compiled pattern (see "SAVING AND RELOADING         request to re-load a pre-compiled pattern (see  "SAVING  AND  RELOADING
612         COMPILED PATTERNS" below) or, if there is a another < character,  as  a         COMPILED  PATTERNS"  below) or, if there is a another < character, as a
613         pattern that uses < as its delimiter.         pattern that uses < as its delimiter.
614    
615    
616  DATA LINES  DATA LINES
617    
618         Before  each  data  line  is  passed to pcre[16|32]_exec(), leading and         Before each data line is  passed  to  pcre[16|32]_exec(),  leading  and
619         trailing white space is removed, and it is then scanned for \  escapes.         trailing  white space is removed, and it is then scanned for \ escapes.
620         Some  of  these are pretty esoteric features, intended for checking out         Some of these are pretty esoteric features, intended for  checking  out
621         some of the more complicated features of PCRE. If you are just  testing         some  of the more complicated features of PCRE. If you are just testing
622         "ordinary"  regular  expressions, you probably don't need any of these.         "ordinary" regular expressions, you probably don't need any  of  these.
623         The following escapes are recognized:         The following escapes are recognized:
624    
625           \a         alarm (BEL, \x07)           \a         alarm (BEL, \x07)
# Line 669  DATA LINES Line 680  DATA LINES
680                        (any number of digits)                        (any number of digits)
681           \R         pass the PCRE_DFA_RESTART option to pcre[16|32]_dfa_exec()           \R         pass the PCRE_DFA_RESTART option to pcre[16|32]_dfa_exec()
682           \S         output details of memory get/free calls during matching           \S         output details of memory get/free calls during matching
683           \Y            pass    the    PCRE_NO_START_OPTIMIZE     option     to           \Y             pass     the    PCRE_NO_START_OPTIMIZE    option    to
684         pcre[16|32]_exec()         pcre[16|32]_exec()
685                        or pcre[16|32]_dfa_exec()                        or pcre[16|32]_dfa_exec()
686           \Z         pass the PCRE_NOTEOL option to pcre[16|32]_exec()           \Z         pass the PCRE_NOTEOL option to pcre[16|32]_exec()
# Line 678  DATA LINES Line 689  DATA LINES
689                        pcre[16|32]_exec() or pcre[16|32]_dfa_exec()                        pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
690           \>dd       start the match at offset dd (optional "-"; then           \>dd       start the match at offset dd (optional "-"; then
691                        any number of digits); this sets the startoffset                        any number of digits); this sets the startoffset
692                        argument         for        pcre[16|32]_exec()        or                        argument        for        pcre[16|32]_exec()         or
693         pcre[16|32]_dfa_exec()         pcre[16|32]_dfa_exec()
694           \<cr>      pass the PCRE_NEWLINE_CR option to pcre[16|32]_exec()           \<cr>      pass the PCRE_NEWLINE_CR option to pcre[16|32]_exec()
695                        or pcre[16|32]_dfa_exec()                        or pcre[16|32]_dfa_exec()
# Line 691  DATA LINES Line 702  DATA LINES
702           \<any>     pass the PCRE_NEWLINE_ANY option to pcre[16|32]_exec()           \<any>     pass the PCRE_NEWLINE_ANY option to pcre[16|32]_exec()
703                        or pcre[16|32]_dfa_exec()                        or pcre[16|32]_dfa_exec()
704    
705         The use of \x{hh...} is not dependent on the use of the /8 modifier  on         The  use of \x{hh...} is not dependent on the use of the /8 modifier on
706         the  pattern. It is recognized always. There may be any number of hexa-         the pattern. It is recognized always. There may be any number of  hexa-
707         decimal digits inside the braces; invalid  values  provoke  error  mes-         decimal  digits  inside  the  braces; invalid values provoke error mes-
708         sages.         sages.
709    
710         Note  that  \xhh  specifies one byte rather than one character in UTF-8         Note that \xhh specifies one byte rather than one  character  in  UTF-8
711         mode; this makes it possible to construct invalid UTF-8  sequences  for         mode;  this  makes it possible to construct invalid UTF-8 sequences for
712         testing  purposes.  On the other hand, \x{hh} is interpreted as a UTF-8         testing purposes. On the other hand, \x{hh} is interpreted as  a  UTF-8
713         character in UTF-8 mode, generating more than one byte if the value  is         character  in UTF-8 mode, generating more than one byte if the value is
714         greater  than  127.   When testing the 8-bit library not in UTF-8 mode,         greater than 127.  When testing the 8-bit library not  in  UTF-8  mode,
715         \x{hh} generates one byte for values less than 256, and causes an error         \x{hh} generates one byte for values less than 256, and causes an error
716         for greater values.         for greater values.
717    
718         In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it         In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
719         possible to construct invalid UTF-16 sequences for testing purposes.         possible to construct invalid UTF-16 sequences for testing purposes.
720    
721         In UTF-32 mode, all 4- to 8-digit \x{...}  values  are  accepted.  This         In  UTF-32  mode,  all  4- to 8-digit \x{...} values are accepted. This
722         makes  it  possible  to  construct invalid UTF-32 sequences for testing         makes it possible to construct invalid  UTF-32  sequences  for  testing
723         purposes.         purposes.
724    
725         The escapes that specify line ending  sequences  are  literal  strings,         The  escapes  that  specify  line ending sequences are literal strings,
726         exactly as shown. No more than one newline setting should be present in         exactly as shown. No more than one newline setting should be present in
727         any data line.         any data line.
728    
729         A backslash followed by anything else just escapes the  anything  else.         A  backslash  followed by anything else just escapes the anything else.
730         If  the very last character is a backslash, it is ignored. This gives a         If the very last character is a backslash, it is ignored. This gives  a
731         way of passing an empty line as data, since a real  empty  line  termi-         way  of  passing  an empty line as data, since a real empty line termi-
732         nates the data input.         nates the data input.
733    
734         The  \J escape provides a way of setting the maximum stack size that is         The \J escape provides a way of setting the maximum stack size that  is
735         used by the just-in-time optimization code. It is ignored if JIT  opti-         used  by the just-in-time optimization code. It is ignored if JIT opti-
736         mization  is  not being used. Providing a stack that is larger than the         mization is not being used. Providing a stack that is larger  than  the
737         default 32K is necessary only for very complicated patterns.         default 32K is necessary only for very complicated patterns.
738    
739         If \M is present, pcretest calls pcre[16|32]_exec() several times, with         If \M is present, pcretest calls pcre[16|32]_exec() several times, with
740         different values in the match_limit and match_limit_recursion fields of         different values in the match_limit and match_limit_recursion fields of
741         the pcre[16|32]_extra data structure, until it finds the  minimum  num-         the  pcre[16|32]_extra  data structure, until it finds the minimum num-
742         bers for each parameter that allow pcre[16|32]_exec() to complete with-         bers for each parameter that allow pcre[16|32]_exec() to complete with-
743         out error. Because this is testing a specific  feature  of  the  normal         out  error.  Because  this  is testing a specific feature of the normal
744         interpretive pcre[16|32]_exec() execution, the use of any JIT optimiza-         interpretive pcre[16|32]_exec() execution, the use of any JIT optimiza-
745         tion that might have been set up by the /S+ qualifier of -s+ option  is         tion  that might have been set up by the /S+ qualifier of -s+ option is
746         disabled.         disabled.
747    
748         The  match_limit number is a measure of the amount of backtracking that         The match_limit number is a measure of the amount of backtracking  that
749         takes place, and checking it out can be instructive.  For  most  simple         takes  place,  and  checking it out can be instructive. For most simple
750         matches,  the  number  is quite small, but for patterns with very large         matches, the number is quite small, but for patterns  with  very  large
751         numbers of matching possibilities, it can  become  large  very  quickly         numbers  of  matching  possibilities,  it can become large very quickly
752         with  increasing  length  of  subject string. The match_limit_recursion         with increasing length of  subject  string.  The  match_limit_recursion
753         number is a measure of how much stack (or, if  PCRE  is  compiled  with         number  is  a  measure  of how much stack (or, if PCRE is compiled with
754         NO_RECURSE,  how  much  heap)  memory  is  needed to complete the match         NO_RECURSE, how much heap) memory  is  needed  to  complete  the  match
755         attempt.         attempt.
756    
757         When \O is used, the value specified may be higher or  lower  than  the         When  \O  is  used, the value specified may be higher or lower than the
758         size set by the -O command line option (or defaulted to 45); \O applies         size set by the -O command line option (or defaulted to 45); \O applies
759         only to the call  of  pcre[16|32]_exec()  for  the  line  in  which  it         only  to  the  call  of  pcre[16|32]_exec()  for  the  line in which it
760         appears.         appears.
761    
762         If  the /P modifier was present on the pattern, causing the POSIX wrap-         If the /P modifier was present on the pattern, causing the POSIX  wrap-
763         per API to be used, the only option-setting  sequences  that  have  any         per  API  to  be  used, the only option-setting sequences that have any
764         effect  are  \B,  \N,  and  \Z,  causing  REG_NOTBOL, REG_NOTEMPTY, and         effect are \B,  \N,  and  \Z,  causing  REG_NOTBOL,  REG_NOTEMPTY,  and
765         REG_NOTEOL, respectively, to be passed to regexec().         REG_NOTEOL, respectively, to be passed to regexec().
766    
767    
768  THE ALTERNATIVE MATCHING FUNCTION  THE ALTERNATIVE MATCHING FUNCTION
769    
770         By  default,  pcretest  uses  the  standard  PCRE  matching   function,         By   default,  pcretest  uses  the  standard  PCRE  matching  function,
771         pcre[16|32]_exec()  to  match  each  data  line.  PCRE also supports an         pcre[16|32]_exec() to match each  data  line.  PCRE  also  supports  an
772         alternative matching function, pcre[16|32]_dfa_test(),  which  operates         alternative  matching  function, pcre[16|32]_dfa_test(), which operates
773         in  a different way, and has some restrictions. The differences between         in a different way, and has some restrictions. The differences  between
774         the two functions are described in the pcrematching documentation.         the two functions are described in the pcrematching documentation.
775    
776         If a data line contains the \D escape sequence, or if the command  line         If  a data line contains the \D escape sequence, or if the command line
777         contains  the  -dfa  option, the alternative matching function is used.         contains the -dfa option, the alternative matching  function  is  used.
778         This function finds all possible matches at a given point. If, however,         This function finds all possible matches at a given point. If, however,
779         the  \F escape sequence is present in the data line, it stops after the         the \F escape sequence is present in the data line, it stops after  the
780         first match is found. This is always the shortest possible match.         first match is found. This is always the shortest possible match.
781    
782    
783  DEFAULT OUTPUT FROM PCRETEST  DEFAULT OUTPUT FROM PCRETEST
784    
785         This section describes the output when the  normal  matching  function,         This  section  describes  the output when the normal matching function,
786         pcre[16|32]_exec(), is being used.         pcre[16|32]_exec(), is being used.
787    
788         When a match succeeds, pcretest outputs the list of captured substrings         When a match succeeds, pcretest outputs the list of captured substrings
789         that pcre[16|32]_exec() returns, starting with number 0 for the  string         that  pcre[16|32]_exec() returns, starting with number 0 for the string
790         that  matched  the whole pattern. Otherwise, it outputs "No match" when         that matched the whole pattern. Otherwise, it outputs "No  match"  when
791         the return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by  the         the  return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the
792         partially    matching   substring   when   pcre[16|32]_exec()   returns         partially   matching   substring   when   pcre[16|32]_exec()    returns
793         PCRE_ERROR_PARTIAL. (Note that this is the entire  substring  that  was         PCRE_ERROR_PARTIAL.  (Note  that  this is the entire substring that was
794         inspected  during  the  partial match; it may include characters before         inspected during the partial match; it may  include  characters  before
795         the actual match start if a lookbehind assertion, \K,  \b,  or  \B  was         the  actual  match  start  if a lookbehind assertion, \K, \b, or \B was
796         involved.)  For  any  other  return, pcretest outputs the PCRE negative         involved.) For any other return, pcretest  outputs  the  PCRE  negative
797         error number and a short descriptive phrase. If the error is  a  failed         error  number  and a short descriptive phrase. If the error is a failed
798         UTF  string check, the offset of the start of the failing character and         UTF string check, the offset of the start of the failing character  and
799         the reason code are also output, provided that the size of  the  output         the  reason  code are also output, provided that the size of the output
800         vector  is  at least two. Here is an example of an interactive pcretest         vector is at least two. Here is an example of an  interactive  pcretest
801         run.         run.
802    
803           $ pcretest           $ pcretest
# Line 800  DEFAULT OUTPUT FROM PCRETEST Line 811  DEFAULT OUTPUT FROM PCRETEST
811           No match           No match
812    
813         Unset capturing substrings that are not followed by one that is set are         Unset capturing substrings that are not followed by one that is set are
814         not  returned  by pcre[16|32]_exec(), and are not shown by pcretest. In         not returned by pcre[16|32]_exec(), and are not shown by  pcretest.  In
815         the following example, there are two capturing substrings, but when the         the following example, there are two capturing substrings, but when the
816         first  data  line is matched, the second, unset substring is not shown.         first data line is matched, the second, unset substring is  not  shown.
817         An "internal" unset substring is shown as "<unset>", as for the  second         An  "internal" unset substring is shown as "<unset>", as for the second
818         data line.         data line.
819    
820             re> /(a)|(b)/             re> /(a)|(b)/
# Line 815  DEFAULT OUTPUT FROM PCRETEST Line 826  DEFAULT OUTPUT FROM PCRETEST
826            1: <unset>            1: <unset>
827            2: b            2: b
828    
829         If  the strings contain any non-printing characters, they are output as         If the strings contain any non-printing characters, they are output  as
830         \xhh escapes if the value is less than 256 and UTF  mode  is  not  set.         \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
831         Otherwise they are output as \x{hh...} escapes. See below for the defi-         Otherwise they are output as \x{hh...} escapes. See below for the defi-
832         nition of non-printing characters. If the pattern has the /+  modifier,         nition  of non-printing characters. If the pattern has the /+ modifier,
833         the  output  for substring 0 is followed by the the rest of the subject         the output for substring 0 is followed by the the rest of  the  subject
834         string, identified by "0+" like this:         string, identified by "0+" like this:
835    
836             re> /cat/+             re> /cat/+
# Line 827  DEFAULT OUTPUT FROM PCRETEST Line 838  DEFAULT OUTPUT FROM PCRETEST
838            0: cat            0: cat
839            0+ aract            0+ aract
840    
841         If the pattern has the /g or /G modifier,  the  results  of  successive         If  the  pattern  has  the /g or /G modifier, the results of successive
842         matching attempts are output in sequence, like this:         matching attempts are output in sequence, like this:
843    
844             re> /\Bi(\w\w)/g             re> /\Bi(\w\w)/g
# Line 839  DEFAULT OUTPUT FROM PCRETEST Line 850  DEFAULT OUTPUT FROM PCRETEST
850            0: ipp            0: ipp
851            1: pp            1: pp
852    
853         "No  match" is output only if the first match attempt fails. Here is an         "No match" is output only if the first match attempt fails. Here is  an
854         example of a failure message (the offset 4 that is specified by \>4  is         example  of a failure message (the offset 4 that is specified by \>4 is
855         past the end of the subject string):         past the end of the subject string):
856    
857             re> /xyz/             re> /xyz/
858           data> xyz\>4           data> xyz\>4
859           Error -24 (bad offset value)           Error -24 (bad offset value)
860    
861         If  any  of the sequences \C, \G, or \L are present in a data line that         If any of the sequences \C, \G, or \L are present in a data  line  that
862         is successfully matched, the substrings extracted  by  the  convenience         is  successfully  matched,  the substrings extracted by the convenience
863         functions are output with C, G, or L after the string number instead of         functions are output with C, G, or L after the string number instead of
864         a colon. This is in addition to the normal full list. The string length         a colon. This is in addition to the normal full list. The string length
865         (that  is,  the return from the extraction function) is given in paren-         (that is, the return from the extraction function) is given  in  paren-
866         theses after each string for \C and \G.         theses after each string for \C and \G.
867    
868         Note that whereas patterns can be continued over several lines (a plain         Note that whereas patterns can be continued over several lines (a plain
869         ">" prompt is used for continuations), data lines may not. However new-         ">" prompt is used for continuations), data lines may not. However new-
870         lines can be included in data by means of the \n escape (or  \r,  \r\n,         lines  can  be included in data by means of the \n escape (or \r, \r\n,
871         etc., depending on the newline sequence setting).         etc., depending on the newline sequence setting).
872    
873    
874  OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION  OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
875    
876         When the alternative matching function, pcre[16|32]_dfa_exec(), is used         When the alternative matching function, pcre[16|32]_dfa_exec(), is used
877         (by means of the \D escape sequence or the -dfa command  line  option),         (by  means  of the \D escape sequence or the -dfa command line option),
878         the  output  consists  of  a  list of all the matches that start at the         the output consists of a list of all the  matches  that  start  at  the
879         first point in the subject where there is at least one match. For exam-         first point in the subject where there is at least one match. For exam-
880         ple:         ple:
881    
# Line 874  OUTPUT FROM THE ALTERNATIVE MATCHING FUN Line 885  OUTPUT FROM THE ALTERNATIVE MATCHING FUN
885            1: tang            1: tang
886            2: tan            2: tan
887    
888         (Using  the  normal  matching function on this data finds only "tang".)         (Using the normal matching function on this data  finds  only  "tang".)
889         The longest matching string is always given first (and numbered  zero).         The  longest matching string is always given first (and numbered zero).
890         After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol-         After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol-
891         lowed by the partially matching  substring.  (Note  that  this  is  the         lowed  by  the  partially  matching  substring.  (Note that this is the
892         entire  substring  that  was inspected during the partial match; it may         entire substring that was inspected during the partial  match;  it  may
893         include characters before the actual match start if a lookbehind asser-         include characters before the actual match start if a lookbehind asser-
894         tion, \K, \b, or \B was involved.)         tion, \K, \b, or \B was involved.)
895    
# Line 894  OUTPUT FROM THE ALTERNATIVE MATCHING FUN Line 905  OUTPUT FROM THE ALTERNATIVE MATCHING FUN
905            1: tan            1: tan
906            0: tan            0: tan
907    
908         Since the matching function does not  support  substring  capture,  the         Since  the  matching  function  does not support substring capture, the
909         escape  sequences  that  are concerned with captured substrings are not         escape sequences that are concerned with captured  substrings  are  not
910         relevant.         relevant.
911    
912    
913  RESTARTING AFTER A PARTIAL MATCH  RESTARTING AFTER A PARTIAL MATCH
914    
915         When the alternative matching function has given the PCRE_ERROR_PARTIAL         When the alternative matching function has given the PCRE_ERROR_PARTIAL
916         return,  indicating that the subject partially matched the pattern, you         return, indicating that the subject partially matched the pattern,  you
917         can restart the match with additional subject data by means of  the  \R         can  restart  the match with additional subject data by means of the \R
918         escape sequence. For example:         escape sequence. For example:
919    
920             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
# Line 912  RESTARTING AFTER A PARTIAL MATCH Line 923  RESTARTING AFTER A PARTIAL MATCH
923           data> n05\R\D           data> n05\R\D
924            0: n05            0: n05
925    
926         For  further  information  about  partial matching, see the pcrepartial         For further information about partial  matching,  see  the  pcrepartial
927         documentation.         documentation.
928    
929    
930  CALLOUTS  CALLOUTS
931    
932         If the pattern contains any callout requests, pcretest's callout  func-         If  the pattern contains any callout requests, pcretest's callout func-
933         tion  is  called  during  matching. This works with both matching func-         tion is called during matching. This works  with  both  matching  func-
934         tions. By default, the called function displays the callout number, the         tions. By default, the called function displays the callout number, the
935         start  and  current  positions in the text at the callout time, and the         start and current positions in the text at the callout  time,  and  the
936         next pattern item to be tested. For example:         next pattern item to be tested. For example:
937    
938           --->pqrabcdef           --->pqrabcdef
939             0    ^  ^     \d             0    ^  ^     \d
940    
941         This output indicates that  callout  number  0  occurred  for  a  match         This  output  indicates  that  callout  number  0  occurred for a match
942         attempt  starting  at  the fourth character of the subject string, when         attempt starting at the fourth character of the  subject  string,  when
943         the pointer was at the seventh character of the data, and when the next         the pointer was at the seventh character of the data, and when the next
944         pattern  item  was  \d.  Just one circumflex is output if the start and         pattern item was \d. Just one circumflex is output  if  the  start  and
945         current positions are the same.         current positions are the same.
946    
947         Callouts numbered 255 are assumed to be automatic callouts, inserted as         Callouts numbered 255 are assumed to be automatic callouts, inserted as
948         a  result  of the /C pattern modifier. In this case, instead of showing         a result of the /C pattern modifier. In this case, instead  of  showing
949         the callout number, the offset in the pattern, preceded by a  plus,  is         the  callout  number, the offset in the pattern, preceded by a plus, is
950         output. For example:         output. For example:
951    
952             re> /\d?[A-E]\*/C             re> /\d?[A-E]\*/C
# Line 948  CALLOUTS Line 959  CALLOUTS
959            0: E*            0: E*
960    
961         If a pattern contains (*MARK) items, an additional line is output when-         If a pattern contains (*MARK) items, an additional line is output when-
962         ever a change of latest mark is passed to  the  callout  function.  For         ever  a  change  of  latest mark is passed to the callout function. For
963         example:         example:
964    
965             re> /a(*MARK:X)bc/C             re> /a(*MARK:X)bc/C
# Line 962  CALLOUTS Line 973  CALLOUTS
973           +12 ^  ^           +12 ^  ^
974            0: abc            0: abc
975    
976         The  mark  changes between matching "a" and "b", but stays the same for         The mark changes between matching "a" and "b", but stays the  same  for
977         the rest of the match, so nothing more is output. If, as  a  result  of         the  rest  of  the match, so nothing more is output. If, as a result of
978         backtracking,  the  mark  reverts to being unset, the text "<unset>" is         backtracking, the mark reverts to being unset, the  text  "<unset>"  is
979         output.         output.
980    
981         The callout function in pcretest returns zero (carry  on  matching)  by         The  callout  function  in pcretest returns zero (carry on matching) by
982         default,  but you can use a \C item in a data line (as described above)         default, but you can use a \C item in a data line (as described  above)
983         to change this and other parameters of the callout.         to change this and other parameters of the callout.
984    
985         Inserting callouts can be helpful when using pcretest to check  compli-         Inserting  callouts can be helpful when using pcretest to check compli-
986         cated  regular expressions. For further information about callouts, see         cated regular expressions. For further information about callouts,  see
987         the pcrecallout documentation.         the pcrecallout documentation.
988    
989    
990  NON-PRINTING CHARACTERS  NON-PRINTING CHARACTERS
991    
992         When pcretest is outputting text in the compiled version of a  pattern,         When  pcretest is outputting text in the compiled version of a pattern,
993         bytes  other  than 32-126 are always treated as non-printing characters         bytes other than 32-126 are always treated as  non-printing  characters
994         are are therefore shown as hex escapes.         are are therefore shown as hex escapes.
995    
996         When pcretest is outputting text that is a matched part  of  a  subject         When  pcretest  is  outputting text that is a matched part of a subject
997         string,  it behaves in the same way, unless a different locale has been         string, it behaves in the same way, unless a different locale has  been
998         set for the  pattern  (using  the  /L  modifier).  In  this  case,  the         set  for  the  pattern  (using  the  /L  modifier).  In  this case, the
999         isprint() function to distinguish printing and non-printing characters.         isprint() function to distinguish printing and non-printing characters.
1000    
1001    
1002  SAVING AND RELOADING COMPILED PATTERNS  SAVING AND RELOADING COMPILED PATTERNS
1003    
1004         The  facilities  described  in  this section are not available when the         The facilities described in this section are  not  available  when  the
1005         POSIX interface to PCRE is being used, that is,  when  the  /P  pattern         POSIX  interface  to  PCRE  is being used, that is, when the /P pattern
1006         modifier is specified.         modifier is specified.
1007    
1008         When the POSIX interface is not in use, you can cause pcretest to write         When the POSIX interface is not in use, you can cause pcretest to write
1009         a compiled pattern to a file, by following the modifiers with >  and  a         a  compiled  pattern to a file, by following the modifiers with > and a
1010         file name.  For example:         file name.  For example:
1011    
1012           /pattern/im >/some/file           /pattern/im >/some/file
1013    
1014         See  the pcreprecompile documentation for a discussion about saving and         See the pcreprecompile documentation for a discussion about saving  and
1015         re-using compiled patterns.  Note that if the pattern was  successfully         re-using  compiled patterns.  Note that if the pattern was successfully
1016         studied with JIT optimization, the JIT data cannot be saved.         studied with JIT optimization, the JIT data cannot be saved.
1017    
1018         The  data  that  is  written  is  binary. The first eight bytes are the         The data that is written is binary.  The  first  eight  bytes  are  the
1019         length of the compiled pattern data  followed  by  the  length  of  the         length  of  the  compiled  pattern  data  followed by the length of the
1020         optional  study  data,  each  written as four bytes in big-endian order         optional study data, each written as four  bytes  in  big-endian  order
1021         (most significant byte first). If there is no study  data  (either  the         (most  significant  byte  first). If there is no study data (either the
1022         pattern was not studied, or studying did not return any data), the sec-         pattern was not studied, or studying did not return any data), the sec-
1023         ond length is zero. The lengths are followed by an exact  copy  of  the         ond  length  is  zero. The lengths are followed by an exact copy of the
1024         compiled  pattern.  If  there is additional study data, this (excluding         compiled pattern. If there is additional study  data,  this  (excluding
1025         any JIT data) follows immediately after  the  compiled  pattern.  After         any  JIT  data)  follows  immediately after the compiled pattern. After
1026         writing the file, pcretest expects to read a new pattern.         writing the file, pcretest expects to read a new pattern.
1027    
1028         A  saved  pattern  can  be reloaded into pcretest by specifying < and a         A saved pattern can be reloaded into pcretest by  specifying  <  and  a
1029         file name instead of a pattern. There must be no space  between  <  and         file  name  instead  of a pattern. There must be no space between < and
1030         the  file  name,  which  must  not  contain a < character, as otherwise         the file name, which must not  contain  a  <  character,  as  otherwise
1031         pcretest will interpret the line as a pattern delimited  by  <  charac-         pcretest  will  interpret  the line as a pattern delimited by < charac-
1032         ters. For example:         ters. For example:
1033    
1034            re> </some/file            re> </some/file
1035           Compiled pattern loaded from /some/file           Compiled pattern loaded from /some/file
1036           No study data           No study data
1037    
1038         If  the  pattern  was previously studied with the JIT optimization, the         If the pattern was previously studied with the  JIT  optimization,  the
1039         JIT information cannot be saved and restored, and so is lost. When  the         JIT  information cannot be saved and restored, and so is lost. When the
1040         pattern  has  been  loaded, pcretest proceeds to read data lines in the         pattern has been loaded, pcretest proceeds to read data  lines  in  the
1041         usual way.         usual way.
1042    
1043         You can copy a file written by pcretest to a different host and  reload         You  can copy a file written by pcretest to a different host and reload
1044         it  there,  even  if the new host has opposite endianness to the one on         it there, even if the new host has opposite endianness to  the  one  on
1045         which the pattern was compiled. For example, you can compile on an  i86         which  the pattern was compiled. For example, you can compile on an i86
1046         machine  and  run  on  a SPARC machine. When a pattern is reloaded on a         machine and run on a SPARC machine. When a pattern  is  reloaded  on  a
1047         host with different endianness, the confirmation message is changed to:         host with different endianness, the confirmation message is changed to:
1048    
1049           Compiled pattern (byte-inverted) loaded from /some/file           Compiled pattern (byte-inverted) loaded from /some/file
1050    
1051         The test suite contains some saved pre-compiled patterns with different         The test suite contains some saved pre-compiled patterns with different
1052         endianness.  These  are  reloaded  using "<!" instead of just "<". This         endianness. These are reloaded using "<!" instead  of  just  "<".  This
1053         suppresses the "(byte-inverted)" text so that the output is the same on         suppresses the "(byte-inverted)" text so that the output is the same on
1054         all  hosts.  It  also forces debugging output once the pattern has been         all hosts. It also forces debugging output once the  pattern  has  been
1055         reloaded.         reloaded.
1056    
1057         File names for saving and reloading can be absolute  or  relative,  but         File  names  for  saving and reloading can be absolute or relative, but
1058         note  that the shell facility of expanding a file name that starts with         note that the shell facility of expanding a file name that starts  with
1059         a tilde (~) is not available.         a tilde (~) is not available.
1060    
1061         The ability to save and reload files in pcretest is intended for  test-         The  ability to save and reload files in pcretest is intended for test-
1062         ing  and experimentation. It is not intended for production use because         ing and experimentation. It is not intended for production use  because
1063         only a single pattern can be written to a file. Furthermore,  there  is         only  a  single pattern can be written to a file. Furthermore, there is
1064         no  facility  for  supplying  custom  character  tables  for use with a         no facility for supplying  custom  character  tables  for  use  with  a
1065         reloaded pattern. If the original  pattern  was  compiled  with  custom         reloaded  pattern.  If  the  original  pattern was compiled with custom
1066         tables,  an  attempt to match a subject string using a reloaded pattern         tables, an attempt to match a subject string using a  reloaded  pattern
1067         is likely to cause pcretest to crash.  Finally, if you attempt to  load         is  likely to cause pcretest to crash.  Finally, if you attempt to load
1068         a file that is not in the correct format, the result is undefined.         a file that is not in the correct format, the result is undefined.
1069    
1070    
1071  SEE ALSO  SEE ALSO
1072    
1073         pcre(3),  pcre16(3),  pcre32(3),  pcreapi(3),  pcrecallout(3), pcrejit,         pcre(3), pcre16(3),  pcre32(3),  pcreapi(3),  pcrecallout(3),  pcrejit,
1074         pcrematching(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3).         pcrematching(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3).
1075    
1076    
# Line 1072  AUTHOR Line 1083  AUTHOR
1083    
1084  REVISION  REVISION
1085    
1086         Last updated: 12 November 2013         Last updated: 09 February 2014
1087         Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2014 University of Cambridge.

Legend:
Removed from v.1458  
changed lines
  Added in v.1459

  ViewVC Help
Powered by ViewVC 1.1.5