/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 96 by nigel, Fri Mar 2 13:10:43 2007 UTC revision 345 by ph10, Mon Apr 28 15:10:02 2008 UTC
# Line 85  DESCRIPTION Line 85  DESCRIPTION
85         "re>" to prompt for regular expressions, and "data>" to prompt for data         "re>" to prompt for regular expressions, and "data>" to prompt for data
86         lines.         lines.
87    
88           When  pcretest  is  built,  a  configuration option can specify that it
89           should be linked with the libreadline library. When this  is  done,  if
90           the input is from a terminal, it is read using the readline() function.
91           This provides line-editing and history facilities. The output from  the
92           -help option states whether or not readline() will be used.
93    
94         The program handles any number of sets of input on a single input file.         The program handles any number of sets of input on a single input file.
95         Each set starts with a regular expression, and continues with any  num-         Each set starts with a regular expression, and continues with any  num-
96         ber of data lines to be matched against the pattern.         ber of data lines to be matched against the pattern.
# Line 146  PATTERN MODIFIERS Line 152  PATTERN MODIFIERS
152         The following table shows additional modifiers for setting PCRE options         The following table shows additional modifiers for setting PCRE options
153         that do not correspond to anything in Perl:         that do not correspond to anything in Perl:
154    
155           /A       PCRE_ANCHORED           /A              PCRE_ANCHORED
156           /C       PCRE_AUTO_CALLOUT           /C              PCRE_AUTO_CALLOUT
157           /E       PCRE_DOLLAR_ENDONLY           /E              PCRE_DOLLAR_ENDONLY
158           /f       PCRE_FIRSTLINE           /f              PCRE_FIRSTLINE
159           /J       PCRE_DUPNAMES           /J              PCRE_DUPNAMES
160           /N       PCRE_NO_AUTO_CAPTURE           /N              PCRE_NO_AUTO_CAPTURE
161           /U       PCRE_UNGREEDY           /U              PCRE_UNGREEDY
162           /X       PCRE_EXTRA           /X              PCRE_EXTRA
163           /<cr>    PCRE_NEWLINE_CR           /<JS>           PCRE_JAVASCRIPT_COMPAT
164           /<lf>    PCRE_NEWLINE_LF           /<cr>           PCRE_NEWLINE_CR
165           /<crlf>  PCRE_NEWLINE_CRLF           /<lf>           PCRE_NEWLINE_LF
166           /<any>   PCRE_NEWLINE_ANY           /<crlf>         PCRE_NEWLINE_CRLF
167             /<anycrlf>      PCRE_NEWLINE_ANYCRLF
168         Those  specifying  line ending sequencess are literal strings as shown.           /<any>          PCRE_NEWLINE_ANY
169         This example sets multiline matching  with  CRLF  as  the  line  ending           /<bsr_anycrlf>  PCRE_BSR_ANYCRLF
170         sequence:           /<bsr_unicode>  PCRE_BSR_UNICODE
171    
172           Those  specifying  line  ending sequences are literal strings as shown,
173           but the letters can be in either  case.  This  example  sets  multiline
174           matching with CRLF as the line ending sequence:
175    
176           /^abc/m<crlf>           /^abc/m<crlf>
177    
# Line 197  PATTERN MODIFIERS Line 207  PATTERN MODIFIERS
207         subject contains multiple copies of the same substring.         subject contains multiple copies of the same substring.
208    
209         The  /B modifier is a debugging feature. It requests that pcretest out-         The  /B modifier is a debugging feature. It requests that pcretest out-
210         put a representation of the compiled byte code after compilation.         put a representation of the compiled byte code after compilation.  Nor-
211           mally  this  information contains length and offset values; however, if
212           /Z is also present, this data is replaced by spaces. This is a  special
213           feature for use in the automatic test scripts; it ensures that the same
214           output is generated for different internal link sizes.
215    
216         The /L modifier must be followed directly by the name of a locale,  for         The /L modifier must be followed directly by the name of a locale,  for
217         example,         example,
# Line 326  DATA LINES Line 340  DATA LINES
340                        or pcre_dfa_exec()                        or pcre_dfa_exec()
341           \<crlf>    pass the PCRE_NEWLINE_CRLF option to pcre_exec()           \<crlf>    pass the PCRE_NEWLINE_CRLF option to pcre_exec()
342                        or pcre_dfa_exec()                        or pcre_dfa_exec()
343             \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to pcre_exec()
344                          or pcre_dfa_exec()
345           \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()           \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()
346                        or pcre_dfa_exec()                        or pcre_dfa_exec()
347    
# Line 362  DATA LINES Line 378  DATA LINES
378         The use of \x{hh...} to represent UTF-8 characters is not dependent  on         The use of \x{hh...} to represent UTF-8 characters is not dependent  on
379         the  use  of  the  /8 modifier on the pattern. It is recognized always.         the  use  of  the  /8 modifier on the pattern. It is recognized always.
380         There may be any number of hexadecimal digits inside  the  braces.  The         There may be any number of hexadecimal digits inside  the  braces.  The
381         result  is from one to six bytes, encoded according to the UTF-8 rules.         result  is  from  one  to  six bytes, encoded according to the original
382           UTF-8 rules of RFC 2279. This allows for  values  in  the  range  0  to
383           0x7FFFFFFF.  Note  that not all of those are valid Unicode code points,
384           or indeed valid UTF-8 characters according to the later  rules  in  RFC
385           3629.
386    
387    
388  THE ALTERNATIVE MATCHING FUNCTION  THE ALTERNATIVE MATCHING FUNCTION
389    
390         By  default,  pcretest  uses  the  standard  PCRE  matching   function,         By   default,  pcretest  uses  the  standard  PCRE  matching  function,
391         pcre_exec() to match each data line. From release 6.0, PCRE supports an         pcre_exec() to match each data line. From release 6.0, PCRE supports an
392         alternative matching function, pcre_dfa_test(),  which  operates  in  a         alternative  matching  function,  pcre_dfa_test(),  which operates in a
393         different  way,  and has some restrictions. The differences between the         different way, and has some restrictions. The differences  between  the
394         two functions are described in the pcrematching documentation.         two functions are described in the pcrematching documentation.
395    
396         If a data line contains the \D escape sequence, or if the command  line         If  a data line contains the \D escape sequence, or if the command line
397         contains  the -dfa option, the alternative matching function is called.         contains the -dfa option, the alternative matching function is  called.
398         This function finds all possible matches at a given point. If, however,         This function finds all possible matches at a given point. If, however,
399         the  \F escape sequence is present in the data line, it stops after the         the \F escape sequence is present in the data line, it stops after  the
400         first match is found. This is always the shortest possible match.         first match is found. This is always the shortest possible match.
401    
402    
403  DEFAULT OUTPUT FROM PCRETEST  DEFAULT OUTPUT FROM PCRETEST
404    
405         This section describes the output when the  normal  matching  function,         This  section  describes  the output when the normal matching function,
406         pcre_exec(), is being used.         pcre_exec(), is being used.
407    
408         When a match succeeds, pcretest outputs the list of captured substrings         When a match succeeds, pcretest outputs the list of captured substrings
409         that pcre_exec() returns, starting with number 0 for  the  string  that         that  pcre_exec()  returns,  starting with number 0 for the string that
410         matched the whole pattern. Otherwise, it outputs "No match" or "Partial         matched the whole pattern. Otherwise, it outputs "No match" or "Partial
411         match" when pcre_exec() returns PCRE_ERROR_NOMATCH  or  PCRE_ERROR_PAR-         match"  when  pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR-
412         TIAL,  respectively, and otherwise the PCRE negative error number. Here         TIAL, respectively, and otherwise the PCRE negative error number.  Here
413         is an example of an interactive pcretest run.         is an example of an interactive pcretest run.
414    
415           $ pcretest           $ pcretest
# Line 402  DEFAULT OUTPUT FROM PCRETEST Line 422  DEFAULT OUTPUT FROM PCRETEST
422           data> xyz           data> xyz
423           No match           No match
424    
425           Note  that unset capturing substrings that are not followed by one that
426           is set are not returned by pcre_exec(), and are not shown by  pcretest.
427           In  the following example, there are two capturing substrings, but when
428           the first data line is matched, the  second,  unset  substring  is  not
429           shown.  An "internal" unset substring is shown as "<unset>", as for the
430           second data line.
431    
432               re> /(a)|(b)/
433             data> a
434              0: a
435              1: a
436             data> b
437              0: b
438              1: <unset>
439              2: b
440    
441         If the strings contain any non-printing characters, they are output  as         If the strings contain any non-printing characters, they are output  as
442         \0x  escapes,  or  as \x{...} escapes if the /8 modifier was present on         \0x  escapes,  or  as \x{...} escapes if the /8 modifier was present on
443         the pattern. See below for the definition of  non-printing  characters.         the pattern. See below for the definition of  non-printing  characters.
# Line 481  RESTARTING AFTER A PARTIAL MATCH Line 517  RESTARTING AFTER A PARTIAL MATCH
517         can restart the match with additional subject data by means of  the  \R         can restart the match with additional subject data by means of  the  \R
518         escape sequence. For example:         escape sequence. For example:
519    
520             re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
521           data> 23ja\P\D           data> 23ja\P\D
522           Partial match: 23ja           Partial match: 23ja
523           data> n05\R\D           data> n05\R\D
# Line 608  SEE ALSO Line 644  SEE ALSO
644  AUTHOR  AUTHOR
645    
646         Philip Hazel         Philip Hazel
647         University Computing Service,         University Computing Service
648         Cambridge CB2 3QH, England.         Cambridge CB2 3QH, England.
649    
650  Last updated: 30 November 2006  
651  Copyright (c) 1997-2006 University of Cambridge.  REVISION
652    
653           Last updated: 12 April 2008
654           Copyright (c) 1997-2008 University of Cambridge.

Legend:
Removed from v.96  
changed lines
  Added in v.345

  ViewVC Help
Powered by ViewVC 1.1.5