/[pcre]/code/tags/pcre-4.3/doc/pcretest.txt
ViewVC logotype

Diff of /code/tags/pcre-4.3/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 41 by nigel, Sat Feb 24 21:39:17 2007 UTC revision 49 by nigel, Sat Feb 24 21:39:33 2007 UTC
# Line 7  experimenting with regular expressions. Line 7  experimenting with regular expressions.
7  If it is given two filename arguments, it reads from the first and writes to  If it is given two filename arguments, it reads from the first and writes to
8  the second. If it is given only one filename argument, it reads from that file  the second. If it is given only one filename argument, it reads from that file
9  and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and  and writes to stdout. Otherwise, it reads from stdin and writes to stdout, and
10  prompts for each line of input.  prompts for each line of input, using "re>" to prompt for regular expressions,
11    and "data>" to prompt for data lines.
12    
13  The program handles any number of sets of input on a single input file. Each  The program handles any number of sets of input on a single input file. Each
14  set starts with a regular expression, and continues with any number of data  set starts with a regular expression, and continues with any number of data
15  lines to be matched against the pattern. An empty line signals the end of the  lines to be matched against the pattern. An empty line signals the end of the
16  set. The regular expressions are given enclosed in any non-alphameric  data lines, at which point a new regular expression is read. The regular
17  delimiters other than backslash, for example  expressions are given enclosed in any non-alphameric delimiters other than
18    backslash, for example
19    
20    /(a|bc)x+yz/    /(a|bc)x+yz/
21    
22  White space before the initial delimiter is ignored. A regular expression may  White space before the initial delimiter is ignored. A regular expression may
23  be continued over several input lines, in which case the newline characters are  be continued over several input lines, in which case the newline characters are
24  included within it. See the testinput files for many examples. It is possible  included within it. See the test input files in the testdata directory for many
25  to include the delimiter within the pattern by escaping it, for example  examples. It is possible to include the delimiter within the pattern by
26    escaping it, for example
27    
28    /abc\/def/    /abc\/def/
29    
# Line 40  backslash, because Line 43  backslash, because
43  is interpreted as the first line of a pattern that starts with "abc/", causing  is interpreted as the first line of a pattern that starts with "abc/", causing
44  pcretest to read the next line as a continuation of the regular expression.  pcretest to read the next line as a continuation of the regular expression.
45    
46    
47    PATTERN MODIFIERS
48    -----------------
49    
50  The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,  The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
51  PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For  PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. For
52  example:  example:
# Line 60  to the matching process if the pattern b Line 67  to the matching process if the pattern b
67  (including \b or \B).  (including \b or \B).
68    
69  If any call to pcre_exec() in a /g or /G sequence matches an empty string, the  If any call to pcre_exec() in a /g or /G sequence matches an empty string, the
70  next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an  next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set in order
71  empty string again at the same point. If however, this second match fails, the  to search for another, non-empty, match at the same point. If this second match
72  start offset is advanced by one, and the match is retried. This imitates the  fails, the start offset is advanced by one, and the normal match is retried.
73  way Perl handles such cases when using the /g modifier or the split() function.  This imitates the way Perl handles such cases when using the /g modifier or the
74    split() function.
75    
76  There are a number of other modifiers for controlling the way pcretest  There are a number of other modifiers for controlling the way pcretest
77  operates.  operates.
# Line 85  is, /L applies only to the expression on Line 93  is, /L applies only to the expression on
93    
94  The /I modifier requests that pcretest output information about the compiled  The /I modifier requests that pcretest output information about the compiled
95  expression (whether it is anchored, has a fixed first character, and so on). It  expression (whether it is anchored, has a fixed first character, and so on). It
96  does this by calling pcre_info() after compiling an expression, and outputting  does this by calling pcre_fullinfo() after compiling an expression, and
97  the information it gets back. If the pattern is studied, the results of that  outputting the information it gets back. If the pattern is studied, the results
98  are also output.  of that are also output.
99    
100  The /D modifier is a PCRE debugging feature, which also assumes /I. It causes  The /D modifier is a PCRE debugging feature, which also assumes /I. It causes
101  the internal form of compiled regular expressions to be output after  the internal form of compiled regular expressions to be output after
# Line 99  compiled, and the results used when the Line 107  compiled, and the results used when the
107  The /M modifier causes the size of memory block used to hold the compiled  The /M modifier causes the size of memory block used to hold the compiled
108  pattern to be output.  pattern to be output.
109    
110  Finally, the /P modifier causes pcretest to call PCRE via the POSIX wrapper API  The /P modifier causes pcretest to call PCRE via the POSIX wrapper API rather
111  rather than its native API. When this is done, all other modifiers except /i,  than its native API. When this is done, all other modifiers except /i, /m, and
112  /m, and /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is  /+ are ignored. REG_ICASE is set if /i is present, and REG_NEWLINE is set if /m
113  set if /m is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always,  is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
114  and PCRE_DOTALL unless REG_NEWLINE is set.  PCRE_DOTALL unless REG_NEWLINE is set.
115    
116    The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option set.
117    This turns on the (currently incomplete) support for UTF-8 character handling
118    in PCRE, provided that it was compiled with this support enabled. This modifier
119    also causes any non-printing characters in output strings to be printed using
120    the \x{hh...} notation if they are valid UTF-8 sequences.
121    
122    
123    DATA LINES
124    ----------
125    
126  Before each data line is passed to pcre_exec(), leading and trailing whitespace  Before each data line is passed to pcre_exec(), leading and trailing whitespace
127  is removed, and it is then scanned for \ escapes. The following are recognized:  is removed, and it is then scanned for \ escapes. The following are recognized:
128    
129    \a     alarm (= BEL)    \a         alarm (= BEL)
130    \b     backspace    \b         backspace
131    \e     escape    \e         escape
132    \f     formfeed    \f         formfeed
133    \n     newline    \n         newline
134    \r     carriage return    \r         carriage return
135    \t     tab    \t         tab
136    \v     vertical tab    \v         vertical tab
137    \nnn   octal character (up to 3 octal digits)    \nnn       octal character (up to 3 octal digits)
138    \xhh   hexadecimal character (up to 2 hex digits)    \xhh       hexadecimal character (up to 2 hex digits)
139      \x{hh...}  hexadecimal UTF-8 character
140    \A     pass the PCRE_ANCHORED option to pcre_exec()  
141    \B     pass the PCRE_NOTBOL option to pcre_exec()    \A         pass the PCRE_ANCHORED option to pcre_exec()
142    \Cdd   call pcre_copy_substring() for substring dd after a successful match    \B         pass the PCRE_NOTBOL option to pcre_exec()
143             (any decimal number less than 32)    \Cdd       call pcre_copy_substring() for substring dd after a successful
144    \Gdd   call pcre_get_substring() for substring dd after a successful match                 match (any decimal number less than 32)
145             (any decimal number less than 32)    \Gdd       call pcre_get_substring() for substring dd after a successful
146    \L     call pcre_get_substringlist() after a successful match                 match (any decimal number less than 32)
147    \N     pass the PCRE_NOTEMPTY option to pcre_exec()    \L         call pcre_get_substringlist() after a successful match
148    \Odd   set the size of the output vector passed to pcre_exec() to dd    \N         pass the PCRE_NOTEMPTY option to pcre_exec()
149             (any number of decimal digits)    \Odd       set the size of the output vector passed to pcre_exec() to dd
150    \Z     pass the PCRE_NOTEOL option to pcre_exec()                 (any number of decimal digits)
151      \Z         pass the PCRE_NOTEOL option to pcre_exec()
152    
153  A backslash followed by anything else just escapes the anything else. If the  A backslash followed by anything else just escapes the anything else. If the
154  very last character is a backslash, it is ignored. This gives a way of passing  very last character is a backslash, it is ignored. This gives a way of passing
# Line 139  If /P was present on the regex, causing Line 158  If /P was present on the regex, causing
158  \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to  \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
159  regexec() respectively.  regexec() respectively.
160    
161    The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
162    of the /8 modifier on the pattern. It is recognized always. There may be any
163    number of hexadecimal digits inside the braces. The result is from one to six
164    bytes, encoded according to the UTF-8 rules.
165    
166    
167    OUTPUT FROM PCRETEST
168    --------------------
169    
170  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
171  pcre_exec() returns, starting with number 0 for the string that matched the  pcre_exec() returns, starting with number 0 for the string that matched the
172  whole pattern. Here is an example of an interactive pcretest run.  whole pattern. Here is an example of an interactive pcretest run.
# Line 154  whole pattern. Here is an example of an Line 182  whole pattern. Here is an example of an
182    No match    No match
183    
184  If the strings contain any non-printing characters, they are output as \0x  If the strings contain any non-printing characters, they are output as \0x
185  escapes. If the pattern has the /+ modifier, then the output for substring 0 is  escapes, or as \x{...} escapes if the /8 modifier was present on the pattern.
186  followed by the the rest of the subject string, identified by "0+" like this:  If the pattern has the /+ modifier, then the output for substring 0 is followed
187    by the the rest of the subject string, identified by "0+" like this:
188    
189      re> /cat/+      re> /cat/+
190    data> cataract    data> cataract
# Line 186  Note that while patterns can be continue Line 215  Note that while patterns can be continue
215  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
216  included in data by means of the \n escape.  included in data by means of the \n escape.
217    
218    
219    COMMAND LINE OPTIONS
220    --------------------
221    
222  If the -p option is given to pcretest, it is equivalent to adding /P to each  If the -p option is given to pcretest, it is equivalent to adding /P to each
223  regular expression: the POSIX wrapper API is used to call PCRE. None of the  regular expression: the POSIX wrapper API is used to call PCRE. None of the
224  following flags has any effect in this case.  following flags has any effect in this case.
# Line 204  a synonym for -m. Line 237  a synonym for -m.
237    
238  If the -t option is given, each compile, study, and match is run 20000 times  If the -t option is given, each compile, study, and match is run 20000 times
239  while being timed, and the resulting time per compile or match is output in  while being timed, and the resulting time per compile or match is output in
240  milliseconds. Do not set -t with -s, because you will then get the size output  milliseconds. Do not set -t with -m, because you will then get the size output
241  20000 times and the timing will be distorted. If you want to change the number  20000 times and the timing will be distorted. If you want to change the number
242  of repetitions used for timing, edit the definition of LOOPREPEAT at the top of  of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
243  pcretest.c  pcretest.c
244    
245  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
246  January 2000  August 2000

Legend:
Removed from v.41  
changed lines
  Added in v.49

  ViewVC Help
Powered by ViewVC 1.1.5