/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1403 by ph10, Wed May 1 16:39:35 2013 UTC revision 1404 by ph10, Tue Nov 19 15:36:57 2013 UTC
# Line 138  COMMAND LINE OPTIONS Line 138  COMMAND LINE OPTIONS
138                   compiled.  This  is  equivalent  to adding /M to each regular                   compiled.  This  is  equivalent  to adding /M to each regular
139                   expression. The size is given in bytes for both libraries.                   expression. The size is given in bytes for both libraries.
140    
141         -o osize  Set the number of elements in the output vector that is  used         -O        Behave as if each pattern has the /O modifier, that  is  dis-
142                   when  calling pcre[16|32]_exec() or pcre[16|32]_dfa_exec() to                   able auto-possessification for all patterns.
143                   be osize. The default value is 45, which  is  enough  for  14  
144           -o osize  Set  the number of elements in the output vector that is used
145                     when calling pcre[16|32]_exec() or pcre[16|32]_dfa_exec()  to
146                     be  osize.  The  default  value is 45, which is enough for 14
147                   capturing subexpressions for pcre[16|32]_exec() or 22 differ-                   capturing subexpressions for pcre[16|32]_exec() or 22 differ-
148                   ent matches for pcre[16|32]_dfa_exec().  The vector size  can                   ent  matches for pcre[16|32]_dfa_exec().  The vector size can
149                   be  changed  for individual matching calls by including \O in                   be changed for individual matching calls by including  \O  in
150                   the data line (see below).                   the data line (see below).
151    
152         -p        Behave as if each pattern has  the  /P  modifier;  the  POSIX         -p        Behave  as  if  each  pattern  has the /P modifier; the POSIX
153                   wrapper  API  is used to call PCRE. None of the other options                   wrapper API is used to call PCRE. None of the  other  options
154                   has any effect when -p is set. This option can be  used  only                   has  any  effect when -p is set. This option can be used only
155                   with the 8-bit library.                   with the 8-bit library.
156    
157         -q        Do  not output the version number of pcretest at the start of         -q        Do not output the version number of pcretest at the start  of
158                   execution.                   execution.
159    
160         -S size   On Unix-like systems, set the size of the run-time  stack  to         -S size   On  Unix-like  systems, set the size of the run-time stack to
161                   size megabytes.                   size megabytes.
162    
163         -s or -s+ Behave  as  if  each  pattern  has  the /S modifier; in other         -s or -s+ Behave as if each pattern  has  the  /S  modifier;  in  other
164                   words, force each pattern to be studied. If -s+ is used,  all                   words,  force each pattern to be studied. If -s+ is used, all
165                   the  JIT  compile  options are passed to pcre[16|32]_study(),                   the JIT compile options are  passed  to  pcre[16|32]_study(),
166                   causing just-in-time optimization to  be  set  up  if  it  is                   causing  just-in-time  optimization  to  be  set  up if it is
167                   available,  for  both full and partial matching. Specific JIT                   available, for both full and partial matching.  Specific  JIT
168                   compile options can be selected by following -s+ with a digit                   compile options can be selected by following -s+ with a digit
169                   in  the  range 1 to 7, which selects the JIT compile modes as                   in the range 1 to 7, which selects the JIT compile  modes  as
170                   follows:                   follows:
171    
172                     1  normal match only                     1  normal match only
# Line 173  COMMAND LINE OPTIONS Line 176  COMMAND LINE OPTIONS
176                     6  soft and hard partial match                     6  soft and hard partial match
177                     7  all three modes (default)                     7  all three modes (default)
178    
179                   If -s++ is used instead of -s+ (with or without  a  following                   If  -s++  is used instead of -s+ (with or without a following
180                   digit),  the  text  "(JIT)" is added to the first output line                   digit), the text "(JIT)" is added to the  first  output  line
181                   after a match or no match when JIT-compiled code was actually                   after a match or no match when JIT-compiled code was actually
182                   used.                   used.
183    
184                   Note  that  there  are  pattern options that can override -s,                   Note that there are pattern options  that  can  override  -s,
185                   either specifying no studying at all, or suppressing JIT com-                   either specifying no studying at all, or suppressing JIT com-
186                   pilation.                   pilation.
187    
188                   If  the  /I  or /D option is present on a pattern (requesting                   If the /I or /D option is present on  a  pattern  (requesting
189                   output about the compiled  pattern),  information  about  the                   output  about  the  compiled  pattern), information about the
190                   result  of  studying  is not included when studying is caused                   result of studying is not included when  studying  is  caused
191                   only by -s and neither -i nor -d is present  on  the  command                   only  by  -s  and neither -i nor -d is present on the command
192                   line.  This  behaviour  means that the output from tests that                   line. This behaviour means that the output  from  tests  that
193                   are run with and without -s should be identical, except  when                   are  run with and without -s should be identical, except when
194                   options that output information about the actual running of a                   options that output information about the actual running of a
195                   match are set.                   match are set.
196    
197                   The -M, -t, and -tm options,  which  give  information  about                   The  -M,  -t,  and  -tm options, which give information about
198                   resources  used,  are likely to produce different output with                   resources used, are likely to produce different  output  with
199                   and without -s. Output may also differ if the  /C  option  is                   and  without  -s.  Output may also differ if the /C option is
200                   present on an individual pattern. This uses callouts to trace                   present on an individual pattern. This uses callouts to trace
201                   the the matching process, and this may be  different  between                   the  the  matching process, and this may be different between
202                   studied  and  non-studied  patterns.  If the pattern contains                   studied and non-studied patterns.  If  the  pattern  contains
203                   (*MARK) items there may also be  differences,  for  the  same                   (*MARK)  items  there  may  also be differences, for the same
204                   reason. The -s command line option can be overridden for spe-                   reason. The -s command line option can be overridden for spe-
205                   cific patterns that should never be studied (see the /S  pat-                   cific  patterns that should never be studied (see the /S pat-
206                   tern modifier below).                   tern modifier below).
207    
208         -t        Run  each  compile, study, and match many times with a timer,         -t        Run each compile, study, and match many times with  a  timer,
209                   and output resulting time per compile or match (in  millisec-                   and  output  the resulting times per compile, study, or match
210                   onds).  Do  not set -m with -t, because you will then get the                   (in milliseconds). Do not set -m with -t,  because  you  will
211                   size output a zillion times, and  the  timing  will  be  dis-                   then get the size output a zillion times, and the timing will
212                   torted.  You  can  control  the number of iterations that are                   be distorted. You can control the number of  iterations  that
213                   used for timing by following -t with a number (as a  separate                   are used for timing by following -t with a number (as a sepa-
214                   item on the command line). For example, "-t 1000" would iter-                   rate item on the command line). For example, "-t 1000"  iter-
215                   ate 1000 times. The default is to iterate 500000 times.                   ates 1000 times.  The default is to iterate 500000 times.
216    
217         -tm       This is like -t except that it times only the matching phase,         -tm       This is like -t except that it times only the matching phase,
218                   not the compile or study phases.                   not the compile or study phases.
219    
220           -T -TM    These behave like -t and -tm, but in addition, at the end  of
221                     a run, the total times for all compiles, studies, and matches
222                     are output.
223    
224    
225  DESCRIPTION  DESCRIPTION
226    
227         If  pcretest  is  given two filename arguments, it reads from the first         If pcretest is given two filename arguments, it reads  from  the  first
228         and writes to the second. If it is given only one filename argument, it         and writes to the second. If it is given only one filename argument, it
229         reads  from  that  file  and writes to stdout. Otherwise, it reads from         reads from that file and writes to stdout.  Otherwise,  it  reads  from
230         stdin and writes to stdout, and prompts for each line of  input,  using         stdin  and  writes to stdout, and prompts for each line of input, using
231         "re>" to prompt for regular expressions, and "data>" to prompt for data         "re>" to prompt for regular expressions, and "data>" to prompt for data
232         lines.         lines.
233    
234         When pcretest is built, a configuration  option  can  specify  that  it         When  pcretest  is  built,  a  configuration option can specify that it
235         should  be  linked  with the libreadline library. When this is done, if         should be linked with the libreadline library. When this  is  done,  if
236         the input is from a terminal, it is read using the readline() function.         the input is from a terminal, it is read using the readline() function.
237         This  provides line-editing and history facilities. The output from the         This provides line-editing and history facilities. The output from  the
238         -help option states whether or not readline() will be used.         -help option states whether or not readline() will be used.
239    
240         The program handles any number of sets of input on a single input file.         The program handles any number of sets of input on a single input file.
241         Each  set starts with a regular expression, and continues with any num-         Each set starts with a regular expression, and continues with any  num-
242         ber of data lines to be matched against the pattern.         ber of data lines to be matched against that pattern.
243    
244         Each data line is matched separately and independently. If you want  to         Each  data line is matched separately and independently. If you want to
245         do multi-line matches, you have to use the \n escape sequence (or \r or         do multi-line matches, you have to use the \n escape sequence (or \r or
246         \r\n, etc., depending on the newline setting) in a single line of input         \r\n, etc., depending on the newline setting) in a single line of input
247         to  encode  the  newline  sequences. There is no limit on the length of         to encode the newline sequences. There is no limit  on  the  length  of
248         data lines; the input buffer is automatically extended  if  it  is  too         data  lines;  the  input  buffer is automatically extended if it is too
249         small.         small.
250    
251         An  empty  line signals the end of the data lines, at which point a new         An empty line signals the end of the data lines, at which point  a  new
252         regular expression is read. The regular expressions are given  enclosed         regular  expression is read. The regular expressions are given enclosed
253         in any non-alphanumeric delimiters other than backslash, for example:         in any non-alphanumeric delimiters other than backslash, for example:
254    
255           /(a|bc)x+yz/           /(a|bc)x+yz/
256    
257         White  space before the initial delimiter is ignored. A regular expres-         White space before the initial delimiter is ignored. A regular  expres-
258         sion may be continued over several input lines, in which case the  new-         sion  may be continued over several input lines, in which case the new-
259         line  characters  are included within it. It is possible to include the         line characters are included within it. It is possible to  include  the
260         delimiter within the pattern by escaping it, for example         delimiter within the pattern by escaping it, for example
261    
262           /abc\/def/           /abc\/def/
263    
264         If you do so, the escape and the delimiter form part  of  the  pattern,         If  you  do  so, the escape and the delimiter form part of the pattern,
265         but  since delimiters are always non-alphanumeric, this does not affect         but since delimiters are always non-alphanumeric, this does not  affect
266         its interpretation.  If the terminating delimiter is  immediately  fol-         its  interpretation.   If the terminating delimiter is immediately fol-
267         lowed by a backslash, for example,         lowed by a backslash, for example,
268    
269           /abc/\           /abc/\
270    
271         then  a  backslash  is added to the end of the pattern. This is done to         then a backslash is added to the end of the pattern. This  is  done  to
272         provide a way of testing the error condition that arises if  a  pattern         provide  a  way of testing the error condition that arises if a pattern
273         finishes with a backslash, because         finishes with a backslash, because
274    
275           /abc\/           /abc\/
276    
277         is  interpreted as the first line of a pattern that starts with "abc/",         is interpreted as the first line of a pattern that starts with  "abc/",
278         causing pcretest to read the next line as a continuation of the regular         causing pcretest to read the next line as a continuation of the regular
279         expression.         expression.
280    
281    
282  PATTERN MODIFIERS  PATTERN MODIFIERS
283    
284         A  pattern may be followed by any number of modifiers, which are mostly         A pattern may be followed by any number of modifiers, which are  mostly
285         single characters, though some of these can  be  qualified  by  further         single  characters,  though  some  of these can be qualified by further
286         characters.   Following Perl usage, these are referred to below as, for         characters.  Following Perl usage, these are referred to below as,  for
287         example, "the /i modifier", even though the delimiter  of  the  pattern         example,  "the  /i  modifier", even though the delimiter of the pattern
288         need  not  always  be  a slash, and no slash is used when writing modi-         need not always be a slash, and no slash is  used  when  writing  modi-
289         fiers. White space may appear between the final pattern  delimiter  and         fiers.  White  space may appear between the final pattern delimiter and
290         the  first  modifier,  and between the modifiers themselves. For refer-         the first modifier, and between the modifiers  themselves.  For  refer-
291         ence, here is a complete list of  modifiers.  They  fall  into  several         ence,  here  is  a  complete  list of modifiers. They fall into several
292         groups that are described in detail in the following sections.         groups that are described in detail in the following sections.
293    
294           /8              set UTF mode           /8              set UTF mode
# Line 307  PATTERN MODIFIERS Line 314  PATTERN MODIFIERS
314           /M              show compiled memory size           /M              show compiled memory size
315           /m              set PCRE_MULTILINE           /m              set PCRE_MULTILINE
316           /N              set PCRE_NO_AUTO_CAPTURE           /N              set PCRE_NO_AUTO_CAPTURE
317             /O              set PCRE_NO_AUTO_POSSESS
318           /P              use the POSIX wrapper           /P              use the POSIX wrapper
319           /S              study the pattern after compilation           /S              study the pattern after compilation
320           /s              set PCRE_DOTALL           /s              set PCRE_DOTALL
# Line 331  PATTERN MODIFIERS Line 339  PATTERN MODIFIERS
339     Perl-compatible modifiers     Perl-compatible modifiers
340    
341         The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,         The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,
342         PCRE_DOTALL,   or    PCRE_EXTENDED    options,    respectively,    when         PCRE_DOTALL,    or    PCRE_EXTENDED    options,    respectively,   when
343         pcre[16|32]_compile()  is  called. These four modifier letters have the         pcre[16|32]_compile() is called. These four modifier letters  have  the
344         same effect as they do in Perl. For example:         same effect as they do in Perl. For example:
345    
346           /caseless/i           /caseless/i
# Line 340  PATTERN MODIFIERS Line 348  PATTERN MODIFIERS
348    
349     Modifiers for other PCRE options     Modifiers for other PCRE options
350    
351         The following table shows additional modifiers for  setting  PCRE  com-         The  following  table  shows additional modifiers for setting PCRE com-
352         pile-time options that do not correspond to anything in Perl:         pile-time options that do not correspond to anything in Perl:
353    
354           /8              PCRE_UTF8           ) when using the 8-bit           /8              PCRE_UTF8           ) when using the 8-bit
# Line 359  PATTERN MODIFIERS Line 367  PATTERN MODIFIERS
367           /f              PCRE_FIRSTLINE           /f              PCRE_FIRSTLINE
368           /J              PCRE_DUPNAMES           /J              PCRE_DUPNAMES
369           /N              PCRE_NO_AUTO_CAPTURE           /N              PCRE_NO_AUTO_CAPTURE
370             /O              PCRE_NO_AUTO_POSSESS
371           /U              PCRE_UNGREEDY           /U              PCRE_UNGREEDY
372           /W              PCRE_UCP           /W              PCRE_UCP
373           /X              PCRE_EXTRA           /X              PCRE_EXTRA
# Line 372  PATTERN MODIFIERS Line 381  PATTERN MODIFIERS
381           /<bsr_unicode>  PCRE_BSR_UNICODE           /<bsr_unicode>  PCRE_BSR_UNICODE
382           /<JS>           PCRE_JAVASCRIPT_COMPAT           /<JS>           PCRE_JAVASCRIPT_COMPAT
383    
384         The  modifiers  that are enclosed in angle brackets are literal strings         The modifiers that are enclosed in angle brackets are  literal  strings
385         as shown, including the angle brackets, but the letters within  can  be         as  shown,  including the angle brackets, but the letters within can be
386         in  either case.  This example sets multiline matching with CRLF as the         in either case.  This example sets multiline matching with CRLF as  the
387         line ending sequence:         line ending sequence:
388    
389           /^abc/m<CRLF>           /^abc/m<CRLF>
390    
391         As well as turning on  the  PCRE_UTF8/16/32  option,  the  /8  modifier         As  well  as  turning  on  the  PCRE_UTF8/16/32 option, the /8 modifier
392         causes  all  non-printing  characters  in  output strings to be printed         causes all non-printing characters in  output  strings  to  be  printed
393         using the \x{hh...} notation. Otherwise, those less than 0x100 are out-         using the \x{hh...} notation. Otherwise, those less than 0x100 are out-
394         put in hex without the curly brackets.         put in hex without the curly brackets.
395    
396         Full  details  of  the PCRE options are given in the pcreapi documenta-         Full details of the PCRE options are given in  the  pcreapi  documenta-
397         tion.         tion.
398    
399     Finding all matches in a string     Finding all matches in a string
400    
401         Searching for all possible matches within each subject  string  can  be         Searching  for  all  possible matches within each subject string can be
402         requested  by  the  /g  or  /G modifier. After finding a match, PCRE is         requested by the /g or /G modifier. After  finding  a  match,  PCRE  is
403         called again to search the remainder of the subject string. The differ-         called again to search the remainder of the subject string. The differ-
404         ence between /g and /G is that the former uses the startoffset argument         ence between /g and /G is that the former uses the startoffset argument
405         to pcre[16|32]_exec() to start searching at  a  new  point  within  the         to  pcre[16|32]_exec()  to  start  searching  at a new point within the
406         entire  string  (which is in effect what Perl does), whereas the latter         entire string (which is in effect what Perl does), whereas  the  latter
407         passes over a shortened substring.  This  makes  a  difference  to  the         passes  over  a  shortened  substring.  This  makes a difference to the
408         matching  process  if  the  pattern  begins with a lookbehind assertion         matching process if the pattern  begins  with  a  lookbehind  assertion
409         (including \b or \B).         (including \b or \B).
410    
411         If any call to pcre[16|32]_exec() in a /g or  /G  sequence  matches  an         If  any  call  to  pcre[16|32]_exec() in a /g or /G sequence matches an
412         empty  string, the next call is done with the PCRE_NOTEMPTY_ATSTART and         empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART  and
413         PCRE_ANCHORED flags set in order  to  search  for  another,  non-empty,         PCRE_ANCHORED  flags  set  in  order  to search for another, non-empty,
414         match  at  the same point. If this second match fails, the start offset         match at the same point. If this second match fails, the  start  offset
415         is advanced, and the normal match is retried.  This  imitates  the  way         is  advanced,  and  the  normal match is retried. This imitates the way
416         Perl handles such cases when using the /g modifier or the split() func-         Perl handles such cases when using the /g modifier or the split() func-
417         tion. Normally, the start offset is advanced by one character,  but  if         tion.  Normally,  the start offset is advanced by one character, but if
418         the  newline  convention  recognizes CRLF as a newline, and the current         the newline convention recognizes CRLF as a newline,  and  the  current
419         character is CR followed by LF, an advance of two is used.         character is CR followed by LF, an advance of two is used.
420    
421     Other modifiers     Other modifiers
422    
423         There are yet more modifiers for controlling the way pcretest operates.         There are yet more modifiers for controlling the way pcretest operates.
424    
425         The /+ modifier requests that as well as outputting the substring  that         The  /+ modifier requests that as well as outputting the substring that
426         matched  the  entire  pattern,  pcretest  should in addition output the         matched the entire pattern, pcretest  should  in  addition  output  the
427         remainder of the subject string. This is useful  for  tests  where  the         remainder  of  the  subject  string. This is useful for tests where the
428         subject  contains multiple copies of the same substring. If the + modi-         subject contains multiple copies of the same substring. If the +  modi-
429         fier appears twice, the same action is taken for  captured  substrings.         fier  appears  twice, the same action is taken for captured substrings.
430         In  each case the remainder is output on the following line with a plus         In each case the remainder is output on the following line with a  plus
431         character following the capture number. Note that  this  modifier  must         character  following  the  capture number. Note that this modifier must
432         not  immediately follow the /S modifier because /S+ and /S++ have other         not immediately follow the /S modifier because /S+ and /S++ have  other
433         meanings.         meanings.
434    
435         The /= modifier requests that the  values  of  all  potential  captured         The  /=  modifier  requests  that  the values of all potential captured
436         parentheses  be  output after a match. By default, only those up to the         parentheses be output after a match. By default, only those up  to  the
437         highest one actually used in the match are output (corresponding to the         highest one actually used in the match are output (corresponding to the
438         return code from pcre[16|32]_exec()). Values in the offsets vector cor-         return code from pcre[16|32]_exec()). Values in the offsets vector cor-
439         responding to higher numbers should be set to -1, and these are  output         responding  to higher numbers should be set to -1, and these are output
440         as  "<unset>".  This modifier gives a way of checking that this is hap-         as "<unset>". This modifier gives a way of checking that this  is  hap-
441         pening.         pening.
442    
443         The /B modifier is a debugging feature. It requests that pcretest  out-         The  /B modifier is a debugging feature. It requests that pcretest out-
444         put  a  representation of the compiled code after compilation. Normally         put a representation of the compiled code after  compilation.  Normally
445         this information contains length and offset values; however, if  /Z  is         this  information  contains length and offset values; however, if /Z is
446         also  present,  this data is replaced by spaces. This is a special fea-         also present, this data is replaced by spaces. This is a  special  fea-
447         ture for use in the automatic test scripts; it ensures  that  the  same         ture  for  use  in the automatic test scripts; it ensures that the same
448         output is generated for different internal link sizes.         output is generated for different internal link sizes.
449    
450         The  /D modifier is a PCRE debugging feature, and is equivalent to /BI,         The /D modifier is a PCRE debugging feature, and is equivalent to  /BI,
451         that is, both the /B and the /I modifiers.         that is, both the /B and the /I modifiers.
452    
453         The /F modifier causes pcretest to flip the byte order  of  the  2-byte         The  /F  modifier  causes pcretest to flip the byte order of the 2-byte
454         and 4-byte fields in the compiled pattern. This facility is for testing         and 4-byte fields in the compiled pattern. This facility is for testing
455         the feature in PCRE that allows it to execute patterns that  were  com-         the  feature  in PCRE that allows it to execute patterns that were com-
456         piled on a host with a different endianness. This feature is not avail-         piled on a host with a different endianness. This feature is not avail-
457         able when the POSIX interface to PCRE is being used, that is, when  the         able  when the POSIX interface to PCRE is being used, that is, when the
458         /P pattern modifier is specified. See also the section about saving and         /P pattern modifier is specified. See also the section about saving and
459         reloading compiled patterns below.         reloading compiled patterns below.
460    
461         The /I modifier requests that pcretest  output  information  about  the         The  /I  modifier  requests  that pcretest output information about the
462         compiled  pattern (whether it is anchored, has a fixed first character,         compiled pattern (whether it is anchored, has a fixed first  character,
463         and so on). It does this by calling pcre[16|32]_fullinfo()  after  com-         and  so  on). It does this by calling pcre[16|32]_fullinfo() after com-
464         piling  a  pattern.  If the pattern is studied, the results of that are         piling a pattern. If the pattern is studied, the results  of  that  are
465         also output.         also output.
466    
467         The /K modifier requests pcretest to show names from backtracking  con-         The  /K modifier requests pcretest to show names from backtracking con-
468         trol  verbs  that  are  returned  from  calls to pcre[16|32]_exec(). It         trol verbs that are  returned  from  calls  to  pcre[16|32]_exec().  It
469         causes pcretest to create a pcre[16|32]_extra  block  if  one  has  not         causes  pcretest  to  create  a  pcre[16|32]_extra block if one has not
470         already  been  created by a call to pcre[16|32]_study(), and to set the         already been created by a call to pcre[16|32]_study(), and to  set  the
471         PCRE_EXTRA_MARK flag and the mark field  within  it,  every  time  that         PCRE_EXTRA_MARK  flag  and  the  mark  field within it, every time that
472         pcre[16|32]_exec()  is  called.  If  the  variable  that the mark field         pcre[16|32]_exec() is called. If  the  variable  that  the  mark  field
473         points to is  non-NULL  for  a  match,  non-match,  or  partial  match,         points  to  is  non-NULL  for  a  match,  non-match,  or partial match,
474         pcretest  prints  the  string  to which it points. For a match, this is         pcretest prints the string to which it points. For  a  match,  this  is
475         shown on a line by itself, tagged with "MK:". For  a  non-match  it  is         shown  on  a  line  by itself, tagged with "MK:". For a non-match it is
476         added to the message.         added to the message.
477    
478         The  /L modifier must be followed directly by the name of a locale, for         The /L modifier must be followed directly by the name of a locale,  for
479         example,         example,
480    
481           /pattern/Lfr_FR           /pattern/Lfr_FR
482    
483         For this reason, it must be the last modifier. The given locale is set,         For this reason, it must be the last modifier. The given locale is set,
484         pcre[16|32]_maketables()  is  called to build a set of character tables         pcre[16|32]_maketables() is called to build a set of  character  tables
485         for the locale, and this is then passed to  pcre[16|32]_compile()  when         for  the  locale, and this is then passed to pcre[16|32]_compile() when
486         compiling  the regular expression. Without an /L (or /T) modifier, NULL         compiling the regular expression. Without an /L (or /T) modifier,  NULL
487         is passed as the tables pointer;  that  is,  /L  applies  only  to  the         is  passed  as  the  tables  pointer;  that  is, /L applies only to the
488         expression on which it appears.         expression on which it appears.
489    
490         The  /M  modifier  causes the size in bytes of the memory block used to         The /M modifier causes the size in bytes of the memory  block  used  to
491         hold the compiled pattern to be output. This does not include the  size         hold  the compiled pattern to be output. This does not include the size
492         of  the  pcre[16|32] block; it is just the actual compiled data. If the         of the pcre[16|32] block; it is just the actual compiled data.  If  the
493         pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option,         pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option,
494         the size of the JIT compiled code is also output.         the size of the JIT compiled code is also output.
495    
496         The  /S  modifier  causes  pcre[16|32]_study()  to  be called after the         The /S modifier causes  pcre[16|32]_study()  to  be  called  after  the
497         expression has been compiled, and the results used when the  expression         expression  has been compiled, and the results used when the expression
498         is matched. There are a number of qualifying characters that may follow         is matched. There are a number of qualifying characters that may follow
499         /S.  They may appear in any order.         /S.  They may appear in any order.
500    
501         If S is followed by an exclamation mark, pcre[16|32]_study() is  called         If /S is followed by an exclamation mark, pcre[16|32]_study() is called
502         with  the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a         with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return  a
503         pcre_extra block, even when studying discovers no useful information.         pcre_extra block, even when studying discovers no useful information.
504    
505         If /S is followed by a second S character, it suppresses studying, even         If /S is followed by a second S character, it suppresses studying, even
506         if  it  was  requested  externally  by the -s command line option. This         if it was requested externally by the  -s  command  line  option.  This
507         makes it possible to specify that certain patterns are always  studied,         makes  it possible to specify that certain patterns are always studied,
508         and others are never studied, independently of -s. This feature is used         and others are never studied, independently of -s. This feature is used
509         in the test files in a few cases where the output is different when the         in the test files in a few cases where the output is different when the
510         pattern is studied.         pattern is studied.
511    
512         If  the  /S  modifier  is  followed  by  a  +  character,  the  call to         If the  /S  modifier  is  followed  by  a  +  character,  the  call  to
513         pcre[16|32]_study() is made with all the JIT study options,  requesting         pcre[16|32]_study()  is made with all the JIT study options, requesting
514         just-in-time  optimization  support if it is available, for both normal         just-in-time optimization support if it is available, for  both  normal
515         and partial matching. If you want to restrict the JIT compiling  modes,         and  partial matching. If you want to restrict the JIT compiling modes,
516         you can follow /S+ with a digit in the range 1 to 7:         you can follow /S+ with a digit in the range 1 to 7:
517    
518           1  normal match only           1  normal match only
# Line 514  PATTERN MODIFIERS Line 523  PATTERN MODIFIERS
523           7  all three modes (default)           7  all three modes (default)
524    
525         If /S++ is used instead of /S+ (with or without a following digit), the         If /S++ is used instead of /S+ (with or without a following digit), the
526         text "(JIT)" is added to the first output line  after  a  match  or  no         text  "(JIT)"  is  added  to  the first output line after a match or no
527         match when JIT-compiled code was actually used.         match when JIT-compiled code was actually used.
528    
529         Note  that  there  is  also  an independent /+ modifier; it must not be         Note that there is also an independent /+  modifier;  it  must  not  be
530         given immediately after /S or /S+ because this will be misinterpreted.         given immediately after /S or /S+ because this will be misinterpreted.
531    
532         If JIT studying is successful, the compiled JIT code will automatically         If JIT studying is successful, the compiled JIT code will automatically
533         be  used  when pcre[16|32]_exec() is run, except when incompatible run-         be used when pcre[16|32]_exec() is run, except when  incompatible  run-
534         time options are specified. For more details, see the pcrejit  documen-         time  options are specified. For more details, see the pcrejit documen-
535         tation.  See also the \J escape sequence below for a way of setting the         tation. See also the \J escape sequence below for a way of setting  the
536         size of the JIT stack.         size of the JIT stack.
537    
538         Finally, if /S is followed by a minus  character,  JIT  compilation  is         Finally,  if  /S  is  followed by a minus character, JIT compilation is
539         suppressed,  even if it was requested externally by the -s command line         suppressed, even if it was requested externally by the -s command  line
540         option. This makes it possible to specify that JIT is never to be  used         option.  This makes it possible to specify that JIT is never to be used
541         for certain patterns.         for certain patterns.
542    
543         The  /T  modifier  must be followed by a single digit. It causes a spe-         The /T modifier must be followed by a single digit. It  causes  a  spe-
544         cific set of built-in character tables to be passed to pcre[16|32]_com-         cific set of built-in character tables to be passed to pcre[16|32]_com-
545         pile().  It  is used in the standard PCRE tests to check behaviour with         pile(). It is used in the standard PCRE tests to check  behaviour  with
546         different character tables. The digit specifies the tables as follows:         different character tables. The digit specifies the tables as follows:
547    
548           0   the default ASCII tables, as distributed in           0   the default ASCII tables, as distributed in
549                 pcre_chartables.c.dist                 pcre_chartables.c.dist
550           1   a set of tables defining ISO 8859 characters           1   a set of tables defining ISO 8859 characters
551    
552         In table 1, some characters whose codes are greater than 128 are  iden-         In  table 1, some characters whose codes are greater than 128 are iden-
553         tified as letters, digits, spaces, etc.         tified as letters, digits, spaces, etc.
554    
555     Using the POSIX wrapper API     Using the POSIX wrapper API
556    
557         The  /P modifier causes pcretest to call PCRE via the POSIX wrapper API         The /P modifier causes pcretest to call PCRE via the POSIX wrapper  API
558         rather than its native API. This supports only the 8-bit library.  When         rather  than its native API. This supports only the 8-bit library. When
559         /P  is set, the following modifiers set options for the regcomp() func-         /P is set, the following modifiers set options for the regcomp()  func-
560         tion:         tion:
561    
562           /i    REG_ICASE           /i    REG_ICASE
# Line 558  PATTERN MODIFIERS Line 567  PATTERN MODIFIERS
567           /W    REG_UCP        )   the POSIX standard           /W    REG_UCP        )   the POSIX standard
568           /8    REG_UTF8       )           /8    REG_UTF8       )
569    
570         The /+ modifier works as  described  above.  All  other  modifiers  are         The  /+  modifier  works  as  described  above. All other modifiers are
571         ignored.         ignored.
572    
573       Locking out certain modifiers
574    
575           PCRE can be compiled with or without support for certain features  such
576           as  UTF-8/16/32  or Unicode properties. Accordingly, the standard tests
577           are split up into a number of different files  that  are  selected  for
578           running  depending  on  which features are available. When updating the
579           tests, it is all too easy to put a new test into the wrong file by mis-
580           take;  for example, to put a test that requires UTF support into a file
581           that is used when it is not available. To help detect such mistakes  as
582           early  as  possible, there is a facility for locking out specific modi-
583           fiers. If an input line for pcretest starts with the string "< forbid "
584           the  following  sequence  of characters is taken as a list of forbidden
585           modifiers. For example, in the test files that must not use UTF or Uni-
586           code property support, this line appears:
587    
588             < forbid 8W
589    
590           This  locks out the /8 and /W modifiers. An immediate error is given if
591           they are subsequently encountered. If the character string  contains  <
592           but  not  >,  all  the  multi-character modifiers that begin with < are
593           locked out. Otherwise, such modifiers must be  explicitly  listed,  for
594           example:
595    
596             < forbid <JS><cr>
597    
598           There must be a single space between < and "forbid" for this feature to
599           be recognised. If there is not, the line is  interpreted  either  as  a
600           request  to  re-load  a pre-compiled pattern (see "SAVING AND RELOADING
601           COMPILED PATTERNS" below) or, if there is a another < character,  as  a
602           pattern that uses < as its delimiter.
603    
604    
605  DATA LINES  DATA LINES
606    
# Line 583  DATA LINES Line 623  DATA LINES
623           \v         vertical tab (\x0b)           \v         vertical tab (\x0b)
624           \nnn       octal character (up to 3 octal digits); always           \nnn       octal character (up to 3 octal digits); always
625                        a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode                        a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
626             \o{dd...}  octal character (any number of octal digits}
627           \xhh       hexadecimal byte (up to 2 hex digits)           \xhh       hexadecimal byte (up to 2 hex digits)
628           \x{hh...}  hexadecimal character (any number of hex digits)           \x{hh...}  hexadecimal character (any number of hex digits)
629           \A         pass the PCRE_ANCHORED option to pcre[16|32]_exec()           \A         pass the PCRE_ANCHORED option to pcre[16|32]_exec()
# Line 974  SAVING AND RELOADING COMPILED PATTERNS Line 1015  SAVING AND RELOADING COMPILED PATTERNS
1015         writing the file, pcretest expects to read a new pattern.         writing the file, pcretest expects to read a new pattern.
1016    
1017         A  saved  pattern  can  be reloaded into pcretest by specifying < and a         A  saved  pattern  can  be reloaded into pcretest by specifying < and a
1018         file name instead of a pattern. The name of the file must not contain a         file name instead of a pattern. There must be no space  between  <  and
1019         < character, as otherwise pcretest will interpret the line as a pattern         the  file  name,  which  must  not  contain a < character, as otherwise
1020         delimited by < characters.  For example:         pcretest will interpret the line as a pattern delimited  by  <  charac-
1021           ters. For example:
1022    
1023            re> </some/file            re> </some/file
1024           Compiled pattern loaded from /some/file           Compiled pattern loaded from /some/file
1025           No study data           No study data
1026    
1027         If the pattern was previously studied with the  JIT  optimization,  the         If  the  pattern  was previously studied with the JIT optimization, the
1028         JIT  information cannot be saved and restored, and so is lost. When the         JIT information cannot be saved and restored, and so is lost. When  the
1029         pattern has been loaded, pcretest proceeds to read data  lines  in  the         pattern  has  been  loaded, pcretest proceeds to read data lines in the
1030         usual way.         usual way.
1031    
1032         You  can copy a file written by pcretest to a different host and reload         You can copy a file written by pcretest to a different host and  reload
1033         it there, even if the new host has opposite endianness to  the  one  on         it  there,  even  if the new host has opposite endianness to the one on
1034         which  the pattern was compiled. For example, you can compile on an i86         which the pattern was compiled. For example, you can compile on an  i86
1035         machine and run on a SPARC machine. When a pattern  is  reloaded  on  a         machine  and  run  on  a SPARC machine. When a pattern is reloaded on a
1036         host with different endianness, the confirmation message is changed to:         host with different endianness, the confirmation message is changed to:
1037    
1038           Compiled pattern (byte-inverted) loaded from /some/file           Compiled pattern (byte-inverted) loaded from /some/file
1039    
1040         The test suite contains some saved pre-compiled patterns with different         The test suite contains some saved pre-compiled patterns with different
1041         endianness. These are reloaded using "<!" instead  of  just  "<".  This         endianness.  These  are  reloaded  using "<!" instead of just "<". This
1042         suppresses the "(byte-inverted)" text so that the output is the same on         suppresses the "(byte-inverted)" text so that the output is the same on
1043         all hosts. It also forces debugging output once the  pattern  has  been         all  hosts.  It  also forces debugging output once the pattern has been
1044         reloaded.         reloaded.
1045    
1046         File  names  for  saving and reloading can be absolute or relative, but         File names for saving and reloading can be absolute  or  relative,  but
1047         note that the shell facility of expanding a file name that starts  with         note  that the shell facility of expanding a file name that starts with
1048         a tilde (~) is not available.         a tilde (~) is not available.
1049    
1050         The  ability to save and reload files in pcretest is intended for test-         The ability to save and reload files in pcretest is intended for  test-
1051         ing and experimentation. It is not intended for production use  because         ing  and experimentation. It is not intended for production use because
1052         only  a  single pattern can be written to a file. Furthermore, there is         only a single pattern can be written to a file. Furthermore,  there  is
1053         no facility for supplying  custom  character  tables  for  use  with  a         no  facility  for  supplying  custom  character  tables  for use with a
1054         reloaded  pattern.  If  the  original  pattern was compiled with custom         reloaded pattern. If the original  pattern  was  compiled  with  custom
1055         tables, an attempt to match a subject string using a  reloaded  pattern         tables,  an  attempt to match a subject string using a reloaded pattern
1056         is  likely to cause pcretest to crash.  Finally, if you attempt to load         is likely to cause pcretest to crash.  Finally, if you attempt to  load
1057         a file that is not in the correct format, the result is undefined.         a file that is not in the correct format, the result is undefined.
1058    
1059    
1060  SEE ALSO  SEE ALSO
1061    
1062         pcre(3), pcre16(3),  pcre32(3),  pcreapi(3),  pcrecallout(3),  pcrejit,         pcre(3),  pcre16(3),  pcre32(3),  pcreapi(3),  pcrecallout(3), pcrejit,
1063         pcrematching(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3).         pcrematching(3), pcrepartial(d), pcrepattern(3), pcreprecompile(3).
1064    
1065    
# Line 1030  AUTHOR Line 1072  AUTHOR
1072    
1073  REVISION  REVISION
1074    
1075         Last updated: 26 April 2013         Last updated: 12 November 2013
1076         Copyright (c) 1997-2013 University of Cambridge.         Copyright (c) 1997-2013 University of Cambridge.

Legend:
Removed from v.1403  
changed lines
  Added in v.1404

  ViewVC Help
Powered by ViewVC 1.1.5