/[pcre]/code/trunk/doc/pcretest.txt
ViewVC logotype

Diff of /code/trunk/doc/pcretest.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 835 by ph10, Wed Dec 28 16:10:09 2011 UTC revision 836 by ph10, Wed Dec 28 17:16:11 2011 UTC
# Line 305  PATTERN MODIFIERS Line 305  PATTERN MODIFIERS
305         it appears.         it appears.
306    
307         The /M modifier causes the size of memory block used to hold  the  com-         The /M modifier causes the size of memory block used to hold  the  com-
308         piled pattern to be output.         piled  pattern to be output. This does not include the size of the pcre
309           block; it is just the actual compiled data. If the pattern is  success-
310         If  the  /S  modifier appears once, it causes pcre_study() to be called         fully  studied  with the PCRE_STUDY_JIT_COMPILE option, the size of the
311         after the expression has been compiled, and the results used  when  the         JIT compiled code is also output.
312         expression  is  matched.  If  /S appears twice, it suppresses studying,  
313           If the /S modifier appears once, it causes pcre_study()  to  be  called
314           after  the  expression has been compiled, and the results used when the
315           expression is matched. If /S appears  twice,  it  suppresses  studying,
316         even if it was requested externally by the -s command line option. This         even if it was requested externally by the -s command line option. This
317         makes  it possible to specify that certain patterns are always studied,         makes it possible to specify that certain patterns are always  studied,
318         and others are never studied, independently of -s. This feature is used         and others are never studied, independently of -s. This feature is used
319         in the test files in a few cases where the output is different when the         in the test files in a few cases where the output is different when the
320         pattern is studied.         pattern is studied.
321    
322         If the /S modifier is immediately followed by a + character,  the  call         If  the  /S modifier is immediately followed by a + character, the call
323         to   pcre_study()  is  made  with  the  PCRE_STUDY_JIT_COMPILE  option,         to  pcre_study()  is  made  with  the  PCRE_STUDY_JIT_COMPILE   option,
324         requesting just-in-time optimization support if it is  available.  Note         requesting  just-in-time  optimization support if it is available. Note
325         that  there  is  also  a  /+ modifier; it must not be given immediately         that there is also a /+ modifier; it  must  not  be  given  immediately
326         after /S because this will be misinterpreted. If JIT studying  is  suc-         after  /S  because this will be misinterpreted. If JIT studying is suc-
327         cessful,  it will automatically be used when pcre_exec() is run, except         cessful, it will automatically be used when pcre_exec() is run,  except
328         when incompatible run-time options are  specified.  These  include  the         when  incompatible  run-time  options  are specified. These include the
329         partial matching options; a complete list is given in the pcrejit docu-         partial matching options; a complete list is given in the pcrejit docu-
330         mentation. See also the \J escape sequence below for a way  of  setting         mentation.  See  also the \J escape sequence below for a way of setting
331         the size of the JIT stack.         the size of the JIT stack.
332    
333         The  /T  modifier  must be followed by a single digit. It causes a spe-         The /T modifier must be followed by a single digit. It  causes  a  spe-
334         cific set of built-in character tables to be passed to  pcre_compile().         cific  set of built-in character tables to be passed to pcre_compile().
335         It is used in the standard PCRE tests to check behaviour with different         It is used in the standard PCRE tests to check behaviour with different
336         character tables. The digit specifies the tables as follows:         character tables. The digit specifies the tables as follows:
337    
# Line 336  PATTERN MODIFIERS Line 339  PATTERN MODIFIERS
339                 pcre_chartables.c.dist                 pcre_chartables.c.dist
340           1   a set of tables defining ISO 8859 characters           1   a set of tables defining ISO 8859 characters
341    
342         In table 1, some characters whose codes are greater than 128 are  iden-         In  table 1, some characters whose codes are greater than 128 are iden-
343         tified as letters, digits, spaces, etc.         tified as letters, digits, spaces, etc.
344    
345     Using the POSIX wrapper API     Using the POSIX wrapper API
346    
347         The  /P modifier causes pcretest to call PCRE via the POSIX wrapper API         The /P modifier causes pcretest to call PCRE via the POSIX wrapper  API
348         rather than its native API. When /P is set, the following modifiers set         rather than its native API. When /P is set, the following modifiers set
349         options for the regcomp() function:         options for the regcomp() function:
350    
# Line 353  PATTERN MODIFIERS Line 356  PATTERN MODIFIERS
356           /W    REG_UCP        )   the POSIX standard           /W    REG_UCP        )   the POSIX standard
357           /8    REG_UTF8       )           /8    REG_UTF8       )
358    
359         The  /+  modifier  works  as  described  above. All other modifiers are         The /+ modifier works as  described  above.  All  other  modifiers  are
360         ignored.         ignored.
361    
362    
363  DATA LINES  DATA LINES
364    
365         Before each data line is passed to pcre_exec(),  leading  and  trailing         Before  each  data  line is passed to pcre_exec(), leading and trailing
366         white  space  is removed, and it is then scanned for \ escapes. Some of         white space is removed, and it is then scanned for \ escapes.  Some  of
367         these are pretty esoteric features, intended for checking out  some  of         these  are  pretty esoteric features, intended for checking out some of
368         the  more  complicated features of PCRE. If you are just testing "ordi-         the more complicated features of PCRE. If you are just  testing  "ordi-
369         nary" regular expressions, you probably don't need any  of  these.  The         nary"  regular  expressions,  you probably don't need any of these. The
370         following escapes are recognized:         following escapes are recognized:
371    
372           \a         alarm (BEL, \x07)           \a         alarm (BEL, \x07)
# Line 444  DATA LINES Line 447  DATA LINES
447           \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()           \<any>     pass the PCRE_NEWLINE_ANY option to pcre_exec()
448                        or pcre_dfa_exec()                        or pcre_dfa_exec()
449    
450         Note  that  \xhh  always  specifies  one byte, even in UTF-8 mode; this         Note that \xhh always specifies one byte,  even  in  UTF-8  mode;  this
451         makes it possible to construct invalid UTF-8 sequences for testing pur-         makes it possible to construct invalid UTF-8 sequences for testing pur-
452         poses. On the other hand, \x{hh} is interpreted as a UTF-8 character in         poses. On the other hand, \x{hh} is interpreted as a UTF-8 character in
453         UTF-8 mode, generating more than one byte if the value is greater  than         UTF-8  mode, generating more than one byte if the value is greater than
454         127. When not in UTF-8 mode, it generates one byte for values less than         127. When not in UTF-8 mode, it generates one byte for values less than
455         256, and causes an error for greater values.         256, and causes an error for greater values.
456    
457         The escapes that specify line ending  sequences  are  literal  strings,         The  escapes  that  specify  line ending sequences are literal strings,
458         exactly as shown. No more than one newline setting should be present in         exactly as shown. No more than one newline setting should be present in
459         any data line.         any data line.
460    
461         A backslash followed by anything else just escapes the  anything  else.         A  backslash  followed by anything else just escapes the anything else.
462         If  the very last character is a backslash, it is ignored. This gives a         If the very last character is a backslash, it is ignored. This gives  a
463         way of passing an empty line as data, since a real  empty  line  termi-         way  of  passing  an empty line as data, since a real empty line termi-
464         nates the data input.         nates the data input.
465    
466         The  \J escape provides a way of setting the maximum stack size that is         The \J escape provides a way of setting the maximum stack size that  is
467         used by the just-in-time optimization code. It is ignored if JIT  opti-         used  by the just-in-time optimization code. It is ignored if JIT opti-
468         mization  is  not being used. Providing a stack that is larger than the         mization is not being used. Providing a stack that is larger  than  the
469         default 32K is necessary only for very complicated patterns.         default 32K is necessary only for very complicated patterns.
470    
471         If \M is present, pcretest calls pcre_exec() several times,  with  dif-         If  \M  is present, pcretest calls pcre_exec() several times, with dif-
472         ferent  values  in  the match_limit and match_limit_recursion fields of         ferent values in the match_limit and  match_limit_recursion  fields  of
473         the pcre_extra data structure, until it finds the minimum  numbers  for         the  pcre_extra  data structure, until it finds the minimum numbers for
474         each  parameter  that  allow  pcre_exec()  to  complete  without error.         each parameter  that  allow  pcre_exec()  to  complete  without  error.
475         Because this is testing a specific feature of the  normal  interpretive         Because  this  is testing a specific feature of the normal interpretive
476         pcre_exec()  execution, the use of any JIT optimization that might have         pcre_exec() execution, the use of any JIT optimization that might  have
477         been set up by the /S+ qualifier of -s+ option is disabled.         been set up by the /S+ qualifier of -s+ option is disabled.
478    
479         The match_limit number is a measure of the amount of backtracking  that         The  match_limit number is a measure of the amount of backtracking that
480         takes  place,  and  checking it out can be instructive. For most simple         takes place, and checking it out can be instructive.  For  most  simple
481         matches, the number is quite small, but for patterns  with  very  large         matches,  the  number  is quite small, but for patterns with very large
482         numbers  of  matching  possibilities,  it can become large very quickly         numbers of matching possibilities, it can  become  large  very  quickly
483         with increasing length of  subject  string.  The  match_limit_recursion         with  increasing  length  of  subject string. The match_limit_recursion
484         number  is  a  measure  of how much stack (or, if PCRE is compiled with         number is a measure of how much stack (or, if  PCRE  is  compiled  with
485         NO_RECURSE, how much heap) memory  is  needed  to  complete  the  match         NO_RECURSE,  how  much  heap)  memory  is  needed to complete the match
486         attempt.         attempt.
487    
488         When  \O  is  used, the value specified may be higher or lower than the         When \O is used, the value specified may be higher or  lower  than  the
489         size set by the -O command line option (or defaulted to 45); \O applies         size set by the -O command line option (or defaulted to 45); \O applies
490         only to the call of pcre_exec() for the line in which it appears.         only to the call of pcre_exec() for the line in which it appears.
491    
492         If  the /P modifier was present on the pattern, causing the POSIX wrap-         If the /P modifier was present on the pattern, causing the POSIX  wrap-
493         per API to be used, the only option-setting  sequences  that  have  any         per  API  to  be  used, the only option-setting sequences that have any
494         effect  are  \B,  \N,  and  \Z,  causing  REG_NOTBOL, REG_NOTEMPTY, and         effect are \B,  \N,  and  \Z,  causing  REG_NOTBOL,  REG_NOTEMPTY,  and
495         REG_NOTEOL, respectively, to be passed to regexec().         REG_NOTEOL, respectively, to be passed to regexec().
496    
497         The use of \x{hh...} to represent UTF-8 characters is not dependent  on         The  use of \x{hh...} to represent UTF-8 characters is not dependent on
498         the  use  of  the  /8 modifier on the pattern. It is recognized always.         the use of the /8 modifier on the pattern.  It  is  recognized  always.
499         There may be any number of hexadecimal digits inside  the  braces.  The         There  may  be  any number of hexadecimal digits inside the braces. The
500         result  is  from  one  to  six bytes, encoded according to the original         result is from one to six bytes,  encoded  according  to  the  original
501         UTF-8 rules of RFC 2279. This allows for  values  in  the  range  0  to         UTF-8  rules  of  RFC  2279.  This  allows for values in the range 0 to
502         0x7FFFFFFF.  Note  that not all of those are valid Unicode code points,         0x7FFFFFFF. Note that not all of those are valid Unicode  code  points,
503         or indeed valid UTF-8 characters according to the later  rules  in  RFC         or  indeed  valid  UTF-8 characters according to the later rules in RFC
504         3629.         3629.
505    
506    
507  THE ALTERNATIVE MATCHING FUNCTION  THE ALTERNATIVE MATCHING FUNCTION
508    
509         By   default,  pcretest  uses  the  standard  PCRE  matching  function,         By  default,  pcretest  uses  the  standard  PCRE  matching   function,
510         pcre_exec() to match each data line. From release 6.0, PCRE supports an         pcre_exec() to match each data line. From release 6.0, PCRE supports an
511         alternative  matching  function,  pcre_dfa_test(),  which operates in a         alternative matching function, pcre_dfa_test(),  which  operates  in  a
512         different way, and has some restrictions. The differences  between  the         different  way,  and has some restrictions. The differences between the
513         two functions are described in the pcrematching documentation.         two functions are described in the pcrematching documentation.
514    
515         If  a data line contains the \D escape sequence, or if the command line         If a data line contains the \D escape sequence, or if the command  line
516         contains the -dfa option, the alternative matching function is  called.         contains  the -dfa option, the alternative matching function is called.
517         This function finds all possible matches at a given point. If, however,         This function finds all possible matches at a given point. If, however,
518         the \F escape sequence is present in the data line, it stops after  the         the  \F escape sequence is present in the data line, it stops after the
519         first match is found. This is always the shortest possible match.         first match is found. This is always the shortest possible match.
520    
521    
522  DEFAULT OUTPUT FROM PCRETEST  DEFAULT OUTPUT FROM PCRETEST
523    
524         This  section  describes  the output when the normal matching function,         This section describes the output when the  normal  matching  function,
525         pcre_exec(), is being used.         pcre_exec(), is being used.
526    
527         When a match succeeds, pcretest outputs the list of captured substrings         When a match succeeds, pcretest outputs the list of captured substrings
528         that  pcre_exec()  returns,  starting with number 0 for the string that         that pcre_exec() returns, starting with number 0 for  the  string  that
529         matched the whole pattern. Otherwise, it outputs "No  match"  when  the         matched  the  whole  pattern. Otherwise, it outputs "No match" when the
530         return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the par-         return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the par-
531         tially matching substring when pcre_exec() returns  PCRE_ERROR_PARTIAL.         tially  matching substring when pcre_exec() returns PCRE_ERROR_PARTIAL.
532         (Note  that  this is the entire substring that was inspected during the         (Note that this is the entire substring that was inspected  during  the
533         partial match; it may include characters before the actual match  start         partial  match; it may include characters before the actual match start
534         if  a  lookbehind assertion, \K, \b, or \B was involved.) For any other         if a lookbehind assertion, \K, \b, or \B was involved.) For  any  other
535         return, pcretest outputs the PCRE negative error  number  and  a  short         return,  pcretest  outputs  the  PCRE negative error number and a short
536         descriptive  phrase.  If  the error is a failed UTF-8 string check, the         descriptive phrase. If the error is a failed UTF-8  string  check,  the
537         byte offset of the start of the failing character and the  reason  code         byte  offset  of the start of the failing character and the reason code
538         are  also  output,  provided  that  the size of the output vector is at         are also output, provided that the size of  the  output  vector  is  at
539         least two. Here is an example of an interactive pcretest run.         least two. Here is an example of an interactive pcretest run.
540    
541           $ pcretest           $ pcretest
# Line 547  DEFAULT OUTPUT FROM PCRETEST Line 550  DEFAULT OUTPUT FROM PCRETEST
550    
551         Unset capturing substrings that are not followed by one that is set are         Unset capturing substrings that are not followed by one that is set are
552         not returned by pcre_exec(), and are not shown by pcretest. In the fol-         not returned by pcre_exec(), and are not shown by pcretest. In the fol-
553         lowing example, there are two capturing substrings, but when the  first         lowing  example, there are two capturing substrings, but when the first
554         data  line  is  matched,  the  second, unset substring is not shown. An         data line is matched, the second, unset  substring  is  not  shown.  An
555         "internal" unset substring is shown as "<unset>",  as  for  the  second         "internal"  unset  substring  is  shown as "<unset>", as for the second
556         data line.         data line.
557    
558             re> /(a)|(b)/             re> /(a)|(b)/
# Line 561  DEFAULT OUTPUT FROM PCRETEST Line 564  DEFAULT OUTPUT FROM PCRETEST
564            1: <unset>            1: <unset>
565            2: b            2: b
566    
567         If  the strings contain any non-printing characters, they are output as         If the strings contain any non-printing characters, they are output  as
568         \0x escapes, or as \x{...} escapes if the /8 modifier  was  present  on         \0x  escapes,  or  as \x{...} escapes if the /8 modifier was present on
569         the  pattern.  See below for the definition of non-printing characters.         the pattern. See below for the definition of  non-printing  characters.
570         If the pattern has the /+ modifier, the output for substring 0 is  fol-         If  the pattern has the /+ modifier, the output for substring 0 is fol-
571         lowed  by  the  the rest of the subject string, identified by "0+" like         lowed by the the rest of the subject string, identified  by  "0+"  like
572         this:         this:
573    
574             re> /cat/+             re> /cat/+
# Line 573  DEFAULT OUTPUT FROM PCRETEST Line 576  DEFAULT OUTPUT FROM PCRETEST
576            0: cat            0: cat
577            0+ aract            0+ aract
578    
579         If the pattern has the /g or /G modifier,  the  results  of  successive         If  the  pattern  has  the /g or /G modifier, the results of successive
580         matching attempts are output in sequence, like this:         matching attempts are output in sequence, like this:
581    
582             re> /\Bi(\w\w)/g             re> /\Bi(\w\w)/g
# Line 585  DEFAULT OUTPUT FROM PCRETEST Line 588  DEFAULT OUTPUT FROM PCRETEST
588            0: ipp            0: ipp
589            1: pp            1: pp
590    
591         "No  match" is output only if the first match attempt fails. Here is an         "No match" is output only if the first match attempt fails. Here is  an
592         example of a failure message (the offset 4 that is specified by \>4  is         example  of a failure message (the offset 4 that is specified by \>4 is
593         past the end of the subject string):         past the end of the subject string):
594    
595             re> /xyz/             re> /xyz/
596           data> xyz\>4           data> xyz\>4
597           Error -24 (bad offset value)           Error -24 (bad offset value)
598    
599         If  any  of the sequences \C, \G, or \L are present in a data line that         If any of the sequences \C, \G, or \L are present in a data  line  that
600         is successfully matched, the substrings extracted  by  the  convenience         is  successfully  matched,  the substrings extracted by the convenience
601         functions are output with C, G, or L after the string number instead of         functions are output with C, G, or L after the string number instead of
602         a colon. This is in addition to the normal full list. The string length         a colon. This is in addition to the normal full list. The string length
603         (that  is,  the return from the extraction function) is given in paren-         (that is, the return from the extraction function) is given  in  paren-
604         theses after each string for \C and \G.         theses after each string for \C and \G.
605    
606         Note that whereas patterns can be continued over several lines (a plain         Note that whereas patterns can be continued over several lines (a plain
607         ">" prompt is used for continuations), data lines may not. However new-         ">" prompt is used for continuations), data lines may not. However new-
608         lines can be included in data by means of the \n escape (or  \r,  \r\n,         lines  can  be included in data by means of the \n escape (or \r, \r\n,
609         etc., depending on the newline sequence setting).         etc., depending on the newline sequence setting).
610    
611    
612  OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION  OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
613    
614         When  the  alternative  matching function, pcre_dfa_exec(), is used (by         When the alternative matching function, pcre_dfa_exec(),  is  used  (by
615         means of the \D escape sequence or the -dfa command line  option),  the         means  of  the \D escape sequence or the -dfa command line option), the
616         output  consists  of  a list of all the matches that start at the first         output consists of a list of all the matches that start  at  the  first
617         point in the subject where there is at least one match. For example:         point in the subject where there is at least one match. For example:
618    
619             re> /(tang|tangerine|tan)/             re> /(tang|tangerine|tan)/
# Line 619  OUTPUT FROM THE ALTERNATIVE MATCHING FUN Line 622  OUTPUT FROM THE ALTERNATIVE MATCHING FUN
622            1: tang            1: tang
623            2: tan            2: tan
624    
625         (Using the normal matching function on this data  finds  only  "tang".)         (Using  the  normal  matching function on this data finds only "tang".)
626         The  longest matching string is always given first (and numbered zero).         The longest matching string is always given first (and numbered  zero).
627         After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol-         After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol-
628         lowed  by  the  partially  matching  substring.  (Note that this is the         lowed by the partially matching  substring.  (Note  that  this  is  the
629         entire substring that was inspected during the partial  match;  it  may         entire  substring  that  was inspected during the partial match; it may
630         include characters before the actual match start if a lookbehind asser-         include characters before the actual match start if a lookbehind asser-
631         tion, \K, \b, or \B was involved.)         tion, \K, \b, or \B was involved.)
632    
# Line 639  OUTPUT FROM THE ALTERNATIVE MATCHING FUN Line 642  OUTPUT FROM THE ALTERNATIVE MATCHING FUN
642            1: tan            1: tan
643            0: tan            0: tan
644    
645         Since  the  matching  function  does not support substring capture, the         Since the matching function does not  support  substring  capture,  the
646         escape sequences that are concerned with captured  substrings  are  not         escape  sequences  that  are concerned with captured substrings are not
647         relevant.         relevant.
648    
649    
650  RESTARTING AFTER A PARTIAL MATCH  RESTARTING AFTER A PARTIAL MATCH
651    
652         When the alternative matching function has given the PCRE_ERROR_PARTIAL         When the alternative matching function has given the PCRE_ERROR_PARTIAL
653         return, indicating that the subject partially matched the pattern,  you         return,  indicating that the subject partially matched the pattern, you
654         can  restart  the match with additional subject data by means of the \R         can restart the match with additional subject data by means of  the  \R
655         escape sequence. For example:         escape sequence. For example:
656    
657             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/             re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
# Line 657  RESTARTING AFTER A PARTIAL MATCH Line 660  RESTARTING AFTER A PARTIAL MATCH
660           data> n05\R\D           data> n05\R\D
661            0: n05            0: n05
662    
663         For further information about partial  matching,  see  the  pcrepartial         For  further  information  about  partial matching, see the pcrepartial
664         documentation.         documentation.
665    
666    
667  CALLOUTS  CALLOUTS
668    
669         If  the pattern contains any callout requests, pcretest's callout func-         If the pattern contains any callout requests, pcretest's callout  func-
670         tion is called during matching. This works  with  both  matching  func-         tion  is  called  during  matching. This works with both matching func-
671         tions. By default, the called function displays the callout number, the         tions. By default, the called function displays the callout number, the
672         start and current positions in the text at the callout  time,  and  the         start  and  current  positions in the text at the callout time, and the
673         next pattern item to be tested. For example, the output         next pattern item to be tested. For example, the output
674    
675           --->pqrabcdef           --->pqrabcdef
676             0    ^  ^     \d             0    ^  ^     \d
677    
678         indicates  that  callout number 0 occurred for a match attempt starting         indicates that callout number 0 occurred for a match  attempt  starting
679         at the fourth character of the subject string, when the pointer was  at         at  the fourth character of the subject string, when the pointer was at
680         the  seventh  character of the data, and when the next pattern item was         the seventh character of the data, and when the next pattern  item  was
681         \d. Just one circumflex is output if the start  and  current  positions         \d.  Just  one  circumflex is output if the start and current positions
682         are the same.         are the same.
683    
684         Callouts numbered 255 are assumed to be automatic callouts, inserted as         Callouts numbered 255 are assumed to be automatic callouts, inserted as
685         a result of the /C pattern modifier. In this case, instead  of  showing         a  result  of the /C pattern modifier. In this case, instead of showing
686         the  callout  number, the offset in the pattern, preceded by a plus, is         the callout number, the offset in the pattern, preceded by a  plus,  is
687         output. For example:         output. For example:
688    
689             re> /\d?[A-E]\*/C             re> /\d?[A-E]\*/C
# Line 693  CALLOUTS Line 696  CALLOUTS
696            0: E*            0: E*
697    
698         If a pattern contains (*MARK) items, an additional line is output when-         If a pattern contains (*MARK) items, an additional line is output when-
699         ever  a  change  of  latest mark is passed to the callout function. For         ever a change of latest mark is passed to  the  callout  function.  For
700         example:         example:
701    
702             re> /a(*MARK:X)bc/C             re> /a(*MARK:X)bc/C
# Line 707  CALLOUTS Line 710  CALLOUTS
710           +12 ^  ^           +12 ^  ^
711            0: abc            0: abc
712    
713         The mark changes between matching "a" and "b", but stays the  same  for         The  mark  changes between matching "a" and "b", but stays the same for
714         the  rest  of  the match, so nothing more is output. If, as a result of         the rest of the match, so nothing more is output. If, as  a  result  of
715         backtracking, the mark reverts to being unset, the  text  "<unset>"  is         backtracking,  the  mark  reverts to being unset, the text "<unset>" is
716         output.         output.
717    
718         The  callout  function  in pcretest returns zero (carry on matching) by         The callout function in pcretest returns zero (carry  on  matching)  by
719         default, but you can use a \C item in a data line (as described  above)         default,  but you can use a \C item in a data line (as described above)
720         to change this and other parameters of the callout.         to change this and other parameters of the callout.
721    
722         Inserting  callouts can be helpful when using pcretest to check compli-         Inserting callouts can be helpful when using pcretest to check  compli-
723         cated regular expressions. For further information about callouts,  see         cated  regular expressions. For further information about callouts, see
724         the pcrecallout documentation.         the pcrecallout documentation.
725    
726    
727  NON-PRINTING CHARACTERS  NON-PRINTING CHARACTERS
728    
729         When  pcretest is outputting text in the compiled version of a pattern,         When pcretest is outputting text in the compiled version of a  pattern,
730         bytes other than 32-126 are always treated as  non-printing  characters         bytes  other  than 32-126 are always treated as non-printing characters
731         are are therefore shown as hex escapes.         are are therefore shown as hex escapes.
732    
733         When  pcretest  is  outputting text that is a matched part of a subject         When pcretest is outputting text that is a matched part  of  a  subject
734         string, it behaves in the same way, unless a different locale has  been         string,  it behaves in the same way, unless a different locale has been
735         set  for  the  pattern  (using  the  /L  modifier).  In  this case, the         set for the  pattern  (using  the  /L  modifier).  In  this  case,  the
736         isprint() function to distinguish printing and non-printing characters.         isprint() function to distinguish printing and non-printing characters.
737    
738    
739  SAVING AND RELOADING COMPILED PATTERNS  SAVING AND RELOADING COMPILED PATTERNS
740    
741         The facilities described in this section are  not  available  when  the         The  facilities  described  in  this section are not available when the
742         POSIX  interface  to  PCRE  is being used, that is, when the /P pattern         POSIX interface to PCRE is being used, that is,  when  the  /P  pattern
743         modifier is specified.         modifier is specified.
744    
745         When the POSIX interface is not in use, you can cause pcretest to write         When the POSIX interface is not in use, you can cause pcretest to write
746         a  compiled  pattern to a file, by following the modifiers with > and a         a compiled pattern to a file, by following the modifiers with >  and  a
747         file name.  For example:         file name.  For example:
748    
749           /pattern/im >/some/file           /pattern/im >/some/file
750    
751         See the pcreprecompile documentation for a discussion about saving  and         See  the pcreprecompile documentation for a discussion about saving and
752         re-using  compiled patterns.  Note that if the pattern was successfully         re-using compiled patterns.  Note that if the pattern was  successfully
753         studied with JIT optimization, the JIT data cannot be saved.         studied with JIT optimization, the JIT data cannot be saved.
754    
755         The data that is written is binary.  The  first  eight  bytes  are  the         The  data  that  is  written  is  binary. The first eight bytes are the
756         length  of  the  compiled  pattern  data  followed by the length of the         length of the compiled pattern data  followed  by  the  length  of  the
757         optional study data, each written as four  bytes  in  big-endian  order         optional  study  data,  each  written as four bytes in big-endian order
758         (most  significant  byte  first). If there is no study data (either the         (most significant byte first). If there is no study  data  (either  the
759         pattern was not studied, or studying did not return any data), the sec-         pattern was not studied, or studying did not return any data), the sec-
760         ond  length  is  zero. The lengths are followed by an exact copy of the         ond length is zero. The lengths are followed by an exact  copy  of  the
761         compiled pattern. If there is additional study  data,  this  (excluding         compiled  pattern.  If  there is additional study data, this (excluding
762         any  JIT  data)  follows  immediately after the compiled pattern. After         any JIT data) follows immediately after  the  compiled  pattern.  After
763         writing the file, pcretest expects to read a new pattern.         writing the file, pcretest expects to read a new pattern.
764    
765         A saved pattern can be reloaded into pcretest by  specifying  <  and  a         A  saved  pattern  can  be reloaded into pcretest by specifying < and a
766         file name instead of a pattern. The name of the file must not contain a         file name instead of a pattern. The name of the file must not contain a
767         < character, as otherwise pcretest will interpret the line as a pattern         < character, as otherwise pcretest will interpret the line as a pattern
768         delimited by < characters.  For example:         delimited by < characters.  For example:
# Line 768  SAVING AND RELOADING COMPILED PATTERNS Line 771  SAVING AND RELOADING COMPILED PATTERNS
771           Compiled pattern loaded from /some/file           Compiled pattern loaded from /some/file
772           No study data           No study data
773    
774         If  the  pattern  was previously studied with the JIT optimization, the         If the pattern was previously studied with the  JIT  optimization,  the
775         JIT information cannot be saved and restored, and so is lost. When  the         JIT  information cannot be saved and restored, and so is lost. When the
776         pattern  has  been  loaded, pcretest proceeds to read data lines in the         pattern has been loaded, pcretest proceeds to read data  lines  in  the
777         usual way.         usual way.
778    
779         You can copy a file written by pcretest to a different host and  reload         You  can copy a file written by pcretest to a different host and reload
780         it  there,  even  if the new host has opposite endianness to the one on         it there, even if the new host has opposite endianness to  the  one  on
781         which the pattern was compiled. For example, you can compile on an  i86         which  the pattern was compiled. For example, you can compile on an i86
782         machine and run on a SPARC machine.         machine and run on a SPARC machine.
783    
784         File  names  for  saving and reloading can be absolute or relative, but         File names for saving and reloading can be absolute  or  relative,  but
785         note that the shell facility of expanding a file name that starts  with         note  that the shell facility of expanding a file name that starts with
786         a tilde (~) is not available.         a tilde (~) is not available.
787    
788         The  ability to save and reload files in pcretest is intended for test-         The ability to save and reload files in pcretest is intended for  test-
789         ing and experimentation. It is not intended for production use  because         ing  and experimentation. It is not intended for production use because
790         only  a  single pattern can be written to a file. Furthermore, there is         only a single pattern can be written to a file. Furthermore,  there  is
791         no facility for supplying  custom  character  tables  for  use  with  a         no  facility  for  supplying  custom  character  tables  for use with a
792         reloaded  pattern.  If  the  original  pattern was compiled with custom         reloaded pattern. If the original  pattern  was  compiled  with  custom
793         tables, an attempt to match a subject string using a  reloaded  pattern         tables,  an  attempt to match a subject string using a reloaded pattern
794         is  likely to cause pcretest to crash.  Finally, if you attempt to load         is likely to cause pcretest to crash.  Finally, if you attempt to  load
795         a file that is not in the correct format, the result is undefined.         a file that is not in the correct format, the result is undefined.
796    
797    
# Line 807  AUTHOR Line 810  AUTHOR
810    
811  REVISION  REVISION
812    
813         Last updated: 26 August 2011         Last updated: 02 December 2011
814         Copyright (c) 1997-2011 University of Cambridge.         Copyright (c) 1997-2011 University of Cambridge.

Legend:
Removed from v.835  
changed lines
  Added in v.836

  ViewVC Help
Powered by ViewVC 1.1.5