/[pcre]/code/trunk/doc/html/pcretest.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcretest.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1319 by ph10, Fri Mar 22 16:13:13 2013 UTC revision 1320 by ph10, Wed May 1 16:39:35 2013 UTC
# Line 14  man page, in case the conversion went wr Line 14  man page, in case the conversion went wr
14  <br>  <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>  <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
17  <li><a name="TOC2" href="#SEC2">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>  <li><a name="TOC2" href="#SEC2">INPUT DATA FORMAT</a>
18  <li><a name="TOC3" href="#SEC3">COMMAND LINE OPTIONS</a>  <li><a name="TOC3" href="#SEC3">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
19  <li><a name="TOC4" href="#SEC4">DESCRIPTION</a>  <li><a name="TOC4" href="#SEC4">COMMAND LINE OPTIONS</a>
20  <li><a name="TOC5" href="#SEC5">PATTERN MODIFIERS</a>  <li><a name="TOC5" href="#SEC5">DESCRIPTION</a>
21  <li><a name="TOC6" href="#SEC6">DATA LINES</a>  <li><a name="TOC6" href="#SEC6">PATTERN MODIFIERS</a>
22  <li><a name="TOC7" href="#SEC7">THE ALTERNATIVE MATCHING FUNCTION</a>  <li><a name="TOC7" href="#SEC7">DATA LINES</a>
23  <li><a name="TOC8" href="#SEC8">DEFAULT OUTPUT FROM PCRETEST</a>  <li><a name="TOC8" href="#SEC8">THE ALTERNATIVE MATCHING FUNCTION</a>
24  <li><a name="TOC9" href="#SEC9">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>  <li><a name="TOC9" href="#SEC9">DEFAULT OUTPUT FROM PCRETEST</a>
25  <li><a name="TOC10" href="#SEC10">RESTARTING AFTER A PARTIAL MATCH</a>  <li><a name="TOC10" href="#SEC10">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
26  <li><a name="TOC11" href="#SEC11">CALLOUTS</a>  <li><a name="TOC11" href="#SEC11">RESTARTING AFTER A PARTIAL MATCH</a>
27  <li><a name="TOC12" href="#SEC12">NON-PRINTING CHARACTERS</a>  <li><a name="TOC12" href="#SEC12">CALLOUTS</a>
28  <li><a name="TOC13" href="#SEC13">SAVING AND RELOADING COMPILED PATTERNS</a>  <li><a name="TOC13" href="#SEC13">NON-PRINTING CHARACTERS</a>
29  <li><a name="TOC14" href="#SEC14">SEE ALSO</a>  <li><a name="TOC14" href="#SEC14">SAVING AND RELOADING COMPILED PATTERNS</a>
30  <li><a name="TOC15" href="#SEC15">AUTHOR</a>  <li><a name="TOC15" href="#SEC15">SEE ALSO</a>
31  <li><a name="TOC16" href="#SEC16">REVISION</a>  <li><a name="TOC16" href="#SEC16">AUTHOR</a>
32    <li><a name="TOC17" href="#SEC17">REVISION</a>
33  </ul>  </ul>
34  <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>  <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
35  <P>  <P>
# Line 63  conjunction with the test script and dat Line 64  conjunction with the test script and dat
64  PCRE, and are unlikely to be of use otherwise. They are all documented here,  PCRE, and are unlikely to be of use otherwise. They are all documented here,
65  but without much justification.  but without much justification.
66  </P>  </P>
67  <br><a name="SEC2" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>  <br><a name="SEC2" href="#TOC1">INPUT DATA FORMAT</a><br>
68    <P>
69    Input to <b>pcretest</b> is processed line by line, either by calling the C
70    library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see
71    below). In Unix-like environments, <b>fgets()</b> treats any bytes other than
72    newline as data characters. However, in some Windows environments character 26
73    (hex 1A) causes an immediate end of file, and no further data is read. For
74    maximum portability, therefore, it is safest to use only ASCII characters in
75    <b>pcretest</b> input files.
76    </P>
77    <br><a name="SEC3" href="#TOC1">PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a><br>
78  <P>  <P>
79  From release 8.30, two separate PCRE libraries can be built. The original one  From release 8.30, two separate PCRE libraries can be built. The original one
80  supports 8-bit character strings, whereas the newer 16-bit library supports  supports 8-bit character strings, whereas the newer 16-bit library supports
81  character strings encoded in 16-bit units. From release 8.32, a third  character strings encoded in 16-bit units. From release 8.32, a third library
82  library can be built, supporting character strings encoded in 32-bit units.  can be built, supporting character strings encoded in 32-bit units. The
83  The <b>pcretest</b> program can be  <b>pcretest</b> program can be used to test all three libraries. However, it is
84  used to test all three libraries. However, it is itself still an 8-bit program,  itself still an 8-bit program, reading 8-bit input and writing 8-bit output.
85  reading 8-bit input and writing 8-bit output. When testing the 16-bit or 32-bit  When testing the 16-bit or 32-bit library, the patterns and data strings are
86  library, the patterns and data strings are converted to 16- or 32-bit format  converted to 16- or 32-bit format before being passed to the PCRE library
87  before being passed to the PCRE library functions. Results are converted to  functions. Results are converted to 8-bit for output.
 8-bit for output.  
88  </P>  </P>
89  <P>  <P>
90  References to functions and structures of the form <b>pcre[16|32]_xx</b> below  References to functions and structures of the form <b>pcre[16|32]_xx</b> below
91  mean "<b>pcre_xx</b> when using the 8-bit library or <b>pcre16_xx</b> when using  mean "<b>pcre_xx</b> when using the 8-bit library, <b>pcre16_xx</b> when using
92  the 16-bit library".  the 16-bit library, or <b>pcre32_xx</b> when using the 32-bit library".
93  </P>  </P>
94  <br><a name="SEC3" href="#TOC1">COMMAND LINE OPTIONS</a><br>  <br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
95  <P>  <P>
96  <b>-8</b>  <b>-8</b>
97  If both the 8-bit library has been built, this option causes the 8-bit library  If both the 8-bit library has been built, this option causes the 8-bit library
# Line 259  to iterate 500000 times. Line 269  to iterate 500000 times.
269  This is like <b>-t</b> except that it times only the matching phase, not the  This is like <b>-t</b> except that it times only the matching phase, not the
270  compile or study phases.  compile or study phases.
271  </P>  </P>
272  <br><a name="SEC4" href="#TOC1">DESCRIPTION</a><br>  <br><a name="SEC5" href="#TOC1">DESCRIPTION</a><br>
273  <P>  <P>
274  If <b>pcretest</b> is given two filename arguments, it reads from the first and  If <b>pcretest</b> is given two filename arguments, it reads from the first and
275  writes to the second. If it is given only one filename argument, it reads from  writes to the second. If it is given only one filename argument, it reads from
# Line 316  backslash, because Line 326  backslash, because
326  is interpreted as the first line of a pattern that starts with "abc/", causing  is interpreted as the first line of a pattern that starts with "abc/", causing
327  pcretest to read the next line as a continuation of the regular expression.  pcretest to read the next line as a continuation of the regular expression.
328  </P>  </P>
329  <br><a name="SEC5" href="#TOC1">PATTERN MODIFIERS</a><br>  <br><a name="SEC6" href="#TOC1">PATTERN MODIFIERS</a><br>
330  <P>  <P>
331  A pattern may be followed by any number of modifiers, which are mostly single  A pattern may be followed by any number of modifiers, which are mostly single
332  characters, though some of these can be qualified by further characters.  characters, though some of these can be qualified by further characters.
# Line 329  fall into several groups that are descri Line 339  fall into several groups that are descri
339  sections.  sections.
340  <pre>  <pre>
341    <b>/8</b>              set UTF mode    <b>/8</b>              set UTF mode
342      <b>/9</b>              set PCRE_NEVER_UTF (locks out UTF mode)
343    <b>/?</b>              disable UTF validity check    <b>/?</b>              disable UTF validity check
344    <b>/+</b>              show remainder of subject after match    <b>/+</b>              show remainder of subject after match
345    <b>/=</b>              show all captures (not just those that are set)    <b>/=</b>              show all captures (not just those that are set)
# Line 401  options that do not correspond to anythi Line 412  options that do not correspond to anythi
412    <b>/8</b>              PCRE_UTF32          ) when using the 32-bit    <b>/8</b>              PCRE_UTF32          ) when using the 32-bit
413    <b>/?</b>              PCRE_NO_UTF32_CHECK )   library    <b>/?</b>              PCRE_NO_UTF32_CHECK )   library
414    
415      <b>/9</b>              PCRE_NEVER_UTF
416    <b>/A</b>              PCRE_ANCHORED    <b>/A</b>              PCRE_ANCHORED
417    <b>/C</b>              PCRE_AUTO_CALLOUT    <b>/C</b>              PCRE_AUTO_CALLOUT
418    <b>/E</b>              PCRE_DOLLAR_ENDONLY    <b>/E</b>              PCRE_DOLLAR_ENDONLY
# Line 630  function: Line 642  function:
642  The <b>/+</b> modifier works as described above. All other modifiers are  The <b>/+</b> modifier works as described above. All other modifiers are
643  ignored.  ignored.
644  </P>  </P>
645  <br><a name="SEC6" href="#TOC1">DATA LINES</a><br>  <br><a name="SEC7" href="#TOC1">DATA LINES</a><br>
646  <P>  <P>
647  Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing  Before each data line is passed to <b>pcre[16|32]_exec()</b>, leading and trailing
648  white space is removed, and it is then scanned for \ escapes. Some of these  white space is removed, and it is then scanned for \ escapes. Some of these
# Line 754  API to be used, the only option-setting Line 766  API to be used, the only option-setting
766  \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively,  \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively,
767  to be passed to <b>regexec()</b>.  to be passed to <b>regexec()</b>.
768  </P>  </P>
769  <br><a name="SEC7" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>  <br><a name="SEC8" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
770  <P>  <P>
771  By default, <b>pcretest</b> uses the standard PCRE matching function,  By default, <b>pcretest</b> uses the standard PCRE matching function,
772  <b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an  <b>pcre[16|32]_exec()</b> to match each data line. PCRE also supports an
# Line 771  This function finds all possible matches Line 783  This function finds all possible matches
783  escape sequence is present in the data line, it stops after the first match is  escape sequence is present in the data line, it stops after the first match is
784  found. This is always the shortest possible match.  found. This is always the shortest possible match.
785  </P>  </P>
786  <br><a name="SEC8" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>  <br><a name="SEC9" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
787  <P>  <P>
788  This section describes the output when the normal matching function,  This section describes the output when the normal matching function,
789  <b>pcre[16|32]_exec()</b>, is being used.  <b>pcre[16|32]_exec()</b>, is being used.
# Line 862  prompt is used for continuations), data Line 874  prompt is used for continuations), data
874  included in data by means of the \n escape (or \r, \r\n, etc., depending on  included in data by means of the \n escape (or \r, \r\n, etc., depending on
875  the newline sequence setting).  the newline sequence setting).
876  </P>  </P>
877  <br><a name="SEC9" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>  <br><a name="SEC10" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
878  <P>  <P>
879  When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by  When the alternative matching function, <b>pcre[16|32]_dfa_exec()</b>, is used (by
880  means of the \D escape sequence or the <b>-dfa</b> command line option), the  means of the \D escape sequence or the <b>-dfa</b> command line option), the
# Line 898  at the end of the longest match. For exa Line 910  at the end of the longest match. For exa
910  Since the matching function does not support substring capture, the escape  Since the matching function does not support substring capture, the escape
911  sequences that are concerned with captured substrings are not relevant.  sequences that are concerned with captured substrings are not relevant.
912  </P>  </P>
913  <br><a name="SEC10" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>  <br><a name="SEC11" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
914  <P>  <P>
915  When the alternative matching function has given the PCRE_ERROR_PARTIAL return,  When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
916  indicating that the subject partially matched the pattern, you can restart the  indicating that the subject partially matched the pattern, you can restart the
# Line 915  For further information about partial ma Line 927  For further information about partial ma
927  <a href="pcrepartial.html"><b>pcrepartial</b></a>  <a href="pcrepartial.html"><b>pcrepartial</b></a>
928  documentation.  documentation.
929  </P>  </P>
930  <br><a name="SEC11" href="#TOC1">CALLOUTS</a><br>  <br><a name="SEC12" href="#TOC1">CALLOUTS</a><br>
931  <P>  <P>
932  If the pattern contains any callout requests, <b>pcretest</b>'s callout function  If the pattern contains any callout requests, <b>pcretest</b>'s callout function
933  is called during matching. This works with both matching functions. By default,  is called during matching. This works with both matching functions. By default,
# Line 976  the Line 988  the
988  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
989  documentation.  documentation.
990  </P>  </P>
991  <br><a name="SEC12" href="#TOC1">NON-PRINTING CHARACTERS</a><br>  <br><a name="SEC13" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
992  <P>  <P>
993  When <b>pcretest</b> is outputting text in the compiled version of a pattern,  When <b>pcretest</b> is outputting text in the compiled version of a pattern,
994  bytes other than 32-126 are always treated as non-printing characters are are  bytes other than 32-126 are always treated as non-printing characters are are
# Line 988  string, it behaves in the same way, unle Line 1000  string, it behaves in the same way, unle
1000  the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>  the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>
1001  function to distinguish printing and non-printing characters.  function to distinguish printing and non-printing characters.
1002  </P>  </P>
1003  <br><a name="SEC13" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>  <br><a name="SEC14" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
1004  <P>  <P>
1005  The facilities described in this section are not available when the POSIX  The facilities described in this section are not available when the POSIX
1006  interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is  interface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
# Line 1061  string using a reloaded pattern is likel Line 1073  string using a reloaded pattern is likel
1073  Finally, if you attempt to load a file that is not in the correct format, the  Finally, if you attempt to load a file that is not in the correct format, the
1074  result is undefined.  result is undefined.
1075  </P>  </P>
1076  <br><a name="SEC14" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC15" href="#TOC1">SEE ALSO</a><br>
1077  <P>  <P>
1078  <b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3),  <b>pcre</b>(3), <b>pcre16</b>(3), <b>pcre32</b>(3), <b>pcreapi</b>(3),
1079  <b>pcrecallout</b>(3),  <b>pcrecallout</b>(3),
1080  <b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d),  <b>pcrejit</b>, <b>pcrematching</b>(3), <b>pcrepartial</b>(d),
1081  <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).  <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).
1082  </P>  </P>
1083  <br><a name="SEC15" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC16" href="#TOC1">AUTHOR</a><br>
1084  <P>  <P>
1085  Philip Hazel  Philip Hazel
1086  <br>  <br>
# Line 1077  University Computing Service Line 1089  University Computing Service
1089  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
1090  <br>  <br>
1091  </P>  </P>
1092  <br><a name="SEC16" href="#TOC1">REVISION</a><br>  <br><a name="SEC17" href="#TOC1">REVISION</a><br>
1093  <P>  <P>
1094  Last updated: 22 February 2013  Last updated: 26 April 2013
1095  <br>  <br>
1096  Copyright &copy; 1997-2013 University of Cambridge.  Copyright &copy; 1997-2013 University of Cambridge.
1097  <br>  <br>

Legend:
Removed from v.1319  
changed lines
  Added in v.1320

  ViewVC Help
Powered by ViewVC 1.1.5