/[pcre]/code/trunk/doc/pcretest.1
ViewVC logotype

Diff of /code/trunk/doc/pcretest.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 87 by nigel, Sat Feb 24 21:41:21 2007 UTC
# Line 4  pcretest - a program for testing Perl-co Line 4  pcretest - a program for testing Perl-co
4  .SH SYNOPSIS  .SH SYNOPSIS
5  .rs  .rs
6  .sp  .sp
7  .B pcretest "[-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]"  .B pcretest "[-C] [-d] [-dfa] [-i] [-m] [-o osize] [-p] [-t] [source]"
8  .ti +5n  .ti +5n
9  .B "[destination]"  .B "[destination]"
10  .P  .P
# Line 31  Output the version number of the PCRE li Line 31  Output the version number of the PCRE li
31  about the optional features that are included, and then exit.  about the optional features that are included, and then exit.
32  .TP 10  .TP 10
33  \fB-d\fP  \fB-d\fP
34  Behave as if each regex had the \fB/D\fP (debug) modifier; the internal  Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
35  form is output after compilation.  form is output after compilation.
36  .TP 10  .TP 10
37    \fB-dfa\fP
38    Behave as if each data line contains the \eD escape sequence; this causes the
39    alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the
40    standard \fBpcre_exec()\fP function (more detail is given below).
41    .TP 10
42  \fB-i\fP  \fB-i\fP
43  Behave as if each regex had the \fB/I\fP modifier; information about the  Behave as if each regex has the \fB/I\fP modifier; information about the
44  compiled pattern is given after compilation.  compiled pattern is given after compilation.
45  .TP 10  .TP 10
46  \fB-m\fP  \fB-m\fP
# Line 50  for 14 capturing subexpressions. The vec Line 55  for 14 capturing subexpressions. The vec
55  matching calls by including \eO in the data line (see below).  matching calls by including \eO in the data line (see below).
56  .TP 10  .TP 10
57  \fB-p\fP  \fB-p\fP
58  Behave as if each regex has \fB/P\fP modifier; the POSIX wrapper API is used  Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
59  to call PCRE. None of the other options has any effect when \fB-p\fP is set.  used to call PCRE. None of the other options has any effect when \fB-p\fP is
60    set.
61    .TP 10
62    \fP-q\fP
63    Do not output the version number of \fBpcretest\fP at the start of execution.
64  .TP 10  .TP 10
65  \fB-t\fP  \fB-t\fP
66  Run each compile, study, and match many times with a timer, and output  Run each compile, study, and match many times with a timer, and output
# Line 131  not correspond to anything in Perl: Line 140  not correspond to anything in Perl:
140    \fB/A\fP    PCRE_ANCHORED    \fB/A\fP    PCRE_ANCHORED
141    \fB/C\fP    PCRE_AUTO_CALLOUT    \fB/C\fP    PCRE_AUTO_CALLOUT
142    \fB/E\fP    PCRE_DOLLAR_ENDONLY    \fB/E\fP    PCRE_DOLLAR_ENDONLY
143      \fB/f\fP    PCRE_FIRSTLINE
144    \fB/N\fP    PCRE_NO_AUTO_CAPTURE    \fB/N\fP    PCRE_NO_AUTO_CAPTURE
145    \fB/U\fP    PCRE_UNGREEDY    \fB/U\fP    PCRE_UNGREEDY
146    \fB/X\fP    PCRE_EXTRA    \fB/X\fP    PCRE_EXTRA
# Line 257  recognized: Line 267  recognized:
267  .\" JOIN  .\" JOIN
268    \eC*n       pass the number n (may be negative) as callout    \eC*n       pass the number n (may be negative) as callout
269                 data; this is used as the callout return value                 data; this is used as the callout return value
270      \eD         use the \fBpcre_dfa_exec()\fP match function
271      \eF         only shortest match for \fBpcre_dfa_exec()\fP
272  .\" JOIN  .\" JOIN
273    \eGdd       call pcre_get_substring() for substring dd    \eGdd       call pcre_get_substring() for substring dd
274                 after a successful match (number less than 32)                 after a successful match (number less than 32)
# Line 267  recognized: Line 279  recognized:
279  .\" JOIN  .\" JOIN
280    \eL         call pcre_get_substringlist() after a    \eL         call pcre_get_substringlist() after a
281                 successful match                 successful match
282    \eM         discover the minimum MATCH_LIMIT setting    \eM         discover the minimum MATCH_LIMIT and
283                   MATCH_LIMIT_RECURSION settings
284    \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP    \eN         pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
285  .\" JOIN  .\" JOIN
286    \eOdd       set the size of the output vector passed to    \eOdd       set the size of the output vector passed to
287                 \fBpcre_exec()\fP to dd (any number of digits)                 \fBpcre_exec()\fP to dd (any number of digits)
288    .\" JOIN
289    \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP    \eP         pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
290                   or \fBpcre_dfa_exec()\fP
291      \eR         pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP
292    \eS         output details of memory get/free calls during matching    \eS         output details of memory get/free calls during matching
293    \eZ         pass the PCRE_NOTEOL option to \fBpcre_exec()\fP    \eZ         pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
294  .\" JOIN  .\" JOIN
# Line 286  very last character is a backslash, it i Line 302  very last character is a backslash, it i
302  an empty line as data, since a real empty line terminates the data input.  an empty line as data, since a real empty line terminates the data input.
303  .P  .P
304  If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with  If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
305  different values in the \fImatch_limit\fP field of the \fBpcre_extra\fP data  different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP
306  structure, until it finds the minimum number that is needed for  fields of the \fBpcre_extra\fP data structure, until it finds the minimum
307  \fBpcre_exec()\fP to complete. This number is a measure of the amount of  numbers for each parameter that allow \fBpcre_exec()\fP to complete. The
308  recursion and backtracking that takes place, and checking it out can be  \fImatch_limit\fP number is a measure of the amount of backtracking that takes
309  instructive. For most simple matches, the number is quite small, but for  place, and checking it out can be instructive. For most simple matches, the
310  patterns with very large numbers of matching possibilities, it can become large  number is quite small, but for patterns with very large numbers of matching
311  very quickly with increasing length of subject string.  possibilities, it can become large very quickly with increasing length of
312    subject string. The \fImatch_limit_recursion\fP number is a measure of how much
313    stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
314    to complete the match attempt.
315  .P  .P
316  When \eO is used, the value specified may be higher or lower than the size set  When \eO is used, the value specified may be higher or lower than the size set
317  by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to  by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
318  the call of \fBpcre_exec()\fP for the line in which it appears.  the call of \fBpcre_exec()\fP for the line in which it appears.
319  .P  .P
320  If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper  If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
321  API to be used, only \eB and \eZ have any effect, causing REG_NOTBOL and  API to be used, the only option-setting sequences that have any effect are \eB
322  REG_NOTEOL to be passed to \fBregexec()\fP respectively.  and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
323    \fBregexec()\fP.
324  .P  .P
325  The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use  The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
326  of the \fB/8\fP modifier on the pattern. It is recognized always. There may be  of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
# Line 308  any number of hexadecimal digits inside Line 328  any number of hexadecimal digits inside
328  six bytes, encoded according to the UTF-8 rules.  six bytes, encoded according to the UTF-8 rules.
329  .  .
330  .  .
331  .SH "OUTPUT FROM PCRETEST"  .SH "THE ALTERNATIVE MATCHING FUNCTION"
332    .rs
333    .sp
334    By default, \fBpcretest\fP uses the standard PCRE matching function,
335    \fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an
336    alternative matching function, \fBpcre_dfa_test()\fP, which operates in a
337    different way, and has some restrictions. The differences between the two
338    functions are described in the
339    .\" HREF
340    \fBpcrematching\fP
341    .\"
342    documentation.
343    .P
344    If a data line contains the \eD escape sequence, or if the command line
345    contains the \fB-dfa\fP option, the alternative matching function is called.
346    This function finds all possible matches at a given point. If, however, the \eF
347    escape sequence is present in the data line, it stops after the first match is
348    found. This is always the shortest possible match.
349    .
350    .
351    .SH "DEFAULT OUTPUT FROM PCRETEST"
352  .rs  .rs
353  .sp  .sp
354    This section describes the output when the normal matching function,
355    \fBpcre_exec()\fP, is being used.
356    .P
357  When a match succeeds, pcretest outputs the list of captured substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
358  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched  \fBpcre_exec()\fP returns, starting with number 0 for the string that matched
359  the whole pattern. Otherwise, it outputs "No match" or "Partial match"  the whole pattern. Otherwise, it outputs "No match" or "Partial match"
360  when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,  when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
361  respectively, and otherwise the PCRE negative error number. Here is an example  respectively, and otherwise the PCRE negative error number. Here is an example
362  of an interactive pcretest run.  of an interactive \fBpcretest\fP run.
363  .sp  .sp
364    $ pcretest    $ pcretest
365    PCRE version 5.00 07-Sep-2004    PCRE version 5.00 07-Sep-2004
# Line 365  prompt is used for continuations), data Line 408  prompt is used for continuations), data
408  included in data by means of the \en escape.  included in data by means of the \en escape.
409  .  .
410  .  .
411    .SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
412    .rs
413    .sp
414    When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by
415    means of the \eD escape sequence or the \fB-dfa\fP command line option), the
416    output consists of a list of all the matches that start at the first point in
417    the subject where there is at least one match. For example:
418    .sp
419        re> /(tang|tangerine|tan)/
420      data> yellow tangerine\eD
421       0: tangerine
422       1: tang
423       2: tan
424    .sp
425    (Using the normal matching function on this data finds only "tang".) The
426    longest matching string is always given first (and numbered zero).
427    .P
428    If \fB/g\P is present on the pattern, the search for further matches resumes
429    at the end of the longest match. For example:
430    .sp
431        re> /(tang|tangerine|tan)/g
432      data> yellow tangerine and tangy sultana\eD
433       0: tangerine
434       1: tang
435       2: tan
436       0: tang
437       1: tan
438       0: tan
439    .sp
440    Since the matching function does not support substring capture, the escape
441    sequences that are concerned with captured substrings are not relevant.
442    .
443    .
444    .SH "RESTARTING AFTER A PARTIAL MATCH"
445    .rs
446    .sp
447    When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
448    indicating that the subject partially matched the pattern, you can restart the
449    match with additional subject data by means of the \eR escape sequence. For
450    example:
451    .sp
452        re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
453      data> 23ja\eP\eD
454      Partial match: 23ja
455      data> n05\eR\eD
456       0: n05
457    .sp
458    For further information about partial matching, see the
459    .\" HREF
460    \fBpcrepartial\fP
461    .\"
462    documentation.
463    .
464    .
465  .SH CALLOUTS  .SH CALLOUTS
466  .rs  .rs
467  .sp  .sp
468  If the pattern contains any callout requests, \fBpcretest\fP's callout function  If the pattern contains any callout requests, \fBpcretest\fP's callout function
469  is called during matching. By default, it displays the callout number, the  is called during matching. This works with both matching functions. By default,
470  start and current positions in the text at the callout time, and the next  the called function displays the callout number, the start and current
471  pattern item to be tested. For example, the output  positions in the text at the callout time, and the next pattern item to be
472    tested. For example, the output
473  .sp  .sp
474    --->pqrabcdef    --->pqrabcdef
475      0    ^  ^     \ed      0    ^  ^     \ed
# Line 396  example: Line 494  example:
494     0: E*     0: E*
495  .sp  .sp
496  The callout function in \fBpcretest\fP returns zero (carry on matching) by  The callout function in \fBpcretest\fP returns zero (carry on matching) by
497  default, but you can use an \eC item in a data line (as described above) to  default, but you can use a \eC item in a data line (as described above) to
498  change this.  change this.
499  .P  .P
500  Inserting callouts can be helpful when using \fBpcretest\fP to check  Inserting callouts can be helpful when using \fBpcretest\fP to check
# Line 471  result is undefined. Line 569  result is undefined.
569  .SH AUTHOR  .SH AUTHOR
570  .rs  .rs
571  .sp  .sp
572  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel
573  .br  .br
574  University Computing Service,  University Computing Service,
575  .br  .br
576  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
577  .P  .P
578  .in 0  .in 0
579  Last updated: 10 September 2004  Last updated: 18 January 2006
580  .br  .br
581  Copyright (c) 1997-2004 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.75  
changed lines
  Added in v.87

  ViewVC Help
Powered by ViewVC 1.1.5