--- code/trunk/doc/html/pcretest.html 2007/02/24 21:40:37 75 +++ code/trunk/doc/html/pcretest.html 2007/02/24 21:41:21 87 @@ -18,14 +18,17 @@
  • DESCRIPTION
  • PATTERN MODIFIERS
  • DATA LINES -
  • OUTPUT FROM PCRETEST -
  • CALLOUTS -
  • SAVING AND RELOADING COMPILED PATTERNS -
  • AUTHOR +
  • THE ALTERNATIVE MATCHING FUNCTION +
  • DEFAULT OUTPUT FROM PCRETEST +
  • OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION +
  • RESTARTING AFTER A PARTIAL MATCH +
  • CALLOUTS +
  • SAVING AND RELOADING COMPILED PATTERNS +
  • AUTHOR
    SYNOPSIS

    -pcretest [-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source] +pcretest [-C] [-d] [-dfa] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]

    @@ -47,12 +50,18 @@

    -d -Behave as if each regex had the /D (debug) modifier; the internal +Behave as if each regex has the /D (debug) modifier; the internal form is output after compilation.

    +-dfa +Behave as if each data line contains the \D escape sequence; this causes the +alternative matching function, pcre_dfa_exec(), to be used instead of the +standard pcre_exec() function (more detail is given below). +

    +

    -i -Behave as if each regex had the /I modifier; information about the +Behave as if each regex has the /I modifier; information about the compiled pattern is given after compilation.

    @@ -70,8 +79,13 @@

    -p -Behave as if each regex has /P modifier; the POSIX wrapper API is used -to call PCRE. None of the other options has any effect when -p is set. +Behave as if each regex has the /P modifier; the POSIX wrapper API is +used to call PCRE. None of the other options has any effect when -p is +set. +

    +

    +\fP-q\fP +Do not output the version number of pcretest at the start of execution.

    -t @@ -152,6 +166,7 @@ /A PCRE_ANCHORED /C PCRE_AUTO_CALLOUT /E PCRE_DOLLAR_ENDONLY + /f PCRE_FIRSTLINE /N PCRE_NO_AUTO_CAPTURE /U PCRE_UNGREEDY /X PCRE_EXTRA @@ -274,14 +289,18 @@ \C!n return 1 instead of 0 when callout number n is reached \C!n!m return 1 instead of 0 when callout number n is reached for the nth time \C*n pass the number n (may be negative) as callout data; this is used as the callout return value + \D use the pcre_dfa_exec() match function + \F only shortest match for pcre_dfa_exec() \Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32) \Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- ated by next non-alphanumeric character) \L call pcre_get_substringlist() after a successful match - \M discover the minimum MATCH_LIMIT setting + \M discover the minimum MATCH_LIMIT and + MATCH_LIMIT_RECURSION settings \N pass the PCRE_NOTEMPTY option to pcre_exec() \Odd set the size of the output vector passed to pcre_exec() to dd (any number of digits) - \P pass the PCRE_PARTIAL option to pcre_exec() + \P pass the PCRE_PARTIAL option to pcre_exec() or pcre_dfa_exec() + \R pass the PCRE_DFA_RESTART option to pcre_dfa_exec() \S output details of memory get/free calls during matching \Z pass the PCRE_NOTEOL option to pcre_exec() \? pass the PCRE_NO_UTF8_CHECK option to pcre_exec() @@ -294,13 +313,16 @@

    If \M is present, pcretest calls pcre_exec() several times, with -different values in the match_limit field of the pcre_extra data -structure, until it finds the minimum number that is needed for -pcre_exec() to complete. This number is a measure of the amount of -recursion and backtracking that takes place, and checking it out can be -instructive. For most simple matches, the number is quite small, but for -patterns with very large numbers of matching possibilities, it can become large -very quickly with increasing length of subject string. +different values in the match_limit and match_limit_recursion +fields of the pcre_extra data structure, until it finds the minimum +numbers for each parameter that allow pcre_exec() to complete. The +match_limit number is a measure of the amount of backtracking that takes +place, and checking it out can be instructive. For most simple matches, the +number is quite small, but for patterns with very large numbers of matching +possibilities, it can become large very quickly with increasing length of +subject string. The match_limit_recursion number is a measure of how much +stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed +to complete the match attempt.

    When \O is used, the value specified may be higher or lower than the size set @@ -309,8 +331,9 @@

    If the /P modifier was present on the pattern, causing the POSIX wrapper -API to be used, only \B and \Z have any effect, causing REG_NOTBOL and -REG_NOTEOL to be passed to regexec() respectively. +API to be used, the only option-setting sequences that have any effect are \B +and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to +regexec().

    The use of \x{hh...} to represent UTF-8 characters is not dependent on the use @@ -318,14 +341,35 @@ any number of hexadecimal digits inside the braces. The result is from one to six bytes, encoded according to the UTF-8 rules.

    -
    OUTPUT FROM PCRETEST
    +
    THE ALTERNATIVE MATCHING FUNCTION
    +

    +By default, pcretest uses the standard PCRE matching function, +pcre_exec() to match each data line. From release 6.0, PCRE supports an +alternative matching function, pcre_dfa_test(), which operates in a +different way, and has some restrictions. The differences between the two +functions are described in the +pcrematching +documentation. +

    +

    +If a data line contains the \D escape sequence, or if the command line +contains the -dfa option, the alternative matching function is called. +This function finds all possible matches at a given point. If, however, the \F +escape sequence is present in the data line, it stops after the first match is +found. This is always the shortest possible match. +

    +
    DEFAULT OUTPUT FROM PCRETEST
    +

    +This section describes the output when the normal matching function, +pcre_exec(), is being used. +

    When a match succeeds, pcretest outputs the list of captured substrings that pcre_exec() returns, starting with number 0 for the string that matched the whole pattern. Otherwise, it outputs "No match" or "Partial match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, respectively, and otherwise the PCRE negative error number. Here is an example -of an interactive pcretest run. +of an interactive pcretest run.

       $ pcretest
       PCRE version 5.00 07-Sep-2004
    @@ -375,12 +419,62 @@
     prompt is used for continuations), data lines may not. However newlines can be
     included in data by means of the \n escape.
     

    -
    CALLOUTS
    +
    OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
    +

    +When the alternative matching function, pcre_dfa_exec(), is used (by +means of the \D escape sequence or the -dfa command line option), the +output consists of a list of all the matches that start at the first point in +the subject where there is at least one match. For example: +

    +    re> /(tang|tangerine|tan)/
    +  data> yellow tangerine\D
    +   0: tangerine
    +   1: tang
    +   2: tan
    +
    +(Using the normal matching function on this data finds only "tang".) The +longest matching string is always given first (and numbered zero). +

    +

    +If \fB/g\P is present on the pattern, the search for further matches resumes +at the end of the longest match. For example: +

    +    re> /(tang|tangerine|tan)/g
    +  data> yellow tangerine and tangy sultana\D
    +   0: tangerine
    +   1: tang
    +   2: tan
    +   0: tang
    +   1: tan
    +   0: tan
    +
    +Since the matching function does not support substring capture, the escape +sequences that are concerned with captured substrings are not relevant. +

    +
    RESTARTING AFTER A PARTIAL MATCH
    +

    +When the alternative matching function has given the PCRE_ERROR_PARTIAL return, +indicating that the subject partially matched the pattern, you can restart the +match with additional subject data by means of the \R escape sequence. For +example: +

    +    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    +  data> 23ja\P\D
    +  Partial match: 23ja
    +  data> n05\R\D
    +   0: n05
    +
    +For further information about partial matching, see the +pcrepartial +documentation. +

    +
    CALLOUTS

    If the pattern contains any callout requests, pcretest's callout function -is called during matching. By default, it displays the callout number, the -start and current positions in the text at the callout time, and the next -pattern item to be tested. For example, the output +is called during matching. This works with both matching functions. By default, +the called function displays the callout number, the start and current +positions in the text at the callout time, and the next pattern item to be +tested. For example, the output

       --->pqrabcdef
         0    ^  ^     \d
    @@ -406,7 +500,7 @@
        0: E*
     
    The callout function in pcretest returns zero (carry on matching) by -default, but you can use an \C item in a data line (as described above) to +default, but you can use a \C item in a data line (as described above) to change this.

    @@ -416,7 +510,7 @@ pcrecallout documentation.

    -
    SAVING AND RELOADING COMPILED PATTERNS
    +
    SAVING AND RELOADING COMPILED PATTERNS

    The facilities described in this section are not available when the POSIX inteface to PCRE is being used, that is, when the /P pattern modifier is @@ -478,18 +572,18 @@ Finally, if you attempt to load a file that is not in the correct format, the result is undefined.

    -
    AUTHOR
    +
    AUTHOR

    -Philip Hazel <ph10@cam.ac.uk> +Philip Hazel
    University Computing Service,
    Cambridge CB2 3QG, England.

    -Last updated: 10 September 2004 +Last updated: 18 January 2006
    -Copyright © 1997-2004 University of Cambridge. +Copyright © 1997-2006 University of Cambridge.

    Return to the PCRE index page.