--- code/trunk/doc/html/pcretest.html 2007/02/24 21:40:37 75 +++ code/trunk/doc/html/pcretest.html 2007/02/24 21:41:21 87 @@ -18,14 +18,17 @@
-pcretest [-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source] +pcretest [-C] [-d] [-dfa] [-i] [-m] [-o osize] [-p] [-t] [source] [destination]
@@ -47,12 +50,18 @@
-d -Behave as if each regex had the /D (debug) modifier; the internal +Behave as if each regex has the /D (debug) modifier; the internal form is output after compilation.
+-dfa +Behave as if each data line contains the \D escape sequence; this causes the +alternative matching function, pcre_dfa_exec(), to be used instead of the +standard pcre_exec() function (more detail is given below). +
+-i -Behave as if each regex had the /I modifier; information about the +Behave as if each regex has the /I modifier; information about the compiled pattern is given after compilation.
@@ -70,8 +79,13 @@
-p -Behave as if each regex has /P modifier; the POSIX wrapper API is used -to call PCRE. None of the other options has any effect when -p is set. +Behave as if each regex has the /P modifier; the POSIX wrapper API is +used to call PCRE. None of the other options has any effect when -p is +set. +
++\fP-q\fP +Do not output the version number of pcretest at the start of execution.
-t @@ -152,6 +166,7 @@ /A PCRE_ANCHORED /C PCRE_AUTO_CALLOUT /E PCRE_DOLLAR_ENDONLY + /f PCRE_FIRSTLINE /N PCRE_NO_AUTO_CAPTURE /U PCRE_UNGREEDY /X PCRE_EXTRA @@ -274,14 +289,18 @@ \C!n return 1 instead of 0 when callout number n is reached \C!n!m return 1 instead of 0 when callout number n is reached for the nth time \C*n pass the number n (may be negative) as callout data; this is used as the callout return value + \D use the pcre_dfa_exec() match function + \F only shortest match for pcre_dfa_exec() \Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32) \Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin- ated by next non-alphanumeric character) \L call pcre_get_substringlist() after a successful match - \M discover the minimum MATCH_LIMIT setting + \M discover the minimum MATCH_LIMIT and + MATCH_LIMIT_RECURSION settings \N pass the PCRE_NOTEMPTY option to pcre_exec() \Odd set the size of the output vector passed to pcre_exec() to dd (any number of digits) - \P pass the PCRE_PARTIAL option to pcre_exec() + \P pass the PCRE_PARTIAL option to pcre_exec() or pcre_dfa_exec() + \R pass the PCRE_DFA_RESTART option to pcre_dfa_exec() \S output details of memory get/free calls during matching \Z pass the PCRE_NOTEOL option to pcre_exec() \? pass the PCRE_NO_UTF8_CHECK option to pcre_exec() @@ -294,13 +313,16 @@
If \M is present, pcretest calls pcre_exec() several times, with -different values in the match_limit field of the pcre_extra data -structure, until it finds the minimum number that is needed for -pcre_exec() to complete. This number is a measure of the amount of -recursion and backtracking that takes place, and checking it out can be -instructive. For most simple matches, the number is quite small, but for -patterns with very large numbers of matching possibilities, it can become large -very quickly with increasing length of subject string. +different values in the match_limit and match_limit_recursion +fields of the pcre_extra data structure, until it finds the minimum +numbers for each parameter that allow pcre_exec() to complete. The +match_limit number is a measure of the amount of backtracking that takes +place, and checking it out can be instructive. For most simple matches, the +number is quite small, but for patterns with very large numbers of matching +possibilities, it can become large very quickly with increasing length of +subject string. The match_limit_recursion number is a measure of how much +stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed +to complete the match attempt.
When \O is used, the value specified may be higher or lower than the size set @@ -309,8 +331,9 @@
If the /P modifier was present on the pattern, causing the POSIX wrapper -API to be used, only \B and \Z have any effect, causing REG_NOTBOL and -REG_NOTEOL to be passed to regexec() respectively. +API to be used, the only option-setting sequences that have any effect are \B +and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to +regexec().
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use @@ -318,14 +341,35 @@ any number of hexadecimal digits inside the braces. The result is from one to six bytes, encoded according to the UTF-8 rules.
-+By default, pcretest uses the standard PCRE matching function, +pcre_exec() to match each data line. From release 6.0, PCRE supports an +alternative matching function, pcre_dfa_test(), which operates in a +different way, and has some restrictions. The differences between the two +functions are described in the +pcrematching +documentation. +
++If a data line contains the \D escape sequence, or if the command line +contains the -dfa option, the alternative matching function is called. +This function finds all possible matches at a given point. If, however, the \F +escape sequence is present in the data line, it stops after the first match is +found. This is always the shortest possible match. +
++This section describes the output when the normal matching function, +pcre_exec(), is being used. +
When a match succeeds, pcretest outputs the list of captured substrings that pcre_exec() returns, starting with number 0 for the string that matched the whole pattern. Otherwise, it outputs "No match" or "Partial match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL, respectively, and otherwise the PCRE negative error number. Here is an example -of an interactive pcretest run. +of an interactive pcretest run.
$ pcretest PCRE version 5.00 07-Sep-2004 @@ -375,12 +419,62 @@ prompt is used for continuations), data lines may not. However newlines can be included in data by means of the \n escape. -
CALLOUTS
+
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
++When the alternative matching function, pcre_dfa_exec(), is used (by +means of the \D escape sequence or the -dfa command line option), the +output consists of a list of all the matches that start at the first point in +the subject where there is at least one match. For example: +
+ re> /(tang|tangerine|tan)/ + data> yellow tangerine\D + 0: tangerine + 1: tang + 2: tan ++(Using the normal matching function on this data finds only "tang".) The +longest matching string is always given first (and numbered zero). + ++If \fB/g\P is present on the pattern, the search for further matches resumes +at the end of the longest match. For example: +
+ re> /(tang|tangerine|tan)/g + data> yellow tangerine and tangy sultana\D + 0: tangerine + 1: tang + 2: tan + 0: tang + 1: tan + 0: tan ++Since the matching function does not support substring capture, the escape +sequences that are concerned with captured substrings are not relevant. + +
RESTARTING AFTER A PARTIAL MATCH
++When the alternative matching function has given the PCRE_ERROR_PARTIAL return, +indicating that the subject partially matched the pattern, you can restart the +match with additional subject data by means of the \R escape sequence. For +example: +
+ re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ + data> 23ja\P\D + Partial match: 23ja + data> n05\R\D + 0: n05 ++For further information about partial matching, see the +pcrepartial +documentation. + +
CALLOUTS
If the pattern contains any callout requests, pcretest's callout function -is called during matching. By default, it displays the callout number, the -start and current positions in the text at the callout time, and the next -pattern item to be tested. For example, the output +is called during matching. This works with both matching functions. By default, +the called function displays the callout number, the start and current +positions in the text at the callout time, and the next pattern item to be +tested. For example, the output
--->pqrabcdef 0 ^ ^ \d @@ -406,7 +500,7 @@ 0: E*The callout function in pcretest returns zero (carry on matching) by -default, but you can use an \C item in a data line (as described above) to +default, but you can use a \C item in a data line (as described above) to change this.@@ -416,7 +510,7 @@ pcrecallout documentation.
-
SAVING AND RELOADING COMPILED PATTERNS
+
SAVING AND RELOADING COMPILED PATTERNS
The facilities described in this section are not available when the POSIX inteface to PCRE is being used, that is, when the /P pattern modifier is @@ -478,18 +572,18 @@ Finally, if you attempt to load a file that is not in the correct format, the result is undefined.
-
AUTHOR
+
AUTHOR
-Philip Hazel <ph10@cam.ac.uk> +Philip Hazel
University Computing Service,
Cambridge CB2 3QG, England.-Last updated: 10 September 2004 +Last updated: 18 January 2006
-Copyright © 1997-2004 University of Cambridge. +Copyright © 1997-2006 University of Cambridge.Return to the PCRE index page.