/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 123 by ph10, Mon Mar 12 15:19:06 2007 UTC revision 148 by ph10, Mon Apr 16 13:25:10 2007 UTC
# Line 244  PCRE BUILD-TIME OPTIONS Line 244  PCRE BUILD-TIME OPTIONS
244    
245           ./configure --help           ./configure --help
246    
247         The following sections describe certain options whose names begin  with         The following sections include  descriptions  of  options  whose  names
248         --enable  or  --disable. These settings specify changes to the defaults         begin with --enable or --disable. These settings specify changes to the
249         for the configure command. Because of the  way  that  configure  works,         defaults for the configure command. Because of the way  that  configure
250         --enable  and  --disable  always  come  in  pairs, so the complementary         works,  --enable  and --disable always come in pairs, so the complemen-
251         option always exists as well, but as it specifies the  default,  it  is         tary option always exists as well, but as it specifies the default,  it
252         not described.         is not described.
253    
254    
255  C++ SUPPORT  C++ SUPPORT
# Line 288  UNICODE CHARACTER PROPERTY SUPPORT Line 288  UNICODE CHARACTER PROPERTY SUPPORT
288         to the configure command. This implies UTF-8 support, even if you  have         to the configure command. This implies UTF-8 support, even if you  have
289         not explicitly requested it.         not explicitly requested it.
290    
291         Including  Unicode  property  support  adds around 90K of tables to the         Including  Unicode  property  support  adds around 30K of tables to the
292         PCRE library, approximately doubling its size. Only the  general  cate-         PCRE library. Only the general category properties such as  Lu  and  Nd
293         gory  properties  such as Lu and Nd are supported. Details are given in         are supported. Details are given in the pcrepattern documentation.
        the pcrepattern documentation.  
294    
295    
296  CODE VALUE OF NEWLINE  CODE VALUE OF NEWLINE
297    
298         By default, PCRE interprets character 10 (linefeed, LF)  as  indicating         By  default,  PCRE interprets character 10 (linefeed, LF) as indicating
299         the  end  of  a line. This is the normal newline character on Unix-like         the end of a line. This is the normal newline  character  on  Unix-like
300         systems. You can compile PCRE to use character 13 (carriage return, CR)         systems. You can compile PCRE to use character 13 (carriage return, CR)
301         instead, by adding         instead, by adding
302    
303           --enable-newline-is-cr           --enable-newline-is-cr
304    
305         to  the  configure  command.  There  is  also  a --enable-newline-is-lf         to the  configure  command.  There  is  also  a  --enable-newline-is-lf
306         option, which explicitly specifies linefeed as the newline character.         option, which explicitly specifies linefeed as the newline character.
307    
308         Alternatively, you can specify that line endings are to be indicated by         Alternatively, you can specify that line endings are to be indicated by
# Line 317  CODE VALUE OF NEWLINE Line 316  CODE VALUE OF NEWLINE
316    
317         which causes PCRE to recognize any Unicode newline sequence.         which causes PCRE to recognize any Unicode newline sequence.
318    
319         Whatever  line  ending convention is selected when PCRE is built can be         Whatever line ending convention is selected when PCRE is built  can  be
320         overridden when the library functions are called. At build time  it  is         overridden  when  the library functions are called. At build time it is
321         conventional to use the standard for your operating system.         conventional to use the standard for your operating system.
322    
323    
324  BUILDING SHARED AND STATIC LIBRARIES  BUILDING SHARED AND STATIC LIBRARIES
325    
326         The  PCRE building process uses libtool to build both shared and static         The PCRE building process uses libtool to build both shared and  static
327         Unix libraries by default. You can suppress one of these by adding  one         Unix  libraries by default. You can suppress one of these by adding one
328         of         of
329    
330           --disable-shared           --disable-shared
# Line 337  BUILDING SHARED AND STATIC LIBRARIES Line 336  BUILDING SHARED AND STATIC LIBRARIES
336  POSIX MALLOC USAGE  POSIX MALLOC USAGE
337    
338         When PCRE is called through the POSIX interface (see the pcreposix doc-         When PCRE is called through the POSIX interface (see the pcreposix doc-
339         umentation), additional working storage is  required  for  holding  the         umentation),  additional  working  storage  is required for holding the
340         pointers  to capturing substrings, because PCRE requires three integers         pointers to capturing substrings, because PCRE requires three  integers
341         per substring, whereas the POSIX interface provides only  two.  If  the         per  substring,  whereas  the POSIX interface provides only two. If the
342         number of expected substrings is small, the wrapper function uses space         number of expected substrings is small, the wrapper function uses space
343         on the stack, because this is faster than using malloc() for each call.         on the stack, because this is faster than using malloc() for each call.
344         The default threshold above which the stack is no longer used is 10; it         The default threshold above which the stack is no longer used is 10; it
# Line 352  POSIX MALLOC USAGE Line 351  POSIX MALLOC USAGE
351    
352  HANDLING VERY LARGE PATTERNS  HANDLING VERY LARGE PATTERNS
353    
354         Within a compiled pattern, offset values are used  to  point  from  one         Within  a  compiled  pattern,  offset values are used to point from one
355         part  to another (for example, from an opening parenthesis to an alter-         part to another (for example, from an opening parenthesis to an  alter-
356         nation metacharacter). By default, two-byte values are used  for  these         nation  metacharacter).  By default, two-byte values are used for these
357         offsets,  leading  to  a  maximum size for a compiled pattern of around         offsets, leading to a maximum size for a  compiled  pattern  of  around
358         64K. This is sufficient to handle all but the most  gigantic  patterns.         64K.  This  is sufficient to handle all but the most gigantic patterns.
359         Nevertheless,  some  people do want to process enormous patterns, so it         Nevertheless, some people do want to process enormous patterns,  so  it
360         is possible to compile PCRE to use three-byte or four-byte  offsets  by         is  possible  to compile PCRE to use three-byte or four-byte offsets by
361         adding a setting such as         adding a setting such as
362    
363           --with-link-size=3           --with-link-size=3
364    
365         to  the  configure  command.  The value given must be 2, 3, or 4. Using         to the configure command. The value given must be 2,  3,  or  4.  Using
366         longer offsets slows down the operation of PCRE because it has to  load         longer  offsets slows down the operation of PCRE because it has to load
367         additional bytes when handling them.         additional bytes when handling them.
368    
        If  you  build  PCRE with an increased link size, test 2 (and test 5 if  
        you are using UTF-8) will fail. Part of the output of these tests is  a  
        representation  of the compiled pattern, and this changes with the link  
        size.  
   
369    
370  AVOIDING EXCESSIVE STACK USAGE  AVOIDING EXCESSIVE STACK USAGE
371    
# Line 429  LIMITING PCRE RESOURCE USAGE Line 423  LIMITING PCRE RESOURCE USAGE
423         time.         time.
424    
425    
426    CREATING CHARACTER TABLES AT BUILD TIME
427    
428           PCRE uses fixed tables for processing characters whose code values  are
429           less  than 256. By default, PCRE is built with a set of tables that are
430           distributed in the file pcre_chartables.c.dist. These  tables  are  for
431           ASCII codes only. If you add
432    
433             --enable-rebuild-chartables
434    
435           to  the  configure  command, the distributed tables are no longer used.
436           Instead, a program called dftables is compiled and  run.  This  outputs
437           the source for new set of tables, created in the default locale of your
438           C runtime system. (This method of replacing the tables does not work if
439           you  are cross compiling, because dftables is run on the local host. If
440           you need to create alternative tables when cross  compiling,  you  will
441           have to do so "by hand".)
442    
443    
444  USING EBCDIC CODE  USING EBCDIC CODE
445    
446         PCRE assumes by default that it will run in an  environment  where  the         PCRE  assumes  by  default that it will run in an environment where the
447         character  code  is  ASCII  (or Unicode, which is a superset of ASCII).         character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
448         PCRE can, however, be compiled to  run  in  an  EBCDIC  environment  by         PCRE  can,  however,  be  compiled  to  run in an EBCDIC environment by
449         adding         adding
450    
451           --enable-ebcdic           --enable-ebcdic
452    
453         to the configure command.         to the configure command. This setting implies --enable-rebuild-charta-
454           bles.
455    
456    
457  SEE ALSO  SEE ALSO
# Line 455  AUTHOR Line 468  AUTHOR
468    
469  REVISION  REVISION
470    
471         Last updated: 06 March 2007         Last updated: 20 March 2007
472         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
473  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
474    
# Line 508  REGULAR EXPRESSIONS AS TREES Line 521  REGULAR EXPRESSIONS AS TREES
521    
522  THE STANDARD MATCHING ALGORITHM  THE STANDARD MATCHING ALGORITHM
523    
524         In the terminology of Jeffrey Friedl's book Mastering  Regular  Expres-         In the terminology of Jeffrey Friedl's book "Mastering Regular  Expres-
525         sions,  the  standard  algorithm  is  an "NFA algorithm". It conducts a         sions",  the  standard  algorithm  is an "NFA algorithm". It conducts a
526         depth-first search of the pattern tree. That is, it  proceeds  along  a         depth-first search of the pattern tree. That is, it  proceeds  along  a
527         single path through the tree, checking that the subject matches what is         single path through the tree, checking that the subject matches what is
528         required. When there is a mismatch, the algorithm  tries  any  alterna-         required. When there is a mismatch, the algorithm  tries  any  alterna-
# Line 1310  STUDYING A PATTERN Line 1323  STUDYING A PATTERN
1323  LOCALE SUPPORT  LOCALE SUPPORT
1324    
1325         PCRE handles caseless matching, and determines whether  characters  are         PCRE handles caseless matching, and determines whether  characters  are
1326         letters  digits,  or whatever, by reference to a set of tables, indexed         letters,  digits, or whatever, by reference to a set of tables, indexed
1327         by character value. When running in UTF-8 mode, this  applies  only  to         by character value. When running in UTF-8 mode, this  applies  only  to
1328         characters  with  codes  less than 128. Higher-valued codes never match         characters  with  codes  less than 128. Higher-valued codes never match
1329         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
1330         with  Unicode  character property support. The use of locales with Uni-         with  Unicode  character property support. The use of locales with Uni-
1331         code is discouraged.         code is discouraged. If you are handling characters with codes  greater
1332           than  128, you should either use UTF-8 and Unicode, or use locales, but
1333         An internal set of tables is created in the default C locale when  PCRE         not try to mix the two.
1334         is  built.  This  is  used when the final argument of pcre_compile() is  
1335         NULL, and is sufficient for many applications. An  alternative  set  of         PCRE contains an internal set of tables that are used  when  the  final
1336         tables  can,  however, be supplied. These may be created in a different         argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
1337         locale from the default. As more and more applications change to  using         applications.  Normally, the internal tables recognize only ASCII char-
1338         Unicode, the need for this locale support is expected to die away.         acters. However, when PCRE is built, it is possible to cause the inter-
1339           nal tables to be rebuilt in the default "C" locale of the local system,
1340         External  tables  are  built by calling the pcre_maketables() function,         which may cause them to be different.
1341         which has no arguments, in the relevant locale. The result can then  be  
1342         passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For         The  internal tables can always be overridden by tables supplied by the
1343         example, to build and use tables that are appropriate  for  the  French         application that calls PCRE. These may be created in a different locale
1344         locale  (where  accented  characters  with  values greater than 128 are         from  the  default.  As more and more applications change to using Uni-
1345           code, the need for this locale support is expected to die away.
1346    
1347           External tables are built by calling  the  pcre_maketables()  function,
1348           which  has no arguments, in the relevant locale. The result can then be
1349           passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
1350           example,  to  build  and use tables that are appropriate for the French
1351           locale (where accented characters with  values  greater  than  128  are
1352         treated as letters), the following code could be used:         treated as letters), the following code could be used:
1353    
1354           setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
1355           tables = pcre_maketables();           tables = pcre_maketables();
1356           re = pcre_compile(..., tables);           re = pcre_compile(..., tables);
1357    
1358           The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
1359           if you are using Windows, the name for the French locale is "french".
1360    
1361         When pcre_maketables() runs, the tables are built  in  memory  that  is         When pcre_maketables() runs, the tables are built  in  memory  that  is
1362         obtained  via  pcre_malloc. It is the caller's responsibility to ensure         obtained  via  pcre_malloc. It is the caller's responsibility to ensure
1363         that the memory containing the tables remains available for as long  as         that the memory containing the tables remains available for as long  as
# Line 2132  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 2155  EXTRACTING CAPTURED SUBSTRINGS BY NAME
2155    
2156         These  functions call pcre_get_stringnumber(), and if it succeeds, they         These  functions call pcre_get_stringnumber(), and if it succeeds, they
2157         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-         then call pcre_copy_substring() or pcre_get_substring(),  as  appropri-
2158         ate.         ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
2159           behaviour may not be what you want (see the next section).
2160    
2161    
2162  DUPLICATE SUBPATTERN NAMES  DUPLICATE SUBPATTERN NAMES
# Line 2140  DUPLICATE SUBPATTERN NAMES Line 2164  DUPLICATE SUBPATTERN NAMES
2164         int pcre_get_stringtable_entries(const pcre *code,         int pcre_get_stringtable_entries(const pcre *code,
2165              const char *name, char **first, char **last);              const char *name, char **first, char **last);
2166    
2167         When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for         When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
2168         subpatterns are not required to  be  unique.  Normally,  patterns  with         subpatterns  are  not  required  to  be unique. Normally, patterns with
2169         duplicate  names  are such that in any one match, only one of the named         duplicate names are such that in any one match, only one of  the  named
2170         subpatterns participates. An example is shown in the pcrepattern  docu-         subpatterns  participates. An example is shown in the pcrepattern docu-
2171         mentation. When duplicates are present, pcre_copy_named_substring() and         mentation. When duplicates are present, pcre_copy_named_substring() and
2172         pcre_get_named_substring() return the first substring corresponding  to         pcre_get_named_substring()  return the first substring corresponding to
2173         the  given  name  that  is  set.  If  none  are set, an empty string is         the given name that is set.  If  none  are  set,  an  empty  string  is
2174         returned.  The pcre_get_stringnumber() function returns one of the num-         returned.  The pcre_get_stringnumber() function returns one of the num-
2175         bers  that are associated with the name, but it is not defined which it         bers that are associated with the name, but it is not defined which  it
2176         is.         is.
2177    
2178         If you want to get full details of all captured substrings for a  given         If  you want to get full details of all captured substrings for a given
2179         name,  you  must  use  the pcre_get_stringtable_entries() function. The         name, you must use  the  pcre_get_stringtable_entries()  function.  The
2180         first argument is the compiled pattern, and the second is the name. The         first argument is the compiled pattern, and the second is the name. The
2181         third  and  fourth  are  pointers to variables which are updated by the         third and fourth are pointers to variables which  are  updated  by  the
2182         function. After it has run, they point to the first and last entries in         function. After it has run, they point to the first and last entries in
2183         the  name-to-number  table  for  the  given  name.  The function itself         the name-to-number table  for  the  given  name.  The  function  itself
2184         returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if         returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
2185         there  are none. The format of the table is described above in the sec-         there are none. The format of the table is described above in the  sec-
2186         tion entitled Information about a  pattern.   Given  all  the  relevant         tion  entitled  Information  about  a  pattern.  Given all the relevant
2187         entries  for the name, you can extract each of their numbers, and hence         entries for the name, you can extract each of their numbers, and  hence
2188         the captured data, if any.         the captured data, if any.
2189    
2190    
2191  FINDING ALL POSSIBLE MATCHES  FINDING ALL POSSIBLE MATCHES
2192    
2193         The traditional matching function uses a  similar  algorithm  to  Perl,         The  traditional  matching  function  uses a similar algorithm to Perl,
2194         which stops when it finds the first match, starting at a given point in         which stops when it finds the first match, starting at a given point in
2195         the subject. If you want to find all possible matches, or  the  longest         the  subject.  If you want to find all possible matches, or the longest
2196         possible  match,  consider using the alternative matching function (see         possible match, consider using the alternative matching  function  (see
2197         below) instead. If you cannot use the alternative function,  but  still         below)  instead.  If you cannot use the alternative function, but still
2198         need  to  find all possible matches, you can kludge it up by making use         need to find all possible matches, you can kludge it up by  making  use
2199         of the callout facility, which is described in the pcrecallout documen-         of the callout facility, which is described in the pcrecallout documen-
2200         tation.         tation.
2201    
2202         What you have to do is to insert a callout right at the end of the pat-         What you have to do is to insert a callout right at the end of the pat-
2203         tern.  When your callout function is called, extract and save the  cur-         tern.   When your callout function is called, extract and save the cur-
2204         rent  matched  substring.  Then  return  1, which forces pcre_exec() to         rent matched substring. Then return  1,  which  forces  pcre_exec()  to
2205         backtrack and try other alternatives. Ultimately, when it runs  out  of         backtrack  and  try other alternatives. Ultimately, when it runs out of
2206         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.         matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
2207    
2208    
# Line 2189  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2213  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2213              int options, int *ovector, int ovecsize,              int options, int *ovector, int ovecsize,
2214              int *workspace, int wscount);              int *workspace, int wscount);
2215    
2216         The  function  pcre_dfa_exec()  is  called  to  match  a subject string         The function pcre_dfa_exec()  is  called  to  match  a  subject  string
2217         against a compiled pattern, using a matching algorithm that  scans  the         against  a  compiled pattern, using a matching algorithm that scans the
2218         subject  string  just  once, and does not backtrack. This has different         subject string just once, and does not backtrack.  This  has  different
2219         characteristics to the normal algorithm, and  is  not  compatible  with         characteristics  to  the  normal  algorithm, and is not compatible with
2220         Perl.  Some  of the features of PCRE patterns are not supported. Never-         Perl. Some of the features of PCRE patterns are not  supported.  Never-
2221         theless, there are times when this kind of matching can be useful.  For         theless,  there are times when this kind of matching can be useful. For
2222         a discussion of the two matching algorithms, see the pcrematching docu-         a discussion of the two matching algorithms, see the pcrematching docu-
2223         mentation.         mentation.
2224    
2225         The arguments for the pcre_dfa_exec() function  are  the  same  as  for         The  arguments  for  the  pcre_dfa_exec()  function are the same as for
2226         pcre_exec(), plus two extras. The ovector argument is used in a differ-         pcre_exec(), plus two extras. The ovector argument is used in a differ-
2227         ent way, and this is described below. The other  common  arguments  are         ent  way,  and  this is described below. The other common arguments are
2228         used  in  the  same way as for pcre_exec(), so their description is not         used in the same way as for pcre_exec(), so their  description  is  not
2229         repeated here.         repeated here.
2230    
2231         The two additional arguments provide workspace for  the  function.  The         The  two  additional  arguments provide workspace for the function. The
2232         workspace  vector  should  contain at least 20 elements. It is used for         workspace vector should contain at least 20 elements. It  is  used  for
2233         keeping  track  of  multiple  paths  through  the  pattern  tree.  More         keeping  track  of  multiple  paths  through  the  pattern  tree.  More
2234         workspace  will  be  needed for patterns and subjects where there are a         workspace will be needed for patterns and subjects where  there  are  a
2235         lot of potential matches.         lot of potential matches.
2236    
2237         Here is an example of a simple call to pcre_dfa_exec():         Here is an example of a simple call to pcre_dfa_exec():
# Line 2229  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2253  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2253    
2254     Option bits for pcre_dfa_exec()     Option bits for pcre_dfa_exec()
2255    
2256         The unused bits of the options argument  for  pcre_dfa_exec()  must  be         The  unused  bits  of  the options argument for pcre_dfa_exec() must be
2257         zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-         zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW-
2258         LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,         LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,
2259         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last         PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
2260         three of these are the same as for pcre_exec(), so their description is         three of these are the same as for pcre_exec(), so their description is
2261         not repeated here.         not repeated here.
2262    
2263           PCRE_PARTIAL           PCRE_PARTIAL
2264    
2265         This  has  the  same general effect as it does for pcre_exec(), but the         This has the same general effect as it does for  pcre_exec(),  but  the
2266         details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for         details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for
2267         pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into         pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into
2268         PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have         PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have
2269         been no complete matches, but there is still at least one matching pos-         been no complete matches, but there is still at least one matching pos-
2270         sibility. The portion of the string that provided the partial match  is         sibility.  The portion of the string that provided the partial match is
2271         set as the first matching string.         set as the first matching string.
2272    
2273           PCRE_DFA_SHORTEST           PCRE_DFA_SHORTEST
2274    
2275         Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to         Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
2276         stop as soon as it has found one match. Because of the way the alterna-         stop as soon as it has found one match. Because of the way the alterna-
2277         tive  algorithm  works, this is necessarily the shortest possible match         tive algorithm works, this is necessarily the shortest  possible  match
2278         at the first possible matching point in the subject string.         at the first possible matching point in the subject string.
2279    
2280           PCRE_DFA_RESTART           PCRE_DFA_RESTART
2281    
2282         When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and         When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and
2283         returns  a  partial  match, it is possible to call it again, with addi-         returns a partial match, it is possible to call it  again,  with  addi-
2284         tional subject characters, and have it continue with  the  same  match.         tional  subject  characters,  and have it continue with the same match.
2285         The  PCRE_DFA_RESTART  option requests this action; when it is set, the         The PCRE_DFA_RESTART option requests this action; when it is  set,  the
2286         workspace and wscount options must reference the same vector as  before         workspace  and wscount options must reference the same vector as before
2287         because  data  about  the  match so far is left in them after a partial         because data about the match so far is left in  them  after  a  partial
2288         match. There is more discussion of this  facility  in  the  pcrepartial         match.  There  is  more  discussion of this facility in the pcrepartial
2289         documentation.         documentation.
2290    
2291     Successful returns from pcre_dfa_exec()     Successful returns from pcre_dfa_exec()
2292    
2293         When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-         When pcre_dfa_exec() succeeds, it may have matched more than  one  sub-
2294         string in the subject. Note, however, that all the matches from one run         string in the subject. Note, however, that all the matches from one run
2295         of  the  function  start  at the same point in the subject. The shorter         of the function start at the same point in  the  subject.  The  shorter
2296         matches are all initial substrings of the longer matches. For  example,         matches  are all initial substrings of the longer matches. For example,
2297         if the pattern         if the pattern
2298    
2299           <.*>           <.*>
# Line 2284  MATCHING A PATTERN: THE ALTERNATIVE FUNC Line 2308  MATCHING A PATTERN: THE ALTERNATIVE FUNC
2308           <something> <something else>           <something> <something else>
2309           <something> <something else> <something further>           <something> <something else> <something further>
2310    
2311         On  success,  the  yield of the function is a number greater than zero,         On success, the yield of the function is a number  greater  than  zero,
2312         which is the number of matched substrings.  The  substrings  themselves         which  is  the  number of matched substrings. The substrings themselves
2313         are  returned  in  ovector. Each string uses two elements; the first is         are returned in ovector. Each string uses two elements;  the  first  is
2314         the offset to the start, and the second is the offset to  the  end.  In         the  offset  to  the start, and the second is the offset to the end. In
2315         fact,  all  the  strings  have the same start offset. (Space could have         fact, all the strings have the same start  offset.  (Space  could  have
2316         been saved by giving this only once, but it was decided to retain  some         been  saved by giving this only once, but it was decided to retain some
2317         compatibility  with  the  way pcre_exec() returns data, even though the         compatibility with the way pcre_exec() returns data,  even  though  the
2318         meaning of the strings is different.)         meaning of the strings is different.)
2319    
2320         The strings are returned in reverse order of length; that is, the long-         The strings are returned in reverse order of length; that is, the long-
2321         est  matching  string is given first. If there were too many matches to         est matching string is given first. If there were too many  matches  to
2322         fit into ovector, the yield of the function is zero, and the vector  is         fit  into ovector, the yield of the function is zero, and the vector is
2323         filled with the longest matches.         filled with the longest matches.
2324    
2325     Error returns from pcre_dfa_exec()     Error returns from pcre_dfa_exec()
2326    
2327         The  pcre_dfa_exec()  function returns a negative number when it fails.         The pcre_dfa_exec() function returns a negative number when  it  fails.
2328         Many of the errors are the same  as  for  pcre_exec(),  and  these  are         Many  of  the  errors  are  the  same as for pcre_exec(), and these are
2329         described  above.   There are in addition the following errors that are         described above.  There are in addition the following errors  that  are
2330         specific to pcre_dfa_exec():         specific to pcre_dfa_exec():
2331    
2332           PCRE_ERROR_DFA_UITEM      (-16)           PCRE_ERROR_DFA_UITEM      (-16)
2333    
2334         This return is given if pcre_dfa_exec() encounters an item in the  pat-         This  return is given if pcre_dfa_exec() encounters an item in the pat-
2335         tern  that  it  does not support, for instance, the use of \C or a back         tern that it does not support, for instance, the use of \C  or  a  back
2336         reference.         reference.
2337    
2338           PCRE_ERROR_DFA_UCOND      (-17)           PCRE_ERROR_DFA_UCOND      (-17)
2339    
2340         This return is given if pcre_dfa_exec()  encounters  a  condition  item         This  return  is  given  if pcre_dfa_exec() encounters a condition item
2341         that  uses  a back reference for the condition, or a test for recursion         that uses a back reference for the condition, or a test  for  recursion
2342         in a specific group. These are not supported.         in a specific group. These are not supported.
2343    
2344           PCRE_ERROR_DFA_UMLIMIT    (-18)           PCRE_ERROR_DFA_UMLIMIT    (-18)
2345    
2346         This return is given if pcre_dfa_exec() is called with an  extra  block         This  return  is given if pcre_dfa_exec() is called with an extra block
2347         that contains a setting of the match_limit field. This is not supported         that contains a setting of the match_limit field. This is not supported
2348         (it is meaningless).         (it is meaningless).
2349    
2350           PCRE_ERROR_DFA_WSSIZE     (-19)           PCRE_ERROR_DFA_WSSIZE     (-19)
2351    
2352         This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the         This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
2353         workspace vector.         workspace vector.
2354    
2355           PCRE_ERROR_DFA_RECURSE    (-20)           PCRE_ERROR_DFA_RECURSE    (-20)
2356    
2357         When  a  recursive subpattern is processed, the matching function calls         When a recursive subpattern is processed, the matching  function  calls
2358         itself recursively, using private vectors for  ovector  and  workspace.         itself  recursively,  using  private vectors for ovector and workspace.
2359         This  error  is  given  if  the output vector is not large enough. This         This error is given if the output vector  is  not  large  enough.  This
2360         should be extremely rare, as a vector of size 1000 is used.         should be extremely rare, as a vector of size 1000 is used.
2361    
2362    
2363  SEE ALSO  SEE ALSO
2364    
2365         pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-         pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar-
2366         tial(3),  pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).         tial(3), pcreposix(3), pcreprecompile(3), pcresample(3),  pcrestack(3).
2367    
2368    
2369  AUTHOR  AUTHOR
# Line 2904  BACKSLASH Line 2928  BACKSLASH
2928         is a letter or digit. The definition of  letters  and  digits  is  con-         is a letter or digit. The definition of  letters  and  digits  is  con-
2929         trolled  by PCRE's low-valued character tables, and may vary if locale-         trolled  by PCRE's low-valued character tables, and may vary if locale-
2930         specific matching is taking place (see "Locale support" in the  pcreapi         specific matching is taking place (see "Locale support" in the  pcreapi
2931         page).  For  example,  in  the  "fr_FR" (French) locale, some character         page).  For  example,  in  a French locale such as "fr_FR" in Unix-like
2932         codes greater than 128 are used for accented  letters,  and  these  are         systems, or "french" in Windows, some character codes greater than  128
2933         matched by \w.         are used for accented letters, and these are matched by \w.
2934    
2935         In  UTF-8 mode, characters with values greater than 128 never match \d,         In  UTF-8 mode, characters with values greater than 128 never match \d,
2936         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-
# Line 3275  SQUARE BRACKETS AND CHARACTER CLASSES Line 3299  SQUARE BRACKETS AND CHARACTER CLASSES
3299         If a range that includes letters is used when caseless matching is set,         If a range that includes letters is used when caseless matching is set,
3300         it matches the letters in either case. For example, [W-c] is equivalent         it matches the letters in either case. For example, [W-c] is equivalent
3301         to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if         to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
3302         character tables for the "fr_FR" locale are in use, [\xc8-\xcb] matches         character tables for a French locale are in  use,  [\xc8-\xcb]  matches
3303         accented  E  characters in both cases. In UTF-8 mode, PCRE supports the         accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
3304         concept of case for characters with values greater than 128  only  when         concept of case for characters with values greater than 128  only  when
3305         it is compiled with Unicode property support.         it is compiled with Unicode property support.
# Line 4503  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe Line 4527  MULTI-SEGMENT MATCHING WITH pcre_dfa_exe
4527         not always produce exactly the same result as matching over one  single         not always produce exactly the same result as matching over one  single
4528         long  string.   The  difference arises when there are multiple matching         long  string.   The  difference arises when there are multiple matching
4529         possibilities, because a partial match result is given only when  there         possibilities, because a partial match result is given only when  there
4530         are  no  completed  matches  in a call to fBpcre_dfa_exec(). This means         are  no completed matches in a call to pcre_dfa_exec(). This means that
4531         that as soon as the shortest match has been found,  continuation  to  a         as soon as the shortest match has been found,  continuation  to  a  new
4532         new  subject  segment  is  no  longer possible.  Consider this pcretest         subject segment is no longer possible.  Consider this pcretest example:
        example:  
4533    
4534             re> /dog(sbody)?/             re> /dog(sbody)?/
4535           data> do\P\D           data> do\P\D

Legend:
Removed from v.123  
changed lines
  Added in v.148

  ViewVC Help
Powered by ViewVC 1.1.5