/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC revision 87 by nigel, Sat Feb 24 21:41:21 2007 UTC
# Line 1  Line 1 
1  .TH PCRE 3  .TH PCREAPI 3
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .SH "PCRE NATIVE API"  .SH "PCRE NATIVE API"
# Line 15  PCRE - Perl-compatible regular expressio Line 15  PCRE - Perl-compatible regular expressio
15  .B const unsigned char *\fItableptr\fP);  .B const unsigned char *\fItableptr\fP);
16  .PP  .PP
17  .br  .br
18    .B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
19    .ti +5n
20    .B int *\fIerrorcodeptr\fP,
21    .ti +5n
22    .B const char **\fIerrptr\fP, int *\fIerroffset\fP,
23    .ti +5n
24    .B const unsigned char *\fItableptr\fP);
25    .PP
26    .br
27  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
28  .ti +5n  .ti +5n
29  .B const char **\fIerrptr\fP);  .B const char **\fIerrptr\fP);
# Line 27  PCRE - Perl-compatible regular expressio Line 36  PCRE - Perl-compatible regular expressio
36  .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);  .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
37  .PP  .PP
38  .br  .br
39    .B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
40    .ti +5n
41    .B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
42    .ti +5n
43    .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
44    .ti +5n
45    .B int *\fIworkspace\fP, int \fIwscount\fP);
46    .PP
47    .br
48  .B int pcre_copy_named_substring(const pcre *\fIcode\fP,  .B int pcre_copy_named_substring(const pcre *\fIcode\fP,
49  .ti +5n  .ti +5n
50  .B const char *\fIsubject\fP, int *\fIovector\fP,  .B const char *\fIsubject\fP, int *\fIovector\fP,
# Line 87  PCRE - Perl-compatible regular expressio Line 105  PCRE - Perl-compatible regular expressio
105  .B *\fIfirstcharptr\fP);  .B *\fIfirstcharptr\fP);
106  .PP  .PP
107  .br  .br
108    .B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
109    .PP
110    .br
111  .B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);  .B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);
112  .PP  .PP
113  .br  .br
# Line 111  PCRE - Perl-compatible regular expressio Line 132  PCRE - Perl-compatible regular expressio
132  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
133  .rs  .rs
134  .sp  .sp
135  PCRE has its own native API, which is described in this document. There is also  PCRE has its own native API, which is described in this document. There is
136  a set of wrapper functions that correspond to the POSIX regular expression API.  also a set of wrapper functions that correspond to the POSIX regular expression
137  These are described in the  API. These are described in the
138  .\" HREF  .\" HREF
139  \fBpcreposix\fP  \fBpcreposix\fP
140  .\"  .\"
141  documentation.  documentation. Both of these APIs define a set of C function calls. A C++
142    wrapper is distributed with PCRE. It is documented in the
143    .\" HREF
144    \fBpcrecpp\fP
145    .\"
146    page.
147  .P  .P
148  The native API function prototypes are defined in the header file \fBpcre.h\fP,  The native API C function prototypes are defined in the header file
149  and on Unix systems the library itself is called \fBlibpcre\fP. It can  \fBpcre.h\fP, and on Unix systems the library itself is called \fBlibpcre\fP.
150  normally be accessed by adding \fB-lpcre\fP to the command for linking an  It can normally be accessed by adding \fB-lpcre\fP to the command for linking
151  application that uses PCRE. The header file defines the macros PCRE_MAJOR and  an application that uses PCRE. The header file defines the macros PCRE_MAJOR
152  PCRE_MINOR to contain the major and minor release numbers for the library.  and PCRE_MINOR to contain the major and minor release numbers for the library.
153  Applications can use these to include support for different releases of PCRE.  Applications can use these to include support for different releases of PCRE.
154  .P  .P
155  The functions \fBpcre_compile()\fP, \fBpcre_study()\fP, and \fBpcre_exec()\fP  The functions \fBpcre_compile()\fP, \fBpcre_compile2()\fP, \fBpcre_study()\fP,
156  are used for compiling and matching regular expressions. A sample program that  and \fBpcre_exec()\fP are used for compiling and matching regular expressions
157  demonstrates the simplest way of using them is provided in the file called  in a Perl-compatible manner. A sample program that demonstrates the simplest
158  \fIpcredemo.c\fP in the source distribution. The  way of using them is provided in the file called \fIpcredemo.c\fP in the source
159    distribution. The
160  .\" HREF  .\" HREF
161  \fBpcresample\fP  \fBpcresample\fP
162  .\"  .\"
163  documentation describes how to run it.  documentation describes how to run it.
164  .P  .P
165    A second matching function, \fBpcre_dfa_exec()\fP, which is not
166    Perl-compatible, is also provided. This uses a different algorithm for the
167    matching. This allows it to find all possible matches (at a given point in the
168    subject), not just one. However, this algorithm does not return captured
169    substrings. A description of the two matching algorithms and their advantages
170    and disadvantages is given in the
171    .\" HREF
172    \fBpcrematching\fP
173    .\"
174    documentation.
175    .P
176  In addition to the main compiling and matching functions, there are convenience  In addition to the main compiling and matching functions, there are convenience
177  functions for extracting captured substrings from a matched subject string.  functions for extracting captured substrings from a subject string that is
178  They are:  matched by \fBpcre_exec()\fP. They are:
179  .sp  .sp
180    \fBpcre_copy_substring()\fP    \fBpcre_copy_substring()\fP
181    \fBpcre_copy_named_substring()\fP    \fBpcre_copy_named_substring()\fP
# Line 150  They are: Line 188  They are:
188  provided, to free the memory used for extracted strings.  provided, to free the memory used for extracted strings.
189  .P  .P
190  The function \fBpcre_maketables()\fP is used to build a set of character tables  The function \fBpcre_maketables()\fP is used to build a set of character tables
191  in the current locale for passing to \fBpcre_compile()\fP or \fBpcre_exec()\fP.  in the current locale for passing to \fBpcre_compile()\fP, \fBpcre_exec()\fP,
192  This is an optional facility that is provided for specialist use. Most  or \fBpcre_dfa_exec()\fP. This is an optional facility that is provided for
193  commonly, no special tables are passed, in which case internal tables that are  specialist use. Most commonly, no special tables are passed, in which case
194  generated when PCRE is built are used.  internal tables that are generated when PCRE is built are used.
195  .P  .P
196  The function \fBpcre_fullinfo()\fP is used to find out information about a  The function \fBpcre_fullinfo()\fP is used to find out information about a
197  compiled pattern; \fBpcre_info()\fP is an obsolete version that returns only  compiled pattern; \fBpcre_info()\fP is an obsolete version that returns only
# Line 161  some of the available information, but i Line 199  some of the available information, but i
199  The function \fBpcre_version()\fP returns a pointer to a string containing the  The function \fBpcre_version()\fP returns a pointer to a string containing the
200  version of PCRE and its date of release.  version of PCRE and its date of release.
201  .P  .P
202    The function \fBpcre_refcount()\fP maintains a reference count in a data block
203    containing a compiled pattern. This is provided for the benefit of
204    object-oriented applications.
205    .P
206  The global variables \fBpcre_malloc\fP and \fBpcre_free\fP initially contain  The global variables \fBpcre_malloc\fP and \fBpcre_free\fP initially contain
207  the entry points of the standard \fBmalloc()\fP and \fBfree()\fP functions,  the entry points of the standard \fBmalloc()\fP and \fBfree()\fP functions,
208  respectively. PCRE calls the memory management functions via these variables,  respectively. PCRE calls the memory management functions via these variables,
# Line 170  should be done before calling any PCRE f Line 212  should be done before calling any PCRE f
212  The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also  The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also
213  indirections to memory management functions. These special functions are used  indirections to memory management functions. These special functions are used
214  only when PCRE is compiled to use the heap for remembering data, instead of  only when PCRE is compiled to use the heap for remembering data, instead of
215  recursive function calls. This is a non-standard way of building PCRE, for use  recursive function calls, when running the \fBpcre_exec()\fP function. This is
216  in environments that have limited stacks. Because of the greater use of memory  a non-standard way of building PCRE, for use in environments that have limited
217  management, it runs more slowly. Separate functions are provided so that  stacks. Because of the greater use of memory management, it runs more slowly.
218  special-purpose external code can be used for this case. When used, these  Separate functions are provided so that special-purpose external code can be
219  functions are always called in a stack-like manner (last obtained, first  used for this case. When used, these functions are always called in a
220  freed), and always for memory blocks of the same size.  stack-like manner (last obtained, first freed), and always for memory blocks of
221    the same size.
222  .P  .P
223  The global variable \fBpcre_callout\fP initially contains NULL. It can be set  The global variable \fBpcre_callout\fP initially contains NULL. It can be set
224  by the caller to a "callout" function, which PCRE will then call at specified  by the caller to a "callout" function, which PCRE will then call at specified
# Line 266  The output is an integer that gives the Line 309  The output is an integer that gives the
309  internal matching function calls in a \fBpcre_exec()\fP execution. Further  internal matching function calls in a \fBpcre_exec()\fP execution. Further
310  details are given with \fBpcre_exec()\fP below.  details are given with \fBpcre_exec()\fP below.
311  .sp  .sp
312      PCRE_CONFIG_MATCH_LIMIT_RECURSION
313    .sp
314    The output is an integer that gives the default limit for the depth of
315    recursion when calling the internal matching function in a \fBpcre_exec()\fP
316    execution. Further details are given with \fBpcre_exec()\fP below.
317    .sp
318    PCRE_CONFIG_STACKRECURSE    PCRE_CONFIG_STACKRECURSE
319  .sp  .sp
320  The output is an integer that is set to one if internal recursion is  The output is an integer that is set to one if internal recursion when running
321  implemented by recursive function calls that use the stack to remember their  \fBpcre_exec()\fP is implemented by recursive function calls that use the stack
322  state. This is the usual way that PCRE is compiled. The output is zero if PCRE  to remember their state. This is the usual way that PCRE is compiled. The
323  was compiled to use blocks of data on the heap instead of recursive function  output is zero if PCRE was compiled to use blocks of data on the heap instead
324  calls. In this case, \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are  of recursive function calls. In this case, \fBpcre_stack_malloc\fP and
325  called to manage memory blocks on the heap, thus avoiding the use of the stack.  \fBpcre_stack_free\fP are called to manage memory blocks on the heap, thus
326    avoiding the use of the stack.
327  .  .
328  .  .
329  .SH "COMPILING A PATTERN"  .SH "COMPILING A PATTERN"
# Line 284  called to manage memory blocks on the he Line 334  called to manage memory blocks on the he
334  .B const char **\fIerrptr\fP, int *\fIerroffset\fP,  .B const char **\fIerrptr\fP, int *\fIerroffset\fP,
335  .ti +5n  .ti +5n
336  .B const unsigned char *\fItableptr\fP);  .B const unsigned char *\fItableptr\fP);
337    .sp
338    .B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
339    .ti +5n
340    .B int *\fIerrorcodeptr\fP,
341    .ti +5n
342    .B const char **\fIerrptr\fP, int *\fIerroffset\fP,
343    .ti +5n
344    .B const unsigned char *\fItableptr\fP);
345  .P  .P
346  The function \fBpcre_compile()\fP is called to compile a pattern into an  Either of the functions \fBpcre_compile()\fP or \fBpcre_compile2()\fP can be
347  internal form. The pattern is a C string terminated by a binary zero, and  called to compile a pattern into an internal form. The only difference between
348  is passed in the \fIpattern\fP argument. A pointer to a single block of memory  the two interfaces is that \fBpcre_compile2()\fP has an additional argument,
349  that is obtained via \fBpcre_malloc\fP is returned. This contains the compiled  \fIerrorcodeptr\fP, via which a numerical error code can be returned.
350  code and related data. The \fBpcre\fP type is defined for the returned block;  .P
351  this is a typedef for a structure whose contents are not externally defined. It  The pattern is a C string terminated by a binary zero, and is passed in the
352  is up to the caller to free the memory when it is no longer required.  \fIpattern\fP argument. A pointer to a single block of memory that is obtained
353    via \fBpcre_malloc\fP is returned. This contains the compiled code and related
354    data. The \fBpcre\fP type is defined for the returned block; this is a typedef
355    for a structure whose contents are not externally defined. It is up to the
356    caller to free the memory when it is no longer required.
357  .P  .P
358  Although the compiled code of a PCRE regex is relocatable, that is, it does not  Although the compiled code of a PCRE regex is relocatable, that is, it does not
359  depend on memory location, the complete \fBpcre\fP data block is not  depend on memory location, the complete \fBpcre\fP data block is not
# Line 314  time. Line 376  time.
376  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.  If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately.
377  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns
378  NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual  NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual
379  error message. The offset from the start of the pattern to the character where  error message. This is a static string that is part of the library. You must
380  the error was discovered is placed in the variable pointed to by  not try to free it. The offset from the start of the pattern to the character
381    where the error was discovered is placed in the variable pointed to by
382  \fIerroffset\fP, which must not be NULL. If it is, an immediate error is given.  \fIerroffset\fP, which must not be NULL. If it is, an immediate error is given.
383  .P  .P
384    If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the
385    \fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is
386    returned via this argument in the event of an error. This is in addition to the
387    textual error message. Error codes and messages are listed below.
388    .P
389  If the final argument, \fItableptr\fP, is NULL, PCRE uses a default set of  If the final argument, \fItableptr\fP, is NULL, PCRE uses a default set of
390  character tables that are built when PCRE is compiled, using the default C  character tables that are built when PCRE is compiled, using the default C
391  locale. Otherwise, \fItableptr\fP must be an address that is the result of a  locale. Otherwise, \fItableptr\fP must be an address that is the result of a
# Line 362  documentation. Line 430  documentation.
430  .sp  .sp
431  If this bit is set, letters in the pattern match both upper and lower case  If this bit is set, letters in the pattern match both upper and lower case
432  letters. It is equivalent to Perl's /i option, and it can be changed within a  letters. It is equivalent to Perl's /i option, and it can be changed within a
433  pattern by a (?i) option setting. When running in UTF-8 mode, case support for  pattern by a (?i) option setting. In UTF-8 mode, PCRE always understands the
434  high-valued characters is available only when PCRE is built with Unicode  concept of case for characters whose values are less than 128, so caseless
435  character property support.  matching is always possible. For characters with higher values, the concept of
436    case is supported if PCRE is compiled with Unicode property support, but not
437    otherwise. If you want to use caseless matching for characters 128 and above,
438    you must ensure that PCRE is compiled with Unicode property support as well as
439    with UTF-8 support.
440  .sp  .sp
441    PCRE_DOLLAR_ENDONLY    PCRE_DOLLAR_ENDONLY
442  .sp  .sp
# Line 408  special meaning is treated as a literal. Line 480  special meaning is treated as a literal.
480  controlled by this option. It can also be set by a (?X) option setting within a  controlled by this option. It can also be set by a (?X) option setting within a
481  pattern.  pattern.
482  .sp  .sp
483      PCRE_FIRSTLINE
484    .sp
485    If this option is set, an unanchored pattern is required to match before or at
486    the first newline character in the subject string, though the matched text may
487    continue over the newline.
488    .sp
489    PCRE_MULTILINE    PCRE_MULTILINE
490  .sp  .sp
491  By default, PCRE treats the subject string as consisting of a single line of  By default, PCRE treats the subject string as consisting of a single line of
# Line 463  automatically checked. If an invalid UTF Line 541  automatically checked. If an invalid UTF
541  valid, and you want to skip this check for performance reasons, you can set the  valid, and you want to skip this check for performance reasons, you can set the
542  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid  PCRE_NO_UTF8_CHECK option. When it is set, the effect of passing an invalid
543  UTF-8 string as a pattern is undefined. It may cause your program to crash.  UTF-8 string as a pattern is undefined. It may cause your program to crash.
544  Note that this option can also be passed to \fBpcre_exec()\fP, to suppress the  Note that this option can also be passed to \fBpcre_exec()\fP and
545  UTF-8 validity checking of subject strings.  \fBpcre_dfa_exec()\fP, to suppress the UTF-8 validity checking of subject
546    strings.
547    .
548    .
549    .SH "COMPILATION ERROR CODES"
550    .rs
551    .sp
552    The following table lists the error codes than may be returned by
553    \fBpcre_compile2()\fP, along with the error messages that may be returned by
554    both compiling functions.
555    .sp
556       0  no error
557       1  \e at end of pattern
558       2  \ec at end of pattern
559       3  unrecognized character follows \e
560       4  numbers out of order in {} quantifier
561       5  number too big in {} quantifier
562       6  missing terminating ] for character class
563       7  invalid escape sequence in character class
564       8  range out of order in character class
565       9  nothing to repeat
566      10  operand of unlimited repeat could match the empty string
567      11  internal error: unexpected repeat
568      12  unrecognized character after (?
569      13  POSIX named classes are supported only within a class
570      14  missing )
571      15  reference to non-existent subpattern
572      16  erroffset passed as NULL
573      17  unknown option bit(s) set
574      18  missing ) after comment
575      19  parentheses nested too deeply
576      20  regular expression too large
577      21  failed to get memory
578      22  unmatched parentheses
579      23  internal error: code overflow
580      24  unrecognized character after (?<
581      25  lookbehind assertion is not fixed length
582      26  malformed number after (?(
583      27  conditional group contains more than two branches
584      28  assertion expected after (?(
585      29  (?R or (?digits must be followed by )
586      30  unknown POSIX class name
587      31  POSIX collating elements are not supported
588      32  this version of PCRE is not compiled with PCRE_UTF8 support
589      33  spare error
590      34  character value in \ex{...} sequence is too large
591      35  invalid condition (?(0)
592      36  \eC not allowed in lookbehind assertion
593      37  PCRE does not support \eL, \el, \eN, \eU, or \eu
594      38  number after (?C is > 255
595      39  closing ) for (?C expected
596      40  recursive call could loop indefinitely
597      41  unrecognized character after (?P
598      42  syntax error after (?P
599      43  two named groups have the same name
600      44  invalid UTF-8 string
601      45  support for \eP, \ep, and \eX has not been compiled
602      46  malformed \eP or \ep sequence
603      47  unknown property name after \eP or \ep
604  .  .
605  .  .
606  .SH "STUDYING A PATTERN"  .SH "STUDYING A PATTERN"
607  .rs  .rs
608  .sp  .sp
609  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,  .B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP
610  .ti +5n  .ti +5n
611  .B const char **\fIerrptr\fP);  .B const char **\fIerrptr\fP);
612  .PP  .PP
# Line 492  below Line 628  below
628  .\"  .\"
629  in the section on matching a pattern.  in the section on matching a pattern.
630  .P  .P
631  If studying the pattern does not produce any additional information,  If studying the pattern does not produce any additional information
632  \fBpcre_study()\fP returns NULL. In that circumstance, if the calling program  \fBpcre_study()\fP returns NULL. In that circumstance, if the calling program
633  wants to pass any of the other fields to \fBpcre_exec()\fP, it must set up its  wants to pass any of the other fields to \fBpcre_exec()\fP, it must set up its
634  own \fBpcre_extra\fP block.  own \fBpcre_extra\fP block.
# Line 502  options are defined, and this argument s Line 638  options are defined, and this argument s
638  .P  .P
639  The third argument for \fBpcre_study()\fP is a pointer for an error message. If  The third argument for \fBpcre_study()\fP is a pointer for an error message. If
640  studying succeeds (even if no data is returned), the variable it points to is  studying succeeds (even if no data is returned), the variable it points to is
641  set to NULL. Otherwise it points to a textual error message. You should  set to NULL. Otherwise it is set to point to a textual error message. This is a
642  therefore test the error pointer for NULL after calling \fBpcre_study()\fP, to  static string that is part of the library. You must not try to free it. You
643  be sure that it has run successfully.  should test the error pointer for NULL after calling \fBpcre_study()\fP, to be
644    sure that it has run successfully.
645  .P  .P
646  This is a typical call to \fBpcre_study\fP():  This is a typical call to \fBpcre_study\fP():
647  .sp  .sp
# Line 523  bytes is created. Line 660  bytes is created.
660  .SH "LOCALE SUPPORT"  .SH "LOCALE SUPPORT"
661  .rs  .rs
662  .sp  .sp
663  PCRE handles caseless matching, and determines whether characters are letters,  PCRE handles caseless matching, and determines whether characters are letters
664  digits, or whatever, by reference to a set of tables, indexed by character  digits, or whatever, by reference to a set of tables, indexed by character
665  value. (When running in UTF-8 mode, this applies only to characters with codes  value. When running in UTF-8 mode, this applies only to characters with codes
666  less than 128. Higher-valued codes never match escapes such as \ew or \ed, but  less than 128. Higher-valued codes never match escapes such as \ew or \ed, but
667  can be tested with \ep if PCRE is built with Unicode character property  can be tested with \ep if PCRE is built with Unicode character property
668  support.)  support. The use of locales with Unicode is discouraged.
669  .P  .P
670  An internal set of tables is created in the default C locale when PCRE is  An internal set of tables is created in the default C locale when PCRE is
671  built. This is used when the final argument of \fBpcre_compile()\fP is NULL,  built. This is used when the final argument of \fBpcre_compile()\fP is NULL,
# Line 615  no back references. Line 752  no back references.
752  Return the number of capturing subpatterns in the pattern. The fourth argument  Return the number of capturing subpatterns in the pattern. The fourth argument
753  should point to an \fBint\fP variable.  should point to an \fBint\fP variable.
754  .sp  .sp
755    PCRE_INFO_DEFAULTTABLES    PCRE_INFO_DEFAULT_TABLES
756  .sp  .sp
757  Return a pointer to the internal default character tables within PCRE. The  Return a pointer to the internal default character tables within PCRE. The
758  fourth argument should point to an \fBunsigned char *\fP variable. This  fourth argument should point to an \fBunsigned char *\fP variable. This
# Line 760  it is used to pass back information abou Line 897  it is used to pass back information abou
897  string (see PCRE_INFO_FIRSTBYTE above).  string (see PCRE_INFO_FIRSTBYTE above).
898  .  .
899  .  .
900  .SH "MATCHING A PATTERN"  .SH "REFERENCE COUNTS"
901    .rs
902    .sp
903    .B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
904    .PP
905    The \fBpcre_refcount()\fP function is used to maintain a reference count in the
906    data block that contains a compiled pattern. It is provided for the benefit of
907    applications that operate in an object-oriented manner, where different parts
908    of the application may be using the same compiled pattern, but you want to free
909    the block when they are all done.
910    .P
911    When a pattern is compiled, the reference count field is initialized to zero.
912    It is changed only by calling this function, whose action is to add the
913    \fIadjust\fP value (which may be positive or negative) to it. The yield of the
914    function is the new value. However, the value of the count is constrained to
915    lie between 0 and 65535, inclusive. If the new value is outside these limits,
916    it is forced to the appropriate limit value.
917    .P
918    Except when it is zero, the reference count is not correctly preserved if a
919    pattern is compiled on one host and then transferred to a host whose byte-order
920    is different. (This seems a highly unlikely scenario.)
921    .
922    .
923    .SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION"
924  .rs  .rs
925  .sp  .sp
926  .B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"  .B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
# Line 772  string (see PCRE_INFO_FIRSTBYTE above). Line 932  string (see PCRE_INFO_FIRSTBYTE above).
932  The function \fBpcre_exec()\fP is called to match a subject string against a  The function \fBpcre_exec()\fP is called to match a subject string against a
933  compiled pattern, which is passed in the \fIcode\fP argument. If the  compiled pattern, which is passed in the \fIcode\fP argument. If the
934  pattern has been studied, the result of the study should be passed in the  pattern has been studied, the result of the study should be passed in the
935  \fIextra\fP argument.  \fIextra\fP argument. This function is the main matching facility of the
936    library, and it operates in a Perl-like manner. For specialist use there is
937    also an alternative matching function, which is described
938    .\" HTML <a href="#dfamatch">
939    .\" </a>
940    below
941    .\"
942    in the section about the \fBpcre_dfa_exec()\fP function.
943  .P  .P
944  In most applications, the pattern will have been compiled (and optionally  In most applications, the pattern will have been compiled (and optionally
945  studied) in the same process that calls \fBpcre_exec()\fP. However, it is  studied) in the same process that calls \fBpcre_exec()\fP. However, it is
# Line 796  Here is an example of a simple call to \ Line 963  Here is an example of a simple call to \
963      0,              /* start at offset 0 in the subject */      0,              /* start at offset 0 in the subject */
964      0,              /* default options */      0,              /* default options */
965      ovector,        /* vector of integers for substring information */      ovector,        /* vector of integers for substring information */
966      30);            /* number of elements in the vector (NOT size in bytes) */      30);            /* number of elements (NOT size in bytes) */
967  .  .
968  .\" HTML <a name="extradata"></a>  .\" HTML <a name="extradata"></a>
969  .SS "Extra data for \fBpcre_exec()\fR"  .SS "Extra data for \fBpcre_exec()\fR"
# Line 805  Here is an example of a simple call to \ Line 972  Here is an example of a simple call to \
972  If the \fIextra\fP argument is not NULL, it must point to a \fBpcre_extra\fP  If the \fIextra\fP argument is not NULL, it must point to a \fBpcre_extra\fP
973  data block. The \fBpcre_study()\fP function returns such a block (when it  data block. The \fBpcre_study()\fP function returns such a block (when it
974  doesn't return NULL), but you can also create one for yourself, and pass  doesn't return NULL), but you can also create one for yourself, and pass
975  additional information in it. The fields in a \fBpcre_extra\fP block are as  additional information in it. The \fBpcre_extra\fP block contains the following
976  follows:  fields (not necessarily in this order):
977  .sp  .sp
978    unsigned long int \fIflags\fP;    unsigned long int \fIflags\fP;
979    void *\fIstudy_data\fP;    void *\fIstudy_data\fP;
980    unsigned long int \fImatch_limit\fP;    unsigned long int \fImatch_limit\fP;
981      unsigned long int \fImatch_limit_recursion\fP;
982    void *\fIcallout_data\fP;    void *\fIcallout_data\fP;
983    const unsigned char *\fItables\fP;    const unsigned char *\fItables\fP;
984  .sp  .sp
# Line 819  are set. The flag bits are: Line 987  are set. The flag bits are:
987  .sp  .sp
988    PCRE_EXTRA_STUDY_DATA    PCRE_EXTRA_STUDY_DATA
989    PCRE_EXTRA_MATCH_LIMIT    PCRE_EXTRA_MATCH_LIMIT
990      PCRE_EXTRA_MATCH_LIMIT_RECURSION
991    PCRE_EXTRA_CALLOUT_DATA    PCRE_EXTRA_CALLOUT_DATA
992    PCRE_EXTRA_TABLES    PCRE_EXTRA_TABLES
993  .sp  .sp
# Line 833  but which have a very large number of po Line 1002  but which have a very large number of po
1002  classic example is the use of nested unlimited repeats.  classic example is the use of nested unlimited repeats.
1003  .P  .P
1004  Internally, PCRE uses a function called \fBmatch()\fP which it calls repeatedly  Internally, PCRE uses a function called \fBmatch()\fP which it calls repeatedly
1005  (sometimes recursively). The limit is imposed on the number of times this  (sometimes recursively). The limit set by \fImatch_limit\fP is imposed on the
1006  function is called during a match, which has the effect of limiting the amount  number of times this function is called during a match, which has the effect of
1007  of recursion and backtracking that can take place. For patterns that are not  limiting the amount of backtracking that can take place. For patterns that are
1008  anchored, the count starts from zero for each position in the subject string.  not anchored, the count restarts from zero for each position in the subject
1009    string.
1010  .P  .P
1011  The default limit for the library can be set when PCRE is built; the default  The default value for the limit can be set when PCRE is built; the default
1012  default is 10 million, which handles all but the most extreme cases. You can  default is 10 million, which handles all but the most extreme cases. You can
1013  reduce the default by suppling \fBpcre_exec()\fP with a \fBpcre_extra\fP block  override the default by suppling \fBpcre_exec()\fP with a \fBpcre_extra\fP
1014  in which \fImatch_limit\fP is set to a smaller value, and  block in which \fImatch_limit\fP is set, and PCRE_EXTRA_MATCH_LIMIT is set in
1015  PCRE_EXTRA_MATCH_LIMIT is set in the \fIflags\fP field. If the limit is  the \fIflags\fP field. If the limit is exceeded, \fBpcre_exec()\fP returns
1016  exceeded, \fBpcre_exec()\fP returns PCRE_ERROR_MATCHLIMIT.  PCRE_ERROR_MATCHLIMIT.
1017    .P
1018    The \fImatch_limit_recursion\fP field is similar to \fImatch_limit\fP, but
1019    instead of limiting the total number of times that \fBmatch()\fP is called, it
1020    limits the depth of recursion. The recursion depth is a smaller number than the
1021    total number of calls, because not all calls to \fBmatch()\fP are recursive.
1022    This limit is of use only if it is set smaller than \fImatch_limit\fP.
1023    .P
1024    Limiting the recursion depth limits the amount of stack that can be used, or,
1025    when PCRE has been compiled to use memory on the heap instead of the stack, the
1026    amount of heap memory that can be used.
1027    .P
1028    The default value for \fImatch_limit_recursion\fP can be set when PCRE is
1029    built; the default default is the same value as the default for
1030    \fImatch_limit\fP. You can override the default by suppling \fBpcre_exec()\fP
1031    with a \fBpcre_extra\fP block in which \fImatch_limit_recursion\fP is set, and
1032    PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the \fIflags\fP field. If the limit
1033    is exceeded, \fBpcre_exec()\fP returns PCRE_ERROR_RECURSIONLIMIT.
1034  .P  .P
1035  The \fIpcre_callout\fP field is used in conjunction with the "callout" feature,  The \fIpcre_callout\fP field is used in conjunction with the "callout" feature,
1036  which is described in the  which is described in the
# Line 1041  subpatterns there are in a compiled patt Line 1228  subpatterns there are in a compiled patt
1228  \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to  \fIovector\fP that will allow for \fIn\fP captured substrings, in addition to
1229  the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.  the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3.
1230  .  .
1231    .\" HTML <a name="errorlist"></a>
1232  .SS "Return values from \fBpcre_exec()\fP"  .SS "Return values from \fBpcre_exec()\fP"
1233  .rs  .rs
1234  .sp  .sp
# Line 1090  below). It is never returned by \fBpcre_ Line 1278  below). It is never returned by \fBpcre_
1278  .sp  .sp
1279    PCRE_ERROR_MATCHLIMIT     (-8)    PCRE_ERROR_MATCHLIMIT     (-8)
1280  .sp  .sp
1281  The recursion and backtracking limit, as specified by the \fImatch_limit\fP  The backtracking limit, as specified by the \fImatch_limit\fP field in a
1282    \fBpcre_extra\fP structure (or defaulted) was reached. See the description
1283    above.
1284    .sp
1285      PCRE_ERROR_RECURSIONLIMIT (-21)
1286    .sp
1287    The internal recursion limit, as specified by the \fImatch_limit_recursion\fP
1288  field in a \fBpcre_extra\fP structure (or defaulted) was reached. See the  field in a \fBpcre_extra\fP structure (or defaulted) was reached. See the
1289  description above.  description above.
1290  .sp  .sp
# Line 1112  A string that contains an invalid UTF-8 Line 1306  A string that contains an invalid UTF-8
1306  The UTF-8 byte sequence that was passed as a subject was valid, but the value  The UTF-8 byte sequence that was passed as a subject was valid, but the value
1307  of \fIstartoffset\fP did not point to the beginning of a UTF-8 character.  of \fIstartoffset\fP did not point to the beginning of a UTF-8 character.
1308  .sp  .sp
1309    PCRE_ERROR_PARTIAL (-12)    PCRE_ERROR_PARTIAL        (-12)
1310  .sp  .sp
1311  The subject string did not match, but it did match partially. See the  The subject string did not match, but it did match partially. See the
1312  .\" HREF  .\" HREF
# Line 1120  The subject string did not match, but it Line 1314  The subject string did not match, but it
1314  .\"  .\"
1315  documentation for details of partial matching.  documentation for details of partial matching.
1316  .sp  .sp
1317    PCRE_ERROR_BAD_PARTIAL (-13)    PCRE_ERROR_BADPARTIAL     (-13)
1318  .sp  .sp
1319  The PCRE_PARTIAL option was used with a compiled pattern containing items that  The PCRE_PARTIAL option was used with a compiled pattern containing items that
1320  are not supported for partial matching. See the  are not supported for partial matching. See the
# Line 1129  are not supported for partial matching. Line 1323  are not supported for partial matching.
1323  .\"  .\"
1324  documentation for details of partial matching.  documentation for details of partial matching.
1325  .sp  .sp
1326    PCRE_ERROR_INTERNAL (-14)    PCRE_ERROR_INTERNAL       (-14)
1327  .sp  .sp
1328  An unexpected internal error has occurred. This error could be caused by a bug  An unexpected internal error has occurred. This error could be caused by a bug
1329  in PCRE or by overwriting of the compiled pattern.  in PCRE or by overwriting of the compiled pattern.
1330  .sp  .sp
1331    PCRE_ERROR_BADCOUNT (-15)    PCRE_ERROR_BADCOUNT       (-15)
1332  .sp  .sp
1333  This error is given if the value of the \fIovecsize\fP argument is negative.  This error is given if the value of the \fIovecsize\fP argument is negative.
1334  .  .
# Line 1256  provided. Line 1450  provided.
1450  To extract a substring by name, you first have to find associated number.  To extract a substring by name, you first have to find associated number.
1451  For example, for this pattern  For example, for this pattern
1452  .sp  .sp
1453    (a+)b(?<xxx>\ed+)...    (a+)b(?P<xxx>\ed+)...
1454  .sp  .sp
1455  the number of the subpattern called "xxx" is 2. You can find the number from  the number of the subpattern called "xxx" is 2. You can find the number from
1456  the name by calling \fBpcre_get_stringnumber()\fP. The first argument is the  the name by calling \fBpcre_get_stringnumber()\fP. The first argument is the
# Line 1281  translation table. Line 1475  translation table.
1475  These functions call \fBpcre_get_stringnumber()\fP, and if it succeeds, they  These functions call \fBpcre_get_stringnumber()\fP, and if it succeeds, they
1476  then call \fIpcre_copy_substring()\fP or \fIpcre_get_substring()\fP, as  then call \fIpcre_copy_substring()\fP or \fIpcre_get_substring()\fP, as
1477  appropriate.  appropriate.
1478    .
1479    .
1480    .SH "FINDING ALL POSSIBLE MATCHES"
1481    .rs
1482    .sp
1483    The traditional matching function uses a similar algorithm to Perl, which stops
1484    when it finds the first match, starting at a given point in the subject. If you
1485    want to find all possible matches, or the longest possible match, consider
1486    using the alternative matching function (see below) instead. If you cannot use
1487    the alternative function, but still need to find all possible matches, you
1488    can kludge it up by making use of the callout facility, which is described in
1489    the
1490    .\" HREF
1491    \fBpcrecallout\fP
1492    .\"
1493    documentation.
1494    .P
1495    What you have to do is to insert a callout right at the end of the pattern.
1496    When your callout function is called, extract and save the current matched
1497    substring. Then return 1, which forces \fBpcre_exec()\fP to backtrack and try
1498    other alternatives. Ultimately, when it runs out of matches, \fBpcre_exec()\fP
1499    will yield PCRE_ERROR_NOMATCH.
1500    .
1501    .
1502    .\" HTML <a name="dfamatch"></a>
1503    .SH "MATCHING A PATTERN: THE ALTERNATIVE FUNCTION"
1504    .rs
1505    .sp
1506    .B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
1507    .ti +5n
1508    .B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
1509    .ti +5n
1510    .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
1511    .ti +5n
1512    .B int *\fIworkspace\fP, int \fIwscount\fP);
1513    .P
1514    The function \fBpcre_dfa_exec()\fP is called to match a subject string against
1515    a compiled pattern, using a "DFA" matching algorithm. This has different
1516    characteristics to the normal algorithm, and is not compatible with Perl. Some
1517    of the features of PCRE patterns are not supported. Nevertheless, there are
1518    times when this kind of matching can be useful. For a discussion of the two
1519    matching algorithms, see the
1520    .\" HREF
1521    \fBpcrematching\fP
1522    .\"
1523    documentation.
1524    .P
1525    The arguments for the \fBpcre_dfa_exec()\fP function are the same as for
1526    \fBpcre_exec()\fP, plus two extras. The \fIovector\fP argument is used in a
1527    different way, and this is described below. The other common arguments are used
1528    in the same way as for \fBpcre_exec()\fP, so their description is not repeated
1529    here.
1530    .P
1531    The two additional arguments provide workspace for the function. The workspace
1532    vector should contain at least 20 elements. It is used for keeping track of
1533    multiple paths through the pattern tree. More workspace will be needed for
1534    patterns and subjects where there are a lot of possible matches.
1535    .P
1536    Here is an example of a simple call to \fBpcre_dfa_exec()\fP:
1537    .sp
1538      int rc;
1539      int ovector[10];
1540      int wspace[20];
1541      rc = pcre_dfa_exec(
1542        re,             /* result of pcre_compile() */
1543        NULL,           /* we didn't study the pattern */
1544        "some string",  /* the subject string */
1545        11,             /* the length of the subject string */
1546        0,              /* start at offset 0 in the subject */
1547        0,              /* default options */
1548        ovector,        /* vector of integers for substring information */
1549        10,             /* number of elements (NOT size in bytes) */
1550        wspace,         /* working space vector */
1551        20);            /* number of elements (NOT size in bytes) */
1552    .
1553    .SS "Option bits for \fBpcre_dfa_exec()\fP"
1554    .rs
1555    .sp
1556    The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be
1557    zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NOTBOL,
1558    PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL,
1559    PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last three of these are
1560    the same as for \fBpcre_exec()\fP, so their description is not repeated here.
1561    .sp
1562      PCRE_PARTIAL
1563    .sp
1564    This has the same general effect as it does for \fBpcre_exec()\fP, but the
1565    details are slightly different. When PCRE_PARTIAL is set for
1566    \fBpcre_dfa_exec()\fP, the return code PCRE_ERROR_NOMATCH is converted into
1567    PCRE_ERROR_PARTIAL if the end of the subject is reached, there have been no
1568    complete matches, but there is still at least one matching possibility. The
1569    portion of the string that provided the partial match is set as the first
1570    matching string.
1571    .sp
1572      PCRE_DFA_SHORTEST
1573    .sp
1574    Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as
1575    soon as it has found one match. Because of the way the DFA algorithm works,
1576    this is necessarily the shortest possible match at the first possible matching
1577    point in the subject string.
1578    .sp
1579      PCRE_DFA_RESTART
1580    .sp
1581    When \fBpcre_dfa_exec()\fP is called with the PCRE_PARTIAL option, and returns
1582    a partial match, it is possible to call it again, with additional subject
1583    characters, and have it continue with the same match. The PCRE_DFA_RESTART
1584    option requests this action; when it is set, the \fIworkspace\fP and
1585    \fIwscount\fP options must reference the same vector as before because data
1586    about the match so far is left in them after a partial match. There is more
1587    discussion of this facility in the
1588    .\" HREF
1589    \fBpcrepartial\fP
1590    .\"
1591    documentation.
1592    .
1593    .SS "Successful returns from \fBpcre_dfa_exec()\fP"
1594    .rs
1595    .sp
1596    When \fBpcre_dfa_exec()\fP succeeds, it may have matched more than one
1597    substring in the subject. Note, however, that all the matches from one run of
1598    the function start at the same point in the subject. The shorter matches are
1599    all initial substrings of the longer matches. For example, if the pattern
1600    .sp
1601      <.*>
1602    .sp
1603    is matched against the string
1604    .sp
1605      This is <something> <something else> <something further> no more
1606    .sp
1607    the three matched strings are
1608    .sp
1609      <something>
1610      <something> <something else>
1611      <something> <something else> <something further>
1612    .sp
1613    On success, the yield of the function is a number greater than zero, which is
1614    the number of matched substrings. The substrings themselves are returned in
1615    \fIovector\fP. Each string uses two elements; the first is the offset to the
1616    start, and the second is the offset to the end. All the strings have the same
1617    start offset. (Space could have been saved by giving this only once, but it was
1618    decided to retain some compatibility with the way \fBpcre_exec()\fP returns
1619    data, even though the meaning of the strings is different.)
1620    .P
1621    The strings are returned in reverse order of length; that is, the longest
1622    matching string is given first. If there were too many matches to fit into
1623    \fIovector\fP, the yield of the function is zero, and the vector is filled with
1624    the longest matches.
1625    .
1626    .SS "Error returns from \fBpcre_dfa_exec()\fP"
1627    .rs
1628    .sp
1629    The \fBpcre_dfa_exec()\fP function returns a negative number when it fails.
1630    Many of the errors are the same as for \fBpcre_exec()\fP, and these are
1631    described
1632    .\" HTML <a href="#errorlist">
1633    .\" </a>
1634    above.
1635    .\"
1636    There are in addition the following errors that are specific to
1637    \fBpcre_dfa_exec()\fP:
1638    .sp
1639      PCRE_ERROR_DFA_UITEM      (-16)
1640    .sp
1641    This return is given if \fBpcre_dfa_exec()\fP encounters an item in the pattern
1642    that it does not support, for instance, the use of \eC or a back reference.
1643    .sp
1644      PCRE_ERROR_DFA_UCOND      (-17)
1645    .sp
1646    This return is given if \fBpcre_dfa_exec()\fP encounters a condition item in a
1647    pattern that uses a back reference for the condition. This is not supported.
1648    .sp
1649      PCRE_ERROR_DFA_UMLIMIT    (-18)
1650    .sp
1651    This return is given if \fBpcre_dfa_exec()\fP is called with an \fIextra\fP
1652    block that contains a setting of the \fImatch_limit\fP field. This is not
1653    supported (it is meaningless).
1654    .sp
1655      PCRE_ERROR_DFA_WSSIZE     (-19)
1656    .sp
1657    This return is given if \fBpcre_dfa_exec()\fP runs out of space in the
1658    \fIworkspace\fP vector.
1659    .sp
1660      PCRE_ERROR_DFA_RECURSE    (-20)
1661    .sp
1662    When a recursive subpattern is processed, the matching function calls itself
1663    recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This
1664    error is given if the output vector is not large enough. This should be
1665    extremely rare, as a vector of size 1000 is used.
1666  .P  .P
1667  .in 0  .in 0
1668  Last updated: 09 September 2004  Last updated: 18 January 2006
1669  .br  .br
1670  Copyright (c) 1997-2004 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.75  
changed lines
  Added in v.87

  ViewVC Help
Powered by ViewVC 1.1.5