/[pcre]/code/tags/pcre-2.08/pcre.3
ViewVC logotype

Diff of /code/tags/pcre-2.08/pcre.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 23 by nigel, Sat Feb 24 21:38:41 2007 UTC revision 25 by nigel, Sat Feb 24 21:38:45 2007 UTC
# Line 8  pcre - Perl-compatible regular expressio Line 8  pcre - Perl-compatible regular expressio
8  .br  .br
9  .B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR,  .B pcre *pcre_compile(const char *\fIpattern\fR, int \fIoptions\fR,
10  .ti +5n  .ti +5n
11  .B const char **\fIerrptr\fR, int *\fIerroffset\fR);  .B const char **\fIerrptr\fR, int *\fIerroffset\fR,
12    .ti +5n
13    .B const unsigned char *\fItableptr\fR);
14    .PP
15    .br
16    .B const unsigned char *pcre_maketables(void);
17  .PP  .PP
18  .br  .br
19  .B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,  .B pcre_extra *pcre_study(const pcre *\fIcode\fR, int \fIoptions\fR,
# Line 34  pcre - Perl-compatible regular expressio Line 39  pcre - Perl-compatible regular expressio
39  .PP  .PP
40  .br  .br
41  .B void (*pcre_free)(void *);  .B void (*pcre_free)(void *);
 .PP  
 .br  
 .B unsigned char *pcre_cbits[128];  
 .PP  
 .br  
 .B unsigned char *pcre_ctypes[256];  
 .PP  
 .br  
 .B unsigned char *pcre_fcc[256];  
 .PP  
 .br  
 .B unsigned char *pcre_lcc[256];  
42    
43    
44    
# Line 60  a set of wrapper functions that correspo Line 53  a set of wrapper functions that correspo
53    
54  The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and  The three functions \fBpcre_compile()\fR, \fBpcre_study()\fR, and
55  \fBpcre_exec()\fR are used for compiling and matching regular expressions. The  \fBpcre_exec()\fR are used for compiling and matching regular expressions. The
56  function \fBpcre_info()\fR is used to find out information about a compiled  function \fBpcre_maketables()\fR is used (optionally) to build a set of
57    character tables in the current locale for passing to \fBpcre_compile()\fR.
58    
59    The function \fBpcre_info()\fR is used to find out information about a compiled
60  pattern, while the function \fBpcre_version()\fR returns a pointer to a string  pattern, while the function \fBpcre_version()\fR returns a pointer to a string
61  containing the version of PCRE and its date of release.  containing the version of PCRE and its date of release.
62    
# Line 70  respectively. PCRE calls the memory mana Line 66  respectively. PCRE calls the memory mana
66  so a calling program can replace them if it wishes to intercept the calls. This  so a calling program can replace them if it wishes to intercept the calls. This
67  should be done before calling any PCRE functions.  should be done before calling any PCRE functions.
68    
 The other global variables are character tables. They are initialized when PCRE  
 is compiled, from source that is generated by reference to the C character type  
 functions, but which a user of PCRE is free to modify. In principle the tables  
 could also be modified at run time. See PCRE's README file for more details.  
   
69    
70  .SH MULTI-THREADING  .SH MULTI-THREADING
71  The PCRE functions can be used in multi-threading applications, with the  The PCRE functions can be used in multi-threading applications, with the
72  proviso that the character tables and the memory management functions pointed  proviso that the memory management functions pointed to by \fBpcre_malloc\fR
73  to by \fBpcre_malloc\fR and \fBpcre_free\fR are shared by all threads.  and \fBpcre_free\fR are shared by all threads.
74    
75  The compiled form of a regular expression is not altered during matching, so  The compiled form of a regular expression is not altered during matching, so
76  the same compiled pattern can safely be used by several threads at once.  the same compiled pattern can safely be used by several threads at once.
# Line 88  the same compiled pattern can safely be Line 79  the same compiled pattern can safely be
79  .SH COMPILING A PATTERN  .SH COMPILING A PATTERN
80  The function \fBpcre_compile()\fR is called to compile a pattern into an  The function \fBpcre_compile()\fR is called to compile a pattern into an
81  internal form. The pattern is a C string terminated by a binary zero, and  internal form. The pattern is a C string terminated by a binary zero, and
82  is passed in the argument \fIpattern\fR. A pointer to the compiled code block  is passed in the argument \fIpattern\fR. A pointer to a single block of memory
83  is returned. The \fBpcre\fR type is defined for this for convenience, but in  that is obtained via \fBpcre_malloc\fR is returned. This contains the
84  fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the contents of the  compiled code and related data. The \fBpcre\fR type is defined for this for
85  block are not defined.  convenience, but in fact \fBpcre\fR is just a typedef for \fBvoid\fR, since the
86    contents of the block are not externally defined. It is up to the caller to
87    free the memory when it is no longer required.
88  .PP  .PP
89  The size of a compiled pattern is roughly proportional to the length of the  The size of a compiled pattern is roughly proportional to the length of the
90  pattern string, except that each character class (other than those containing  pattern string, except that each character class (other than those containing
# Line 111  time. Line 104  time.
104  If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.  If \fIerrptr\fR is NULL, \fBpcre_compile()\fR returns NULL immediately.
105  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns  Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fR returns
106  NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual  NULL, and sets the variable pointed to by \fIerrptr\fR to point to a textual
107  error message.  error message. The offset from the start of the pattern to the character where
108    the error was discovered is placed in the variable pointed to by
109  The offset from the start of the pattern to the character where the error was  \fIerroffset\fR, which must not be NULL. If it is, an immediate error is given.
110  discovered is placed in the variable pointed to by \fIerroffset\fR, which must  .PP
111  not be NULL. If it is, an immediate error is given.  If the final argument, \fItableptr\fR, is NULL, PCRE uses a default set of
112    character tables which are built when it is compiled, using the default C
113    locale. Otherwise, \fItableptr\fR must be the result of a call to
114    \fBpcre_maketables()\fR. See the section on locale support below.
115  .PP  .PP
116  The following option bits are defined in the header file:  The following option bits are defined in the header file:
117    
# Line 210  not have a single fixed starting charact Line 206  not have a single fixed starting charact
206  characters is created.  characters is created.
207    
208    
209    .SH LOCALE SUPPORT
210    PCRE handles caseless matching, and determines whether characters are letters,
211    digits, or whatever, by reference to a set of tables. The library contains a
212    default set of tables which is created in the default C locale when PCRE is
213    compiled. This is used when the final argument of \fBpcre_compile()\fR is NULL,
214    and is sufficient for many applications.
215    
216    An alternative set of tables can, however, be supplied. Such tables are built
217    by calling the \fBpcre_maketables()\fR function, which has no arguments, in the
218    relevant locale. The result can then be passed to \fBpcre_compile()\ as often
219    as necessary. For example, to build and use tables that are appropriate for the
220    French locale (where accented characters with codes greater than 128 are
221    treated as letters), the following code could be used:
222    
223      setlocale(LC_CTYPE, "fr");
224      tables = pcre_maketables();
225      re = pcre_compile(..., tables);
226    
227    The tables are built in memory that is obtained via \fBpcre_malloc\fR. The
228    pointer that is passed to \fBpcre_compile\fR is saved with the compiled
229    pattern, and the same tables are used via this pointer by \fBpcre_study()\fR
230    and \fBpcre_match()\fR. Thus for any single pattern, compilation, studying and
231    matching all happen in the same locale, but different patterns can be compiled
232    in different locales. It is the caller's responsibility to ensure that the
233    memory containing the tables remains available for as long as it is needed.
234    
235    
236  .SH MATCHING A PATTERN  .SH MATCHING A PATTERN
237  The function \fBpcre_exec()\fR is called to match a subject string against a  The function \fBpcre_exec()\fR is called to match a subject string against a
238  pre-compiled pattern, which is passed in the \fIcode\fR argument. If the  pre-compiled pattern, which is passed in the \fIcode\fR argument. If the
# Line 579  Each pair of escape sequences partitions Line 602  Each pair of escape sequences partitions
602  two disjoint sets. Any given character matches one, and only one, of each pair.  two disjoint sets. Any given character matches one, and only one, of each pair.
603    
604  A "word" character is any letter or digit or the underscore character, that is,  A "word" character is any letter or digit or the underscore character, that is,
605  any character which can be part of a Perl "word". These character type  any character which can be part of a Perl "word". The definition of letters and
606  sequences can appear both inside and outside character classes. They each match  digits is controlled by PCRE's character tables, and may vary if locale-
607  one character of the appropriate type. If the current matching point is at the  specific matching is taking place (see "Locale support" above). For example, in
608  end of the subject string, all of them fail, since there is no character to  the "fr" (French) locale, some character codes greater than 128 are used for
609  match.  accented letters, and these are matched by \\w.
610    
611    These character type sequences can appear both inside and outside character
612    classes. They each match one character of the appropriate type. If the current
613    matching point is at the end of the subject string, all of them fail, since
614    there is no character to match.
615    
616  The fourth use of backslash is for certain simple assertions. An assertion  The fourth use of backslash is for certain simple assertions. An assertion
617  specifies a condition that has to be met at a particular point in a match,  specifies a condition that has to be met at a particular point in a match,
# Line 682  are in the class by enumerating those th Line 710  are in the class by enumerating those th
710  still consumes a character from the subject string, and fails if the current  still consumes a character from the subject string, and fails if the current
711  pointer is at the end of the string.  pointer is at the end of the string.
712    
713  When PCRE_CASELESS is set, any letters in a class represent both their upper  When caseless matching is set, any letters in a class represent both their
714  case and lower case versions, so for example, a caseless [aeiou] matches "A" as  upper case and lower case versions, so for example, a caseless [aeiou] matches
715  well as "a", and a caseless [^aeiou] does not match "A", whereas a caseful  "A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
716  version would.  caseful version would.
717    
718  The newline character is never treated in any special way in character classes,  The newline character is never treated in any special way in character classes,
719  whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class  whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE options is. A class
# Line 702  octal or hexadecimal representation of " Line 730  octal or hexadecimal representation of "
730  range.  range.
731    
732  Ranges operate in ASCII collating sequence. They can also be used for  Ranges operate in ASCII collating sequence. They can also be used for
733  characters specified numerically, for example [\\000-\\037]. If a range such as  characters specified numerically, for example [\\000-\\037]. If a range that
734  [W-c] is used when PCRE_CASELESS is set, it matches the letters involved in  includes letters is used when caseless matching is set, it matches the letters
735  either case, so is equivalent to [][\\^_`wxyzabc], matched caselessly.  in either case. For example, [W-c] is equivalent to [][\\^_`wxyzabc], matched
736    caselessly, and if character tables for the "fr" locale are in use,
737    [\\xc8-\\xcb] matches accented E characters in both cases.
738    
739  The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in a  The character types \\d, \\D, \\s, \\S, \\w, and \\W may also appear in a
740  character class, and add the characters that they match to the class. For  character class, and add the characters that they match to the class. For

Legend:
Removed from v.23  
changed lines
  Added in v.25

  ViewVC Help
Powered by ViewVC 1.1.5