/[pcre]/code/trunk/doc/html/pcre.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcre.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC revision 83 by nigel, Sat Feb 24 21:41:06 2007 UTC
# Line 3  Line 3 
3  <title>pcre specification</title>  <title>pcre specification</title>
4  </head>  </head>
5  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6  This HTML document has been generated automatically from the original man page.  <h1>pcre man page</h1>
7  If there is any nonsense in it, please consult the man page, in case the  <p>
8  conversion went wrong.<br>  Return to the <a href="index.html">PCRE index page</a>.
9    </p>
10    <p>
11    This page is part of the PCRE HTML documentation. It was generated automatically
12    from the original man page. If there is any nonsense in it, please consult the
13    man page, in case the conversion went wrong.
14    <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">DESCRIPTION</a>  <li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
17  <li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>  <li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
18  <li><a name="TOC3" href="#SEC3">LIMITATIONS</a>  <li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
19  <li><a name="TOC4" href="#SEC4">UTF-8 SUPPORT</a>  <li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>
20  <li><a name="TOC5" href="#SEC5">AUTHOR</a>  <li><a name="TOC5" href="#SEC5">AUTHOR</a>
21  </ul>  </ul>
22  <br><a name="SEC1" href="#TOC1">DESCRIPTION</a><br>  <br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
23  <P>  <P>
24  The PCRE library is a set of functions that implement regular expression  The PCRE library is a set of functions that implement regular expression
25  pattern matching using the same syntax and semantics as Perl, with just a few  pattern matching using the same syntax and semantics as Perl, with just a few
26  differences. The current implementation of PCRE (release 4.x) corresponds  differences. The current implementation of PCRE (release 6.x) corresponds
27  approximately with Perl 5.8, including support for UTF-8 encoded strings.  approximately with Perl 5.8, including support for UTF-8 encoded strings and
28  However, this support has to be explicitly enabled; it is not the default.  Unicode general category properties. However, this support has to be explicitly
29  </P>  enabled; it is not the default.
30  <P>  </P>
31  PCRE is written in C and released as a C library. However, a number of people  <P>
32  have written wrappers and interfaces of various kinds. A C++ class is included  In addition to the Perl-compatible matching function, PCRE also contains an
33  in these contributions, which can be found in the <i>Contrib</i> directory at  alternative matching function that matches the same compiled patterns in a
34  the primary FTP site, which is:  different way. In certain circumstances, the alternative function has some
35  </P>  advantages. For a discussion of the two matching algorithms, see the
36    <a href="pcrematching.html"><b>pcrematching</b></a>
37    page.
38    </P>
39    <P>
40    PCRE is written in C and released as a C library. A number of people have
41    written wrappers and interfaces of various kinds. In particular, Google Inc.
42    have provided a comprehensive C++ wrapper. This is now included as part of the
43    PCRE distribution. The
44    <a href="pcrecpp.html"><b>pcrecpp</b></a>
45    page has details of this interface. Other people's contributions can be found
46    in the <i>Contrib</i> directory at the primary FTP site, which is:
47  <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>  <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
48    </P>
49  <P>  <P>
50  Details of exactly which Perl regular expression features are and are not  Details of exactly which Perl regular expression features are and are not
51  supported by PCRE are given in separate documents. See the  supported by PCRE are given in separate documents. See the
# Line 41  Some features of PCRE can be included, e Line 59  Some features of PCRE can be included, e
59  built. The  built. The
60  <a href="pcre_config.html"><b>pcre_config()</b></a>  <a href="pcre_config.html"><b>pcre_config()</b></a>
61  function makes it possible for a client to discover which features are  function makes it possible for a client to discover which features are
62  available. Documentation about building PCRE for various operating systems can  available. The features themselves are described in the
63  be found in the <b>README</b> file in the source distribution.  <a href="pcrebuild.html"><b>pcrebuild</b></a>
64    page. Documentation about building PCRE for various operating systems can be
65    found in the <b>README</b> file in the source distribution.
66  </P>  </P>
 <br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>  
67  <P>  <P>
68  The user documentation for PCRE has been split up into a number of different  The library contains a number of undocumented internal functions and data
69  sections. In the "man" format, each of these is a separate "man page". In the  tables that are used by more than one of the exported external functions, but
70  HTML format, each is a separate page, linked from the index page. In the plain  which are not intended for use by external callers. Their names all begin with
71  text format, all the sections are concatenated, for ease of searching. The  "_pcre_", which hopefully will not provoke any name clashes. In some
72  sections are as follows:  environments, it is possible to control which external symbols are exported
73    when a shared library is built, and in these cases the undocumented symbols are
74    not exported.
75  </P>  </P>
76    <br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
77  <P>  <P>
78    The user documentation for PCRE comprises a number of different sections. In
79    the "man" format, each of these is a separate "man page". In the HTML format,
80    each is a separate page, linked from the index page. In the plain text format,
81    all the sections are concatenated, for ease of searching. The sections are as
82    follows:
83  <pre>  <pre>
84    pcre              this document    pcre              this document
85    pcreapi           details of PCRE's native API    pcreapi           details of PCRE's native C API
86    pcrebuild         options for building PCRE    pcrebuild         options for building PCRE
87    pcrecallout       details of the callout feature    pcrecallout       details of the callout feature
88    pcrecompat        discussion of Perl compatibility    pcrecompat        discussion of Perl compatibility
89      pcrecpp           details of the C++ wrapper
90    pcregrep          description of the <b>pcregrep</b> command    pcregrep          description of the <b>pcregrep</b> command
91    pcrepattern       syntax and semantics of supported    pcrematching      discussion of the two matching algorithms
92                        regular expressions    pcrepartial       details of the partial matching facility
93      pcrepattern       syntax and semantics of supported regular expressions
94    pcreperform       discussion of performance issues    pcreperform       discussion of performance issues
95    pcreposix         the POSIX-compatible API    pcreposix         the POSIX-compatible C API
96      pcreprecompile    details of saving and re-using precompiled patterns
97    pcresample        discussion of the sample program    pcresample        discussion of the sample program
98    pcretest          the <b>pcretest</b> testing command    pcretest          description of the <b>pcretest</b> testing command
99  </PRE>  </pre>
 </P>  
 <P>  
100  In addition, in the "man" and HTML formats, there is a short page for each  In addition, in the "man" and HTML formats, there is a short page for each
101  library function, listing its arguments and results.  C library function, listing its arguments and results.
102  </P>  </P>
103  <br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>  <br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
104  <P>  <P>
# Line 84  regular expressions that are truly enorm Line 112  regular expressions that are truly enorm
112  internal linkage size of 3 or 4 (see the <b>README</b> file in the source  internal linkage size of 3 or 4 (see the <b>README</b> file in the source
113  distribution and the  distribution and the
114  <a href="pcrebuild.html"><b>pcrebuild</b></a>  <a href="pcrebuild.html"><b>pcrebuild</b></a>
115  documentation for details). If these cases the limit is substantially larger.  documentation for details). In these cases the limit is substantially larger.
116  However, the speed of execution will be slower.  However, the speed of execution will be slower.
117  </P>  </P>
118  <P>  <P>
# Line 98  subpatterns, assertions, and other types Line 126  subpatterns, assertions, and other types
126  </P>  </P>
127  <P>  <P>
128  The maximum length of a subject string is the largest positive number that an  The maximum length of a subject string is the largest positive number that an
129  integer variable can hold. However, PCRE uses recursion to handle subpatterns  integer variable can hold. However, when using the traditional matching
130  and indefinite repetition. This means that the available stack space may limit  function, PCRE uses recursion to handle subpatterns and indefinite repetition.
131  the size of a subject string that can be processed by certain patterns.  This means that the available stack space may limit the size of a subject
132  </P>  string that can be processed by certain patterns.
133  <a name="utf8support"></a><br><a name="SEC4" href="#TOC1">UTF-8 SUPPORT</a><br>  <a name="utf8support"></a></P>
134  <P>  <br><a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a><br>
135  Starting at release 3.3, PCRE has had some support for character strings  <P>
136  encoded in the UTF-8 format. For release 4.0 this has been greatly extended to  From release 3.3, PCRE has had some support for character strings encoded in
137  cover most common requirements.  the UTF-8 format. For release 4.0 this was greatly extended to cover most
138    common requirements, and in release 5.0 additional support for Unicode general
139    category properties was added.
140  </P>  </P>
141  <P>  <P>
142  In order process UTF-8 strings, you must build PCRE to include UTF-8 support in  In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
# Line 122  library will be a bit bigger, but the ad Line 152  library will be a bit bigger, but the ad
152  to testing the PCRE_UTF8 flag in several places, so should not be very large.  to testing the PCRE_UTF8 flag in several places, so should not be very large.
153  </P>  </P>
154  <P>  <P>
155    If PCRE is built with Unicode character property support (which implies UTF-8
156    support), the escape sequences \p{..}, \P{..}, and \X are supported.
157    The available properties that can be tested are limited to the general
158    category properties such as Lu for an upper case letter or Nd for a decimal
159    number. A full list is given in the
160    <a href="pcrepattern.html"><b>pcrepattern</b></a>
161    documentation. The PCRE library is increased in size by about 90K when Unicode
162    property support is included.
163    </P>
164    <P>
165  The following comments apply when PCRE is running in UTF-8 mode:  The following comments apply when PCRE is running in UTF-8 mode:
166  </P>  </P>
167  <P>  <P>
# Line 157  bytes, for example: \x{100}{3}. Line 197  bytes, for example: \x{100}{3}.
197  </P>  </P>
198  <P>  <P>
199  6. The escape sequence \C can be used to match a single byte in UTF-8 mode,  6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
200  but its use can lead to some strange effects.  but its use can lead to some strange effects. This facility is not available in
201    the alternative matching function, <b>pcre_dfa_exec()</b>.
202  </P>  </P>
203  <P>  <P>
204  7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly  7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
205  test characters of any code value, but the characters that PCRE recognizes as  test characters of any code value, but the characters that PCRE recognizes as
206  digits, spaces, or word characters remain the same set as before, all with  digits, spaces, or word characters remain the same set as before, all with
207  values less than 256.  values less than 256. This remains true even when PCRE includes Unicode
208  </P>  property support, because to do otherwise would slow down PCRE in many common
209  <P>  cases. If you really want to test for a wider sense of, say, "digit", you
210  8. Case-insensitive matching applies only to characters whose values are less  must use Unicode property tests such as \p{Nd}.
211  than 256. PCRE does not support the notion of "case" for higher-valued  </P>
212  characters.  <P>
213  </P>  8. Similarly, characters that match the POSIX named character classes are all
214  <P>  low-valued characters.
215  9. PCRE does not support the use of Unicode tables and properties or the Perl  </P>
216  escapes \p, \P, and \X.  <P>
217    9. Case-insensitive matching applies only to characters whose values are less
218    than 128, unless PCRE is built with Unicode property support. Even when Unicode
219    property support is available, PCRE still uses its own character tables when
220    checking the case of low-valued characters, so as not to degrade performance.
221    The Unicode property information is used only for characters with higher
222    values.
223  </P>  </P>
224  <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
225  <P>  <P>
226  Philip Hazel &#60;ph10@cam.ac.uk&#62;  Philip Hazel
227  <br>  <br>
228  University Computing Service,  University Computing Service,
229  <br>  <br>
230  Cambridge CB2 3QG, England.  Cambridge CB2 3QG, England.
 <br>  
 Phone: +44 1223 334714  
231  </P>  </P>
232  <P>  <P>
233  Last updated: 20 August 2003  Putting an actual email address here seems to have been a spam magnet, so I've
234    taken it away. If you want to email me, use my initial and surname, separated
235    by a dot, at the domain ucs.cam.ac.uk.
236    Last updated: 07 March 2005
237  <br>  <br>
238  Copyright &copy; 1997-2003 University of Cambridge.  Copyright &copy; 1997-2005 University of Cambridge.
239    <p>
240    Return to the <a href="index.html">PCRE index page</a>.
241    </p>

Legend:
Removed from v.71  
changed lines
  Added in v.83

  ViewVC Help
Powered by ViewVC 1.1.5