/[pcre]/code/trunk/doc/html/pcrebuild.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrebuild.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 74 by nigel, Sat Feb 24 21:40:30 2007 UTC revision 75 by nigel, Sat Feb 24 21:40:37 2007 UTC
# Line 3  Line 3 
3  <title>pcrebuild specification</title>  <title>pcrebuild specification</title>
4  </head>  </head>
5  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">  <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6  This HTML document has been generated automatically from the original man page.  <h1>pcrebuild man page</h1>
7  If there is any nonsense in it, please consult the man page, in case the  <p>
8  conversion went wrong.<br>  Return to the <a href="index.html">PCRE index page</a>.
9    </p>
10    <p>
11    This page is part of the PCRE HTML documentation. It was generated automatically
12    from the original man page. If there is any nonsense in it, please consult the
13    man page, in case the conversion went wrong.
14    <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>  <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
17  <li><a name="TOC2" href="#SEC2">UTF-8 SUPPORT</a>  <li><a name="TOC2" href="#SEC2">UTF-8 SUPPORT</a>
18  <li><a name="TOC3" href="#SEC3">CODE VALUE OF NEWLINE</a>  <li><a name="TOC3" href="#SEC3">UNICODE CHARACTER PROPERTY SUPPORT</a>
19  <li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>  <li><a name="TOC4" href="#SEC4">CODE VALUE OF NEWLINE</a>
20  <li><a name="TOC5" href="#SEC5">POSIX MALLOC USAGE</a>  <li><a name="TOC5" href="#SEC5">BUILDING SHARED AND STATIC LIBRARIES</a>
21  <li><a name="TOC6" href="#SEC6">LIMITING PCRE RESOURCE USAGE</a>  <li><a name="TOC6" href="#SEC6">POSIX MALLOC USAGE</a>
22  <li><a name="TOC7" href="#SEC7">HANDLING VERY LARGE PATTERNS</a>  <li><a name="TOC7" href="#SEC7">LIMITING PCRE RESOURCE USAGE</a>
23  <li><a name="TOC8" href="#SEC8">AVOIDING EXCESSIVE STACK USAGE</a>  <li><a name="TOC8" href="#SEC8">HANDLING VERY LARGE PATTERNS</a>
24  <li><a name="TOC9" href="#SEC9">USING EBCDIC CODE</a>  <li><a name="TOC9" href="#SEC9">AVOIDING EXCESSIVE STACK USAGE</a>
25    <li><a name="TOC10" href="#SEC10">USING EBCDIC CODE</a>
26  </ul>  </ul>
27  <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>  <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
28  <P>  <P>
29  This document describes the optional features of PCRE that can be selected when  This document describes the optional features of PCRE that can be selected when
30  the library is compiled. They are all selected, or deselected, by providing  the library is compiled. They are all selected, or deselected, by providing
31  options to the <b>configure</b> script which is run before the <b>make</b>  options to the <b>configure</b> script that is run before the <b>make</b>
32  command. The complete list of options for <b>configure</b> (which includes the  command. The complete list of options for <b>configure</b> (which includes the
33  standard ones such as the selection of the installation directory) can be  standard ones such as the selection of the installation directory) can be
34  obtained by running  obtained by running
 </P>  
 <P>  
35  <pre>  <pre>
36    ./configure --help    ./configure --help
37  </PRE>  </pre>
 </P>  
 <P>  
38  The following sections describe certain options whose names begin with --enable  The following sections describe certain options whose names begin with --enable
39  or --disable. These settings specify changes to the defaults for the  or --disable. These settings specify changes to the defaults for the
40  <b>configure</b> command. Because of the way that <b>configure</b> works,  <b>configure</b> command. Because of the way that <b>configure</b> works,
# Line 41  exists as well, but as it specifies the Line 44  exists as well, but as it specifies the
44  <br><a name="SEC2" href="#TOC1">UTF-8 SUPPORT</a><br>  <br><a name="SEC2" href="#TOC1">UTF-8 SUPPORT</a><br>
45  <P>  <P>
46  To build PCRE with support for UTF-8 character strings, add  To build PCRE with support for UTF-8 character strings, add
 </P>  
 <P>  
47  <pre>  <pre>
48    --enable-utf8    --enable-utf8
49  </PRE>  </pre>
 </P>  
 <P>  
50  to the <b>configure</b> command. Of itself, this does not make PCRE treat  to the <b>configure</b> command. Of itself, this does not make PCRE treat
51  strings as UTF-8. As well as compiling PCRE with this option, you also have  strings as UTF-8. As well as compiling PCRE with this option, you also have
52  have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>  have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
53  function.  function.
54  </P>  </P>
55  <br><a name="SEC3" href="#TOC1">CODE VALUE OF NEWLINE</a><br>  <br><a name="SEC3" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
56    <P>
57    UTF-8 support allows PCRE to process character values greater than 255 in the
58    strings that it handles. On its own, however, it does not provide any
59    facilities for accessing the properties of such characters. If you want to be
60    able to use the pattern escapes \P, \p, and \X, which refer to Unicode
61    character properties, you must add
62    <pre>
63      --enable-unicode-properties
64    </pre>
65    to the <b>configure</b> command. This implies UTF-8 support, even if you have
66    not explicitly requested it.
67    </P>
68    <P>
69    Including Unicode property support adds around 90K of tables to the PCRE
70    library, approximately doubling its size. Only the general category properties
71    such as <i>Lu</i> and <i>Nd</i> are supported. Details are given in the
72    <a href="pcrepattern.html"><b>pcrepattern</b></a>
73    documentation.
74    </P>
75    <br><a name="SEC4" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
76  <P>  <P>
77  By default, PCRE treats character 10 (linefeed) as the newline character. This  By default, PCRE treats character 10 (linefeed) as the newline character. This
78  is the normal newline character on Unix-like systems. You can compile PCRE to  is the normal newline character on Unix-like systems. You can compile PCRE to
79  use character 13 (carriage return) instead by adding  use character 13 (carriage return) instead by adding
 </P>  
 <P>  
80  <pre>  <pre>
81    --enable-newline-is-cr    --enable-newline-is-cr
82  </PRE>  </pre>
 </P>  
 <P>  
83  to the <b>configure</b> command. For completeness there is also a  to the <b>configure</b> command. For completeness there is also a
84  --enable-newline-is-lf option, which explicitly specifies linefeed as the  --enable-newline-is-lf option, which explicitly specifies linefeed as the
85  newline character.  newline character.
86  </P>  </P>
87  <br><a name="SEC4" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>  <br><a name="SEC5" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
88  <P>  <P>
89  The PCRE building process uses <b>libtool</b> to build both shared and static  The PCRE building process uses <b>libtool</b> to build both shared and static
90  Unix libraries by default. You can suppress one of these by adding one of  Unix libraries by default. You can suppress one of these by adding one of
 </P>  
 <P>  
91  <pre>  <pre>
92    --disable-shared    --disable-shared
93    --disable-static    --disable-static
94  </PRE>  </pre>
 </P>  
 <P>  
95  to the <b>configure</b> command, as required.  to the <b>configure</b> command, as required.
96  </P>  </P>
97  <br><a name="SEC5" href="#TOC1">POSIX MALLOC USAGE</a><br>  <br><a name="SEC6" href="#TOC1">POSIX MALLOC USAGE</a><br>
98  <P>  <P>
99  When PCRE is called through the POSIX interface (see the <b>pcreposix</b>  When PCRE is called through the POSIX interface (see the
100    <a href="pcreposix.html"><b>pcreposix</b></a>
101  documentation), additional working storage is required for holding the pointers  documentation), additional working storage is required for holding the pointers
102  to capturing substrings because PCRE requires three integers per substring,  to capturing substrings, because PCRE requires three integers per substring,
103  whereas the POSIX interface provides only two. If the number of expected  whereas the POSIX interface provides only two. If the number of expected
104  substrings is small, the wrapper function uses space on the stack, because this  substrings is small, the wrapper function uses space on the stack, because this
105  is faster than using <b>malloc()</b> for each call. The default threshold above  is faster than using <b>malloc()</b> for each call. The default threshold above
106  which the stack is no longer used is 10; it can be changed by adding a setting  which the stack is no longer used is 10; it can be changed by adding a setting
107  such as  such as
 </P>  
 <P>  
108  <pre>  <pre>
109    --with-posix-malloc-threshold=20    --with-posix-malloc-threshold=20
110  </PRE>  </pre>
 </P>  
 <P>  
111  to the <b>configure</b> command.  to the <b>configure</b> command.
112  </P>  </P>
113  <br><a name="SEC6" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>  <br><a name="SEC7" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
 <P>  
 Internally, PCRE has a function called <b>match()</b> which it calls repeatedly  
 (possibly recursively) when performing a matching operation. By limiting the  
 number of times this function may be called, a limit can be placed on the  
 resources used by a single call to <b>pcre_exec()</b>. The limit can be changed  
 at run time, as described in the <b>pcreapi</b> documentation. The default is 10  
 million, but this can be changed by adding a setting such as  
 </P>  
114  <P>  <P>
115    Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
116    (possibly recursively) when matching a pattern. By controlling the maximum
117    number of times this function may be called during a single matching operation,
118    a limit can be placed on the resources used by a single call to
119    <b>pcre_exec()</b>. The limit can be changed at run time, as described in the
120    <a href="pcreapi.html"><b>pcreapi</b></a>
121    documentation. The default is 10 million, but this can be changed by adding a
122    setting such as
123  <pre>  <pre>
124    --with-match-limit=500000    --with-match-limit=500000
125  </PRE>  </pre>
 </P>  
 <P>  
126  to the <b>configure</b> command.  to the <b>configure</b> command.
127  </P>  </P>
128  <br><a name="SEC7" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>  <br><a name="SEC8" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
129  <P>  <P>
130  Within a compiled pattern, offset values are used to point from one part to  Within a compiled pattern, offset values are used to point from one part to
131  another (for example, from an opening parenthesis to an alternation  another (for example, from an opening parenthesis to an alternation
132  metacharacter). By default two-byte values are used for these offsets, leading  metacharacter). By default, two-byte values are used for these offsets, leading
133  to a maximum size for a compiled pattern of around 64K. This is sufficient to  to a maximum size for a compiled pattern of around 64K. This is sufficient to
134  handle all but the most gigantic patterns. Nevertheless, some people do want to  handle all but the most gigantic patterns. Nevertheless, some people do want to
135  process enormous patterns, so it is possible to compile PCRE to use three-byte  process enormous patterns, so it is possible to compile PCRE to use three-byte
136  or four-byte offsets by adding a setting such as  or four-byte offsets by adding a setting such as
 </P>  
 <P>  
137  <pre>  <pre>
138    --with-link-size=3    --with-link-size=3
139  </PRE>  </pre>
 </P>  
 <P>  
140  to the <b>configure</b> command. The value given must be 2, 3, or 4. Using  to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
141  longer offsets slows down the operation of PCRE because it has to load  longer offsets slows down the operation of PCRE because it has to load
142  additional bytes when handling them.  additional bytes when handling them.
# Line 144  If you build PCRE with an increased link Line 146  If you build PCRE with an increased link
146  using UTF-8) will fail. Part of the output of these tests is a representation  using UTF-8) will fail. Part of the output of these tests is a representation
147  of the compiled pattern, and this changes with the link size.  of the compiled pattern, and this changes with the link size.
148  </P>  </P>
149  <br><a name="SEC8" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>  <br><a name="SEC9" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
150  <P>  <P>
151  PCRE implements backtracking while matching by making recursive calls to an  PCRE implements backtracking while matching by making recursive calls to an
152  internal function called <b>match()</b>. In environments where the size of the  internal function called <b>match()</b>. In environments where the size of the
# Line 153  environment does not usually suffer from Line 155  environment does not usually suffer from
155  that uses memory from the heap to remember data, instead of using recursive  that uses memory from the heap to remember data, instead of using recursive
156  function calls, has been implemented to work round this problem. If you want to  function calls, has been implemented to work round this problem. If you want to
157  build a version of PCRE that works this way, add  build a version of PCRE that works this way, add
 </P>  
 <P>  
158  <pre>  <pre>
159    --disable-stack-for-recursion    --disable-stack-for-recursion
160  </PRE>  </pre>
 </P>  
 <P>  
161  to the <b>configure</b> command. With this configuration, PCRE will use the  to the <b>configure</b> command. With this configuration, PCRE will use the
162  <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory  <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
163  management functions. Separate functions are provided because the usage is very  management functions. Separate functions are provided because the usage is very
# Line 169  optimized functions that perform better Line 167  optimized functions that perform better
167  <b>free()</b> functions. PCRE runs noticeably more slowly when built in this  <b>free()</b> functions. PCRE runs noticeably more slowly when built in this
168  way.  way.
169  </P>  </P>
170  <br><a name="SEC9" href="#TOC1">USING EBCDIC CODE</a><br>  <br><a name="SEC10" href="#TOC1">USING EBCDIC CODE</a><br>
171  <P>  <P>
172  PCRE assumes by default that it will run in an environment where the character  PCRE assumes by default that it will run in an environment where the character
173  code is ASCII (or UTF-8, which is a superset of ASCII). PCRE can, however, be  code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
174  compiled to run in an EBCDIC environment by adding  compiled to run in an EBCDIC environment by adding
 </P>  
 <P>  
175  <pre>  <pre>
176    --enable-ebcdic    --enable-ebcdic
177  </PRE>  </pre>
 </P>  
 <P>  
178  to the <b>configure</b> command.  to the <b>configure</b> command.
179  </P>  </P>
180  <P>  <P>
181  Last updated: 09 December 2003  Last updated: 09 September 2004
182  <br>  <br>
183  Copyright &copy; 1997-2003 University of Cambridge.  Copyright &copy; 1997-2004 University of Cambridge.
184    <p>
185    Return to the <a href="index.html">PCRE index page</a>.
186    </p>

Legend:
Removed from v.74  
changed lines
  Added in v.75

  ViewVC Help
Powered by ViewVC 1.1.5