/[pcre]/code/trunk/doc/pcrejit.3
ViewVC logotype

Diff of /code/trunk/doc/pcrejit.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 717 by ph10, Wed Oct 5 15:58:51 2011 UTC revision 858 by ph10, Sun Jan 8 17:55:38 2012 UTC
# Line 7  PCRE - Perl-compatible regular expressio Line 7  PCRE - Perl-compatible regular expressio
7  Just-in-time compiling is a heavyweight optimization that can greatly speed up  Just-in-time compiling is a heavyweight optimization that can greatly speed up
8  pattern matching. However, it comes at the cost of extra processing before the  pattern matching. However, it comes at the cost of extra processing before the
9  match is performed. Therefore, it is of most benefit when the same pattern is  match is performed. Therefore, it is of most benefit when the same pattern is
10  going to be matched many times. This does not necessarily mean many calls of  going to be matched many times. This does not necessarily mean many calls of a
11  \fPpcre_exec()\fP; if the pattern is not anchored, matching attempts may take  matching function; if the pattern is not anchored, matching attempts may take
12  place many times at various positions in the subject, even for a single call to  place many times at various positions in the subject, even for a single call.
13  \fBpcre_exec()\fP. If the subject string is very long, it may still pay to use  Therefore, if the subject string is very long, it may still pay to use JIT for
14  JIT for one-off matches.  one-off matches.
15  .P  .P
16  JIT support applies only to the traditional matching function,  JIT support applies only to the traditional Perl-compatible matching function.
17  \fBpcre_exec()\fP. It does not apply when \fBpcre_dfa_exec()\fP is being used.  It does not apply when the DFA matching function is being used. The code for
18  The code for this support was written by Zoltan Herczeg.  this support was written by Zoltan Herczeg.
19    .
20    .
21    .SH "8-BIT and 16-BIT SUPPORT"
22    .rs
23    .sp
24    JIT support is available for both the 8-bit and 16-bit PCRE libraries. To keep
25    this documentation simple, only the 8-bit interface is described in what
26    follows. If you are using the 16-bit library, substitute the 16-bit functions
27    and 16-bit structures (for example, \fIpcre16_jit_stack\fP instead of
28    \fIpcre_jit_stack\fP).
29  .  .
30  .  .
31  .SH "AVAILABILITY OF JIT SUPPORT"  .SH "AVAILABILITY OF JIT SUPPORT"
# Line 30  JIT. The support is limited to the follo Line 40  JIT. The support is limited to the follo
40    MIPS 32-bit    MIPS 32-bit
41    Power PC 32-bit and 64-bit (experimental)    Power PC 32-bit and 64-bit (experimental)
42  .sp  .sp
43  The Power PC support is designated as experimental because it has not been  The Power PC support is designated as experimental because it has not been
44  fully tested. If --enable-jit is set on an unsupported platform, compilation  fully tested. If --enable-jit is set on an unsupported platform, compilation
45  fails.  fails.
46  .P  .P
47  A program can tell if JIT support is available by calling \fBpcre_config()\fP  A program that is linked with PCRE 8.20 or later can tell if JIT support is
48  with the PCRE_CONFIG_JIT option. The result is 1 when JIT is available, and 0  available by calling \fBpcre_config()\fP with the PCRE_CONFIG_JIT option. The
49  otherwise. However, a simple program does not need to check this in order to  result is 1 when JIT is available, and 0 otherwise. However, a simple program
50  use JIT. The API is implemented in a way that falls back to the ordinary PCRE  does not need to check this in order to use JIT. The API is implemented in a
51  code if JIT is not available.  way that falls back to the ordinary PCRE code if JIT is not available.
52    .P
53    If your program may sometimes be linked with versions of PCRE that are older
54    than 8.20, but you want to use JIT when it is available, you can test
55    the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such
56    as PCRE_CONFIG_JIT, for compile-time control of your code.
57  .  .
58  .  .
59  .SH "SIMPLE USE OF JIT"  .SH "SIMPLE USE OF JIT"
# Line 54  You have to do two things to make use of Line 69  You have to do two things to make use of
69        no longer needed instead of just freeing it yourself. This        no longer needed instead of just freeing it yourself. This
70        ensures that any JIT data is also freed.        ensures that any JIT data is also freed.
71  .sp  .sp
72    For a program that may be linked with pre-8.20 versions of PCRE, you can insert
73    .sp
74      #ifndef PCRE_STUDY_JIT_COMPILE
75      #define PCRE_STUDY_JIT_COMPILE 0
76      #endif
77    .sp
78    so that no option is passed to \fBpcre_study()\fP, and then use something like
79    this to free the study data:
80    .sp
81      #ifdef PCRE_CONFIG_JIT
82          pcre_free_study(study_ptr);
83      #else
84          pcre_free(study_ptr);
85      #endif
86    .sp
87  In some circumstances you may need to call additional functions. These are  In some circumstances you may need to call additional functions. These are
88  described in the section entitled  described in the section entitled
89  .\" HTML <a href="#stackcontrol">  .\" HTML <a href="#stackcontrol">
# Line 95  supported. Line 125  supported.
125  .P  .P
126  The unsupported pattern items are:  The unsupported pattern items are:
127  .sp  .sp
128    \eC            match a single byte, even in UTF-8 mode    \eC             match a single byte; not supported in UTF-8 mode
129    (?Cn)          callouts    (?Cn)          callouts
   (?(<name>)...  conditional test on setting of a named subpattern  
   (?(R)...       conditional test on whole pattern recursion  
   (?(Rn)...      conditional test on recursion, by number  
   (?(R&name)...  conditional test on recursion, by name  
130    (*COMMIT)      )    (*COMMIT)      )
131    (*MARK)        )    (*MARK)        )
132    (*PRUNE)       ) the backtracking control verbs    (*PRUNE)       ) the backtracking control verbs
# Line 157  When the compiled JIT code runs, it need Line 183  When the compiled JIT code runs, it need
183  By default, it uses 32K on the machine stack. However, some large or  By default, it uses 32K on the machine stack. However, some large or
184  complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT  complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT
185  is given when there is not enough stack. Three functions are provided for  is given when there is not enough stack. Three functions are provided for
186  managing blocks of memory for use as JIT stacks.  managing blocks of memory for use as JIT stacks. There is further discussion
187    about the use of JIT stacks in the section entitled
188    .\" HTML <a href="#stackcontrol">
189    .\" </a>
190    "JIT stack FAQ"
191    .\"
192    below.
193  .P  .P
194  The \fBpcre_jit_stack_alloc()\fP function creates a JIT stack. Its arguments  The \fBpcre_jit_stack_alloc()\fP function creates a JIT stack. Its arguments
195  are a starting size and a maximum size, and it returns a pointer to an opaque  are a starting size and a maximum size, and it returns a pointer to an opaque
# Line 221  is non-NULL and points to a \fBpcre_extr Line 253  is non-NULL and points to a \fBpcre_extr
253  successful study with PCRE_STUDY_JIT_COMPILE.  successful study with PCRE_STUDY_JIT_COMPILE.
254  .  .
255  .  .
256    .\" HTML <a name="stackfaq"></a>
257    .SH "JIT STACK FAQ"
258    .rs
259    .sp
260    (1) Why do we need JIT stacks?
261    .sp
262    PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where
263    the local data of the current node is pushed before checking its child nodes.
264    Allocating real machine stack on some platforms is difficult. For example, the
265    stack chain needs to be updated every time if we extend the stack on PowerPC.
266    Although it is possible, its updating time overhead decreases performance. So
267    we do the recursion in memory.
268    .P
269    (2) Why don't we simply allocate blocks of memory with \fBmalloc()\fP?
270    .sp
271    Modern operating systems have a nice feature: they can reserve an address space
272    instead of allocating memory. We can safely allocate memory pages inside this
273    address space, so the stack could grow without moving memory data (this is
274    important because of pointers). Thus we can allocate 1M address space, and use
275    only a single memory page (usually 4K) if that is enough. However, we can still
276    grow up to 1M anytime if needed.
277    .P
278    (3) Who "owns" a JIT stack?
279    .sp
280    The owner of the stack is the user program, not the JIT studied pattern or
281    anything else. The user program must ensure that if a stack is used by
282    \fBpcre_exec()\fP, (that is, it is assigned to the pattern currently running),
283    that stack must not be used by any other threads (to avoid overwriting the same
284    memory area). The best practice for multithreaded programs is to allocate a
285    stack for each thread, and return this stack through the JIT callback function.
286    .P
287    (4) When should a JIT stack be freed?
288    .sp
289    You can free a JIT stack at any time, as long as it will not be used by
290    \fBpcre_exec()\fP again. When you assign the stack to a pattern, only a pointer
291    is set. There is no reference counting or any other magic. You can free the
292    patterns and stacks in any order, anytime. Just \fIdo not\fP call
293    \fBpcre_exec()\fP with a pattern pointing to an already freed stack, as that
294    will cause SEGFAULT. (Also, do not free a stack currently used by
295    \fBpcre_exec()\fP in another thread). You can also replace the stack for a
296    pattern at any time. You can even free the previous stack before assigning a
297    replacement.
298    .P
299    (5) Should I allocate/free a stack every time before/after calling
300    \fBpcre_exec()\fP?
301    .sp
302    No, because this is too costly in terms of resources. However, you could
303    implement some clever idea which release the stack if it is not used in let's
304    say two minutes. The JIT callback can help to achive this without keeping a
305    list of the currently JIT studied patterns.
306    .P
307    (6) OK, the stack is for long term memory allocation. But what happens if a
308    pattern causes stack overflow with a stack of 1M? Is that 1M kept until the
309    stack is freed?
310    .sp
311    Especially on embedded sytems, it might be a good idea to release
312    memory sometimes without freeing the stack. There is no API for this at the
313    moment. Probably a function call which returns with the currently allocated
314    memory for any stack and another which allows releasing memory (shrinking the
315    stack) would be a good idea if someone needs this.
316    .P
317    (7) This is too much of a headache. Isn't there any better solution for JIT
318    stack handling?
319    .sp
320    No, thanks to Windows. If POSIX threads were used everywhere, we could throw
321    out this complicated API.
322    .
323    .
324  .SH "EXAMPLE CODE"  .SH "EXAMPLE CODE"
325  .rs  .rs
326  .sp  .sp
# Line 257  callback. Line 357  callback.
357  .rs  .rs
358  .sp  .sp
359  .nf  .nf
360  Philip Hazel  Philip Hazel (FAQ by Zoltan Herczeg)
361  University Computing Service  University Computing Service
362  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
363  .fi  .fi
# Line 267  Cambridge CB2 3QH, England. Line 367  Cambridge CB2 3QH, England.
367  .rs  .rs
368  .sp  .sp
369  .nf  .nf
370  Last updated: 05 October 2011  Last updated: 08 January 2012
371  Copyright (c) 1997-2011 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
372  .fi  .fi

Legend:
Removed from v.717  
changed lines
  Added in v.858

  ViewVC Help
Powered by ViewVC 1.1.5