--- code/branches/pcre16/doc/html/pcrejit.html 2011/12/12 12:15:17 800 +++ code/branches/pcre16/doc/html/pcrejit.html 2011/12/12 16:23:37 801 @@ -20,10 +20,11 @@
  • RETURN VALUES FROM JIT EXECUTION
  • SAVING AND RESTORING COMPILED PATTERNS
  • CONTROLLING THE JIT STACK -
  • EXAMPLE CODE -
  • SEE ALSO -
  • AUTHOR -
  • REVISION +
  • JIT STACK FAQ +
  • EXAMPLE CODE +
  • SEE ALSO +
  • AUTHOR +
  • REVISION
    PCRE JUST-IN-TIME COMPILER SUPPORT

    @@ -57,11 +58,17 @@ fails.

    -A program can tell if JIT support is available by calling pcre_config() -with the PCRE_CONFIG_JIT option. The result is 1 when JIT is available, and 0 -otherwise. However, a simple program does not need to check this in order to -use JIT. The API is implemented in a way that falls back to the ordinary PCRE -code if JIT is not available. +A program that is linked with PCRE 8.20 or later can tell if JIT support is +available by calling pcre_config() with the PCRE_CONFIG_JIT option. The +result is 1 when JIT is available, and 0 otherwise. However, a simple program +does not need to check this in order to use JIT. The API is implemented in a +way that falls back to the ordinary PCRE code if JIT is not available. +

    +

    +If your program may sometimes be linked with versions of PCRE that are older +than 8.20, but you want to use JIT when it is available, you can test +the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such +as PCRE_CONFIG_JIT, for compile-time control of your code.


    SIMPLE USE OF JIT

    @@ -75,6 +82,21 @@ no longer needed instead of just freeing it yourself. This ensures that any JIT data is also freed. +For a program that may be linked with pre-8.20 versions of PCRE, you can insert +

    +  #ifndef PCRE_STUDY_JIT_COMPILE
    +  #define PCRE_STUDY_JIT_COMPILE 0
    +  #endif
    +
    +so that no option is passed to pcre_study(), and then use something like +this to free the study data: +
    +  #ifdef PCRE_CONFIG_JIT
    +      pcre_free_study(study_ptr);
    +  #else
    +      pcre_free(study_ptr);
    +  #endif
    +
    In some circumstances you may need to call additional functions. These are described in the section entitled "Controlling the JIT stack" @@ -116,12 +138,8 @@

    The unsupported pattern items are:

    -  \C            match a single byte; not supported in UTF-8 mode
    +  \C             match a single byte; not supported in UTF-8 mode
       (?Cn)          callouts
    -  (?(<name>)...  conditional test on setting of a named subpattern
    -  (?(R)...       conditional test on whole pattern recursion
    -  (?(Rn)...      conditional test on recursion, by number
    -  (?(R&name)...  conditional test on recursion, by name
       (*COMMIT)      )
       (*MARK)        )
       (*PRUNE)       ) the backtracking control verbs
    @@ -167,7 +185,10 @@
     By default, it uses 32K on the machine stack. However, some large or
     complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT
     is given when there is not enough stack. Three functions are provided for
    -managing blocks of memory for use as JIT stacks.
    +managing blocks of memory for use as JIT stacks. There is further discussion
    +about the use of JIT stacks in the section entitled
    +"JIT stack FAQ"
    +below.
     

    The pcre_jit_stack_alloc() function creates a JIT stack. Its arguments @@ -234,8 +255,86 @@ and pcre_assign_jit_stack() does nothing unless the extra argument is non-NULL and points to a pcre_extra block that is the result of a successful study with PCRE_STUDY_JIT_COMPILE. +

    +
    JIT STACK FAQ
    +

    +(1) Why do we need JIT stacks? +
    +
    +PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where +the local data of the current node is pushed before checking its child nodes. +Allocating real machine stack on some platforms is difficult. For example, the +stack chain needs to be updated every time if we extend the stack on PowerPC. +Although it is possible, its updating time overhead decreases performance. So +we do the recursion in memory. +

    +

    +(2) Why don't we simply allocate blocks of memory with malloc()? +
    +
    +Modern operating systems have a nice feature: they can reserve an address space +instead of allocating memory. We can safely allocate memory pages inside this +address space, so the stack could grow without moving memory data (this is +important because of pointers). Thus we can allocate 1M address space, and use +only a single memory page (usually 4K) if that is enough. However, we can still +grow up to 1M anytime if needed. +

    +

    +(3) Who "owns" a JIT stack? +
    +
    +The owner of the stack is the user program, not the JIT studied pattern or +anything else. The user program must ensure that if a stack is used by +pcre_exec(), (that is, it is assigned to the pattern currently running), +that stack must not be used by any other threads (to avoid overwriting the same +memory area). The best practice for multithreaded programs is to allocate a +stack for each thread, and return this stack through the JIT callback function. +

    +

    +(4) When should a JIT stack be freed? +
    +
    +You can free a JIT stack at any time, as long as it will not be used by +pcre_exec() again. When you assign the stack to a pattern, only a pointer +is set. There is no reference counting or any other magic. You can free the +patterns and stacks in any order, anytime. Just do not call +pcre_exec() with a pattern pointing to an already freed stack, as that +will cause SEGFAULT. (Also, do not free a stack currently used by +pcre_exec() in another thread). You can also replace the stack for a +pattern at any time. You can even free the previous stack before assigning a +replacement. +

    +

    +(5) Should I allocate/free a stack every time before/after calling +pcre_exec()? +
    +
    +No, because this is too costly in terms of resources. However, you could +implement some clever idea which release the stack if it is not used in let's +say two minutes. The JIT callback can help to achive this without keeping a +list of the currently JIT studied patterns. +

    +

    +(6) OK, the stack is for long term memory allocation. But what happens if a +pattern causes stack overflow with a stack of 1M? Is that 1M kept until the +stack is freed? +
    +
    +Especially on embedded sytems, it might be a good idea to release +memory sometimes without freeing the stack. There is no API for this at the +moment. Probably a function call which returns with the currently allocated +memory for any stack and another which allows releasing memory (shrinking the +stack) would be a good idea if someone needs this. +

    +

    +(7) This is too much of a headache. Isn't there any better solution for JIT +stack handling? +
    +
    +No, thanks to Windows. If POSIX threads were used everywhere, we could throw +out this complicated API.

    -
    EXAMPLE CODE
    +
    EXAMPLE CODE

    This is a single-threaded example that specifies a JIT stack without using a callback. @@ -260,22 +359,22 @@

    -
    SEE ALSO
    +
    SEE ALSO

    pcreapi(3)

    -
    AUTHOR
    +
    AUTHOR

    -Philip Hazel +Philip Hazel (FAQ by Zoltan Herczeg)
    University Computing Service
    Cambridge CB2 3QH, England.

    -
    REVISION
    +
    REVISION

    -Last updated: 19 October 2011 +Last updated: 26 November 2011
    Copyright © 1997-2011 University of Cambridge.