--- code/trunk/doc/html/pcreapi.html 2007/02/24 21:40:24 71 +++ code/trunk/doc/html/pcreapi.html 2007/02/24 21:40:30 73 @@ -98,6 +98,12 @@ void (*pcre_free)(void *);

+void *(*pcre_stack_malloc)(size_t); +

+

+void (*pcre_stack_free)(void *); +

+

int (*pcre_callout)(pcre_callout_block *);


PCRE API
@@ -156,6 +162,17 @@ should be done before calling any PCRE functions.

+The global variables pcre_stack_malloc and pcre_stack_free are also +indirections to memory management functions. These special functions are used +only when PCRE is compiled to use the heap for remembering data, instead of +recursive function calls. This is a non-standard way of building PCRE, for use +in environments that have limited stacks. Because of the greater use of memory +management, it runs more slowly. Separate functions are provided so that +special-purpose external code can be used for this case. When used, these +functions are always called in a stack-like manner (last obtained, first +freed), and always for memory blocks of the same size. +

+

The global variable pcre_callout initially contains NULL. It can be set by the caller to a "callout" function, which PCRE will then call at specified points during a matching operation. Details are given in the pcrecallout @@ -164,9 +181,9 @@
MULTITHREADING

The PCRE functions can be used in multi-threading applications, with the -proviso that the memory management functions pointed to by pcre_malloc -and pcre_free, and the callout function pointed to by pcre_callout, -are shared by all threads. +proviso that the memory management functions pointed to by pcre_malloc, +pcre_free, pcre_stack_malloc, and pcre_stack_free, and the +callout function pointed to by pcre_callout, are shared by all threads.

The compiled form of a regular expression is not altered during matching, so @@ -238,6 +255,19 @@ internal matching function calls in a pcre_exec() execution. Further details are given with pcre_exec() below.

+

+

+  PCRE_CONFIG_STACKRECURSE
+
+

+

+The output is an integer that is set to one if internal recursion is +implemented by recursive function calls that use the stack to remember their +state. This is the usual way that PCRE is compiled. The output is zero if PCRE +was compiled to use blocks of data on the heap instead of recursive function +calls. In this case, pcre_stack_malloc and pcre_stack_free are +called to manage memory blocks on the heap, thus avoiding the use of the stack. +


COMPILING A PATTERN

pcre *pcre_compile(const char *pattern, int options, @@ -878,12 +908,22 @@

When PCRE_UTF8 was set at compile time, the validity of the subject as a UTF-8 -string is automatically checked. If an invalid UTF-8 sequence of bytes is -found, pcre_exec() returns the error PCRE_ERROR_BADUTF8. If you already -know that your subject is valid, and you want to skip this check for -performance reasons, you can set the PCRE_NO_UTF8_CHECK option when calling -pcre_exec(). When this option is set, the effect of passing an invalid -UTF-8 string as a subject is undefined. It may cause your program to crash. +string is automatically checked, and the value of startoffset is also +checked to ensure that it points to the start of a UTF-8 character. If an +invalid UTF-8 sequence of bytes is found, pcre_exec() returns the error +PCRE_ERROR_BADUTF8. If startoffset contains an invalid value, +PCRE_ERROR_BADUTF8_OFFSET is returned. +

+

+If you already know that your subject is valid, and you want to skip these +checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when +calling pcre_exec(). You might want to do this for the second and +subsequent calls to pcre_exec() if you are making repeated calls to find +all the matches in a single subject string. However, you should be sure that +the value of startoffset points to the start of a UTF-8 character. When +PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid UTF-8 string as a +subject, or a value of startoffset that does not point to the start of a +UTF-8 character, is undefined. Your program may crash.

There are also three further options that can be set only at matching time: @@ -939,15 +979,18 @@

The subject string is passed to pcre_exec() as a pointer in -subject, a length in length, and a starting offset in +subject, a length in length, and a starting byte offset in startoffset. Unlike the pattern string, the subject may contain binary zero bytes. When the starting offset is zero, the search for a match starts at the beginning of the subject, and this is by far the most common case.

If the pattern was compiled with the PCRE_UTF8 option, the subject must be a -sequence of bytes that is a valid UTF-8 string. If an invalid UTF-8 string is -passed, PCRE's behaviour is not defined. +sequence of bytes that is a valid UTF-8 string, and the starting offset must +point to the beginning of a UTF-8 character. If an invalid UTF-8 string or +offset is passed, an error (either PCRE_ERROR_BADUTF8 or +PCRE_ERROR_BADUTF8_OFFSET) is returned, unless the option PCRE_NO_UTF8_CHECK is +set, in which case PCRE's behaviour is not defined.

A non-zero starting offset is useful when searching for another match in the @@ -1132,12 +1175,21 @@

-  PCRE_ERROR_BADUTF8       (-10)
+  PCRE_ERROR_BADUTF8        (-10)
 

A string that contains an invalid UTF-8 byte sequence was passed as a subject.

+

+

+  PCRE_ERROR_BADUTF8_OFFSET (-11)
+
+

+

+The UTF-8 byte sequence that was passed as a subject was valid, but the value +of startoffset did not point to the beginning of a UTF-8 character. +


EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

int pcre_copy_substring(const char *subject, int *ovector, @@ -1289,6 +1341,6 @@ appropriate.

-Last updated: 20 August 2003 +Last updated: 09 December 2003
Copyright © 1997-2003 University of Cambridge.