/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1080 by chpe, Tue Oct 16 15:55:07 2012 UTC revision 1194 by ph10, Wed Oct 31 17:42:29 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3 "07 September 2012" "PCRE 8.32"  .TH PCREAPI 3 "31 October 2012" "PCRE 8.32"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 95  PCRE - Perl-compatible regular expressio Line 95  PCRE - Perl-compatible regular expressio
95  .SH "PCRE NATIVE API AUXILIARY FUNCTIONS"  .SH "PCRE NATIVE API AUXILIARY FUNCTIONS"
96  .rs  .rs
97  .sp  .sp
98    .B int pcre_jit_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
99    .ti +5n
100    .B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
101    .ti +5n
102    .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
103    .ti +5n
104    .B pcre_jit_stack *\fIjstack\fP);
105    .PP
106  .B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);  .B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);
107  .PP  .PP
108  .B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP);  .B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP);
# Line 139  PCRE - Perl-compatible regular expressio Line 147  PCRE - Perl-compatible regular expressio
147  .sp  .sp
148  From release 8.30, PCRE can be compiled as a library for handling 16-bit  From release 8.30, PCRE can be compiled as a library for handling 16-bit
149  character strings as well as, or instead of, the original library that handles  character strings as well as, or instead of, the original library that handles
150  8-bit character strings. From release 8.FIXME, PCRE can also be compiled as a  8-bit character strings. From release 8.32, PCRE can also be compiled as a
151  library for handling 32-bit character strings. To avoid too much complication,  library for handling 32-bit character strings. To avoid too much complication,
152  this document describes the 8-bit versions of the functions, with only  this document describes the 8-bit versions of the functions, with only
153  occasional references to the 16-bit and 32-bit libraries.  occasional references to the 16-bit and 32-bit libraries.
154  .P  .P
155  The 16-bit functions operate in the same way as their 8-bit counterparts; they  The 16-bit and 32-bit functions operate in the same way as their 8-bit
156  just use different data types for their arguments and results, and their names  counterparts; they just use different data types for their arguments and
157  start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8  results, and their names start with \fBpcre16_\fP or \fBpcre32_\fP instead of
158  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with  \fBpcre_\fP. For every option that has UTF8 in its name (for example,
159  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit  PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced
160  option names define the same bit values.  by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the
161  .P  16-bit and 32-bit option names define the same bit values.
 The 32-bit functions operate in the same way as their 8-bit counterparts; they  
 just use different data types for their arguments and results, and their names  
 start with \fBpcre32_\fP instead of \fBpcre_\fP. For every option that has UTF8  
 in its name (for example, PCRE_UTF8), there is a corresponding 32-bit name with  
 UTF8 replaced by UTF32. This facility is in fact just cosmetic; the 32-bit  
 option names define the same bit values.  
162  .P  .P
163  References to bytes and UTF-8 in this document should be read as references to  References to bytes and UTF-8 in this document should be read as references to
164  16-bit data quantities and UTF-16 when using the 16-bit library, unless  16-bit data quantities and UTF-16 when using the 16-bit library, or 32-bit data
165  specified otherwise. More details of the specific differences for the 16-bit  quantities and UTF-32 when using the 32-bit library, unless specified
166  library are given in the  otherwise. More details of the specific differences for the 16-bit and 32-bit
167    libraries are given in the
168  .\" HREF  .\" HREF
169  \fBpcre16\fP  \fBpcre16\fP
170  .\"  .\"
171  page.  and
 .  
 .P  
 References to bytes and UTF-8 in this document should be read as references to  
 32-bit data quantities and UTF-32 when using the 32-bit library, unless  
 specified otherwise. More details of the specific differences for the 32-bit  
 library are given in the  
172  .\" HREF  .\" HREF
173  \fBpcre32\fP  \fBpcre32\fP
174  .\"  .\"
175  page.  pages.
176  .  .
177  .  .
178  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
# Line 231  used if available, by setting an option Line 228  used if available, by setting an option
228  relevant. More complicated programs might need to make use of the functions  relevant. More complicated programs might need to make use of the functions
229  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and
230  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.
231  These functions are discussed in the  .P
232    From release 8.32 there is also a direct interface for JIT execution, which
233    gives improved performance. The JIT-specific functions are discussed in the
234  .\" HREF  .\" HREF
235  \fBpcrejit\fP  \fBpcrejit\fP
236  .\"  .\"
# Line 860  page. Line 859  page.
859  .sp  .sp
860    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
861  .sp  .sp
862  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
863  string is automatically checked. There is a discussion about the  automatically checked. There is a discussion about the
864  .\" HTML <a href="pcreunicode.html#utf8strings">  .\" HTML <a href="pcreunicode.html#utf8strings">
865  .\" </a>  .\" </a>
866  validity of UTF-8 strings  validity of UTF-8 strings
# Line 876  this check for performance reasons, you Line 875  this check for performance reasons, you
875  When it is set, the effect of passing an invalid UTF-8 string as a pattern is  When it is set, the effect of passing an invalid UTF-8 string as a pattern is
876  undefined. It may cause your program to crash. Note that this option can also  undefined. It may cause your program to crash. Note that this option can also
877  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the
878  validity checking of subject strings.  validity checking of subject strings only. If the same string is being matched
879    many times, the option can be safely set for the second and subsequent
880    matchings to improve performance.
881  .  .
882  .  .
883  .SH "COMPILATION ERROR CODES"  .SH "COMPILATION ERROR CODES"
# Line 1238  returned. For anchored patterns, -2 is r Line 1239  returned. For anchored patterns, -2 is r
1239  .P  .P
1240  Since for the 32-bit library using the non-UTF-32 mode, this function is unable  Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1241  to return the full 32-bit range of the character, this value is deprecated;  to return the full 32-bit range of the character, this value is deprecated;
1242  instead the PCRE_INFO_FIRSTLITERALSET and PCRE_INFO_FIRSTLITERAL values should  instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values
1243  be used.  should be used.
1244  .sp  .sp
1245    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
1246  .sp  .sp
# Line 1290  is -1. Line 1291  is -1.
1291  .P  .P
1292  Since for the 32-bit library using the non-UTF-32 mode, this function is unable  Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1293  to return the full 32-bit range of the character, this value is deprecated;  to return the full 32-bit range of the character, this value is deprecated;
1294  instead the PCRE_INFO_LASTLITERAL2SET and PCRE_INFO_LASTLITERAL2 values should  instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
1295  be used.  be used.
1296  .sp  .sp
1297    PCRE_INFO_MAXLOOKBEHIND    PCRE_INFO_MAXLOOKBEHIND
# Line 1436  is made available via this option so tha Line 1437  is made available via this option so tha
1437  .\"  .\"
1438  documentation for details).  documentation for details).
1439  .sp  .sp
1440    PCRE_INFO_FIRSTLITERALSET    PCRE_INFO_FIRSTCHARACTERFLAGS
1441  .sp  .sp
1442  Return information about the first data unit of any matched string, for a  Return information about the first data unit of any matched string, for a
1443  non-anchored pattern. The fourth argument should point to an \fBint\fP  non-anchored pattern. The fourth argument should point to an \fBint\fP
# Line 1444  variable. Line 1445  variable.
1445  .P  .P
1446  If there is a fixed first value, for example, the letter "c" from a pattern  If there is a fixed first value, for example, the letter "c" from a pattern
1447  such as (cat|cow|coyote), 1 is returned, and the character value can be  such as (cat|cow|coyote), 1 is returned, and the character value can be
1448  retrieved using PCRE_INFO_FIRSTLITERAL.  retrieved using PCRE_INFO_FIRSTCHARACTER.
1449  .P  .P
1450  If there is no fixed first value, and if either  If there is no fixed first value, and if either
1451  .sp  .sp
# Line 1458  starts with "^", or Line 1459  starts with "^", or
1459  subject string or after any newline within the string. Otherwise 0 is  subject string or after any newline within the string. Otherwise 0 is
1460  returned. For anchored patterns, 0 is returned.  returned. For anchored patterns, 0 is returned.
1461  .sp  .sp
1462    PCRE_INFO_FIRSTLITERAL    PCRE_INFO_FIRSTCHARACTER
1463  .sp  .sp
1464  Return the fixed first character value, if PCRE_INFO_FIRSTLITERALSET returned 1;  Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS
1465  otherwise returns 0. The fourth argument should point to an \fBuint_t\fP  returned 1; otherwise returns 0. The fourth argument should point to an
1466  variable.  \fBuint_t\fP variable.
1467  .P  .P
1468  In the 8-bit library, the value is always less than 256. In the 16-bit library  In the 8-bit library, the value is always less than 256. In the 16-bit library
1469  the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value  the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
# Line 1480  starts with "^", or Line 1481  starts with "^", or
1481  subject string or after any newline within the string. Otherwise -2 is  subject string or after any newline within the string. Otherwise -2 is
1482  returned. For anchored patterns, -2 is returned.  returned. For anchored patterns, -2 is returned.
1483  .sp  .sp
1484    PCRE_INFO_LASTLITERAL2SET    PCRE_INFO_REQUIREDCHARFLAGS
1485  .sp  .sp
1486  Returns 1 if there is a rightmost literal data unit that must exist in any matched  Returns 1 if there is a rightmost literal data unit that must exist in any
1487  string, other than at its start. The fourth argument should  point to an \fBint\fP  matched string, other than at its start. The fourth argument should  point to
1488  variable. If there is no such value, 0 is returned. If returning 1, the character  an \fBint\fP variable. If there is no such value, 0 is returned. If returning
1489  value itself can be retrieved using PCRE_INFO_LASTLITERAL2.  1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR.
1490  .P  .P
1491  For anchored patterns, a last literal value is recorded only if it follows something  For anchored patterns, a last literal value is recorded only if it follows
1492  of variable length. For example, for the pattern /^a\ed+z\ed+/ the returned value  something of variable length. For example, for the pattern /^a\ed+z\ed+/ the
1493  1 (with "z" returned from PCRE_INFO_LASTLITERAL2), but for /^a\edz\ed/ the returned  returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for
1494  value is 0.  /^a\edz\ed/ the returned value is 0.
1495  .sp  .sp
1496    PCRE_INFO_LASTLITERAL2    PCRE_INFO_REQUIREDCHAR
1497  .sp  .sp
1498  Return the value of the rightmost literal data unit that must exist in any  Return the value of the rightmost literal data unit that must exist in any
1499  matched string, other than at its start, if such a value has been recorded. The  matched string, other than at its start, if such a value has been recorded. The
# Line 2241  This error is given if a pattern that wa Line 2242  This error is given if a pattern that wa
2242  host with different endianness. The utility function  host with different endianness. The utility function
2243  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern
2244  so that it runs on the new host.  so that it runs on the new host.
2245    .sp
2246      PCRE_ERROR_JIT_BADOPTION
2247    .sp
2248    This error is returned when a pattern that was successfully studied using a JIT
2249    compile option is being matched, but the matching mode (partial or complete
2250    match) does not correspond to any JIT compilation mode. When the JIT fast path
2251    function is used, this error may be also given for invalid options. See the
2252    .\" HREF
2253    \fBpcrejit\fP
2254    .\"
2255    documentation for more details.
2256    .sp
2257      PCRE_ERROR_BADLENGTH      (-32)
2258    .sp
2259    This error is given if \fBpcre_exec()\fP is called with a negative value for
2260    the \fIlength\fP argument.
2261  .P  .P
2262  Error numbers -16 to -20, -22, and -30 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20, -22, and 30 are not used by \fBpcre_exec()\fP.
2263  .  .
2264  .  .
2265  .\" HTML <a name="badutf8reasons"></a>  .\" HTML <a name="badutf8reasons"></a>
# Line 2328  character. Line 2345  character.
2345  .sp  .sp
2346  The first byte of a character has the value 0xfe or 0xff. These values can  The first byte of a character has the value 0xfe or 0xff. These values can
2347  never occur in a valid UTF-8 string.  never occur in a valid UTF-8 string.
2348    .sp
2349      PCRE_UTF8_ERR2
2350    .sp
2351    Non-character. These are the last two characters in each plane (0xfffe, 0xffff,
2352    0x1fffe, 0x1ffff .. 0x10fffe, 0x10ffff), and the characters 0xfdd0..0xfdef.
2353  .  .
2354  .  .
2355  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"
# Line 2796  Cambridge CB2 3QH, England. Line 2818  Cambridge CB2 3QH, England.
2818  .rs  .rs
2819  .sp  .sp
2820  .nf  .nf
2821  Last updated: 07 September 2012  Last updated: 31 October 2012
2822  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2823  .fi  .fi

Legend:
Removed from v.1080  
changed lines
  Added in v.1194

  ViewVC Help
Powered by ViewVC 1.1.5