/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1055 by chpe, Tue Oct 16 15:53:30 2012 UTC revision 1194 by ph10, Wed Oct 31 17:42:29 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3 "07 September 2012" "PCRE 8.32"  .TH PCREAPI 3 "31 October 2012" "PCRE 8.32"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 95  PCRE - Perl-compatible regular expressio Line 95  PCRE - Perl-compatible regular expressio
95  .SH "PCRE NATIVE API AUXILIARY FUNCTIONS"  .SH "PCRE NATIVE API AUXILIARY FUNCTIONS"
96  .rs  .rs
97  .sp  .sp
98    .B int pcre_jit_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
99    .ti +5n
100    .B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
101    .ti +5n
102    .B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
103    .ti +5n
104    .B pcre_jit_stack *\fIjstack\fP);
105    .PP
106  .B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);  .B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP);
107  .PP  .PP
108  .B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP);  .B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP);
# Line 139  PCRE - Perl-compatible regular expressio Line 147  PCRE - Perl-compatible regular expressio
147  .sp  .sp
148  From release 8.30, PCRE can be compiled as a library for handling 16-bit  From release 8.30, PCRE can be compiled as a library for handling 16-bit
149  character strings as well as, or instead of, the original library that handles  character strings as well as, or instead of, the original library that handles
150  8-bit character strings. From release 8.FIXME, PCRE can also be compiled as a  8-bit character strings. From release 8.32, PCRE can also be compiled as a
151  library for handling 32-bit character strings. To avoid too much complication,  library for handling 32-bit character strings. To avoid too much complication,
152  this document describes the 8-bit versions of the functions, with only  this document describes the 8-bit versions of the functions, with only
153  occasional references to the 16-bit and 32-bit libraries.  occasional references to the 16-bit and 32-bit libraries.
154  .P  .P
155  The 16-bit functions operate in the same way as their 8-bit counterparts; they  The 16-bit and 32-bit functions operate in the same way as their 8-bit
156  just use different data types for their arguments and results, and their names  counterparts; they just use different data types for their arguments and
157  start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8  results, and their names start with \fBpcre16_\fP or \fBpcre32_\fP instead of
158  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with  \fBpcre_\fP. For every option that has UTF8 in its name (for example,
159  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit  PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced
160  option names define the same bit values.  by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the
161  .P  16-bit and 32-bit option names define the same bit values.
 The 32-bit functions operate in the same way as their 8-bit counterparts; they  
 just use different data types for their arguments and results, and their names  
 start with \fBpcre32_\fP instead of \fBpcre_\fP. For every option that has UTF8  
 in its name (for example, PCRE_UTF8), there is a corresponding 32-bit name with  
 UTF8 replaced by UTF32. This facility is in fact just cosmetic; the 32-bit  
 option names define the same bit values.  
162  .P  .P
163  References to bytes and UTF-8 in this document should be read as references to  References to bytes and UTF-8 in this document should be read as references to
164  16-bit data quantities and UTF-16 when using the 16-bit library, unless  16-bit data quantities and UTF-16 when using the 16-bit library, or 32-bit data
165  specified otherwise. More details of the specific differences for the 16-bit  quantities and UTF-32 when using the 32-bit library, unless specified
166  library are given in the  otherwise. More details of the specific differences for the 16-bit and 32-bit
167    libraries are given in the
168  .\" HREF  .\" HREF
169  \fBpcre16\fP  \fBpcre16\fP
170  .\"  .\"
171  page.  and
 .  
 .P  
 References to bytes and UTF-8 in this document should be read as references to  
 32-bit data quantities and UTF-32 when using the 32-bit library, unless  
 specified otherwise. More details of the specific differences for the 32-bit  
 library are given in the  
172  .\" HREF  .\" HREF
173  \fBpcre32\fP  \fBpcre32\fP
174  .\"  .\"
175  page.  pages.
176  .  .
177  .  .
178  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
# Line 231  used if available, by setting an option Line 228  used if available, by setting an option
228  relevant. More complicated programs might need to make use of the functions  relevant. More complicated programs might need to make use of the functions
229  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and
230  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.
231  These functions are discussed in the  .P
232    From release 8.32 there is also a direct interface for JIT execution, which
233    gives improved performance. The JIT-specific functions are discussed in the
234  .\" HREF  .\" HREF
235  \fBpcrejit\fP  \fBpcrejit\fP
236  .\"  .\"
# Line 860  page. Line 859  page.
859  .sp  .sp
860    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
861  .sp  .sp
862  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
863  string is automatically checked. There is a discussion about the  automatically checked. There is a discussion about the
864  .\" HTML <a href="pcreunicode.html#utf8strings">  .\" HTML <a href="pcreunicode.html#utf8strings">
865  .\" </a>  .\" </a>
866  validity of UTF-8 strings  validity of UTF-8 strings
# Line 876  this check for performance reasons, you Line 875  this check for performance reasons, you
875  When it is set, the effect of passing an invalid UTF-8 string as a pattern is  When it is set, the effect of passing an invalid UTF-8 string as a pattern is
876  undefined. It may cause your program to crash. Note that this option can also  undefined. It may cause your program to crash. Note that this option can also
877  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the
878  validity checking of subject strings.  validity checking of subject strings only. If the same string is being matched
879    many times, the option can be safely set for the second and subsequent
880    matchings to improve performance.
881  .  .
882  .  .
883  .SH "COMPILATION ERROR CODES"  .SH "COMPILATION ERROR CODES"
# Line 1235  starts with "^", or Line 1236  starts with "^", or
1236  -1 is returned, indicating that the pattern matches only at the start of a  -1 is returned, indicating that the pattern matches only at the start of a
1237  subject string or after any newline within the string. Otherwise -2 is  subject string or after any newline within the string. Otherwise -2 is
1238  returned. For anchored patterns, -2 is returned.  returned. For anchored patterns, -2 is returned.
1239    .P
1240    Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1241    to return the full 32-bit range of the character, this value is deprecated;
1242    instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values
1243    should be used.
1244  .sp  .sp
1245    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
1246  .sp  .sp
# Line 1282  value, -1 is returned. For anchored patt Line 1288  value, -1 is returned. For anchored patt
1288  only if it follows something of variable length. For example, for the pattern  only if it follows something of variable length. For example, for the pattern
1289  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value
1290  is -1.  is -1.
1291    .P
1292    Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1293    to return the full 32-bit range of the character, this value is deprecated;
1294    instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
1295    be used.
1296  .sp  .sp
1297    PCRE_INFO_MAXLOOKBEHIND    PCRE_INFO_MAXLOOKBEHIND
1298  .sp  .sp
# Line 1425  is made available via this option so tha Line 1436  is made available via this option so tha
1436  \fBpcreprecompile\fP  \fBpcreprecompile\fP
1437  .\"  .\"
1438  documentation for details).  documentation for details).
1439    .sp
1440      PCRE_INFO_FIRSTCHARACTERFLAGS
1441    .sp
1442    Return information about the first data unit of any matched string, for a
1443    non-anchored pattern. The fourth argument should point to an \fBint\fP
1444    variable.
1445    .P
1446    If there is a fixed first value, for example, the letter "c" from a pattern
1447    such as (cat|cow|coyote), 1 is returned, and the character value can be
1448    retrieved using PCRE_INFO_FIRSTCHARACTER.
1449    .P
1450    If there is no fixed first value, and if either
1451    .sp
1452    (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
1453    starts with "^", or
1454    .sp
1455    (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
1456    (if it were set, the pattern would be anchored),
1457    .sp
1458    2 is returned, indicating that the pattern matches only at the start of a
1459    subject string or after any newline within the string. Otherwise 0 is
1460    returned. For anchored patterns, 0 is returned.
1461    .sp
1462      PCRE_INFO_FIRSTCHARACTER
1463    .sp
1464    Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS
1465    returned 1; otherwise returns 0. The fourth argument should point to an
1466    \fBuint_t\fP variable.
1467    .P
1468    In the 8-bit library, the value is always less than 256. In the 16-bit library
1469    the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
1470    can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
1471    .P
1472    If there is no fixed first value, and if either
1473    .sp
1474    (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
1475    starts with "^", or
1476    .sp
1477    (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
1478    (if it were set, the pattern would be anchored),
1479    .sp
1480    -1 is returned, indicating that the pattern matches only at the start of a
1481    subject string or after any newline within the string. Otherwise -2 is
1482    returned. For anchored patterns, -2 is returned.
1483    .sp
1484      PCRE_INFO_REQUIREDCHARFLAGS
1485    .sp
1486    Returns 1 if there is a rightmost literal data unit that must exist in any
1487    matched string, other than at its start. The fourth argument should  point to
1488    an \fBint\fP variable. If there is no such value, 0 is returned. If returning
1489    1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR.
1490    .P
1491    For anchored patterns, a last literal value is recorded only if it follows
1492    something of variable length. For example, for the pattern /^a\ed+z\ed+/ the
1493    returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for
1494    /^a\edz\ed/ the returned value is 0.
1495    .sp
1496      PCRE_INFO_REQUIREDCHAR
1497    .sp
1498    Return the value of the rightmost literal data unit that must exist in any
1499    matched string, other than at its start, if such a value has been recorded. The
1500    fourth argument should point to an \fBuint32_t\fP variable. If there is no such
1501    value, 0 is returned.
1502  .  .
1503  .  .
1504  .SH "REFERENCE COUNTS"  .SH "REFERENCE COUNTS"
# Line 2168  This error is given if a pattern that wa Line 2242  This error is given if a pattern that wa
2242  host with different endianness. The utility function  host with different endianness. The utility function
2243  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern
2244  so that it runs on the new host.  so that it runs on the new host.
2245    .sp
2246      PCRE_ERROR_JIT_BADOPTION
2247    .sp
2248    This error is returned when a pattern that was successfully studied using a JIT
2249    compile option is being matched, but the matching mode (partial or complete
2250    match) does not correspond to any JIT compilation mode. When the JIT fast path
2251    function is used, this error may be also given for invalid options. See the
2252    .\" HREF
2253    \fBpcrejit\fP
2254    .\"
2255    documentation for more details.
2256    .sp
2257      PCRE_ERROR_BADLENGTH      (-32)
2258    .sp
2259    This error is given if \fBpcre_exec()\fP is called with a negative value for
2260    the \fIlength\fP argument.
2261  .P  .P
2262  Error numbers -16 to -20, -22, and -30 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20, -22, and 30 are not used by \fBpcre_exec()\fP.
2263  .  .
2264  .  .
2265  .\" HTML <a name="badutf8reasons"></a>  .\" HTML <a name="badutf8reasons"></a>
# Line 2255  character. Line 2345  character.
2345  .sp  .sp
2346  The first byte of a character has the value 0xfe or 0xff. These values can  The first byte of a character has the value 0xfe or 0xff. These values can
2347  never occur in a valid UTF-8 string.  never occur in a valid UTF-8 string.
2348    .sp
2349      PCRE_UTF8_ERR2
2350    .sp
2351    Non-character. These are the last two characters in each plane (0xfffe, 0xffff,
2352    0x1fffe, 0x1ffff .. 0x10fffe, 0x10ffff), and the characters 0xfdd0..0xfdef.
2353  .  .
2354  .  .
2355  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"
# Line 2723  Cambridge CB2 3QH, England. Line 2818  Cambridge CB2 3QH, England.
2818  .rs  .rs
2819  .sp  .sp
2820  .nf  .nf
2821  Last updated: 07 September 2012  Last updated: 31 October 2012
2822  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2823  .fi  .fi

Legend:
Removed from v.1055  
changed lines
  Added in v.1194

  ViewVC Help
Powered by ViewVC 1.1.5