/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1103 by chpe, Tue Oct 16 15:56:38 2012 UTC revision 1191 by ph10, Tue Oct 30 16:50:57 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3 "07 September 2012" "PCRE 8.32"  .TH PCREAPI 3 "29 October 2012" "PCRE 8.32"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 144  library for handling 32-bit character st Line 144  library for handling 32-bit character st
144  this document describes the 8-bit versions of the functions, with only  this document describes the 8-bit versions of the functions, with only
145  occasional references to the 16-bit and 32-bit libraries.  occasional references to the 16-bit and 32-bit libraries.
146  .P  .P
147  The 16-bit functions operate in the same way as their 8-bit counterparts; they  The 16-bit and 32-bit functions operate in the same way as their 8-bit
148  just use different data types for their arguments and results, and their names  counterparts; they just use different data types for their arguments and
149  start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8  results, and their names start with \fBpcre16_\fP or \fBpcre32_\fP instead of
150  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with  \fBpcre_\fP. For every option that has UTF8 in its name (for example,
151  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit  PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced
152  option names define the same bit values.  by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the
153  .P  16-bit and 32-bit option names define the same bit values.
 The 32-bit functions operate in the same way as their 8-bit counterparts; they  
 just use different data types for their arguments and results, and their names  
 start with \fBpcre32_\fP instead of \fBpcre_\fP. For every option that has UTF8  
 in its name (for example, PCRE_UTF8), there is a corresponding 32-bit name with  
 UTF8 replaced by UTF32. This facility is in fact just cosmetic; the 32-bit  
 option names define the same bit values.  
154  .P  .P
155  References to bytes and UTF-8 in this document should be read as references to  References to bytes and UTF-8 in this document should be read as references to
156  16-bit data quantities and UTF-16 when using the 16-bit library, unless  16-bit data quantities and UTF-16 when using the 16-bit library, or 32-bit data
157  specified otherwise. More details of the specific differences for the 16-bit  quantities and UTF-32 when using the 32-bit library, unless specified
158  library are given in the  otherwise. More details of the specific differences for the 16-bit and 32-bit
159    libraries are given in the
160  .\" HREF  .\" HREF
161  \fBpcre16\fP  \fBpcre16\fP
162  .\"  .\"
163  page.  and
 .  
 .P  
 References to bytes and UTF-8 in this document should be read as references to  
 32-bit data quantities and UTF-32 when using the 32-bit library, unless  
 specified otherwise. More details of the specific differences for the 32-bit  
 library are given in the  
164  .\" HREF  .\" HREF
165  \fBpcre32\fP  \fBpcre32\fP
166  .\"  .\"
167  page.  pages.
168  .  .
169  .  .
170  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
# Line 231  used if available, by setting an option Line 220  used if available, by setting an option
220  relevant. More complicated programs might need to make use of the functions  relevant. More complicated programs might need to make use of the functions
221  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and
222  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.
223  These functions are discussed in the  .P
224    From release 8.32 there is also a direct interface for JIT execution, which
225    gives improved performance. The JIT-specific functions are discussed in the
226  .\" HREF  .\" HREF
227  \fBpcrejit\fP  \fBpcrejit\fP
228  .\"  .\"
# Line 860  page. Line 851  page.
851  .sp  .sp
852    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
853  .sp  .sp
854  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
855  string is automatically checked. There is a discussion about the  automatically checked. There is a discussion about the
856  .\" HTML <a href="pcreunicode.html#utf8strings">  .\" HTML <a href="pcreunicode.html#utf8strings">
857  .\" </a>  .\" </a>
858  validity of UTF-8 strings  validity of UTF-8 strings
# Line 876  this check for performance reasons, you Line 867  this check for performance reasons, you
867  When it is set, the effect of passing an invalid UTF-8 string as a pattern is  When it is set, the effect of passing an invalid UTF-8 string as a pattern is
868  undefined. It may cause your program to crash. Note that this option can also  undefined. It may cause your program to crash. Note that this option can also
869  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the
870  validity checking of subject strings.  validity checking of subject strings only. If the same string is being matched
871    many times, the option can be safely set for the second and subsequent
872    matchings to improve performance.
873  .  .
874  .  .
875  .SH "COMPILATION ERROR CODES"  .SH "COMPILATION ERROR CODES"
# Line 1238  returned. For anchored patterns, -2 is r Line 1231  returned. For anchored patterns, -2 is r
1231  .P  .P
1232  Since for the 32-bit library using the non-UTF-32 mode, this function is unable  Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1233  to return the full 32-bit range of the character, this value is deprecated;  to return the full 32-bit range of the character, this value is deprecated;
1234  instead the PCRE_INFO_FIRSTLITERALSET and PCRE_INFO_FIRSTLITERAL values should  instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values
1235  be used.  should be used.
1236  .sp  .sp
1237    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
1238  .sp  .sp
# Line 1290  is -1. Line 1283  is -1.
1283  .P  .P
1284  Since for the 32-bit library using the non-UTF-32 mode, this function is unable  Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1285  to return the full 32-bit range of the character, this value is deprecated;  to return the full 32-bit range of the character, this value is deprecated;
1286  instead the PCRE_INFO_LASTLITERAL2SET and PCRE_INFO_LASTLITERAL2 values should  instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
1287  be used.  be used.
1288  .sp  .sp
1289    PCRE_INFO_MAXLOOKBEHIND    PCRE_INFO_MAXLOOKBEHIND
# Line 1436  is made available via this option so tha Line 1429  is made available via this option so tha
1429  .\"  .\"
1430  documentation for details).  documentation for details).
1431  .sp  .sp
1432    PCRE_INFO_FIRSTLITERALSET    PCRE_INFO_FIRSTCHARACTERFLAGS
1433  .sp  .sp
1434  Return information about the first data unit of any matched string, for a  Return information about the first data unit of any matched string, for a
1435  non-anchored pattern. The fourth argument should point to an \fBint\fP  non-anchored pattern. The fourth argument should point to an \fBint\fP
# Line 1444  variable. Line 1437  variable.
1437  .P  .P
1438  If there is a fixed first value, for example, the letter "c" from a pattern  If there is a fixed first value, for example, the letter "c" from a pattern
1439  such as (cat|cow|coyote), 1 is returned, and the character value can be  such as (cat|cow|coyote), 1 is returned, and the character value can be
1440  retrieved using PCRE_INFO_FIRSTLITERAL.  retrieved using PCRE_INFO_FIRSTCHARACTER.
1441  .P  .P
1442  If there is no fixed first value, and if either  If there is no fixed first value, and if either
1443  .sp  .sp
# Line 1458  starts with "^", or Line 1451  starts with "^", or
1451  subject string or after any newline within the string. Otherwise 0 is  subject string or after any newline within the string. Otherwise 0 is
1452  returned. For anchored patterns, 0 is returned.  returned. For anchored patterns, 0 is returned.
1453  .sp  .sp
1454    PCRE_INFO_FIRSTLITERAL    PCRE_INFO_FIRSTCHARACTER
1455  .sp  .sp
1456  Return the fixed first character value, if PCRE_INFO_FIRSTLITERALSET returned 1;  Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS
1457  otherwise returns 0. The fourth argument should point to an \fBuint_t\fP  returned 1; otherwise returns 0. The fourth argument should point to an
1458  variable.  \fBuint_t\fP variable.
1459  .P  .P
1460  In the 8-bit library, the value is always less than 256. In the 16-bit library  In the 8-bit library, the value is always less than 256. In the 16-bit library
1461  the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value  the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
# Line 1480  starts with "^", or Line 1473  starts with "^", or
1473  subject string or after any newline within the string. Otherwise -2 is  subject string or after any newline within the string. Otherwise -2 is
1474  returned. For anchored patterns, -2 is returned.  returned. For anchored patterns, -2 is returned.
1475  .sp  .sp
1476    PCRE_INFO_LASTLITERAL2SET    PCRE_INFO_REQUIREDCHARFLAGS
1477  .sp  .sp
1478  Returns 1 if there is a rightmost literal data unit that must exist in any matched  Returns 1 if there is a rightmost literal data unit that must exist in any
1479  string, other than at its start. The fourth argument should  point to an \fBint\fP  matched string, other than at its start. The fourth argument should  point to
1480  variable. If there is no such value, 0 is returned. If returning 1, the character  an \fBint\fP variable. If there is no such value, 0 is returned. If returning
1481  value itself can be retrieved using PCRE_INFO_LASTLITERAL2.  1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR.
1482  .P  .P
1483  For anchored patterns, a last literal value is recorded only if it follows something  For anchored patterns, a last literal value is recorded only if it follows
1484  of variable length. For example, for the pattern /^a\ed+z\ed+/ the returned value  something of variable length. For example, for the pattern /^a\ed+z\ed+/ the
1485  1 (with "z" returned from PCRE_INFO_LASTLITERAL2), but for /^a\edz\ed/ the returned  returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for
1486  value is 0.  /^a\edz\ed/ the returned value is 0.
1487  .sp  .sp
1488    PCRE_INFO_LASTLITERAL2    PCRE_INFO_REQUIREDCHAR
1489  .sp  .sp
1490  Return the value of the rightmost literal data unit that must exist in any  Return the value of the rightmost literal data unit that must exist in any
1491  matched string, other than at its start, if such a value has been recorded. The  matched string, other than at its start, if such a value has been recorded. The
# Line 2241  This error is given if a pattern that wa Line 2234  This error is given if a pattern that wa
2234  host with different endianness. The utility function  host with different endianness. The utility function
2235  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern
2236  so that it runs on the new host.  so that it runs on the new host.
2237    .sp
2238      PCRE_ERROR_BADLENGTH      (-32)
2239    .sp
2240    This error is given if \fBpcre_exec()\fP is called with a negative value for
2241    the \fIlength\fP argument.
2242  .P  .P
2243  Error numbers -16 to -20, -22, and -30 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20, -22, 30, and -31 are not used by \fBpcre_exec()\fP.
2244  .  .
2245  .  .
2246  .\" HTML <a name="badutf8reasons"></a>  .\" HTML <a name="badutf8reasons"></a>
# Line 2801  Cambridge CB2 3QH, England. Line 2799  Cambridge CB2 3QH, England.
2799  .rs  .rs
2800  .sp  .sp
2801  .nf  .nf
2802  Last updated: 07 September 2012  Last updated: 29 October 2012
2803  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2804  .fi  .fi

Legend:
Removed from v.1103  
changed lines
  Added in v.1191

  ViewVC Help
Powered by ViewVC 1.1.5