/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1055 by chpe, Tue Oct 16 15:53:30 2012 UTC revision 1191 by ph10, Tue Oct 30 16:50:57 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3 "07 September 2012" "PCRE 8.32"  .TH PCREAPI 3 "29 October 2012" "PCRE 8.32"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 139  PCRE - Perl-compatible regular expressio Line 139  PCRE - Perl-compatible regular expressio
139  .sp  .sp
140  From release 8.30, PCRE can be compiled as a library for handling 16-bit  From release 8.30, PCRE can be compiled as a library for handling 16-bit
141  character strings as well as, or instead of, the original library that handles  character strings as well as, or instead of, the original library that handles
142  8-bit character strings. From release 8.FIXME, PCRE can also be compiled as a  8-bit character strings. From release 8.32, PCRE can also be compiled as a
143  library for handling 32-bit character strings. To avoid too much complication,  library for handling 32-bit character strings. To avoid too much complication,
144  this document describes the 8-bit versions of the functions, with only  this document describes the 8-bit versions of the functions, with only
145  occasional references to the 16-bit and 32-bit libraries.  occasional references to the 16-bit and 32-bit libraries.
146  .P  .P
147  The 16-bit functions operate in the same way as their 8-bit counterparts; they  The 16-bit and 32-bit functions operate in the same way as their 8-bit
148  just use different data types for their arguments and results, and their names  counterparts; they just use different data types for their arguments and
149  start with \fBpcre16_\fP instead of \fBpcre_\fP. For every option that has UTF8  results, and their names start with \fBpcre16_\fP or \fBpcre32_\fP instead of
150  in its name (for example, PCRE_UTF8), there is a corresponding 16-bit name with  \fBpcre_\fP. For every option that has UTF8 in its name (for example,
151  UTF8 replaced by UTF16. This facility is in fact just cosmetic; the 16-bit  PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced
152  option names define the same bit values.  by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the
153  .P  16-bit and 32-bit option names define the same bit values.
 The 32-bit functions operate in the same way as their 8-bit counterparts; they  
 just use different data types for their arguments and results, and their names  
 start with \fBpcre32_\fP instead of \fBpcre_\fP. For every option that has UTF8  
 in its name (for example, PCRE_UTF8), there is a corresponding 32-bit name with  
 UTF8 replaced by UTF32. This facility is in fact just cosmetic; the 32-bit  
 option names define the same bit values.  
154  .P  .P
155  References to bytes and UTF-8 in this document should be read as references to  References to bytes and UTF-8 in this document should be read as references to
156  16-bit data quantities and UTF-16 when using the 16-bit library, unless  16-bit data quantities and UTF-16 when using the 16-bit library, or 32-bit data
157  specified otherwise. More details of the specific differences for the 16-bit  quantities and UTF-32 when using the 32-bit library, unless specified
158  library are given in the  otherwise. More details of the specific differences for the 16-bit and 32-bit
159    libraries are given in the
160  .\" HREF  .\" HREF
161  \fBpcre16\fP  \fBpcre16\fP
162  .\"  .\"
163  page.  and
 .  
 .P  
 References to bytes and UTF-8 in this document should be read as references to  
 32-bit data quantities and UTF-32 when using the 32-bit library, unless  
 specified otherwise. More details of the specific differences for the 32-bit  
 library are given in the  
164  .\" HREF  .\" HREF
165  \fBpcre32\fP  \fBpcre32\fP
166  .\"  .\"
167  page.  pages.
168  .  .
169  .  .
170  .SH "PCRE API OVERVIEW"  .SH "PCRE API OVERVIEW"
# Line 231  used if available, by setting an option Line 220  used if available, by setting an option
220  relevant. More complicated programs might need to make use of the functions  relevant. More complicated programs might need to make use of the functions
221  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and  \fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and
222  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.  \fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage.
223  These functions are discussed in the  .P
224    From release 8.32 there is also a direct interface for JIT execution, which
225    gives improved performance. The JIT-specific functions are discussed in the
226  .\" HREF  .\" HREF
227  \fBpcrejit\fP  \fBpcrejit\fP
228  .\"  .\"
# Line 860  page. Line 851  page.
851  .sp  .sp
852    PCRE_NO_UTF8_CHECK    PCRE_NO_UTF8_CHECK
853  .sp  .sp
854  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8  When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
855  string is automatically checked. There is a discussion about the  automatically checked. There is a discussion about the
856  .\" HTML <a href="pcreunicode.html#utf8strings">  .\" HTML <a href="pcreunicode.html#utf8strings">
857  .\" </a>  .\" </a>
858  validity of UTF-8 strings  validity of UTF-8 strings
# Line 876  this check for performance reasons, you Line 867  this check for performance reasons, you
867  When it is set, the effect of passing an invalid UTF-8 string as a pattern is  When it is set, the effect of passing an invalid UTF-8 string as a pattern is
868  undefined. It may cause your program to crash. Note that this option can also  undefined. It may cause your program to crash. Note that this option can also
869  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the  be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress the
870  validity checking of subject strings.  validity checking of subject strings only. If the same string is being matched
871    many times, the option can be safely set for the second and subsequent
872    matchings to improve performance.
873  .  .
874  .  .
875  .SH "COMPILATION ERROR CODES"  .SH "COMPILATION ERROR CODES"
# Line 1235  starts with "^", or Line 1228  starts with "^", or
1228  -1 is returned, indicating that the pattern matches only at the start of a  -1 is returned, indicating that the pattern matches only at the start of a
1229  subject string or after any newline within the string. Otherwise -2 is  subject string or after any newline within the string. Otherwise -2 is
1230  returned. For anchored patterns, -2 is returned.  returned. For anchored patterns, -2 is returned.
1231    .P
1232    Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1233    to return the full 32-bit range of the character, this value is deprecated;
1234    instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values
1235    should be used.
1236  .sp  .sp
1237    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
1238  .sp  .sp
# Line 1282  value, -1 is returned. For anchored patt Line 1280  value, -1 is returned. For anchored patt
1280  only if it follows something of variable length. For example, for the pattern  only if it follows something of variable length. For example, for the pattern
1281  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value
1282  is -1.  is -1.
1283    .P
1284    Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1285    to return the full 32-bit range of the character, this value is deprecated;
1286    instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
1287    be used.
1288  .sp  .sp
1289    PCRE_INFO_MAXLOOKBEHIND    PCRE_INFO_MAXLOOKBEHIND
1290  .sp  .sp
# Line 1425  is made available via this option so tha Line 1428  is made available via this option so tha
1428  \fBpcreprecompile\fP  \fBpcreprecompile\fP
1429  .\"  .\"
1430  documentation for details).  documentation for details).
1431    .sp
1432      PCRE_INFO_FIRSTCHARACTERFLAGS
1433    .sp
1434    Return information about the first data unit of any matched string, for a
1435    non-anchored pattern. The fourth argument should point to an \fBint\fP
1436    variable.
1437    .P
1438    If there is a fixed first value, for example, the letter "c" from a pattern
1439    such as (cat|cow|coyote), 1 is returned, and the character value can be
1440    retrieved using PCRE_INFO_FIRSTCHARACTER.
1441    .P
1442    If there is no fixed first value, and if either
1443    .sp
1444    (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
1445    starts with "^", or
1446    .sp
1447    (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
1448    (if it were set, the pattern would be anchored),
1449    .sp
1450    2 is returned, indicating that the pattern matches only at the start of a
1451    subject string or after any newline within the string. Otherwise 0 is
1452    returned. For anchored patterns, 0 is returned.
1453    .sp
1454      PCRE_INFO_FIRSTCHARACTER
1455    .sp
1456    Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS
1457    returned 1; otherwise returns 0. The fourth argument should point to an
1458    \fBuint_t\fP variable.
1459    .P
1460    In the 8-bit library, the value is always less than 256. In the 16-bit library
1461    the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
1462    can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
1463    .P
1464    If there is no fixed first value, and if either
1465    .sp
1466    (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
1467    starts with "^", or
1468    .sp
1469    (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
1470    (if it were set, the pattern would be anchored),
1471    .sp
1472    -1 is returned, indicating that the pattern matches only at the start of a
1473    subject string or after any newline within the string. Otherwise -2 is
1474    returned. For anchored patterns, -2 is returned.
1475    .sp
1476      PCRE_INFO_REQUIREDCHARFLAGS
1477    .sp
1478    Returns 1 if there is a rightmost literal data unit that must exist in any
1479    matched string, other than at its start. The fourth argument should  point to
1480    an \fBint\fP variable. If there is no such value, 0 is returned. If returning
1481    1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR.
1482    .P
1483    For anchored patterns, a last literal value is recorded only if it follows
1484    something of variable length. For example, for the pattern /^a\ed+z\ed+/ the
1485    returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for
1486    /^a\edz\ed/ the returned value is 0.
1487    .sp
1488      PCRE_INFO_REQUIREDCHAR
1489    .sp
1490    Return the value of the rightmost literal data unit that must exist in any
1491    matched string, other than at its start, if such a value has been recorded. The
1492    fourth argument should point to an \fBuint32_t\fP variable. If there is no such
1493    value, 0 is returned.
1494  .  .
1495  .  .
1496  .SH "REFERENCE COUNTS"  .SH "REFERENCE COUNTS"
# Line 2168  This error is given if a pattern that wa Line 2234  This error is given if a pattern that wa
2234  host with different endianness. The utility function  host with different endianness. The utility function
2235  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern  \fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern
2236  so that it runs on the new host.  so that it runs on the new host.
2237    .sp
2238      PCRE_ERROR_BADLENGTH      (-32)
2239    .sp
2240    This error is given if \fBpcre_exec()\fP is called with a negative value for
2241    the \fIlength\fP argument.
2242  .P  .P
2243  Error numbers -16 to -20, -22, and -30 are not used by \fBpcre_exec()\fP.  Error numbers -16 to -20, -22, 30, and -31 are not used by \fBpcre_exec()\fP.
2244  .  .
2245  .  .
2246  .\" HTML <a name="badutf8reasons"></a>  .\" HTML <a name="badutf8reasons"></a>
# Line 2255  character. Line 2326  character.
2326  .sp  .sp
2327  The first byte of a character has the value 0xfe or 0xff. These values can  The first byte of a character has the value 0xfe or 0xff. These values can
2328  never occur in a valid UTF-8 string.  never occur in a valid UTF-8 string.
2329    .sp
2330      PCRE_UTF8_ERR2
2331    .sp
2332    Non-character. These are the last two characters in each plane (0xfffe, 0xffff,
2333    0x1fffe, 0x1ffff .. 0x10fffe, 0x10ffff), and the characters 0xfdd0..0xfdef.
2334  .  .
2335  .  .
2336  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"
# Line 2723  Cambridge CB2 3QH, England. Line 2799  Cambridge CB2 3QH, England.
2799  .rs  .rs
2800  .sp  .sp
2801  .nf  .nf
2802  Last updated: 07 September 2012  Last updated: 29 October 2012
2803  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2804  .fi  .fi

Legend:
Removed from v.1055  
changed lines
  Added in v.1191

  ViewVC Help
Powered by ViewVC 1.1.5