/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1055 by chpe, Tue Oct 16 15:53:30 2012 UTC revision 1113 by chpe, Tue Oct 16 15:57:12 2012 UTC
# Line 139  PCRE - Perl-compatible regular expressio Line 139  PCRE - Perl-compatible regular expressio
139  .sp  .sp
140  From release 8.30, PCRE can be compiled as a library for handling 16-bit  From release 8.30, PCRE can be compiled as a library for handling 16-bit
141  character strings as well as, or instead of, the original library that handles  character strings as well as, or instead of, the original library that handles
142  8-bit character strings. From release 8.FIXME, PCRE can also be compiled as a  8-bit character strings. From release 8.32, PCRE can also be compiled as a
143  library for handling 32-bit character strings. To avoid too much complication,  library for handling 32-bit character strings. To avoid too much complication,
144  this document describes the 8-bit versions of the functions, with only  this document describes the 8-bit versions of the functions, with only
145  occasional references to the 16-bit and 32-bit libraries.  occasional references to the 16-bit and 32-bit libraries.
# Line 1235  starts with "^", or Line 1235  starts with "^", or
1235  -1 is returned, indicating that the pattern matches only at the start of a  -1 is returned, indicating that the pattern matches only at the start of a
1236  subject string or after any newline within the string. Otherwise -2 is  subject string or after any newline within the string. Otherwise -2 is
1237  returned. For anchored patterns, -2 is returned.  returned. For anchored patterns, -2 is returned.
1238    .P
1239    Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1240    to return the full 32-bit range of the character, this value is deprecated;
1241    instead the PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER values should
1242    be used.
1243  .sp  .sp
1244    PCRE_INFO_FIRSTTABLE    PCRE_INFO_FIRSTTABLE
1245  .sp  .sp
# Line 1282  value, -1 is returned. For anchored patt Line 1287  value, -1 is returned. For anchored patt
1287  only if it follows something of variable length. For example, for the pattern  only if it follows something of variable length. For example, for the pattern
1288  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value  /^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value
1289  is -1.  is -1.
1290    .P
1291    Since for the 32-bit library using the non-UTF-32 mode, this function is unable
1292    to return the full 32-bit range of the character, this value is deprecated;
1293    instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should
1294    be used.
1295  .sp  .sp
1296    PCRE_INFO_MAXLOOKBEHIND    PCRE_INFO_MAXLOOKBEHIND
1297  .sp  .sp
# Line 1425  is made available via this option so tha Line 1435  is made available via this option so tha
1435  \fBpcreprecompile\fP  \fBpcreprecompile\fP
1436  .\"  .\"
1437  documentation for details).  documentation for details).
1438    .sp
1439      PCRE_INFO_FIRSTCHARACTERFLAGS
1440    .sp
1441    Return information about the first data unit of any matched string, for a
1442    non-anchored pattern. The fourth argument should point to an \fBint\fP
1443    variable.
1444    .P
1445    If there is a fixed first value, for example, the letter "c" from a pattern
1446    such as (cat|cow|coyote), 1 is returned, and the character value can be
1447    retrieved using PCRE_INFO_FIRSTCHARACTER.
1448    .P
1449    If there is no fixed first value, and if either
1450    .sp
1451    (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
1452    starts with "^", or
1453    .sp
1454    (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
1455    (if it were set, the pattern would be anchored),
1456    .sp
1457    2 is returned, indicating that the pattern matches only at the start of a
1458    subject string or after any newline within the string. Otherwise 0 is
1459    returned. For anchored patterns, 0 is returned.
1460    .sp
1461      PCRE_INFO_FIRSTCHARACTER
1462    .sp
1463    Return the fixed first character value, if PCRE_INFO_FIRSTCHARACTERFLAGS returned 1;
1464    otherwise returns 0. The fourth argument should point to an \fBuint_t\fP
1465    variable.
1466    .P
1467    In the 8-bit library, the value is always less than 256. In the 16-bit library
1468    the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value
1469    can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode.
1470    .P
1471    If there is no fixed first value, and if either
1472    .sp
1473    (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch
1474    starts with "^", or
1475    .sp
1476    (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set
1477    (if it were set, the pattern would be anchored),
1478    .sp
1479    -1 is returned, indicating that the pattern matches only at the start of a
1480    subject string or after any newline within the string. Otherwise -2 is
1481    returned. For anchored patterns, -2 is returned.
1482    .sp
1483      PCRE_INFO_REQUIREDCHARFLAGS
1484    .sp
1485    Returns 1 if there is a rightmost literal data unit that must exist in any matched
1486    string, other than at its start. The fourth argument should  point to an \fBint\fP
1487    variable. If there is no such value, 0 is returned. If returning 1, the character
1488    value itself can be retrieved using PCRE_INFO_REQUIREDCHAR.
1489    .P
1490    For anchored patterns, a last literal value is recorded only if it follows something
1491    of variable length. For example, for the pattern /^a\ed+z\ed+/ the returned value
1492    1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for /^a\edz\ed/ the returned
1493    value is 0.
1494    .sp
1495      PCRE_INFO_REQUIREDCHAR
1496    .sp
1497    Return the value of the rightmost literal data unit that must exist in any
1498    matched string, other than at its start, if such a value has been recorded. The
1499    fourth argument should point to an \fBuint32_t\fP variable. If there is no such
1500    value, 0 is returned.
1501  .  .
1502  .  .
1503  .SH "REFERENCE COUNTS"  .SH "REFERENCE COUNTS"
# Line 2255  character. Line 2328  character.
2328  .sp  .sp
2329  The first byte of a character has the value 0xfe or 0xff. These values can  The first byte of a character has the value 0xfe or 0xff. These values can
2330  never occur in a valid UTF-8 string.  never occur in a valid UTF-8 string.
2331    .sp
2332      PCRE_UTF8_ERR2
2333    .sp
2334    Non-character. These are the last two characters in each plane (0xfffe, 0xffff,
2335    0x1fffe, 0x1ffff .. 0x10fffe, 0x10ffff), and the characters 0xfdd0..0xfdef.
2336  .  .
2337  .  .
2338  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"  .SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER"

Legend:
Removed from v.1055  
changed lines
  Added in v.1113

  ViewVC Help
Powered by ViewVC 1.1.5