1469 |
57 \g is not followed by a braced, angle-bracketed, or quoted |
57 \g is not followed by a braced, angle-bracketed, or quoted |
1470 |
name/number or by a plain number |
name/number or by a plain number |
1471 |
58 a numbered reference must not be zero |
58 a numbered reference must not be zero |
1472 |
59 (*VERB) with an argument is not supported |
59 an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) |
1473 |
60 (*VERB) not recognized |
60 (*VERB) not recognized |
1474 |
61 number is too big |
61 number is too big |
1475 |
62 subpattern name expected |
62 subpattern name expected |
1476 |
63 digit expected after (?+ |
63 digit expected after (?+ |
1477 |
64 ] is an invalid data character in JavaScript compatibility mode |
64 ] is an invalid data character in JavaScript compatibility mode |
1478 |
|
65 different names for subpatterns of the same number are not |
1479 |
|
allowed |
1480 |
|
66 (*MARK) must have an argument |
1481 |
|
|
1482 |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different |
The numbers 32 and 10000 in errors 48 and 49 are defaults; different |
1483 |
values may be used if the limits were changed when PCRE was built. |
values may be used if the limits were changed when PCRE was built. |
1484 |
|
|
1485 |
|
|
1488 |
pcre_extra *pcre_study(const pcre *code, int options |
pcre_extra *pcre_study(const pcre *code, int options |
1489 |
const char **errptr); |
const char **errptr); |
1490 |
|
|
1491 |
If a compiled pattern is going to be used several times, it is worth |
If a compiled pattern is going to be used several times, it is worth |
1492 |
spending more time analyzing it in order to speed up the time taken for |
spending more time analyzing it in order to speed up the time taken for |
1493 |
matching. The function pcre_study() takes a pointer to a compiled pat- |
matching. The function pcre_study() takes a pointer to a compiled pat- |
1494 |
tern as its first argument. If studying the pattern produces additional |
tern as its first argument. If studying the pattern produces additional |
1495 |
information that will help speed up matching, pcre_study() returns a |
information that will help speed up matching, pcre_study() returns a |
1496 |
pointer to a pcre_extra block, in which the study_data field points to |
pointer to a pcre_extra block, in which the study_data field points to |
1497 |
the results of the study. |
the results of the study. |
1498 |
|
|
1499 |
The returned value from pcre_study() can be passed directly to |
The returned value from pcre_study() can be passed directly to |
1500 |
pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block also con- |
pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block also con- |
1501 |
tains other fields that can be set by the caller before the block is |
tains other fields that can be set by the caller before the block is |
1502 |
passed; these are described below in the section on matching a pattern. |
passed; these are described below in the section on matching a pattern. |
1503 |
|
|
1504 |
If studying the pattern does not produce any useful information, |
If studying the pattern does not produce any useful information, |
1505 |
pcre_study() returns NULL. In that circumstance, if the calling program |
pcre_study() returns NULL. In that circumstance, if the calling program |
1506 |
wants to pass any of the other fields to pcre_exec() or |
wants to pass any of the other fields to pcre_exec() or |
1507 |
pcre_dfa_exec(), it must set up its own pcre_extra block. |
pcre_dfa_exec(), it must set up its own pcre_extra block. |
1508 |
|
|
1509 |
The second argument of pcre_study() contains option bits. At present, |
The second argument of pcre_study() contains option bits. At present, |
1510 |
no options are defined, and this argument should always be zero. |
no options are defined, and this argument should always be zero. |
1511 |
|
|
1512 |
The third argument for pcre_study() is a pointer for an error message. |
The third argument for pcre_study() is a pointer for an error message. |
1513 |
If studying succeeds (even if no data is returned), the variable it |
If studying succeeds (even if no data is returned), the variable it |
1514 |
points to is set to NULL. Otherwise it is set to point to a textual |
points to is set to NULL. Otherwise it is set to point to a textual |
1515 |
error message. This is a static string that is part of the library. You |
error message. This is a static string that is part of the library. You |
1516 |
must not try to free it. You should test the error pointer for NULL |
must not try to free it. You should test the error pointer for NULL |
1517 |
after calling pcre_study(), to be sure that it has run successfully. |
after calling pcre_study(), to be sure that it has run successfully. |
1518 |
|
|
1519 |
This is a typical call to pcre_study(): |
This is a typical call to pcre_study(): |
1527 |
Studying a pattern does two things: first, a lower bound for the length |
Studying a pattern does two things: first, a lower bound for the length |
1528 |
of subject string that is needed to match the pattern is computed. This |
of subject string that is needed to match the pattern is computed. This |
1529 |
does not mean that there are any strings of that length that match, but |
does not mean that there are any strings of that length that match, but |
1530 |
it does guarantee that no shorter strings match. The value is used by |
it does guarantee that no shorter strings match. The value is used by |
1531 |
pcre_exec() and pcre_dfa_exec() to avoid wasting time by trying to |
pcre_exec() and pcre_dfa_exec() to avoid wasting time by trying to |
1532 |
match strings that are shorter than the lower bound. You can find out |
match strings that are shorter than the lower bound. You can find out |
1533 |
the value in a calling program via the pcre_fullinfo() function. |
the value in a calling program via the pcre_fullinfo() function. |
1534 |
|
|
1535 |
Studying a pattern is also useful for non-anchored patterns that do not |
Studying a pattern is also useful for non-anchored patterns that do not |
1536 |
have a single fixed starting character. A bitmap of possible starting |
have a single fixed starting character. A bitmap of possible starting |
1537 |
bytes is created. This speeds up finding a position in the subject at |
bytes is created. This speeds up finding a position in the subject at |
1538 |
which to start matching. |
which to start matching. |
1539 |
|
|
1540 |
|
|
1541 |
LOCALE SUPPORT |
LOCALE SUPPORT |
1542 |
|
|
1543 |
PCRE handles caseless matching, and determines whether characters are |
PCRE handles caseless matching, and determines whether characters are |
1544 |
letters, digits, or whatever, by reference to a set of tables, indexed |
letters, digits, or whatever, by reference to a set of tables, indexed |
1545 |
by character value. When running in UTF-8 mode, this applies only to |
by character value. When running in UTF-8 mode, this applies only to |
1546 |
characters with codes less than 128. Higher-valued codes never match |
characters with codes less than 128. Higher-valued codes never match |
1547 |
escapes such as \w or \d, but can be tested with \p if PCRE is built |
escapes such as \w or \d, but can be tested with \p if PCRE is built |
1548 |
with Unicode character property support. The use of locales with Uni- |
with Unicode character property support. The use of locales with Uni- |
1549 |
code is discouraged. If you are handling characters with codes greater |
code is discouraged. If you are handling characters with codes greater |
1550 |
than 128, you should either use UTF-8 and Unicode, or use locales, but |
than 128, you should either use UTF-8 and Unicode, or use locales, but |
1551 |
not try to mix the two. |
not try to mix the two. |
1552 |
|
|
1553 |
PCRE contains an internal set of tables that are used when the final |
PCRE contains an internal set of tables that are used when the final |
1554 |
argument of pcre_compile() is NULL. These are sufficient for many |
argument of pcre_compile() is NULL. These are sufficient for many |
1555 |
applications. Normally, the internal tables recognize only ASCII char- |
applications. Normally, the internal tables recognize only ASCII char- |
1556 |
acters. However, when PCRE is built, it is possible to cause the inter- |
acters. However, when PCRE is built, it is possible to cause the inter- |
1557 |
nal tables to be rebuilt in the default "C" locale of the local system, |
nal tables to be rebuilt in the default "C" locale of the local system, |
1558 |
which may cause them to be different. |
which may cause them to be different. |
1559 |
|
|
1560 |
The internal tables can always be overridden by tables supplied by the |
The internal tables can always be overridden by tables supplied by the |
1561 |
application that calls PCRE. These may be created in a different locale |
application that calls PCRE. These may be created in a different locale |
1562 |
from the default. As more and more applications change to using Uni- |
from the default. As more and more applications change to using Uni- |
1563 |
code, the need for this locale support is expected to die away. |
code, the need for this locale support is expected to die away. |
1564 |
|
|
1565 |
External tables are built by calling the pcre_maketables() function, |
External tables are built by calling the pcre_maketables() function, |
1566 |
which has no arguments, in the relevant locale. The result can then be |
which has no arguments, in the relevant locale. The result can then be |
1567 |
passed to pcre_compile() or pcre_exec() as often as necessary. For |
passed to pcre_compile() or pcre_exec() as often as necessary. For |
1568 |
example, to build and use tables that are appropriate for the French |
example, to build and use tables that are appropriate for the French |
1569 |
locale (where accented characters with values greater than 128 are |
locale (where accented characters with values greater than 128 are |
1570 |
treated as letters), the following code could be used: |
treated as letters), the following code could be used: |
1571 |
|
|
1572 |
setlocale(LC_CTYPE, "fr_FR"); |
setlocale(LC_CTYPE, "fr_FR"); |
1573 |
tables = pcre_maketables(); |
tables = pcre_maketables(); |
1574 |
re = pcre_compile(..., tables); |
re = pcre_compile(..., tables); |
1575 |
|
|
1576 |
The locale name "fr_FR" is used on Linux and other Unix-like systems; |
The locale name "fr_FR" is used on Linux and other Unix-like systems; |
1577 |
if you are using Windows, the name for the French locale is "french". |
if you are using Windows, the name for the French locale is "french". |
1578 |
|
|
1579 |
When pcre_maketables() runs, the tables are built in memory that is |
When pcre_maketables() runs, the tables are built in memory that is |
1580 |
obtained via pcre_malloc. It is the caller's responsibility to ensure |
obtained via pcre_malloc. It is the caller's responsibility to ensure |
1581 |
that the memory containing the tables remains available for as long as |
that the memory containing the tables remains available for as long as |
1582 |
it is needed. |
it is needed. |
1583 |
|
|
1584 |
The pointer that is passed to pcre_compile() is saved with the compiled |
The pointer that is passed to pcre_compile() is saved with the compiled |
1585 |
pattern, and the same tables are used via this pointer by pcre_study() |
pattern, and the same tables are used via this pointer by pcre_study() |
1586 |
and normally also by pcre_exec(). Thus, by default, for any single pat- |
and normally also by pcre_exec(). Thus, by default, for any single pat- |
1587 |
tern, compilation, studying and matching all happen in the same locale, |
tern, compilation, studying and matching all happen in the same locale, |
1588 |
but different patterns can be compiled in different locales. |
but different patterns can be compiled in different locales. |
1589 |
|
|
1590 |
It is possible to pass a table pointer or NULL (indicating the use of |
It is possible to pass a table pointer or NULL (indicating the use of |
1591 |
the internal tables) to pcre_exec(). Although not intended for this |
the internal tables) to pcre_exec(). Although not intended for this |
1592 |
purpose, this facility could be used to match a pattern in a different |
purpose, this facility could be used to match a pattern in a different |
1593 |
locale from the one in which it was compiled. Passing table pointers at |
locale from the one in which it was compiled. Passing table pointers at |
1594 |
run time is discussed below in the section on matching a pattern. |
run time is discussed below in the section on matching a pattern. |
1595 |
|
|
1599 |
int pcre_fullinfo(const pcre *code, const pcre_extra *extra, |
int pcre_fullinfo(const pcre *code, const pcre_extra *extra, |
1600 |
int what, void *where); |
int what, void *where); |
1601 |
|
|
1602 |
The pcre_fullinfo() function returns information about a compiled pat- |
The pcre_fullinfo() function returns information about a compiled pat- |
1603 |
tern. It replaces the obsolete pcre_info() function, which is neverthe- |
tern. It replaces the obsolete pcre_info() function, which is neverthe- |
1604 |
less retained for backwards compability (and is documented below). |
less retained for backwards compability (and is documented below). |
1605 |
|
|
1606 |
The first argument for pcre_fullinfo() is a pointer to the compiled |
The first argument for pcre_fullinfo() is a pointer to the compiled |
1607 |
pattern. The second argument is the result of pcre_study(), or NULL if |
pattern. The second argument is the result of pcre_study(), or NULL if |
1608 |
the pattern was not studied. The third argument specifies which piece |
the pattern was not studied. The third argument specifies which piece |
1609 |
of information is required, and the fourth argument is a pointer to a |
of information is required, and the fourth argument is a pointer to a |
1610 |
variable to receive the data. The yield of the function is zero for |
variable to receive the data. The yield of the function is zero for |
1611 |
success, or one of the following negative numbers: |
success, or one of the following negative numbers: |
1612 |
|
|
1613 |
PCRE_ERROR_NULL the argument code was NULL |
PCRE_ERROR_NULL the argument code was NULL |
1615 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
1616 |
PCRE_ERROR_BADOPTION the value of what was invalid |
PCRE_ERROR_BADOPTION the value of what was invalid |
1617 |
|
|
1618 |
The "magic number" is placed at the start of each compiled pattern as |
The "magic number" is placed at the start of each compiled pattern as |
1619 |
an simple check against passing an arbitrary memory pointer. Here is a |
an simple check against passing an arbitrary memory pointer. Here is a |
1620 |
typical call of pcre_fullinfo(), to obtain the length of the compiled |
typical call of pcre_fullinfo(), to obtain the length of the compiled |
1621 |
pattern: |
pattern: |
1622 |
|
|
1623 |
int rc; |
int rc; |
1628 |
PCRE_INFO_SIZE, /* what is required */ |
PCRE_INFO_SIZE, /* what is required */ |
1629 |
&length); /* where to put the data */ |
&length); /* where to put the data */ |
1630 |
|
|
1631 |
The possible values for the third argument are defined in pcre.h, and |
The possible values for the third argument are defined in pcre.h, and |
1632 |
are as follows: |
are as follows: |
1633 |
|
|
1634 |
PCRE_INFO_BACKREFMAX |
PCRE_INFO_BACKREFMAX |
1635 |
|
|
1636 |
Return the number of the highest back reference in the pattern. The |
Return the number of the highest back reference in the pattern. The |
1637 |
fourth argument should point to an int variable. Zero is returned if |
fourth argument should point to an int variable. Zero is returned if |
1638 |
there are no back references. |
there are no back references. |
1639 |
|
|
1640 |
PCRE_INFO_CAPTURECOUNT |
PCRE_INFO_CAPTURECOUNT |
1641 |
|
|
1642 |
Return the number of capturing subpatterns in the pattern. The fourth |
Return the number of capturing subpatterns in the pattern. The fourth |
1643 |
argument should point to an int variable. |
argument should point to an int variable. |
1644 |
|
|
1645 |
PCRE_INFO_DEFAULT_TABLES |
PCRE_INFO_DEFAULT_TABLES |
1646 |
|
|
1647 |
Return a pointer to the internal default character tables within PCRE. |
Return a pointer to the internal default character tables within PCRE. |
1648 |
The fourth argument should point to an unsigned char * variable. This |
The fourth argument should point to an unsigned char * variable. This |
1649 |
information call is provided for internal use by the pcre_study() func- |
information call is provided for internal use by the pcre_study() func- |
1650 |
tion. External callers can cause PCRE to use its internal tables by |
tion. External callers can cause PCRE to use its internal tables by |
1651 |
passing a NULL table pointer. |
passing a NULL table pointer. |
1652 |
|
|
1653 |
PCRE_INFO_FIRSTBYTE |
PCRE_INFO_FIRSTBYTE |
1654 |
|
|
1655 |
Return information about the first byte of any matched string, for a |
Return information about the first byte of any matched string, for a |
1656 |
non-anchored pattern. The fourth argument should point to an int vari- |
non-anchored pattern. The fourth argument should point to an int vari- |
1657 |
able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name |
able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name |
1658 |
is still recognized for backwards compatibility.) |
is still recognized for backwards compatibility.) |
1659 |
|
|
1660 |
If there is a fixed first byte, for example, from a pattern such as |
If there is a fixed first byte, for example, from a pattern such as |
1661 |
(cat|cow|coyote), its value is returned. Otherwise, if either |
(cat|cow|coyote), its value is returned. Otherwise, if either |
1662 |
|
|
1663 |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every |
1664 |
branch starts with "^", or |
branch starts with "^", or |
1665 |
|
|
1666 |
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not |
(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not |
1667 |
set (if it were set, the pattern would be anchored), |
set (if it were set, the pattern would be anchored), |
1668 |
|
|
1669 |
-1 is returned, indicating that the pattern matches only at the start |
-1 is returned, indicating that the pattern matches only at the start |
1670 |
of a subject string or after any newline within the string. Otherwise |
of a subject string or after any newline within the string. Otherwise |
1671 |
-2 is returned. For anchored patterns, -2 is returned. |
-2 is returned. For anchored patterns, -2 is returned. |
1672 |
|
|
1673 |
PCRE_INFO_FIRSTTABLE |
PCRE_INFO_FIRSTTABLE |
1674 |
|
|
1675 |
If the pattern was studied, and this resulted in the construction of a |
If the pattern was studied, and this resulted in the construction of a |
1676 |
256-bit table indicating a fixed set of bytes for the first byte in any |
256-bit table indicating a fixed set of bytes for the first byte in any |
1677 |
matching string, a pointer to the table is returned. Otherwise NULL is |
matching string, a pointer to the table is returned. Otherwise NULL is |
1678 |
returned. The fourth argument should point to an unsigned char * vari- |
returned. The fourth argument should point to an unsigned char * vari- |
1679 |
able. |
able. |
1680 |
|
|
1681 |
PCRE_INFO_HASCRORLF |
PCRE_INFO_HASCRORLF |
1682 |
|
|
1683 |
Return 1 if the pattern contains any explicit matches for CR or LF |
Return 1 if the pattern contains any explicit matches for CR or LF |
1684 |
characters, otherwise 0. The fourth argument should point to an int |
characters, otherwise 0. The fourth argument should point to an int |
1685 |
variable. An explicit match is either a literal CR or LF character, or |
variable. An explicit match is either a literal CR or LF character, or |
1686 |
\r or \n. |
\r or \n. |
1687 |
|
|
1688 |
PCRE_INFO_JCHANGED |
PCRE_INFO_JCHANGED |
1689 |
|
|
1690 |
Return 1 if the (?J) or (?-J) option setting is used in the pattern, |
Return 1 if the (?J) or (?-J) option setting is used in the pattern, |
1691 |
otherwise 0. The fourth argument should point to an int variable. (?J) |
otherwise 0. The fourth argument should point to an int variable. (?J) |
1692 |
and (?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
and (?-J) set and unset the local PCRE_DUPNAMES option, respectively. |
1693 |
|
|
1694 |
PCRE_INFO_LASTLITERAL |
PCRE_INFO_LASTLITERAL |
1695 |
|
|
1696 |
Return the value of the rightmost literal byte that must exist in any |
Return the value of the rightmost literal byte that must exist in any |
1697 |
matched string, other than at its start, if such a byte has been |
matched string, other than at its start, if such a byte has been |
1698 |
recorded. The fourth argument should point to an int variable. If there |
recorded. The fourth argument should point to an int variable. If there |
1699 |
is no such byte, -1 is returned. For anchored patterns, a last literal |
is no such byte, -1 is returned. For anchored patterns, a last literal |
1700 |
byte is recorded only if it follows something of variable length. For |
byte is recorded only if it follows something of variable length. For |
1701 |
example, for the pattern /^a\d+z\d+/ the returned value is "z", but for |
example, for the pattern /^a\d+z\d+/ the returned value is "z", but for |
1702 |
/^a\dz\d/ the returned value is -1. |
/^a\dz\d/ the returned value is -1. |
1703 |
|
|
1704 |
PCRE_INFO_MINLENGTH |
PCRE_INFO_MINLENGTH |
1705 |
|
|
1706 |
If the pattern was studied and a minimum length for matching subject |
If the pattern was studied and a minimum length for matching subject |
1707 |
strings was computed, its value is returned. Otherwise the returned |
strings was computed, its value is returned. Otherwise the returned |
1708 |
value is -1. The value is a number of characters, not bytes (this may |
value is -1. The value is a number of characters, not bytes (this may |
1709 |
be relevant in UTF-8 mode). The fourth argument should point to an int |
be relevant in UTF-8 mode). The fourth argument should point to an int |
1710 |
variable. A non-negative value is a lower bound to the length of any |
variable. A non-negative value is a lower bound to the length of any |
1711 |
matching string. There may not be any strings of that length that do |
matching string. There may not be any strings of that length that do |
1712 |
actually match, but every string that does match is at least that long. |
actually match, but every string that does match is at least that long. |
1713 |
|
|
1714 |
PCRE_INFO_NAMECOUNT |
PCRE_INFO_NAMECOUNT |
1715 |
PCRE_INFO_NAMEENTRYSIZE |
PCRE_INFO_NAMEENTRYSIZE |
1716 |
PCRE_INFO_NAMETABLE |
PCRE_INFO_NAMETABLE |
1717 |
|
|
1718 |
PCRE supports the use of named as well as numbered capturing parenthe- |
PCRE supports the use of named as well as numbered capturing parenthe- |
1719 |
ses. The names are just an additional way of identifying the parenthe- |
ses. The names are just an additional way of identifying the parenthe- |
1720 |
ses, which still acquire numbers. Several convenience functions such as |
ses, which still acquire numbers. Several convenience functions such as |
1721 |
pcre_get_named_substring() are provided for extracting captured sub- |
pcre_get_named_substring() are provided for extracting captured sub- |
1722 |
strings by name. It is also possible to extract the data directly, by |
strings by name. It is also possible to extract the data directly, by |
1723 |
first converting the name to a number in order to access the correct |
first converting the name to a number in order to access the correct |
1724 |
pointers in the output vector (described with pcre_exec() below). To do |
pointers in the output vector (described with pcre_exec() below). To do |
1725 |
the conversion, you need to use the name-to-number map, which is |
the conversion, you need to use the name-to-number map, which is |
1726 |
described by these three values. |
described by these three values. |
1727 |
|
|
1728 |
The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT |
The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT |
1729 |
gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size |
gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size |
1730 |
of each entry; both of these return an int value. The entry size |
of each entry; both of these return an int value. The entry size |
1731 |
depends on the length of the longest name. PCRE_INFO_NAMETABLE returns |
depends on the length of the longest name. PCRE_INFO_NAMETABLE returns |
1732 |
a pointer to the first entry of the table (a pointer to char). The |
a pointer to the first entry of the table (a pointer to char). The |
1733 |
first two bytes of each entry are the number of the capturing parenthe- |
first two bytes of each entry are the number of the capturing parenthe- |
1734 |
sis, most significant byte first. The rest of the entry is the corre- |
sis, most significant byte first. The rest of the entry is the corre- |
1735 |
sponding name, zero terminated. |
sponding name, zero terminated. |
1736 |
|
|
1737 |
The names are in alphabetical order. Duplicate names may appear if (?| |
The names are in alphabetical order. Duplicate names may appear if (?| |
1738 |
is used to create multiple groups with the same number, as described in |
is used to create multiple groups with the same number, as described in |
1739 |
the section on duplicate subpattern numbers in the pcrepattern page. |
the section on duplicate subpattern numbers in the pcrepattern page. |
1740 |
Duplicate names for subpatterns with different numbers are permitted |
Duplicate names for subpatterns with different numbers are permitted |
1741 |
only if PCRE_DUPNAMES is set. In all cases of duplicate names, they |
only if PCRE_DUPNAMES is set. In all cases of duplicate names, they |
1742 |
appear in the table in the order in which they were found in the pat- |
appear in the table in the order in which they were found in the pat- |
1743 |
tern. In the absence of (?| this is the order of increasing number; |
tern. In the absence of (?| this is the order of increasing number; |
1744 |
when (?| is used this is not necessarily the case because later subpat- |
when (?| is used this is not necessarily the case because later subpat- |
1745 |
terns may have lower numbers. |
terns may have lower numbers. |
1746 |
|
|
1747 |
As a simple example of the name/number table, consider the following |
As a simple example of the name/number table, consider the following |
1748 |
pattern (assume PCRE_EXTENDED is set, so white space - including new- |
pattern (assume PCRE_EXTENDED is set, so white space - including new- |
1749 |
lines - is ignored): |
lines - is ignored): |
1750 |
|
|
1751 |
(?<date> (?<year>(\d\d)?\d\d) - |
(?<date> (?<year>(\d\d)?\d\d) - |
1752 |
(?<month>\d\d) - (?<day>\d\d) ) |
(?<month>\d\d) - (?<day>\d\d) ) |
1753 |
|
|
1754 |
There are four named subpatterns, so the table has four entries, and |
There are four named subpatterns, so the table has four entries, and |
1755 |
each entry in the table is eight bytes long. The table is as follows, |
each entry in the table is eight bytes long. The table is as follows, |
1756 |
with non-printing bytes shows in hexadecimal, and undefined bytes shown |
with non-printing bytes shows in hexadecimal, and undefined bytes shown |
1757 |
as ??: |
as ??: |
1758 |
|
|
1761 |
00 04 m o n t h 00 |
00 04 m o n t h 00 |
1762 |
00 02 y e a r 00 ?? |
00 02 y e a r 00 ?? |
1763 |
|
|
1764 |
When writing code to extract data from named subpatterns using the |
When writing code to extract data from named subpatterns using the |
1765 |
name-to-number map, remember that the length of the entries is likely |
name-to-number map, remember that the length of the entries is likely |
1766 |
to be different for each compiled pattern. |
to be different for each compiled pattern. |
1767 |
|
|
1768 |
PCRE_INFO_OKPARTIAL |
PCRE_INFO_OKPARTIAL |
1769 |
|
|
1770 |
Return 1 if the pattern can be used for partial matching with |
Return 1 if the pattern can be used for partial matching with |
1771 |
pcre_exec(), otherwise 0. The fourth argument should point to an int |
pcre_exec(), otherwise 0. The fourth argument should point to an int |
1772 |
variable. From release 8.00, this always returns 1, because the |
variable. From release 8.00, this always returns 1, because the |
1773 |
restrictions that previously applied to partial matching have been |
restrictions that previously applied to partial matching have been |
1774 |
lifted. The pcrepartial documentation gives details of partial match- |
lifted. The pcrepartial documentation gives details of partial match- |
1775 |
ing. |
ing. |
1776 |
|
|
1777 |
PCRE_INFO_OPTIONS |
PCRE_INFO_OPTIONS |
1778 |
|
|
1779 |
Return a copy of the options with which the pattern was compiled. The |
Return a copy of the options with which the pattern was compiled. The |
1780 |
fourth argument should point to an unsigned long int variable. These |
fourth argument should point to an unsigned long int variable. These |
1781 |
option bits are those specified in the call to pcre_compile(), modified |
option bits are those specified in the call to pcre_compile(), modified |
1782 |
by any top-level option settings at the start of the pattern itself. In |
by any top-level option settings at the start of the pattern itself. In |
1783 |
other words, they are the options that will be in force when matching |
other words, they are the options that will be in force when matching |
1784 |
starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with |
starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with |
1785 |
the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, |
the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, |
1786 |
and PCRE_EXTENDED. |
and PCRE_EXTENDED. |
1787 |
|
|
1788 |
A pattern is automatically anchored by PCRE if all of its top-level |
A pattern is automatically anchored by PCRE if all of its top-level |
1789 |
alternatives begin with one of the following: |
alternatives begin with one of the following: |
1790 |
|
|
1791 |
^ unless PCRE_MULTILINE is set |
^ unless PCRE_MULTILINE is set |
1799 |
|
|
1800 |
PCRE_INFO_SIZE |
PCRE_INFO_SIZE |
1801 |
|
|
1802 |
Return the size of the compiled pattern, that is, the value that was |
Return the size of the compiled pattern, that is, the value that was |
1803 |
passed as the argument to pcre_malloc() when PCRE was getting memory in |
passed as the argument to pcre_malloc() when PCRE was getting memory in |
1804 |
which to place the compiled data. The fourth argument should point to a |
which to place the compiled data. The fourth argument should point to a |
1805 |
size_t variable. |
size_t variable. |
1807 |
PCRE_INFO_STUDYSIZE |
PCRE_INFO_STUDYSIZE |
1808 |
|
|
1809 |
Return the size of the data block pointed to by the study_data field in |
Return the size of the data block pointed to by the study_data field in |
1810 |
a pcre_extra block. That is, it is the value that was passed to |
a pcre_extra block. That is, it is the value that was passed to |
1811 |
pcre_malloc() when PCRE was getting memory into which to place the data |
pcre_malloc() when PCRE was getting memory into which to place the data |
1812 |
created by pcre_study(). If pcre_extra is NULL, or there is no study |
created by pcre_study(). If pcre_extra is NULL, or there is no study |
1813 |
data, zero is returned. The fourth argument should point to a size_t |
data, zero is returned. The fourth argument should point to a size_t |
1814 |
variable. |
variable. |
1815 |
|
|
1816 |
|
|
1818 |
|
|
1819 |
int pcre_info(const pcre *code, int *optptr, int *firstcharptr); |
int pcre_info(const pcre *code, int *optptr, int *firstcharptr); |
1820 |
|
|
1821 |
The pcre_info() function is now obsolete because its interface is too |
The pcre_info() function is now obsolete because its interface is too |
1822 |
restrictive to return all the available data about a compiled pattern. |
restrictive to return all the available data about a compiled pattern. |
1823 |
New programs should use pcre_fullinfo() instead. The yield of |
New programs should use pcre_fullinfo() instead. The yield of |
1824 |
pcre_info() is the number of capturing subpatterns, or one of the fol- |
pcre_info() is the number of capturing subpatterns, or one of the fol- |
1825 |
lowing negative numbers: |
lowing negative numbers: |
1826 |
|
|
1827 |
PCRE_ERROR_NULL the argument code was NULL |
PCRE_ERROR_NULL the argument code was NULL |
1828 |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
PCRE_ERROR_BADMAGIC the "magic number" was not found |
1829 |
|
|
1830 |
If the optptr argument is not NULL, a copy of the options with which |
If the optptr argument is not NULL, a copy of the options with which |
1831 |
the pattern was compiled is placed in the integer it points to (see |
the pattern was compiled is placed in the integer it points to (see |
1832 |
PCRE_INFO_OPTIONS above). |
PCRE_INFO_OPTIONS above). |
1833 |
|
|
1834 |
If the pattern is not anchored and the firstcharptr argument is not |
If the pattern is not anchored and the firstcharptr argument is not |
1835 |
NULL, it is used to pass back information about the first character of |
NULL, it is used to pass back information about the first character of |
1836 |
any matched string (see PCRE_INFO_FIRSTBYTE above). |
any matched string (see PCRE_INFO_FIRSTBYTE above). |
1837 |
|
|
1838 |
|
|
1840 |
|
|
1841 |
int pcre_refcount(pcre *code, int adjust); |
int pcre_refcount(pcre *code, int adjust); |
1842 |
|
|
1843 |
The pcre_refcount() function is used to maintain a reference count in |
The pcre_refcount() function is used to maintain a reference count in |
1844 |
the data block that contains a compiled pattern. It is provided for the |
the data block that contains a compiled pattern. It is provided for the |
1845 |
benefit of applications that operate in an object-oriented manner, |
benefit of applications that operate in an object-oriented manner, |
1846 |
where different parts of the application may be using the same compiled |
where different parts of the application may be using the same compiled |
1847 |
pattern, but you want to free the block when they are all done. |
pattern, but you want to free the block when they are all done. |
1848 |
|
|
1849 |
When a pattern is compiled, the reference count field is initialized to |
When a pattern is compiled, the reference count field is initialized to |
1850 |
zero. It is changed only by calling this function, whose action is to |
zero. It is changed only by calling this function, whose action is to |
1851 |
add the adjust value (which may be positive or negative) to it. The |
add the adjust value (which may be positive or negative) to it. The |
1852 |
yield of the function is the new value. However, the value of the count |
yield of the function is the new value. However, the value of the count |
1853 |
is constrained to lie between 0 and 65535, inclusive. If the new value |
is constrained to lie between 0 and 65535, inclusive. If the new value |
1854 |
is outside these limits, it is forced to the appropriate limit value. |
is outside these limits, it is forced to the appropriate limit value. |
1855 |
|
|
1856 |
Except when it is zero, the reference count is not correctly preserved |
Except when it is zero, the reference count is not correctly preserved |
1857 |
if a pattern is compiled on one host and then transferred to a host |
if a pattern is compiled on one host and then transferred to a host |
1858 |
whose byte-order is different. (This seems a highly unlikely scenario.) |
whose byte-order is different. (This seems a highly unlikely scenario.) |
1859 |
|
|
1860 |
|
|
1864 |
const char *subject, int length, int startoffset, |
const char *subject, int length, int startoffset, |
1865 |
int options, int *ovector, int ovecsize); |
int options, int *ovector, int ovecsize); |
1866 |
|
|
1867 |
The function pcre_exec() is called to match a subject string against a |
The function pcre_exec() is called to match a subject string against a |
1868 |
compiled pattern, which is passed in the code argument. If the pattern |
compiled pattern, which is passed in the code argument. If the pattern |
1869 |
was studied, the result of the study should be passed in the extra |
was studied, the result of the study should be passed in the extra |
1870 |
argument. This function is the main matching facility of the library, |
argument. This function is the main matching facility of the library, |
1871 |
and it operates in a Perl-like manner. For specialist use there is also |
and it operates in a Perl-like manner. For specialist use there is also |
1872 |
an alternative matching function, which is described below in the sec- |
an alternative matching function, which is described below in the sec- |
1873 |
tion about the pcre_dfa_exec() function. |
tion about the pcre_dfa_exec() function. |
1874 |
|
|
1875 |
In most applications, the pattern will have been compiled (and option- |
In most applications, the pattern will have been compiled (and option- |
1876 |
ally studied) in the same process that calls pcre_exec(). However, it |
ally studied) in the same process that calls pcre_exec(). However, it |
1877 |
is possible to save compiled patterns and study data, and then use them |
is possible to save compiled patterns and study data, and then use them |
1878 |
later in different processes, possibly even on different hosts. For a |
later in different processes, possibly even on different hosts. For a |
1879 |
discussion about this, see the pcreprecompile documentation. |
discussion about this, see the pcreprecompile documentation. |
1880 |
|
|
1881 |
Here is an example of a simple call to pcre_exec(): |
Here is an example of a simple call to pcre_exec(): |
1894 |
|
|
1895 |
Extra data for pcre_exec() |
Extra data for pcre_exec() |
1896 |
|
|
1897 |
If the extra argument is not NULL, it must point to a pcre_extra data |
If the extra argument is not NULL, it must point to a pcre_extra data |
1898 |
block. The pcre_study() function returns such a block (when it doesn't |
block. The pcre_study() function returns such a block (when it doesn't |
1899 |
return NULL), but you can also create one for yourself, and pass addi- |
return NULL), but you can also create one for yourself, and pass addi- |
1900 |
tional information in it. The pcre_extra block contains the following |
tional information in it. The pcre_extra block contains the following |
1901 |
fields (not necessarily in this order): |
fields (not necessarily in this order): |
1902 |
|
|
1903 |
unsigned long int flags; |
unsigned long int flags; |
1906 |
unsigned long int match_limit_recursion; |
unsigned long int match_limit_recursion; |
1907 |
void *callout_data; |
void *callout_data; |
1908 |
const unsigned char *tables; |
const unsigned char *tables; |
1909 |
|
unsigned char **mark; |
1910 |
|
|
1911 |
The flags field is a bitmap that specifies which of the other fields |
The flags field is a bitmap that specifies which of the other fields |
1912 |
are set. The flag bits are: |
are set. The flag bits are: |
1913 |
|
|
1914 |
PCRE_EXTRA_STUDY_DATA |
PCRE_EXTRA_STUDY_DATA |
1916 |
PCRE_EXTRA_MATCH_LIMIT_RECURSION |
PCRE_EXTRA_MATCH_LIMIT_RECURSION |
1917 |
PCRE_EXTRA_CALLOUT_DATA |
PCRE_EXTRA_CALLOUT_DATA |
1918 |
PCRE_EXTRA_TABLES |
PCRE_EXTRA_TABLES |
1919 |
|
PCRE_EXTRA_MARK |
1920 |
|
|
1921 |
Other flag bits should be set to zero. The study_data field is set in |
Other flag bits should be set to zero. The study_data field is set in |
1922 |
the pcre_extra block that is returned by pcre_study(), together with |
the pcre_extra block that is returned by pcre_study(), together with |
1923 |
the appropriate flag bit. You should not set this yourself, but you may |
the appropriate flag bit. You should not set this yourself, but you may |
1924 |
add to the block by setting the other fields and their corresponding |
add to the block by setting the other fields and their corresponding |
1925 |
flag bits. |
flag bits. |
1926 |
|
|
1927 |
The match_limit field provides a means of preventing PCRE from using up |
The match_limit field provides a means of preventing PCRE from using up |
1928 |
a vast amount of resources when running patterns that are not going to |
a vast amount of resources when running patterns that are not going to |
1929 |
match, but which have a very large number of possibilities in their |
match, but which have a very large number of possibilities in their |
1930 |
search trees. The classic example is a pattern that uses nested unlim- |
search trees. The classic example is a pattern that uses nested unlim- |
1931 |
ited repeats. |
ited repeats. |
1932 |
|
|
1933 |
Internally, PCRE uses a function called match() which it calls repeat- |
Internally, PCRE uses a function called match() which it calls repeat- |
1934 |
edly (sometimes recursively). The limit set by match_limit is imposed |
edly (sometimes recursively). The limit set by match_limit is imposed |
1935 |
on the number of times this function is called during a match, which |
on the number of times this function is called during a match, which |
1936 |
has the effect of limiting the amount of backtracking that can take |
has the effect of limiting the amount of backtracking that can take |
1937 |
place. For patterns that are not anchored, the count restarts from zero |
place. For patterns that are not anchored, the count restarts from zero |
1938 |
for each position in the subject string. |
for each position in the subject string. |
1939 |
|
|
1940 |
The default value for the limit can be set when PCRE is built; the |
The default value for the limit can be set when PCRE is built; the |
1941 |
default default is 10 million, which handles all but the most extreme |
default default is 10 million, which handles all but the most extreme |
1942 |
cases. You can override the default by suppling pcre_exec() with a |
cases. You can override the default by suppling pcre_exec() with a |
1943 |
pcre_extra block in which match_limit is set, and |
pcre_extra block in which match_limit is set, and |
1944 |
PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is |
PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is |
1945 |
exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. |
exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. |
1946 |
|
|
1947 |
The match_limit_recursion field is similar to match_limit, but instead |
The match_limit_recursion field is similar to match_limit, but instead |
1948 |
of limiting the total number of times that match() is called, it limits |
of limiting the total number of times that match() is called, it limits |
1949 |
the depth of recursion. The recursion depth is a smaller number than |
the depth of recursion. The recursion depth is a smaller number than |
1950 |
the total number of calls, because not all calls to match() are recur- |
the total number of calls, because not all calls to match() are recur- |
1951 |
sive. This limit is of use only if it is set smaller than match_limit. |
sive. This limit is of use only if it is set smaller than match_limit. |
1952 |
|
|
1953 |
Limiting the recursion depth limits the amount of stack that can be |
Limiting the recursion depth limits the amount of stack that can be |
1954 |
used, or, when PCRE has been compiled to use memory on the heap instead |
used, or, when PCRE has been compiled to use memory on the heap instead |
1955 |
of the stack, the amount of heap memory that can be used. |
of the stack, the amount of heap memory that can be used. |
1956 |
|
|
1957 |
The default value for match_limit_recursion can be set when PCRE is |
The default value for match_limit_recursion can be set when PCRE is |
1958 |
built; the default default is the same value as the default for |
built; the default default is the same value as the default for |
1959 |
match_limit. You can override the default by suppling pcre_exec() with |
match_limit. You can override the default by suppling pcre_exec() with |
1960 |
a pcre_extra block in which match_limit_recursion is set, and |
a pcre_extra block in which match_limit_recursion is set, and |
1961 |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the |
PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the |
1962 |
limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. |
limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. |
1963 |
|
|
1964 |
The callout_data field is used in conjunction with the "callout" fea- |
The callout_data field is used in conjunction with the "callout" fea- |
1965 |
ture, and is described in the pcrecallout documentation. |
ture, and is described in the pcrecallout documentation. |
1966 |
|
|
1967 |
The tables field is used to pass a character tables pointer to |
The tables field is used to pass a character tables pointer to |
1968 |
pcre_exec(); this overrides the value that is stored with the compiled |
pcre_exec(); this overrides the value that is stored with the compiled |
1969 |
pattern. A non-NULL value is stored with the compiled pattern only if |
pattern. A non-NULL value is stored with the compiled pattern only if |
1970 |
custom tables were supplied to pcre_compile() via its tableptr argu- |
custom tables were supplied to pcre_compile() via its tableptr argu- |
1971 |
ment. If NULL is passed to pcre_exec() using this mechanism, it forces |
ment. If NULL is passed to pcre_exec() using this mechanism, it forces |
1972 |
PCRE's internal tables to be used. This facility is helpful when re- |
PCRE's internal tables to be used. This facility is helpful when re- |
1973 |
using patterns that have been saved after compiling with an external |
using patterns that have been saved after compiling with an external |
1974 |
set of tables, because the external tables might be at a different |
set of tables, because the external tables might be at a different |
1975 |
address when pcre_exec() is called. See the pcreprecompile documenta- |
address when pcre_exec() is called. See the pcreprecompile documenta- |
1976 |
tion for a discussion of saving compiled patterns for later use. |
tion for a discussion of saving compiled patterns for later use. |
1977 |
|
|
1978 |
|
If PCRE_EXTRA_MARK is set in the flags field, the mark field must be |
1979 |
|
set to point to a char * variable. If the pattern contains any back- |
1980 |
|
tracking control verbs such as (*MARK:NAME), and the execution ends up |
1981 |
|
with a name to pass back, a pointer to the name string (zero termi- |
1982 |
|
nated) is placed in the variable pointed to by the mark field. The |
1983 |
|
names are within the compiled pattern; if you wish to retain such a |
1984 |
|
name you must copy it before freeing the memory of a compiled pattern. |
1985 |
|
If there is no name to pass back, the variable pointed to by the mark |
1986 |
|
field set to NULL. For details of the backtracking control verbs, see |
1987 |
|
the section entitled "Backtracking control" in the pcrepattern documen- |
1988 |
|
tation. |
1989 |
|
|
1990 |
Option bits for pcre_exec() |
Option bits for pcre_exec() |
1991 |
|
|
1992 |
The unused bits of the options argument for pcre_exec() must be zero. |
The unused bits of the options argument for pcre_exec() must be zero. |
1993 |
The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, |
The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, |
1994 |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, |
1995 |
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and |
PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and |
1996 |
PCRE_PARTIAL_HARD. |
PCRE_PARTIAL_HARD. |
1997 |
|
|
1998 |
PCRE_ANCHORED |
PCRE_ANCHORED |
1999 |
|
|
2000 |
The PCRE_ANCHORED option limits pcre_exec() to matching at the first |
The PCRE_ANCHORED option limits pcre_exec() to matching at the first |
2001 |
matching position. If a pattern was compiled with PCRE_ANCHORED, or |
matching position. If a pattern was compiled with PCRE_ANCHORED, or |
2002 |
turned out to be anchored by virtue of its contents, it cannot be made |
turned out to be anchored by virtue of its contents, it cannot be made |
2003 |
unachored at matching time. |
unachored at matching time. |
2004 |
|
|
2005 |
PCRE_BSR_ANYCRLF |
PCRE_BSR_ANYCRLF |
2006 |
PCRE_BSR_UNICODE |
PCRE_BSR_UNICODE |
2007 |
|
|
2008 |
These options (which are mutually exclusive) control what the \R escape |
These options (which are mutually exclusive) control what the \R escape |
2009 |
sequence matches. The choice is either to match only CR, LF, or CRLF, |
sequence matches. The choice is either to match only CR, LF, or CRLF, |
2010 |
or to match any Unicode newline sequence. These options override the |
or to match any Unicode newline sequence. These options override the |
2011 |
choice that was made or defaulted when the pattern was compiled. |
choice that was made or defaulted when the pattern was compiled. |
2012 |
|
|
2013 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
2016 |
PCRE_NEWLINE_ANYCRLF |
PCRE_NEWLINE_ANYCRLF |
2017 |
PCRE_NEWLINE_ANY |
PCRE_NEWLINE_ANY |
2018 |
|
|
2019 |
These options override the newline definition that was chosen or |
These options override the newline definition that was chosen or |
2020 |
defaulted when the pattern was compiled. For details, see the descrip- |
defaulted when the pattern was compiled. For details, see the descrip- |
2021 |
tion of pcre_compile() above. During matching, the newline choice |
tion of pcre_compile() above. During matching, the newline choice |
2022 |
affects the behaviour of the dot, circumflex, and dollar metacharac- |
affects the behaviour of the dot, circumflex, and dollar metacharac- |
2023 |
ters. It may also alter the way the match position is advanced after a |
ters. It may also alter the way the match position is advanced after a |
2024 |
match failure for an unanchored pattern. |
match failure for an unanchored pattern. |
2025 |
|
|
2026 |
When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is |
2027 |
set, and a match attempt for an unanchored pattern fails when the cur- |
set, and a match attempt for an unanchored pattern fails when the cur- |
2028 |
rent position is at a CRLF sequence, and the pattern contains no |
rent position is at a CRLF sequence, and the pattern contains no |
2029 |
explicit matches for CR or LF characters, the match position is |
explicit matches for CR or LF characters, the match position is |
2030 |
advanced by two characters instead of one, in other words, to after the |
advanced by two characters instead of one, in other words, to after the |
2031 |
CRLF. |
CRLF. |
2032 |
|
|
2033 |
The above rule is a compromise that makes the most common cases work as |
The above rule is a compromise that makes the most common cases work as |
2034 |
expected. For example, if the pattern is .+A (and the PCRE_DOTALL |
expected. For example, if the pattern is .+A (and the PCRE_DOTALL |
2035 |
option is not set), it does not match the string "\r\nA" because, after |
option is not set), it does not match the string "\r\nA" because, after |
2036 |
failing at the start, it skips both the CR and the LF before retrying. |
failing at the start, it skips both the CR and the LF before retrying. |
2037 |
However, the pattern [\r\n]A does match that string, because it con- |
However, the pattern [\r\n]A does match that string, because it con- |
2038 |
tains an explicit CR or LF reference, and so advances only by one char- |
tains an explicit CR or LF reference, and so advances only by one char- |
2039 |
acter after the first failure. |
acter after the first failure. |
2040 |
|
|
2041 |
An explicit match for CR of LF is either a literal appearance of one of |
An explicit match for CR of LF is either a literal appearance of one of |
2042 |
those characters, or one of the \r or \n escape sequences. Implicit |
those characters, or one of the \r or \n escape sequences. Implicit |
2043 |
matches such as [^X] do not count, nor does \s (which includes CR and |
matches such as [^X] do not count, nor does \s (which includes CR and |
2044 |
LF in the characters that it matches). |
LF in the characters that it matches). |
2045 |
|
|
2046 |
Notwithstanding the above, anomalous effects may still occur when CRLF |
Notwithstanding the above, anomalous effects may still occur when CRLF |
2047 |
is a valid newline sequence and explicit \r or \n escapes appear in the |
is a valid newline sequence and explicit \r or \n escapes appear in the |
2048 |
pattern. |
pattern. |
2049 |
|
|
2050 |
PCRE_NOTBOL |
PCRE_NOTBOL |
2051 |
|
|
2052 |
This option specifies that first character of the subject string is not |
This option specifies that first character of the subject string is not |
2053 |
the beginning of a line, so the circumflex metacharacter should not |
the beginning of a line, so the circumflex metacharacter should not |
2054 |
match before it. Setting this without PCRE_MULTILINE (at compile time) |
match before it. Setting this without PCRE_MULTILINE (at compile time) |
2055 |
causes circumflex never to match. This option affects only the behav- |
causes circumflex never to match. This option affects only the behav- |
2056 |
iour of the circumflex metacharacter. It does not affect \A. |
iour of the circumflex metacharacter. It does not affect \A. |
2057 |
|
|
2058 |
PCRE_NOTEOL |
PCRE_NOTEOL |
2059 |
|
|
2060 |
This option specifies that the end of the subject string is not the end |
This option specifies that the end of the subject string is not the end |
2061 |
of a line, so the dollar metacharacter should not match it nor (except |
of a line, so the dollar metacharacter should not match it nor (except |
2062 |
in multiline mode) a newline immediately before it. Setting this with- |
in multiline mode) a newline immediately before it. Setting this with- |
2063 |
out PCRE_MULTILINE (at compile time) causes dollar never to match. This |
out PCRE_MULTILINE (at compile time) causes dollar never to match. This |
2064 |
option affects only the behaviour of the dollar metacharacter. It does |
option affects only the behaviour of the dollar metacharacter. It does |
2065 |
not affect \Z or \z. |
not affect \Z or \z. |
2066 |
|
|
2067 |
PCRE_NOTEMPTY |
PCRE_NOTEMPTY |
2068 |
|
|
2069 |
An empty string is not considered to be a valid match if this option is |
An empty string is not considered to be a valid match if this option is |
2070 |
set. If there are alternatives in the pattern, they are tried. If all |
set. If there are alternatives in the pattern, they are tried. If all |
2071 |
the alternatives match the empty string, the entire match fails. For |
the alternatives match the empty string, the entire match fails. For |
2072 |
example, if the pattern |
example, if the pattern |
2073 |
|
|
2074 |
a?b? |
a?b? |
2075 |
|
|
2076 |
is applied to a string not beginning with "a" or "b", it matches an |
is applied to a string not beginning with "a" or "b", it matches an |
2077 |
empty string at the start of the subject. With PCRE_NOTEMPTY set, this |
empty string at the start of the subject. With PCRE_NOTEMPTY set, this |
2078 |
match is not valid, so PCRE searches further into the string for occur- |
match is not valid, so PCRE searches further into the string for occur- |
2079 |
rences of "a" or "b". |
rences of "a" or "b". |
2080 |
|
|
2081 |
PCRE_NOTEMPTY_ATSTART |
PCRE_NOTEMPTY_ATSTART |
2082 |
|
|
2083 |
This is like PCRE_NOTEMPTY, except that an empty string match that is |
This is like PCRE_NOTEMPTY, except that an empty string match that is |
2084 |
not at the start of the subject is permitted. If the pattern is |
not at the start of the subject is permitted. If the pattern is |
2085 |
anchored, such a match can occur only if the pattern contains \K. |
anchored, such a match can occur only if the pattern contains \K. |
2086 |
|
|
2087 |
Perl has no direct equivalent of PCRE_NOTEMPTY or |
Perl has no direct equivalent of PCRE_NOTEMPTY or |
2088 |
PCRE_NOTEMPTY_ATSTART, but it does make a special case of a pattern |
PCRE_NOTEMPTY_ATSTART, but it does make a special case of a pattern |
2089 |
match of the empty string within its split() function, and when using |
match of the empty string within its split() function, and when using |
2090 |
the /g modifier. It is possible to emulate Perl's behaviour after |
the /g modifier. It is possible to emulate Perl's behaviour after |
2091 |
matching a null string by first trying the match again at the same off- |
matching a null string by first trying the match again at the same off- |
2092 |
set with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then if that |
set with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then if that |
2093 |
fails, by advancing the starting offset (see below) and trying an ordi- |
fails, by advancing the starting offset (see below) and trying an ordi- |
2094 |
nary match again. There is some code that demonstrates how to do this |
nary match again. There is some code that demonstrates how to do this |
2095 |
in the pcredemo sample program. |
in the pcredemo sample program. |
2096 |
|
|
2097 |
PCRE_NO_START_OPTIMIZE |
PCRE_NO_START_OPTIMIZE |
2098 |
|
|
2099 |
There are a number of optimizations that pcre_exec() uses at the start |
There are a number of optimizations that pcre_exec() uses at the start |
2100 |
of a match, in order to speed up the process. For example, if it is |
of a match, in order to speed up the process. For example, if it is |
2101 |
known that a match must start with a specific character, it searches |
known that a match must start with a specific character, it searches |
2102 |
the subject for that character, and fails immediately if it cannot find |
the subject for that character, and fails immediately if it cannot find |
2103 |
it, without actually running the main matching function. When callouts |
it, without actually running the main matching function. When callouts |
2104 |
are in use, these optimizations can cause them to be skipped. This |
are in use, these optimizations can cause them to be skipped. This |
2105 |
option disables the "start-up" optimizations, causing performance to |
option disables the "start-up" optimizations, causing performance to |
2106 |
suffer, but ensuring that the callouts do occur. |
suffer, but ensuring that the callouts do occur. |
2107 |
|
|
2108 |
PCRE_NO_UTF8_CHECK |
PCRE_NO_UTF8_CHECK |
2109 |
|
|
2110 |
When PCRE_UTF8 is set at compile time, the validity of the subject as a |
When PCRE_UTF8 is set at compile time, the validity of the subject as a |
2111 |
UTF-8 string is automatically checked when pcre_exec() is subsequently |
UTF-8 string is automatically checked when pcre_exec() is subsequently |
2112 |
called. The value of startoffset is also checked to ensure that it |
called. The value of startoffset is also checked to ensure that it |
2113 |
points to the start of a UTF-8 character. There is a discussion about |
points to the start of a UTF-8 character. There is a discussion about |
2114 |
the validity of UTF-8 strings in the section on UTF-8 support in the |
the validity of UTF-8 strings in the section on UTF-8 support in the |
2115 |
main pcre page. If an invalid UTF-8 sequence of bytes is found, |
main pcre page. If an invalid UTF-8 sequence of bytes is found, |
2116 |
pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con- |
pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con- |
2117 |
tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. |
2118 |
|
|
2119 |
If you already know that your subject is valid, and you want to skip |
If you already know that your subject is valid, and you want to skip |
2120 |
these checks for performance reasons, you can set the |
these checks for performance reasons, you can set the |
2121 |
PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to |
PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to |
2122 |
do this for the second and subsequent calls to pcre_exec() if you are |
do this for the second and subsequent calls to pcre_exec() if you are |
2123 |
making repeated calls to find all the matches in a single subject |
making repeated calls to find all the matches in a single subject |
2124 |
string. However, you should be sure that the value of startoffset |
string. However, you should be sure that the value of startoffset |
2125 |
points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is |
points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is |
2126 |
set, the effect of passing an invalid UTF-8 string as a subject, or a |
set, the effect of passing an invalid UTF-8 string as a subject, or a |
2127 |
value of startoffset that does not point to the start of a UTF-8 char- |
value of startoffset that does not point to the start of a UTF-8 char- |
2128 |
acter, is undefined. Your program may crash. |
acter, is undefined. Your program may crash. |
2129 |
|
|
2130 |
PCRE_PARTIAL_HARD |
PCRE_PARTIAL_HARD |
2131 |
PCRE_PARTIAL_SOFT |
PCRE_PARTIAL_SOFT |
2132 |
|
|
2133 |
These options turn on the partial matching feature. For backwards com- |
These options turn on the partial matching feature. For backwards com- |
2134 |
patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial |
patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial |
2135 |
match occurs if the end of the subject string is reached successfully, |
match occurs if the end of the subject string is reached successfully, |
2136 |
but there are not enough subject characters to complete the match. If |
but there are not enough subject characters to complete the match. If |
2137 |
this happens when PCRE_PARTIAL_HARD is set, pcre_exec() immediately |
this happens when PCRE_PARTIAL_HARD is set, pcre_exec() immediately |
2138 |
returns PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, |
returns PCRE_ERROR_PARTIAL. Otherwise, if PCRE_PARTIAL_SOFT is set, |
2139 |
matching continues by testing any other alternatives. Only if they all |
matching continues by testing any other alternatives. Only if they all |
2140 |
fail is PCRE_ERROR_PARTIAL returned (instead of PCRE_ERROR_NOMATCH). |
fail is PCRE_ERROR_PARTIAL returned (instead of PCRE_ERROR_NOMATCH). |
2141 |
The portion of the string that was inspected when the partial match was |
The portion of the string that was inspected when the partial match was |
2142 |
found is set as the first matching string. There is a more detailed |
found is set as the first matching string. There is a more detailed |
2143 |
discussion in the pcrepartial documentation. |
discussion in the pcrepartial documentation. |
2144 |
|
|
2145 |
The string to be matched by pcre_exec() |
The string to be matched by pcre_exec() |
2146 |
|
|
2147 |
The subject string is passed to pcre_exec() as a pointer in subject, a |
The subject string is passed to pcre_exec() as a pointer in subject, a |
2148 |
length (in bytes) in length, and a starting byte offset in startoffset. |
length (in bytes) in length, and a starting byte offset in startoffset. |
2149 |
In UTF-8 mode, the byte offset must point to the start of a UTF-8 char- |
In UTF-8 mode, the byte offset must point to the start of a UTF-8 char- |
2150 |
acter. Unlike the pattern string, the subject may contain binary zero |
acter. Unlike the pattern string, the subject may contain binary zero |
2151 |
bytes. When the starting offset is zero, the search for a match starts |
bytes. When the starting offset is zero, the search for a match starts |
2152 |
at the beginning of the subject, and this is by far the most common |
at the beginning of the subject, and this is by far the most common |
2153 |
case. |
case. |
2154 |
|
|
2155 |
A non-zero starting offset is useful when searching for another match |
A non-zero starting offset is useful when searching for another match |
2156 |
in the same subject by calling pcre_exec() again after a previous suc- |
in the same subject by calling pcre_exec() again after a previous suc- |
2157 |
cess. Setting startoffset differs from just passing over a shortened |
cess. Setting startoffset differs from just passing over a shortened |
2158 |
string and setting PCRE_NOTBOL in the case of a pattern that begins |
string and setting PCRE_NOTBOL in the case of a pattern that begins |
2159 |
with any kind of lookbehind. For example, consider the pattern |
with any kind of lookbehind. For example, consider the pattern |
2160 |
|
|
2161 |
\Biss\B |
\Biss\B |
2162 |
|
|
2163 |
which finds occurrences of "iss" in the middle of words. (\B matches |
which finds occurrences of "iss" in the middle of words. (\B matches |
2164 |
only if the current position in the subject is not a word boundary.) |
only if the current position in the subject is not a word boundary.) |
2165 |
When applied to the string "Mississipi" the first call to pcre_exec() |
When applied to the string "Mississipi" the first call to pcre_exec() |
2166 |
finds the first occurrence. If pcre_exec() is called again with just |
finds the first occurrence. If pcre_exec() is called again with just |
2167 |
the remainder of the subject, namely "issipi", it does not match, |
the remainder of the subject, namely "issipi", it does not match, |
2168 |
because \B is always false at the start of the subject, which is deemed |
because \B is always false at the start of the subject, which is deemed |
2169 |
to be a word boundary. However, if pcre_exec() is passed the entire |
to be a word boundary. However, if pcre_exec() is passed the entire |
2170 |
string again, but with startoffset set to 4, it finds the second occur- |
string again, but with startoffset set to 4, it finds the second occur- |
2171 |
rence of "iss" because it is able to look behind the starting point to |
rence of "iss" because it is able to look behind the starting point to |
2172 |
discover that it is preceded by a letter. |
discover that it is preceded by a letter. |
2173 |
|
|
2174 |
If a non-zero starting offset is passed when the pattern is anchored, |
If a non-zero starting offset is passed when the pattern is anchored, |
2175 |
one attempt to match at the given offset is made. This can only succeed |
one attempt to match at the given offset is made. This can only succeed |
2176 |
if the pattern does not require the match to be at the start of the |
if the pattern does not require the match to be at the start of the |
2177 |
subject. |
subject. |
2178 |
|
|
2179 |
How pcre_exec() returns captured substrings |
How pcre_exec() returns captured substrings |
2180 |
|
|
2181 |
In general, a pattern matches a certain portion of the subject, and in |
In general, a pattern matches a certain portion of the subject, and in |
2182 |
addition, further substrings from the subject may be picked out by |
addition, further substrings from the subject may be picked out by |
2183 |
parts of the pattern. Following the usage in Jeffrey Friedl's book, |
parts of the pattern. Following the usage in Jeffrey Friedl's book, |
2184 |
this is called "capturing" in what follows, and the phrase "capturing |
this is called "capturing" in what follows, and the phrase "capturing |
2185 |
subpattern" is used for a fragment of a pattern that picks out a sub- |
subpattern" is used for a fragment of a pattern that picks out a sub- |
2186 |
string. PCRE supports several other kinds of parenthesized subpattern |
string. PCRE supports several other kinds of parenthesized subpattern |
2187 |
that do not cause substrings to be captured. |
that do not cause substrings to be captured. |
2188 |
|
|
2189 |
Captured substrings are returned to the caller via a vector of integers |
Captured substrings are returned to the caller via a vector of integers |
2190 |
whose address is passed in ovector. The number of elements in the vec- |
whose address is passed in ovector. The number of elements in the vec- |
2191 |
tor is passed in ovecsize, which must be a non-negative number. Note: |
tor is passed in ovecsize, which must be a non-negative number. Note: |
2192 |
this argument is NOT the size of ovector in bytes. |
this argument is NOT the size of ovector in bytes. |
2193 |
|
|
2194 |
The first two-thirds of the vector is used to pass back captured sub- |
The first two-thirds of the vector is used to pass back captured sub- |
2195 |
strings, each substring using a pair of integers. The remaining third |
strings, each substring using a pair of integers. The remaining third |
2196 |
of the vector is used as workspace by pcre_exec() while matching cap- |
of the vector is used as workspace by pcre_exec() while matching cap- |
2197 |
turing subpatterns, and is not available for passing back information. |
turing subpatterns, and is not available for passing back information. |
2198 |
The number passed in ovecsize should always be a multiple of three. If |
The number passed in ovecsize should always be a multiple of three. If |
2199 |
it is not, it is rounded down. |
it is not, it is rounded down. |
2200 |
|
|
2201 |
When a match is successful, information about captured substrings is |
When a match is successful, information about captured substrings is |
2202 |
returned in pairs of integers, starting at the beginning of ovector, |
returned in pairs of integers, starting at the beginning of ovector, |
2203 |
and continuing up to two-thirds of its length at the most. The first |
and continuing up to two-thirds of its length at the most. The first |
2204 |
element of each pair is set to the byte offset of the first character |
element of each pair is set to the byte offset of the first character |
2205 |
in a substring, and the second is set to the byte offset of the first |
in a substring, and the second is set to the byte offset of the first |
2206 |
character after the end of a substring. Note: these values are always |
character after the end of a substring. Note: these values are always |
2207 |
byte offsets, even in UTF-8 mode. They are not character counts. |
byte offsets, even in UTF-8 mode. They are not character counts. |
2208 |
|
|
2209 |
The first pair of integers, ovector[0] and ovector[1], identify the |
The first pair of integers, ovector[0] and ovector[1], identify the |
2210 |
portion of the subject string matched by the entire pattern. The next |
portion of the subject string matched by the entire pattern. The next |
2211 |
pair is used for the first capturing subpattern, and so on. The value |
pair is used for the first capturing subpattern, and so on. The value |
2212 |
returned by pcre_exec() is one more than the highest numbered pair that |
returned by pcre_exec() is one more than the highest numbered pair that |
2213 |
has been set. For example, if two substrings have been captured, the |
has been set. For example, if two substrings have been captured, the |
2214 |
returned value is 3. If there are no capturing subpatterns, the return |
returned value is 3. If there are no capturing subpatterns, the return |
2215 |
value from a successful match is 1, indicating that just the first pair |
value from a successful match is 1, indicating that just the first pair |
2216 |
of offsets has been set. |
of offsets has been set. |
2217 |
|
|
2218 |
If a capturing subpattern is matched repeatedly, it is the last portion |
If a capturing subpattern is matched repeatedly, it is the last portion |
2219 |
of the string that it matched that is returned. |
of the string that it matched that is returned. |
2220 |
|
|
2221 |
If the vector is too small to hold all the captured substring offsets, |
If the vector is too small to hold all the captured substring offsets, |
2222 |
it is used as far as possible (up to two-thirds of its length), and the |
it is used as far as possible (up to two-thirds of its length), and the |
2223 |
function returns a value of zero. If the substring offsets are not of |
function returns a value of zero. If the substring offsets are not of |
2224 |
interest, pcre_exec() may be called with ovector passed as NULL and |
interest, pcre_exec() may be called with ovector passed as NULL and |
2225 |
ovecsize as zero. However, if the pattern contains back references and |
ovecsize as zero. However, if the pattern contains back references and |
2226 |
the ovector is not big enough to remember the related substrings, PCRE |
the ovector is not big enough to remember the related substrings, PCRE |
2227 |
has to get additional memory for use during matching. Thus it is usu- |
has to get additional memory for use during matching. Thus it is usu- |
2228 |
ally advisable to supply an ovector. |
ally advisable to supply an ovector. |
2229 |
|
|
2230 |
The pcre_fullinfo() function can be used to find out how many capturing |
The pcre_fullinfo() function can be used to find out how many capturing |
2231 |
subpatterns there are in a compiled pattern. The smallest size for |
subpatterns there are in a compiled pattern. The smallest size for |
2232 |
ovector that will allow for n captured substrings, in addition to the |
ovector that will allow for n captured substrings, in addition to the |
2233 |
offsets of the substring matched by the whole pattern, is (n+1)*3. |
offsets of the substring matched by the whole pattern, is (n+1)*3. |
2234 |
|
|
2235 |
It is possible for capturing subpattern number n+1 to match some part |
It is possible for capturing subpattern number n+1 to match some part |
2236 |
of the subject when subpattern n has not been used at all. For example, |
of the subject when subpattern n has not been used at all. For example, |
2237 |
if the string "abc" is matched against the pattern (a|(z))(bc) the |
if the string "abc" is matched against the pattern (a|(z))(bc) the |
2238 |
return from the function is 4, and subpatterns 1 and 3 are matched, but |
return from the function is 4, and subpatterns 1 and 3 are matched, but |
2239 |
2 is not. When this happens, both values in the offset pairs corre- |
2 is not. When this happens, both values in the offset pairs corre- |
2240 |
sponding to unused subpatterns are set to -1. |
sponding to unused subpatterns are set to -1. |
2241 |
|
|
2242 |
Offset values that correspond to unused subpatterns at the end of the |
Offset values that correspond to unused subpatterns at the end of the |
2243 |
expression are also set to -1. For example, if the string "abc" is |
expression are also set to -1. For example, if the string "abc" is |
2244 |
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not |
matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not |
2245 |
matched. The return from the function is 2, because the highest used |
matched. The return from the function is 2, because the highest used |
2246 |
capturing subpattern number is 1. However, you can refer to the offsets |
capturing subpattern number is 1. However, you can refer to the offsets |
2247 |
for the second and third capturing subpatterns if you wish (assuming |
for the second and third capturing subpatterns if you wish (assuming |
2248 |
the vector is large enough, of course). |
the vector is large enough, of course). |
2249 |
|
|
2250 |
Some convenience functions are provided for extracting the captured |
Some convenience functions are provided for extracting the captured |
2251 |
substrings as separate strings. These are described below. |
substrings as separate strings. These are described below. |
2252 |
|
|
2253 |
Error return values from pcre_exec() |
Error return values from pcre_exec() |
2254 |
|
|
2255 |
If pcre_exec() fails, it returns a negative number. The following are |
If pcre_exec() fails, it returns a negative number. The following are |
2256 |
defined in the header file: |
defined in the header file: |
2257 |
|
|
2258 |
PCRE_ERROR_NOMATCH (-1) |
PCRE_ERROR_NOMATCH (-1) |
2261 |
|
|
2262 |
PCRE_ERROR_NULL (-2) |
PCRE_ERROR_NULL (-2) |
2263 |
|
|
2264 |
Either code or subject was passed as NULL, or ovector was NULL and |
Either code or subject was passed as NULL, or ovector was NULL and |
2265 |
ovecsize was not zero. |
ovecsize was not zero. |
2266 |
|
|
2267 |
PCRE_ERROR_BADOPTION (-3) |
PCRE_ERROR_BADOPTION (-3) |
2270 |
|
|
2271 |
PCRE_ERROR_BADMAGIC (-4) |
PCRE_ERROR_BADMAGIC (-4) |
2272 |
|
|
2273 |
PCRE stores a 4-byte "magic number" at the start of the compiled code, |
PCRE stores a 4-byte "magic number" at the start of the compiled code, |
2274 |
to catch the case when it is passed a junk pointer and to detect when a |
to catch the case when it is passed a junk pointer and to detect when a |
2275 |
pattern that was compiled in an environment of one endianness is run in |
pattern that was compiled in an environment of one endianness is run in |
2276 |
an environment with the other endianness. This is the error that PCRE |
an environment with the other endianness. This is the error that PCRE |
2277 |
gives when the magic number is not present. |
gives when the magic number is not present. |
2278 |
|
|
2279 |
PCRE_ERROR_UNKNOWN_OPCODE (-5) |
PCRE_ERROR_UNKNOWN_OPCODE (-5) |
2280 |
|
|
2281 |
While running the pattern match, an unknown item was encountered in the |
While running the pattern match, an unknown item was encountered in the |
2282 |
compiled pattern. This error could be caused by a bug in PCRE or by |
compiled pattern. This error could be caused by a bug in PCRE or by |
2283 |
overwriting of the compiled pattern. |
overwriting of the compiled pattern. |
2284 |
|
|
2285 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2286 |
|
|
2287 |
If a pattern contains back references, but the ovector that is passed |
If a pattern contains back references, but the ovector that is passed |
2288 |
to pcre_exec() is not big enough to remember the referenced substrings, |
to pcre_exec() is not big enough to remember the referenced substrings, |
2289 |
PCRE gets a block of memory at the start of matching to use for this |
PCRE gets a block of memory at the start of matching to use for this |
2290 |
purpose. If the call via pcre_malloc() fails, this error is given. The |
purpose. If the call via pcre_malloc() fails, this error is given. The |
2291 |
memory is automatically freed at the end of matching. |
memory is automatically freed at the end of matching. |
2292 |
|
|
2293 |
PCRE_ERROR_NOSUBSTRING (-7) |
PCRE_ERROR_NOSUBSTRING (-7) |
2294 |
|
|
2295 |
This error is used by the pcre_copy_substring(), pcre_get_substring(), |
This error is used by the pcre_copy_substring(), pcre_get_substring(), |
2296 |
and pcre_get_substring_list() functions (see below). It is never |
and pcre_get_substring_list() functions (see below). It is never |
2297 |
returned by pcre_exec(). |
returned by pcre_exec(). |
2298 |
|
|
2299 |
PCRE_ERROR_MATCHLIMIT (-8) |
PCRE_ERROR_MATCHLIMIT (-8) |
2300 |
|
|
2301 |
The backtracking limit, as specified by the match_limit field in a |
The backtracking limit, as specified by the match_limit field in a |
2302 |
pcre_extra structure (or defaulted) was reached. See the description |
pcre_extra structure (or defaulted) was reached. See the description |
2303 |
above. |
above. |
2304 |
|
|
2305 |
PCRE_ERROR_CALLOUT (-9) |
PCRE_ERROR_CALLOUT (-9) |
2306 |
|
|
2307 |
This error is never generated by pcre_exec() itself. It is provided for |
This error is never generated by pcre_exec() itself. It is provided for |
2308 |
use by callout functions that want to yield a distinctive error code. |
use by callout functions that want to yield a distinctive error code. |
2309 |
See the pcrecallout documentation for details. |
See the pcrecallout documentation for details. |
2310 |
|
|
2311 |
PCRE_ERROR_BADUTF8 (-10) |
PCRE_ERROR_BADUTF8 (-10) |
2312 |
|
|
2313 |
A string that contains an invalid UTF-8 byte sequence was passed as a |
A string that contains an invalid UTF-8 byte sequence was passed as a |
2314 |
subject. |
subject. |
2315 |
|
|
2316 |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
PCRE_ERROR_BADUTF8_OFFSET (-11) |
2317 |
|
|
2318 |
The UTF-8 byte sequence that was passed as a subject was valid, but the |
The UTF-8 byte sequence that was passed as a subject was valid, but the |
2319 |
value of startoffset did not point to the beginning of a UTF-8 charac- |
value of startoffset did not point to the beginning of a UTF-8 charac- |
2320 |
ter. |
ter. |
2321 |
|
|
2322 |
PCRE_ERROR_PARTIAL (-12) |
PCRE_ERROR_PARTIAL (-12) |
2323 |
|
|
2324 |
The subject string did not match, but it did match partially. See the |
The subject string did not match, but it did match partially. See the |
2325 |
pcrepartial documentation for details of partial matching. |
pcrepartial documentation for details of partial matching. |
2326 |
|
|
2327 |
PCRE_ERROR_BADPARTIAL (-13) |
PCRE_ERROR_BADPARTIAL (-13) |
2328 |
|
|
2329 |
This code is no longer in use. It was formerly returned when the |
This code is no longer in use. It was formerly returned when the |
2330 |
PCRE_PARTIAL option was used with a compiled pattern containing items |
PCRE_PARTIAL option was used with a compiled pattern containing items |
2331 |
that were not supported for partial matching. From release 8.00 |
that were not supported for partial matching. From release 8.00 |
2332 |
onwards, there are no restrictions on partial matching. |
onwards, there are no restrictions on partial matching. |
2333 |
|
|
2334 |
PCRE_ERROR_INTERNAL (-14) |
PCRE_ERROR_INTERNAL (-14) |
2335 |
|
|
2336 |
An unexpected internal error has occurred. This error could be caused |
An unexpected internal error has occurred. This error could be caused |
2337 |
by a bug in PCRE or by overwriting of the compiled pattern. |
by a bug in PCRE or by overwriting of the compiled pattern. |
2338 |
|
|
2339 |
PCRE_ERROR_BADCOUNT (-15) |
PCRE_ERROR_BADCOUNT (-15) |
2343 |
PCRE_ERROR_RECURSIONLIMIT (-21) |
PCRE_ERROR_RECURSIONLIMIT (-21) |
2344 |
|
|
2345 |
The internal recursion limit, as specified by the match_limit_recursion |
The internal recursion limit, as specified by the match_limit_recursion |
2346 |
field in a pcre_extra structure (or defaulted) was reached. See the |
field in a pcre_extra structure (or defaulted) was reached. See the |
2347 |
description above. |
description above. |
2348 |
|
|
2349 |
PCRE_ERROR_BADNEWLINE (-23) |
PCRE_ERROR_BADNEWLINE (-23) |
2366 |
int pcre_get_substring_list(const char *subject, |
int pcre_get_substring_list(const char *subject, |
2367 |
int *ovector, int stringcount, const char ***listptr); |
int *ovector, int stringcount, const char ***listptr); |
2368 |
|
|
2369 |
Captured substrings can be accessed directly by using the offsets |
Captured substrings can be accessed directly by using the offsets |
2370 |
returned by pcre_exec() in ovector. For convenience, the functions |
returned by pcre_exec() in ovector. For convenience, the functions |
2371 |
pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- |
pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- |
2372 |
string_list() are provided for extracting captured substrings as new, |
string_list() are provided for extracting captured substrings as new, |
2373 |
separate, zero-terminated strings. These functions identify substrings |
separate, zero-terminated strings. These functions identify substrings |
2374 |
by number. The next section describes functions for extracting named |
by number. The next section describes functions for extracting named |
2375 |
substrings. |
substrings. |
2376 |
|
|
2377 |
A substring that contains a binary zero is correctly extracted and has |
A substring that contains a binary zero is correctly extracted and has |
2378 |
a further zero added on the end, but the result is not, of course, a C |
a further zero added on the end, but the result is not, of course, a C |
2379 |
string. However, you can process such a string by referring to the |
string. However, you can process such a string by referring to the |
2380 |
length that is returned by pcre_copy_substring() and pcre_get_sub- |
length that is returned by pcre_copy_substring() and pcre_get_sub- |
2381 |
string(). Unfortunately, the interface to pcre_get_substring_list() is |
string(). Unfortunately, the interface to pcre_get_substring_list() is |
2382 |
not adequate for handling strings containing binary zeros, because the |
not adequate for handling strings containing binary zeros, because the |
2383 |
end of the final string is not independently indicated. |
end of the final string is not independently indicated. |
2384 |
|
|
2385 |
The first three arguments are the same for all three of these func- |
The first three arguments are the same for all three of these func- |
2386 |
tions: subject is the subject string that has just been successfully |
tions: subject is the subject string that has just been successfully |
2387 |
matched, ovector is a pointer to the vector of integer offsets that was |
matched, ovector is a pointer to the vector of integer offsets that was |
2388 |
passed to pcre_exec(), and stringcount is the number of substrings that |
passed to pcre_exec(), and stringcount is the number of substrings that |
2389 |
were captured by the match, including the substring that matched the |
were captured by the match, including the substring that matched the |
2390 |
entire regular expression. This is the value returned by pcre_exec() if |
entire regular expression. This is the value returned by pcre_exec() if |
2391 |
it is greater than zero. If pcre_exec() returned zero, indicating that |
it is greater than zero. If pcre_exec() returned zero, indicating that |
2392 |
it ran out of space in ovector, the value passed as stringcount should |
it ran out of space in ovector, the value passed as stringcount should |
2393 |
be the number of elements in the vector divided by three. |
be the number of elements in the vector divided by three. |
2394 |
|
|
2395 |
The functions pcre_copy_substring() and pcre_get_substring() extract a |
The functions pcre_copy_substring() and pcre_get_substring() extract a |
2396 |
single substring, whose number is given as stringnumber. A value of |
single substring, whose number is given as stringnumber. A value of |
2397 |
zero extracts the substring that matched the entire pattern, whereas |
zero extracts the substring that matched the entire pattern, whereas |
2398 |
higher values extract the captured substrings. For pcre_copy_sub- |
higher values extract the captured substrings. For pcre_copy_sub- |
2399 |
string(), the string is placed in buffer, whose length is given by |
string(), the string is placed in buffer, whose length is given by |
2400 |
buffersize, while for pcre_get_substring() a new block of memory is |
buffersize, while for pcre_get_substring() a new block of memory is |
2401 |
obtained via pcre_malloc, and its address is returned via stringptr. |
obtained via pcre_malloc, and its address is returned via stringptr. |
2402 |
The yield of the function is the length of the string, not including |
The yield of the function is the length of the string, not including |
2403 |
the terminating zero, or one of these error codes: |
the terminating zero, or one of these error codes: |
2404 |
|
|
2405 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2406 |
|
|
2407 |
The buffer was too small for pcre_copy_substring(), or the attempt to |
The buffer was too small for pcre_copy_substring(), or the attempt to |
2408 |
get memory failed for pcre_get_substring(). |
get memory failed for pcre_get_substring(). |
2409 |
|
|
2410 |
PCRE_ERROR_NOSUBSTRING (-7) |
PCRE_ERROR_NOSUBSTRING (-7) |
2411 |
|
|
2412 |
There is no substring whose number is stringnumber. |
There is no substring whose number is stringnumber. |
2413 |
|
|
2414 |
The pcre_get_substring_list() function extracts all available sub- |
The pcre_get_substring_list() function extracts all available sub- |
2415 |
strings and builds a list of pointers to them. All this is done in a |
strings and builds a list of pointers to them. All this is done in a |
2416 |
single block of memory that is obtained via pcre_malloc. The address of |
single block of memory that is obtained via pcre_malloc. The address of |
2417 |
the memory block is returned via listptr, which is also the start of |
the memory block is returned via listptr, which is also the start of |
2418 |
the list of string pointers. The end of the list is marked by a NULL |
the list of string pointers. The end of the list is marked by a NULL |
2419 |
pointer. The yield of the function is zero if all went well, or the |
pointer. The yield of the function is zero if all went well, or the |
2420 |
error code |
error code |
2421 |
|
|
2422 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
2423 |
|
|
2424 |
if the attempt to get the memory block failed. |
if the attempt to get the memory block failed. |
2425 |
|
|
2426 |
When any of these functions encounter a substring that is unset, which |
When any of these functions encounter a substring that is unset, which |
2427 |
can happen when capturing subpattern number n+1 matches some part of |
can happen when capturing subpattern number n+1 matches some part of |
2428 |
the subject, but subpattern n has not been used at all, they return an |
the subject, but subpattern n has not been used at all, they return an |
2429 |
empty string. This can be distinguished from a genuine zero-length sub- |
empty string. This can be distinguished from a genuine zero-length sub- |
2430 |
string by inspecting the appropriate offset in ovector, which is nega- |
string by inspecting the appropriate offset in ovector, which is nega- |
2431 |
tive for unset substrings. |
tive for unset substrings. |
2432 |
|
|
2433 |
The two convenience functions pcre_free_substring() and pcre_free_sub- |
The two convenience functions pcre_free_substring() and pcre_free_sub- |
2434 |
string_list() can be used to free the memory returned by a previous |
string_list() can be used to free the memory returned by a previous |
2435 |
call of pcre_get_substring() or pcre_get_substring_list(), respec- |
call of pcre_get_substring() or pcre_get_substring_list(), respec- |
2436 |
tively. They do nothing more than call the function pointed to by |
tively. They do nothing more than call the function pointed to by |
2437 |
pcre_free, which of course could be called directly from a C program. |
pcre_free, which of course could be called directly from a C program. |
2438 |
However, PCRE is used in some situations where it is linked via a spe- |
However, PCRE is used in some situations where it is linked via a spe- |
2439 |
cial interface to another programming language that cannot use |
cial interface to another programming language that cannot use |
2440 |
pcre_free directly; it is for these cases that the functions are pro- |
pcre_free directly; it is for these cases that the functions are pro- |
2441 |
vided. |
vided. |
2442 |
|
|
2443 |
|
|
2456 |
int stringcount, const char *stringname, |
int stringcount, const char *stringname, |
2457 |
const char **stringptr); |
const char **stringptr); |
2458 |
|
|
2459 |
To extract a substring by name, you first have to find associated num- |
To extract a substring by name, you first have to find associated num- |
2460 |
ber. For example, for this pattern |
ber. For example, for this pattern |
2461 |
|
|
2462 |
(a+)b(?<xxx>\d+)... |
(a+)b(?<xxx>\d+)... |
2465 |
be unique (PCRE_DUPNAMES was not set), you can find the number from the |
be unique (PCRE_DUPNAMES was not set), you can find the number from the |
2466 |
name by calling pcre_get_stringnumber(). The first argument is the com- |
name by calling pcre_get_stringnumber(). The first argument is the com- |
2467 |
piled pattern, and the second is the name. The yield of the function is |
piled pattern, and the second is the name. The yield of the function is |
2468 |
the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no |
the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no |
2469 |
subpattern of that name. |
subpattern of that name. |
2470 |
|
|
2471 |
Given the number, you can extract the substring directly, or use one of |
Given the number, you can extract the substring directly, or use one of |
2472 |
the functions described in the previous section. For convenience, there |
the functions described in the previous section. For convenience, there |
2473 |
are also two functions that do the whole job. |
are also two functions that do the whole job. |
2474 |
|
|
2475 |
Most of the arguments of pcre_copy_named_substring() and |
Most of the arguments of pcre_copy_named_substring() and |
2476 |
pcre_get_named_substring() are the same as those for the similarly |
pcre_get_named_substring() are the same as those for the similarly |
2477 |
named functions that extract by number. As these are described in the |
named functions that extract by number. As these are described in the |
2478 |
previous section, they are not re-described here. There are just two |
previous section, they are not re-described here. There are just two |
2479 |
differences: |
differences: |
2480 |
|
|
2481 |
First, instead of a substring number, a substring name is given. Sec- |
First, instead of a substring number, a substring name is given. Sec- |
2482 |
ond, there is an extra argument, given at the start, which is a pointer |
ond, there is an extra argument, given at the start, which is a pointer |
2483 |
to the compiled pattern. This is needed in order to gain access to the |
to the compiled pattern. This is needed in order to gain access to the |
2484 |
name-to-number translation table. |
name-to-number translation table. |
2485 |
|
|
2486 |
These functions call pcre_get_stringnumber(), and if it succeeds, they |
These functions call pcre_get_stringnumber(), and if it succeeds, they |
2487 |
then call pcre_copy_substring() or pcre_get_substring(), as appropri- |
then call pcre_copy_substring() or pcre_get_substring(), as appropri- |
2488 |
ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the |
ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the |
2489 |
behaviour may not be what you want (see the next section). |
behaviour may not be what you want (see the next section). |
2490 |
|
|
2491 |
Warning: If the pattern uses the (?| feature to set up multiple subpat- |
Warning: If the pattern uses the (?| feature to set up multiple subpat- |
2492 |
terns with the same number, as described in the section on duplicate |
terns with the same number, as described in the section on duplicate |
2493 |
subpattern numbers in the pcrepattern page, you cannot use names to |
subpattern numbers in the pcrepattern page, you cannot use names to |
2494 |
distinguish the different subpatterns, because names are not included |
distinguish the different subpatterns, because names are not included |
2495 |
in the compiled code. The matching process uses only numbers. For this |
in the compiled code. The matching process uses only numbers. For this |
2496 |
reason, the use of different names for subpatterns of the same number |
reason, the use of different names for subpatterns of the same number |
2497 |
causes an error at compile time. |
causes an error at compile time. |
2498 |
|
|
2499 |
|
|
2502 |
int pcre_get_stringtable_entries(const pcre *code, |
int pcre_get_stringtable_entries(const pcre *code, |
2503 |
const char *name, char **first, char **last); |
const char *name, char **first, char **last); |
2504 |
|
|
2505 |
When a pattern is compiled with the PCRE_DUPNAMES option, names for |
When a pattern is compiled with the PCRE_DUPNAMES option, names for |
2506 |
subpatterns are not required to be unique. (Duplicate names are always |
subpatterns are not required to be unique. (Duplicate names are always |
2507 |
allowed for subpatterns with the same number, created by using the (?| |
allowed for subpatterns with the same number, created by using the (?| |
2508 |
feature. Indeed, if such subpatterns are named, they are required to |
feature. Indeed, if such subpatterns are named, they are required to |
2509 |
use the same names.) |
use the same names.) |
2510 |
|
|
2511 |
Normally, patterns with duplicate names are such that in any one match, |
Normally, patterns with duplicate names are such that in any one match, |
2512 |
only one of the named subpatterns participates. An example is shown in |
only one of the named subpatterns participates. An example is shown in |
2513 |
the pcrepattern documentation. |
the pcrepattern documentation. |
2514 |
|
|
2515 |
When duplicates are present, pcre_copy_named_substring() and |
When duplicates are present, pcre_copy_named_substring() and |
2516 |
pcre_get_named_substring() return the first substring corresponding to |
pcre_get_named_substring() return the first substring corresponding to |
2517 |
the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING |
the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING |
2518 |
(-7) is returned; no data is returned. The pcre_get_stringnumber() |
(-7) is returned; no data is returned. The pcre_get_stringnumber() |
2519 |
function returns one of the numbers that are associated with the name, |
function returns one of the numbers that are associated with the name, |
2520 |
but it is not defined which it is. |
but it is not defined which it is. |
2521 |
|
|
2522 |
If you want to get full details of all captured substrings for a given |
If you want to get full details of all captured substrings for a given |
2523 |
name, you must use the pcre_get_stringtable_entries() function. The |
name, you must use the pcre_get_stringtable_entries() function. The |
2524 |
first argument is the compiled pattern, and the second is the name. The |
first argument is the compiled pattern, and the second is the name. The |
2525 |
third and fourth are pointers to variables which are updated by the |
third and fourth are pointers to variables which are updated by the |
2526 |
function. After it has run, they point to the first and last entries in |
function. After it has run, they point to the first and last entries in |
2527 |
the name-to-number table for the given name. The function itself |
the name-to-number table for the given name. The function itself |
2528 |
returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if |
returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if |
2529 |
there are none. The format of the table is described above in the sec- |
there are none. The format of the table is described above in the sec- |
2530 |
tion entitled Information about a pattern. Given all the relevant |
tion entitled Information about a pattern. Given all the relevant |
2531 |
entries for the name, you can extract each of their numbers, and hence |
entries for the name, you can extract each of their numbers, and hence |
2532 |
the captured data, if any. |
the captured data, if any. |
2533 |
|
|
2534 |
|
|
2535 |
FINDING ALL POSSIBLE MATCHES |
FINDING ALL POSSIBLE MATCHES |
2536 |
|
|
2537 |
The traditional matching function uses a similar algorithm to Perl, |
The traditional matching function uses a similar algorithm to Perl, |
2538 |
which stops when it finds the first match, starting at a given point in |
which stops when it finds the first match, starting at a given point in |
2539 |
the subject. If you want to find all possible matches, or the longest |
the subject. If you want to find all possible matches, or the longest |
2540 |
possible match, consider using the alternative matching function (see |
possible match, consider using the alternative matching function (see |
2541 |
below) instead. If you cannot use the alternative function, but still |
below) instead. If you cannot use the alternative function, but still |
2542 |
need to find all possible matches, you can kludge it up by making use |
need to find all possible matches, you can kludge it up by making use |
2543 |
of the callout facility, which is described in the pcrecallout documen- |
of the callout facility, which is described in the pcrecallout documen- |
2544 |
tation. |
tation. |
2545 |
|
|
2546 |
What you have to do is to insert a callout right at the end of the pat- |
What you have to do is to insert a callout right at the end of the pat- |
2547 |
tern. When your callout function is called, extract and save the cur- |
tern. When your callout function is called, extract and save the cur- |
2548 |
rent matched substring. Then return 1, which forces pcre_exec() to |
rent matched substring. Then return 1, which forces pcre_exec() to |
2549 |
backtrack and try other alternatives. Ultimately, when it runs out of |
backtrack and try other alternatives. Ultimately, when it runs out of |
2550 |
matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. |
matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. |
2551 |
|
|
2552 |
|
|
2557 |
int options, int *ovector, int ovecsize, |
int options, int *ovector, int ovecsize, |
2558 |
int *workspace, int wscount); |
int *workspace, int wscount); |
2559 |
|
|
2560 |
The function pcre_dfa_exec() is called to match a subject string |
The function pcre_dfa_exec() is called to match a subject string |
2561 |
against a compiled pattern, using a matching algorithm that scans the |
against a compiled pattern, using a matching algorithm that scans the |
2562 |
subject string just once, and does not backtrack. This has different |
subject string just once, and does not backtrack. This has different |
2563 |
characteristics to the normal algorithm, and is not compatible with |
characteristics to the normal algorithm, and is not compatible with |
2564 |
Perl. Some of the features of PCRE patterns are not supported. Never- |
Perl. Some of the features of PCRE patterns are not supported. Never- |
2565 |
theless, there are times when this kind of matching can be useful. For |
theless, there are times when this kind of matching can be useful. For |
2566 |
a discussion of the two matching algorithms, and a list of features |
a discussion of the two matching algorithms, and a list of features |
2567 |
that pcre_dfa_exec() does not support, see the pcrematching documenta- |
that pcre_dfa_exec() does not support, see the pcrematching documenta- |
2568 |
tion. |
tion. |
2569 |
|
|
2570 |
The arguments for the pcre_dfa_exec() function are the same as for |
The arguments for the pcre_dfa_exec() function are the same as for |
2571 |
pcre_exec(), plus two extras. The ovector argument is used in a differ- |
pcre_exec(), plus two extras. The ovector argument is used in a differ- |
2572 |
ent way, and this is described below. The other common arguments are |
ent way, and this is described below. The other common arguments are |
2573 |
used in the same way as for pcre_exec(), so their description is not |
used in the same way as for pcre_exec(), so their description is not |
2574 |
repeated here. |
repeated here. |
2575 |
|
|
2576 |
The two additional arguments provide workspace for the function. The |
The two additional arguments provide workspace for the function. The |
2577 |
workspace vector should contain at least 20 elements. It is used for |
workspace vector should contain at least 20 elements. It is used for |
2578 |
keeping track of multiple paths through the pattern tree. More |
keeping track of multiple paths through the pattern tree. More |
2579 |
workspace will be needed for patterns and subjects where there are a |
workspace will be needed for patterns and subjects where there are a |
2580 |
lot of potential matches. |
lot of potential matches. |
2581 |
|
|
2582 |
Here is an example of a simple call to pcre_dfa_exec(): |
Here is an example of a simple call to pcre_dfa_exec(): |
2598 |
|
|
2599 |
Option bits for pcre_dfa_exec() |
Option bits for pcre_dfa_exec() |
2600 |
|
|
2601 |
The unused bits of the options argument for pcre_dfa_exec() must be |
The unused bits of the options argument for pcre_dfa_exec() must be |
2602 |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- |
zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- |
2603 |
LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, |
LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, |
2604 |
PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PAR- |
PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, PCRE_PAR- |
2605 |
TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last |
2606 |
four of these are exactly the same as for pcre_exec(), so their |
four of these are exactly the same as for pcre_exec(), so their |
2607 |
description is not repeated here. |
description is not repeated here. |
2608 |
|
|
2609 |
PCRE_PARTIAL_HARD |
PCRE_PARTIAL_HARD |
2610 |
PCRE_PARTIAL_SOFT |
PCRE_PARTIAL_SOFT |
2611 |
|
|
2612 |
These have the same general effect as they do for pcre_exec(), but the |
These have the same general effect as they do for pcre_exec(), but the |
2613 |
details are slightly different. When PCRE_PARTIAL_HARD is set for |
details are slightly different. When PCRE_PARTIAL_HARD is set for |
2614 |
pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of the sub- |
pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of the sub- |
2615 |
ject is reached and there is still at least one matching possibility |
ject is reached and there is still at least one matching possibility |
2616 |
that requires additional characters. This happens even if some complete |
that requires additional characters. This happens even if some complete |
2617 |
matches have also been found. When PCRE_PARTIAL_SOFT is set, the return |
matches have also been found. When PCRE_PARTIAL_SOFT is set, the return |
2618 |
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end |
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end |
2619 |
of the subject is reached, there have been no complete matches, but |
of the subject is reached, there have been no complete matches, but |
2620 |
there is still at least one matching possibility. The portion of the |
there is still at least one matching possibility. The portion of the |
2621 |
string that was inspected when the longest partial match was found is |
string that was inspected when the longest partial match was found is |
2622 |
set as the first matching string in both cases. |
set as the first matching string in both cases. |
2623 |
|
|
2624 |
PCRE_DFA_SHORTEST |
PCRE_DFA_SHORTEST |
2625 |
|
|
2626 |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to |
2627 |
stop as soon as it has found one match. Because of the way the alterna- |
stop as soon as it has found one match. Because of the way the alterna- |
2628 |
tive algorithm works, this is necessarily the shortest possible match |
tive algorithm works, this is necessarily the shortest possible match |
2629 |
at the first possible matching point in the subject string. |
at the first possible matching point in the subject string. |
2630 |
|
|
2631 |
PCRE_DFA_RESTART |
PCRE_DFA_RESTART |
2632 |
|
|
2633 |
When pcre_dfa_exec() returns a partial match, it is possible to call it |
When pcre_dfa_exec() returns a partial match, it is possible to call it |
2634 |
again, with additional subject characters, and have it continue with |
again, with additional subject characters, and have it continue with |
2635 |
the same match. The PCRE_DFA_RESTART option requests this action; when |
the same match. The PCRE_DFA_RESTART option requests this action; when |
2636 |
it is set, the workspace and wscount options must reference the same |
it is set, the workspace and wscount options must reference the same |
2637 |
vector as before because data about the match so far is left in them |
vector as before because data about the match so far is left in them |
2638 |
after a partial match. There is more discussion of this facility in the |
after a partial match. There is more discussion of this facility in the |
2639 |
pcrepartial documentation. |
pcrepartial documentation. |
2640 |
|
|
2641 |
Successful returns from pcre_dfa_exec() |
Successful returns from pcre_dfa_exec() |
2642 |
|
|
2643 |
When pcre_dfa_exec() succeeds, it may have matched more than one sub- |
When pcre_dfa_exec() succeeds, it may have matched more than one sub- |
2644 |
string in the subject. Note, however, that all the matches from one run |
string in the subject. Note, however, that all the matches from one run |
2645 |
of the function start at the same point in the subject. The shorter |
of the function start at the same point in the subject. The shorter |
2646 |
matches are all initial substrings of the longer matches. For example, |
matches are all initial substrings of the longer matches. For example, |
2647 |
if the pattern |
if the pattern |
2648 |
|
|
2649 |
<.*> |
<.*> |
2658 |
<something> <something else> |
<something> <something else> |
2659 |
<something> <something else> <something further> |
<something> <something else> <something further> |
2660 |
|
|
2661 |
On success, the yield of the function is a number greater than zero, |
On success, the yield of the function is a number greater than zero, |
2662 |
which is the number of matched substrings. The substrings themselves |
which is the number of matched substrings. The substrings themselves |
2663 |
are returned in ovector. Each string uses two elements; the first is |
are returned in ovector. Each string uses two elements; the first is |
2664 |
the offset to the start, and the second is the offset to the end. In |
the offset to the start, and the second is the offset to the end. In |
2665 |
fact, all the strings have the same start offset. (Space could have |
fact, all the strings have the same start offset. (Space could have |
2666 |
been saved by giving this only once, but it was decided to retain some |
been saved by giving this only once, but it was decided to retain some |
2667 |
compatibility with the way pcre_exec() returns data, even though the |
compatibility with the way pcre_exec() returns data, even though the |
2668 |
meaning of the strings is different.) |
meaning of the strings is different.) |
2669 |
|
|
2670 |
The strings are returned in reverse order of length; that is, the long- |
The strings are returned in reverse order of length; that is, the long- |
2671 |
est matching string is given first. If there were too many matches to |
est matching string is given first. If there were too many matches to |
2672 |
fit into ovector, the yield of the function is zero, and the vector is |
fit into ovector, the yield of the function is zero, and the vector is |
2673 |
filled with the longest matches. |
filled with the longest matches. |
2674 |
|
|
2675 |
Error returns from pcre_dfa_exec() |
Error returns from pcre_dfa_exec() |
2676 |
|
|
2677 |
The pcre_dfa_exec() function returns a negative number when it fails. |
The pcre_dfa_exec() function returns a negative number when it fails. |
2678 |
Many of the errors are the same as for pcre_exec(), and these are |
Many of the errors are the same as for pcre_exec(), and these are |
2679 |
described above. There are in addition the following errors that are |
described above. There are in addition the following errors that are |
2680 |
specific to pcre_dfa_exec(): |
specific to pcre_dfa_exec(): |
2681 |
|
|
2682 |
PCRE_ERROR_DFA_UITEM (-16) |
PCRE_ERROR_DFA_UITEM (-16) |
2683 |
|
|
2684 |
This return is given if pcre_dfa_exec() encounters an item in the pat- |
This return is given if pcre_dfa_exec() encounters an item in the pat- |
2685 |
tern that it does not support, for instance, the use of \C or a back |
tern that it does not support, for instance, the use of \C or a back |
2686 |
reference. |
reference. |
2687 |
|
|
2688 |
PCRE_ERROR_DFA_UCOND (-17) |
PCRE_ERROR_DFA_UCOND (-17) |
2689 |
|
|
2690 |
This return is given if pcre_dfa_exec() encounters a condition item |
This return is given if pcre_dfa_exec() encounters a condition item |
2691 |
that uses a back reference for the condition, or a test for recursion |
that uses a back reference for the condition, or a test for recursion |
2692 |
in a specific group. These are not supported. |
in a specific group. These are not supported. |
2693 |
|
|
2694 |
PCRE_ERROR_DFA_UMLIMIT (-18) |
PCRE_ERROR_DFA_UMLIMIT (-18) |
2695 |
|
|
2696 |
This return is given if pcre_dfa_exec() is called with an extra block |
This return is given if pcre_dfa_exec() is called with an extra block |
2697 |
that contains a setting of the match_limit field. This is not supported |
that contains a setting of the match_limit field. This is not supported |
2698 |
(it is meaningless). |
(it is meaningless). |
2699 |
|
|
2700 |
PCRE_ERROR_DFA_WSSIZE (-19) |
PCRE_ERROR_DFA_WSSIZE (-19) |
2701 |
|
|
2702 |
This return is given if pcre_dfa_exec() runs out of space in the |
This return is given if pcre_dfa_exec() runs out of space in the |
2703 |
workspace vector. |
workspace vector. |
2704 |
|
|
2705 |
PCRE_ERROR_DFA_RECURSE (-20) |
PCRE_ERROR_DFA_RECURSE (-20) |
2706 |
|
|
2707 |
When a recursive subpattern is processed, the matching function calls |
When a recursive subpattern is processed, the matching function calls |
2708 |
itself recursively, using private vectors for ovector and workspace. |
itself recursively, using private vectors for ovector and workspace. |
2709 |
This error is given if the output vector is not large enough. This |
This error is given if the output vector is not large enough. This |
2710 |
should be extremely rare, as a vector of size 1000 is used. |
should be extremely rare, as a vector of size 1000 is used. |
2711 |
|
|
2712 |
|
|
2713 |
SEE ALSO |
SEE ALSO |
2714 |
|
|
2715 |
pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar- |
pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar- |
2716 |
tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3). |
tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3). |
2717 |
|
|
2718 |
|
|
2725 |
|
|
2726 |
REVISION |
REVISION |
2727 |
|
|
2728 |
Last updated: 03 October 2009 |
Last updated: 26 March 2010 |
2729 |
Copyright (c) 1997-2009 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
2730 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
2731 |
|
|
2732 |
|
|
5161 |
tested. |
tested. |
5162 |
|
|
5163 |
The new verbs make use of what was previously invalid syntax: an open- |
The new verbs make use of what was previously invalid syntax: an open- |
5164 |
ing parenthesis followed by an asterisk. In Perl, they are generally of |
ing parenthesis followed by an asterisk. They are generally of the form |
5165 |
the form (*VERB:ARG) but PCRE does not support the use of arguments, so |
(*VERB) or (*VERB:NAME). Some may take either form, with differing be- |
5166 |
its general form is just (*VERB). Any number of these verbs may occur |
haviour, depending on whether or not an argument is present. An name is |
5167 |
in a pattern. There are two kinds: |
a sequence of letters, digits, and underscores. If the name is empty, |
5168 |
|
that is, if the closing parenthesis immediately follows the colon, the |
5169 |
|
effect is as if the colon were not there. Any number of these verbs may |
5170 |
|
occur in a pattern. |
5171 |
|
|
5172 |
|
PCRE contains some optimizations that are used to speed up matching by |
5173 |
|
running some checks at the start of each match attempt. For example, it |
5174 |
|
may know the minimum length of matching subject, or that a particular |
5175 |
|
character must be present. When one of these optimizations suppresses |
5176 |
|
the running of a match, any included backtracking verbs will not, of |
5177 |
|
course, be processed. You can suppress the start-of-match optimizations |
5178 |
|
by setting the PCRE_NO_START_OPTIMIZE option when calling pcre_exec(). |
5179 |
|
|
5180 |
Verbs that act immediately |
Verbs that act immediately |
5181 |
|
|
5182 |
The following verbs act as soon as they are encountered: |
The following verbs act as soon as they are encountered. They may not |
5183 |
|
be followed by a name. |
5184 |
|
|
5185 |
(*ACCEPT) |
(*ACCEPT) |
5186 |
|
|
5209 |
A match with the string "aaaa" always fails, but the callout is taken |
A match with the string "aaaa" always fails, but the callout is taken |
5210 |
before each backtrack happens (in this example, 10 times). |
before each backtrack happens (in this example, 10 times). |
5211 |
|
|
5212 |
|
Recording which path was taken |
5213 |
|
|
5214 |
|
There is one verb whose main purpose is to track how a match was |
5215 |
|
arrived at, though it also has a secondary use in conjunction with |
5216 |
|
advancing the match starting point (see (*SKIP) below). |
5217 |
|
|
5218 |
|
(*MARK:NAME) or (*:NAME) |
5219 |
|
|
5220 |
|
A name is always required with this verb. There may be as many |
5221 |
|
instances of (*MARK) as you like in a pattern, and their names do not |
5222 |
|
have to be unique. |
5223 |
|
|
5224 |
|
When a match succeeds, the name of the last-encountered (*MARK) is |
5225 |
|
passed back to the caller via the pcre_extra data structure, as |
5226 |
|
described in the section on pcre_extra in the pcreapi documentation. No |
5227 |
|
data is returned for a partial match. Here is an example of pcretest |
5228 |
|
output, where the /K modifier requests the retrieval and outputting of |
5229 |
|
(*MARK) data: |
5230 |
|
|
5231 |
|
/X(*MARK:A)Y|X(*MARK:B)Z/K |
5232 |
|
XY |
5233 |
|
0: XY |
5234 |
|
MK: A |
5235 |
|
XZ |
5236 |
|
0: XZ |
5237 |
|
MK: B |
5238 |
|
|
5239 |
|
The (*MARK) name is tagged with "MK:" in this output, and in this exam- |
5240 |
|
ple it indicates which of the two alternatives matched. This is a more |
5241 |
|
efficient way of obtaining this information than putting each alterna- |
5242 |
|
tive in its own capturing parentheses. |
5243 |
|
|
5244 |
|
A name may also be returned after a failed match if the final path |
5245 |
|
through the pattern involves (*MARK). However, unless (*MARK) used in |
5246 |
|
conjunction with (*COMMIT), this is unlikely to happen for an unan- |
5247 |
|
chored pattern because, as the starting point for matching is advanced, |
5248 |
|
the final check is often with an empty string, causing a failure before |
5249 |
|
(*MARK) is reached. For example: |
5250 |
|
|
5251 |
|
/X(*MARK:A)Y|X(*MARK:B)Z/K |
5252 |
|
XP |
5253 |
|
No match |
5254 |
|
|
5255 |
|
There are three potential starting points for this match (starting with |
5256 |
|
X, starting with P, and with an empty string). If the pattern is |
5257 |
|
anchored, the result is different: |
5258 |
|
|
5259 |
|
/^X(*MARK:A)Y|^X(*MARK:B)Z/K |
5260 |
|
XP |
5261 |
|
No match, mark = B |
5262 |
|
|
5263 |
|
PCRE's start-of-match optimizations can also interfere with this. For |
5264 |
|
example, if, as a result of a call to pcre_study(), it knows the mini- |
5265 |
|
mum subject length for a match, a shorter subject will not be scanned |
5266 |
|
at all. |
5267 |
|
|
5268 |
|
Note that similar anomalies (though different in detail) exist in Perl, |
5269 |
|
no doubt for the same reasons. The use of (*MARK) data after a failed |
5270 |
|
match of an unanchored pattern is not recommended, unless (*COMMIT) is |
5271 |
|
involved. |
5272 |
|
|
5273 |
Verbs that act after backtracking |
Verbs that act after backtracking |
5274 |
|
|
5275 |
The following verbs do nothing when they are encountered. Matching con- |
The following verbs do nothing when they are encountered. Matching con- |
5276 |
tinues with what follows, but if there is no subsequent match, a fail- |
tinues with what follows, but if there is no subsequent match, causing |
5277 |
ure is forced. The verbs differ in exactly what kind of failure |
a backtrack to the verb, a failure is forced. That is, backtracking |
5278 |
occurs. |
cannot pass to the left of the verb. However, when one of these verbs |
5279 |
|
appears inside an atomic group, its effect is confined to that group, |
5280 |
|
because once the group has been matched, there is never any backtrack- |
5281 |
|
ing into it. In this situation, backtracking can "jump back" to the |
5282 |
|
left of the entire atomic group. (Remember also, as stated above, that |
5283 |
|
this localization also applies in subroutine calls and assertions.) |
5284 |
|
|
5285 |
|
These verbs differ in exactly what kind of failure occurs when back- |
5286 |
|
tracking reaches them. |
5287 |
|
|
5288 |
(*COMMIT) |
(*COMMIT) |
5289 |
|
|
5290 |
This verb causes the whole match to fail outright if the rest of the |
This verb, which may not be followed by a name, causes the whole match |
5291 |
pattern does not match. Even if the pattern is unanchored, no further |
to fail outright if the rest of the pattern does not match. Even if the |
5292 |
attempts to find a match by advancing the starting point take place. |
pattern is unanchored, no further attempts to find a match by advancing |
5293 |
Once (*COMMIT) has been passed, pcre_exec() is committed to finding a |
the starting point take place. Once (*COMMIT) has been passed, |
5294 |
match at the current starting point, or not at all. For example: |
pcre_exec() is committed to finding a match at the current starting |
5295 |
|
point, or not at all. For example: |
5296 |
|
|
5297 |
a+(*COMMIT)b |
a+(*COMMIT)b |
5298 |
|
|
5299 |
This matches "xxaab" but not "aacaab". It can be thought of as a kind |
This matches "xxaab" but not "aacaab". It can be thought of as a kind |
5300 |
of dynamic anchor, or "I've started, so I must finish." |
of dynamic anchor, or "I've started, so I must finish." The name of the |
5301 |
|
most recently passed (*MARK) in the path is passed back when (*COMMIT) |
5302 |
(*PRUNE) |
forces a match failure. |
5303 |
|
|
5304 |
This verb causes the match to fail at the current position if the rest |
Note that (*COMMIT) at the start of a pattern is not the same as an |
5305 |
of the pattern does not match. If the pattern is unanchored, the normal |
anchor, unless PCRE's start-of-match optimizations are turned off, as |
5306 |
"bumpalong" advance to the next starting character then happens. Back- |
shown in this pcretest example: |
5307 |
tracking can occur as usual to the left of (*PRUNE), or when matching |
|
5308 |
to the right of (*PRUNE), but if there is no match to the right, back- |
/(*COMMIT)abc/ |
5309 |
tracking cannot cross (*PRUNE). In simple cases, the use of (*PRUNE) |
xyzabc |
5310 |
is just an alternative to an atomic group or possessive quantifier, but |
0: abc |
5311 |
there are some uses of (*PRUNE) that cannot be expressed in any other |
xyzabc\Y |
5312 |
way. |
No match |
5313 |
|
|
5314 |
|
PCRE knows that any match must start with "a", so the optimization |
5315 |
|
skips along the subject to "a" before running the first match attempt, |
5316 |
|
which succeeds. When the optimization is disabled by the \Y escape in |
5317 |
|
the second subject, the match starts at "x" and so the (*COMMIT) causes |
5318 |
|
it to fail without trying any other starting points. |
5319 |
|
|
5320 |
|
(*PRUNE) or (*PRUNE:NAME) |
5321 |
|
|
5322 |
|
This verb causes the match to fail at the current starting position in |
5323 |
|
the subject if the rest of the pattern does not match. If the pattern |
5324 |
|
is unanchored, the normal "bumpalong" advance to the next starting |
5325 |
|
character then happens. Backtracking can occur as usual to the left of |
5326 |
|
(*PRUNE), before it is reached, or when matching to the right of |
5327 |
|
(*PRUNE), but if there is no match to the right, backtracking cannot |
5328 |
|
cross (*PRUNE). In simple cases, the use of (*PRUNE) is just an alter- |
5329 |
|
native to an atomic group or possessive quantifier, but there are some |
5330 |
|
uses of (*PRUNE) that cannot be expressed in any other way. The behav- |
5331 |
|
iour of (*PRUNE:NAME) is the same as (*MARK:NAME)(*PRUNE) when the |
5332 |
|
match fails completely; the name is passed back if this is the final |
5333 |
|
attempt. (*PRUNE:NAME) does not pass back a name if the match suc- |
5334 |
|
ceeds. In an anchored pattern (*PRUNE) has the same effect as (*COM- |
5335 |
|
MIT). |
5336 |
|
|
5337 |
(*SKIP) |
(*SKIP) |
5338 |
|
|
5339 |
This verb is like (*PRUNE), except that if the pattern is unanchored, |
This verb, when given without a name, is like (*PRUNE), except that if |
5340 |
the "bumpalong" advance is not to the next character, but to the posi- |
the pattern is unanchored, the "bumpalong" advance is not to the next |
5341 |
tion in the subject where (*SKIP) was encountered. (*SKIP) signifies |
character, but to the position in the subject where (*SKIP) was encoun- |
5342 |
that whatever text was matched leading up to it cannot be part of a |
tered. (*SKIP) signifies that whatever text was matched leading up to |
5343 |
successful match. Consider: |
it cannot be part of a successful match. Consider: |
5344 |
|
|
5345 |
a+(*SKIP)b |
a+(*SKIP)b |
5346 |
|
|
5347 |
If the subject is "aaaac...", after the first match attempt fails |
If the subject is "aaaac...", after the first match attempt fails |
5348 |
(starting at the first character in the string), the starting point |
(starting at the first character in the string), the starting point |
5349 |
skips on to start the next attempt at "c". Note that a possessive quan- |
skips on to start the next attempt at "c". Note that a possessive quan- |
5350 |
tifer does not have the same effect as this example; although it would |
tifer does not have the same effect as this example; although it would |
5351 |
suppress backtracking during the first match attempt, the second |
suppress backtracking during the first match attempt, the second |
5352 |
attempt would start at the second character instead of skipping on to |
attempt would start at the second character instead of skipping on to |
5353 |
"c". |
"c". |
5354 |
|
|
5355 |
(*THEN) |
(*SKIP:NAME) |
5356 |
|
|
5357 |
|
When (*SKIP) has an associated name, its behaviour is modified. If the |
5358 |
|
following pattern fails to match, the previous path through the pattern |
5359 |
|
is searched for the most recent (*MARK) that has the same name. If one |
5360 |
|
is found, the "bumpalong" advance is to the subject position that cor- |
5361 |
|
responds to that (*MARK) instead of to where (*SKIP) was encountered. |
5362 |
|
If no (*MARK) with a matching name is found, normal "bumpalong" of one |
5363 |
|
character happens (the (*SKIP) is ignored). |
5364 |
|
|
5365 |
|
(*THEN) or (*THEN:NAME) |
5366 |
|
|
5367 |
This verb causes a skip to the next alternation if the rest of the pat- |
This verb causes a skip to the next alternation if the rest of the pat- |
5368 |
tern does not match. That is, it cancels pending backtracking, but only |
tern does not match. That is, it cancels pending backtracking, but only |
5369 |
within the current alternation. Its name comes from the observation |
within the current alternation. Its name comes from the observation |
5370 |
that it can be used for a pattern-based if-then-else block: |
that it can be used for a pattern-based if-then-else block: |
5371 |
|
|
5372 |
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... |
( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... |
5373 |
|
|
5374 |
If the COND1 pattern matches, FOO is tried (and possibly further items |
If the COND1 pattern matches, FOO is tried (and possibly further items |
5375 |
after the end of the group if FOO succeeds); on failure the matcher |
after the end of the group if FOO succeeds); on failure the matcher |
5376 |
skips to the second alternative and tries COND2, without backtracking |
skips to the second alternative and tries COND2, without backtracking |
5377 |
into COND1. If (*THEN) is used outside of any alternation, it acts |
into COND1. The behaviour of (*THEN:NAME) is exactly the same as |
5378 |
exactly like (*PRUNE). |
(*MARK:NAME)(*THEN) if the overall match fails. If (*THEN) is not |
5379 |
|
directly inside an alternation, it acts like (*PRUNE). |
5380 |
|
|
5381 |
|
|
5382 |
SEE ALSO |
SEE ALSO |
5393 |
|
|
5394 |
REVISION |
REVISION |
5395 |
|
|
5396 |
Last updated: 06 March 2010 |
Last updated: 27 March 2010 |
5397 |
Copyright (c) 1997-2010 University of Cambridge. |
Copyright (c) 1997-2010 University of Cambridge. |
5398 |
------------------------------------------------------------------------------ |
------------------------------------------------------------------------------ |
5399 |
|
|