32 |
<li><a name="TOC17" href="#SEC17">DUPLICATE SUBPATTERN NAMES</a> |
<li><a name="TOC17" href="#SEC17">DUPLICATE SUBPATTERN NAMES</a> |
33 |
<li><a name="TOC18" href="#SEC18">FINDING ALL POSSIBLE MATCHES</a> |
<li><a name="TOC18" href="#SEC18">FINDING ALL POSSIBLE MATCHES</a> |
34 |
<li><a name="TOC19" href="#SEC19">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> |
<li><a name="TOC19" href="#SEC19">MATCHING A PATTERN: THE ALTERNATIVE FUNCTION</a> |
35 |
|
<li><a name="TOC20" href="#SEC20">SEE ALSO</a> |
36 |
</ul> |
</ul> |
37 |
<br><a name="SEC1" href="#TOC1">PCRE NATIVE API</a><br> |
<br><a name="SEC1" href="#TOC1">PCRE NATIVE API</a><br> |
38 |
<P> |
<P> |
141 |
</P> |
</P> |
142 |
<br><a name="SEC2" href="#TOC1">PCRE API OVERVIEW</a><br> |
<br><a name="SEC2" href="#TOC1">PCRE API OVERVIEW</a><br> |
143 |
<P> |
<P> |
144 |
PCRE has its own native API, which is described in this document. There is |
PCRE has its own native API, which is described in this document. There are |
145 |
also a set of wrapper functions that correspond to the POSIX regular expression |
also some wrapper functions that correspond to the POSIX regular expression |
146 |
API. These are described in the |
API. These are described in the |
147 |
<a href="pcreposix.html"><b>pcreposix</b></a> |
<a href="pcreposix.html"><b>pcreposix</b></a> |
148 |
documentation. Both of these APIs define a set of C function calls. A C++ |
documentation. Both of these APIs define a set of C function calls. A C++ |
171 |
A second matching function, <b>pcre_dfa_exec()</b>, which is not |
A second matching function, <b>pcre_dfa_exec()</b>, which is not |
172 |
Perl-compatible, is also provided. This uses a different algorithm for the |
Perl-compatible, is also provided. This uses a different algorithm for the |
173 |
matching. The alternative algorithm finds all possible matches (at a given |
matching. The alternative algorithm finds all possible matches (at a given |
174 |
point in the subject). However, this algorithm does not return captured |
point in the subject), and scans the subject just once. However, this algorithm |
175 |
substrings. A description of the two matching algorithms and their advantages |
does not return captured substrings. A description of the two matching |
176 |
and disadvantages is given in the |
algorithms and their advantages and disadvantages is given in the |
177 |
<a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
178 |
documentation. |
documentation. |
179 |
</P> |
</P> |
244 |
</P> |
</P> |
245 |
<br><a name="SEC3" href="#TOC1">NEWLINES</a><br> |
<br><a name="SEC3" href="#TOC1">NEWLINES</a><br> |
246 |
<P> |
<P> |
247 |
PCRE supports three different conventions for indicating line breaks in |
PCRE supports four different conventions for indicating line breaks in |
248 |
strings: a single CR character, a single LF character, or the two-character |
strings: a single CR (carriage return) character, a single LF (linefeed) |
249 |
sequence CRLF. All three are used as "standard" by different operating systems. |
character, the two-character sequence CRLF, or any Unicode newline sequence. |
250 |
When PCRE is built, a default can be specified. The default default is LF, |
The Unicode newline sequences are the three just mentioned, plus the single |
251 |
which is the Unix standard. When PCRE is run, the default can be overridden, |
characters VT (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next line, |
252 |
either when a pattern is compiled, or when it is matched. |
U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). |
253 |
<br> |
</P> |
254 |
<br> |
<P> |
255 |
|
Each of the first three conventions is used by at least one operating system as |
256 |
|
its standard newline sequence. When PCRE is built, a default can be specified. |
257 |
|
The default default is LF, which is the Unix standard. When PCRE is run, the |
258 |
|
default can be overridden, either when a pattern is compiled, or when it is |
259 |
|
matched. |
260 |
|
</P> |
261 |
|
<P> |
262 |
In the PCRE documentation the word "newline" is used to mean "the character or |
In the PCRE documentation the word "newline" is used to mean "the character or |
263 |
pair of characters that indicate a line break". |
pair of characters that indicate a line break". The choice of newline |
264 |
|
convention affects the handling of the dot, circumflex, and dollar |
265 |
|
metacharacters, the handling of #-comments in /x mode, and, when CRLF is a |
266 |
|
recognized line ending sequence, the match position advancement for a |
267 |
|
non-anchored pattern. The choice of newline convention does not affect the |
268 |
|
interpretation of the \n or \r escape sequences. |
269 |
</P> |
</P> |
270 |
<br><a name="SEC4" href="#TOC1">MULTITHREADING</a><br> |
<br><a name="SEC4" href="#TOC1">MULTITHREADING</a><br> |
271 |
<P> |
<P> |
314 |
PCRE_CONFIG_NEWLINE |
PCRE_CONFIG_NEWLINE |
315 |
</pre> |
</pre> |
316 |
The output is an integer whose value specifies the default character sequence |
The output is an integer whose value specifies the default character sequence |
317 |
that is recognized as meaning "newline". The three values that are supported |
that is recognized as meaning "newline". The four values that are supported |
318 |
are: 10 for LF, 13 for CR, and 3338 for CRLF. The default should normally be |
are: 10 for LF, 13 for CR, 3338 for CRLF, and -1 for ANY. The default should |
319 |
the standard sequence for your operating system. |
normally be the standard sequence for your operating system. |
320 |
<pre> |
<pre> |
321 |
PCRE_CONFIG_LINK_SIZE |
PCRE_CONFIG_LINK_SIZE |
322 |
</pre> |
</pre> |
387 |
argument, which is an address (see below). |
argument, which is an address (see below). |
388 |
</P> |
</P> |
389 |
<P> |
<P> |
390 |
The <i>options</i> argument contains independent bits that affect the |
The <i>options</i> argument contains various bit settings that affect the |
391 |
compilation. It should be zero if no options are required. The available |
compilation. It should be zero if no options are required. The available |
392 |
options are described below. Some of them, in particular, those that are |
options are described below. Some of them, in particular, those that are |
393 |
compatible with Perl, can also be set and unset from within the pattern (see |
compatible with Perl, can also be set and unset from within the pattern (see |
480 |
including those that indicate newline. Without it, a dot does not match when |
including those that indicate newline. Without it, a dot does not match when |
481 |
the current position is at a newline. This option is equivalent to Perl's /s |
the current position is at a newline. This option is equivalent to Perl's /s |
482 |
option, and it can be changed within a pattern by a (?s) option setting. A |
option, and it can be changed within a pattern by a (?s) option setting. A |
483 |
negative class such as [^a] always matches newlines, independent of the setting |
negative class such as [^a] always matches newline characters, independent of |
484 |
of this option. |
the setting of this option. |
485 |
<pre> |
<pre> |
486 |
PCRE_DUPNAMES |
PCRE_DUPNAMES |
487 |
</pre> |
</pre> |
544 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
545 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
546 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
547 |
|
PCRE_NEWLINE_ANY |
548 |
</pre> |
</pre> |
549 |
These options override the default newline definition that was chosen when PCRE |
These options override the default newline definition that was chosen when PCRE |
550 |
was built. Setting the first or the second specifies that a newline is |
was built. Setting the first or the second specifies that a newline is |
551 |
indicated by a single character (CR or LF, respectively). Setting both of them |
indicated by a single character (CR or LF, respectively). Setting |
552 |
specifies that a newline is indicated by the two-character CRLF sequence. For |
PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character |
553 |
convenience, PCRE_NEWLINE_CRLF is defined to contain both bits. The only time |
CRLF sequence. Setting PCRE_NEWLINE_ANY specifies that any Unicode newline |
554 |
that a line break is relevant when compiling a pattern is if PCRE_EXTENDED is |
sequence should be recognized. The Unicode newline sequences are the three just |
555 |
set, and an unescaped # outside a character class is encountered. This |
mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed, |
556 |
indicates a comment that lasts until after the next newline. |
U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS |
557 |
|
(paragraph separator, U+2029). The last two are recognized only in UTF-8 mode. |
558 |
|
</P> |
559 |
|
<P> |
560 |
|
The newline setting in the options word uses three bits that are treated |
561 |
|
as a number, giving eight possibilities. Currently only five are used (default |
562 |
|
plus the four values above). This means that if you set more than one newline |
563 |
|
option, the combination may or may not be sensible. For example, |
564 |
|
PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but |
565 |
|
other combinations yield unused numbers and cause an error. |
566 |
|
</P> |
567 |
|
<P> |
568 |
|
The only time that a line break is specially recognized when compiling a |
569 |
|
pattern is if PCRE_EXTENDED is set, and an unescaped # outside a character |
570 |
|
class is encountered. This indicates a comment that lasts until after the next |
571 |
|
line break sequence. In other circumstances, line break sequences are treated |
572 |
|
as literal data, except that in PCRE_EXTENDED mode, both CR and LF are treated |
573 |
|
as whitespace characters and are therefore ignored. |
574 |
</P> |
</P> |
575 |
<P> |
<P> |
576 |
The newline option set at compile time becomes the default that is used for |
The newline option that is set at compile time becomes the default that is used |
577 |
<b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, but it can be overridden. |
for <b>pcre_exec()</b> and <b>pcre_dfa_exec()</b>, but it can be overridden. |
578 |
<pre> |
<pre> |
579 |
PCRE_NO_AUTO_CAPTURE |
PCRE_NO_AUTO_CAPTURE |
580 |
</pre> |
</pre> |
618 |
<P> |
<P> |
619 |
The following table lists the error codes than may be returned by |
The following table lists the error codes than may be returned by |
620 |
<b>pcre_compile2()</b>, along with the error messages that may be returned by |
<b>pcre_compile2()</b>, along with the error messages that may be returned by |
621 |
both compiling functions. |
both compiling functions. As PCRE has developed, some error codes have fallen |
622 |
|
out of use. To avoid confusion, they have not been re-used. |
623 |
<pre> |
<pre> |
624 |
0 no error |
0 no error |
625 |
1 \ at end of pattern |
1 \ at end of pattern |
631 |
7 invalid escape sequence in character class |
7 invalid escape sequence in character class |
632 |
8 range out of order in character class |
8 range out of order in character class |
633 |
9 nothing to repeat |
9 nothing to repeat |
634 |
10 operand of unlimited repeat could match the empty string |
10 [this code is not in use] |
635 |
11 internal error: unexpected repeat |
11 internal error: unexpected repeat |
636 |
12 unrecognized character after (? |
12 unrecognized character after (? |
637 |
13 POSIX named classes are supported only within a class |
13 POSIX named classes are supported only within a class |
640 |
16 erroffset passed as NULL |
16 erroffset passed as NULL |
641 |
17 unknown option bit(s) set |
17 unknown option bit(s) set |
642 |
18 missing ) after comment |
18 missing ) after comment |
643 |
19 parentheses nested too deeply |
19 [this code is not in use] |
644 |
20 regular expression too large |
20 regular expression too large |
645 |
21 failed to get memory |
21 failed to get memory |
646 |
22 unmatched parentheses |
22 unmatched parentheses |
654 |
30 unknown POSIX class name |
30 unknown POSIX class name |
655 |
31 POSIX collating elements are not supported |
31 POSIX collating elements are not supported |
656 |
32 this version of PCRE is not compiled with PCRE_UTF8 support |
32 this version of PCRE is not compiled with PCRE_UTF8 support |
657 |
33 spare error |
33 [this code is not in use] |
658 |
34 character value in \x{...} sequence is too large |
34 character value in \x{...} sequence is too large |
659 |
35 invalid condition (?(0) |
35 invalid condition (?(0) |
660 |
36 \C not allowed in lookbehind assertion |
36 \C not allowed in lookbehind assertion |
663 |
39 closing ) for (?C expected |
39 closing ) for (?C expected |
664 |
40 recursive call could loop indefinitely |
40 recursive call could loop indefinitely |
665 |
41 unrecognized character after (?P |
41 unrecognized character after (?P |
666 |
42 syntax error after (?P |
42 syntax error in subpattern name (missing terminator) |
667 |
43 two named subpatterns have the same name |
43 two named subpatterns have the same name |
668 |
44 invalid UTF-8 string |
44 invalid UTF-8 string |
669 |
45 support for \P, \p, and \X has not been compiled |
45 support for \P, \p, and \X has not been compiled |
673 |
49 too many named subpatterns (maximum 10,000) |
49 too many named subpatterns (maximum 10,000) |
674 |
50 repeated subpattern is too long |
50 repeated subpattern is too long |
675 |
51 octal value is greater than \377 (not in UTF-8 mode) |
51 octal value is greater than \377 (not in UTF-8 mode) |
676 |
|
52 internal error: overran compiling workspace |
677 |
|
53 internal error: previously-checked referenced subpattern not found |
678 |
|
54 DEFINE group contains more than one branch |
679 |
|
55 repeating a DEFINE group is not allowed |
680 |
|
56 inconsistent NEWLINE options" |
681 |
</PRE> |
</PRE> |
682 |
</P> |
</P> |
683 |
<br><a name="SEC9" href="#TOC1">STUDYING A PATTERN</a><br> |
<br><a name="SEC9" href="#TOC1">STUDYING A PATTERN</a><br> |
847 |
</P> |
</P> |
848 |
<P> |
<P> |
849 |
If there is a fixed first byte, for example, from a pattern such as |
If there is a fixed first byte, for example, from a pattern such as |
850 |
(cat|cow|coyote). Otherwise, if either |
(cat|cow|coyote), its value is returned. Otherwise, if either |
851 |
<br> |
<br> |
852 |
<br> |
<br> |
853 |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch |
(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch |
905 |
their parentheses numbers. For example, consider the following pattern (assume |
their parentheses numbers. For example, consider the following pattern (assume |
906 |
PCRE_EXTENDED is set, so white space - including newlines - is ignored): |
PCRE_EXTENDED is set, so white space - including newlines - is ignored): |
907 |
<pre> |
<pre> |
908 |
(?P<date> (?P<year>(\d\d)?\d\d) - (?P<month>\d\d) - (?P<day>\d\d) ) |
(?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) ) |
909 |
</pre> |
</pre> |
910 |
There are four named subpatterns, so the table has four entries, and each entry |
There are four named subpatterns, so the table has four entries, and each entry |
911 |
in the table is eight bytes long. The table is as follows, with non-printing |
in the table is eight bytes long. The table is as follows, with non-printing |
1153 |
PCRE_NEWLINE_CR |
PCRE_NEWLINE_CR |
1154 |
PCRE_NEWLINE_LF |
PCRE_NEWLINE_LF |
1155 |
PCRE_NEWLINE_CRLF |
PCRE_NEWLINE_CRLF |
1156 |
|
PCRE_NEWLINE_ANY |
1157 |
</pre> |
</pre> |
1158 |
These options override the newline definition that was chosen or defaulted when |
These options override the newline definition that was chosen or defaulted when |
1159 |
the pattern was compiled. For details, see the description <b>pcre_compile()</b> |
the pattern was compiled. For details, see the description of |
1160 |
above. During matching, the newline choice affects the behaviour of the dot, |
<b>pcre_compile()</b> above. During matching, the newline choice affects the |
1161 |
circumflex, and dollar metacharacters. |
behaviour of the dot, circumflex, and dollar metacharacters. It may also alter |
1162 |
|
the way the match position is advanced after a match failure for an unanchored |
1163 |
|
pattern. When PCRE_NEWLINE_CRLF or PCRE_NEWLINE_ANY is set, and a match attempt |
1164 |
|
fails when the current position is at a CRLF sequence, the match position is |
1165 |
|
advanced by two characters instead of one, in other words, to after the CRLF. |
1166 |
<pre> |
<pre> |
1167 |
PCRE_NOTBOL |
PCRE_NOTBOL |
1168 |
</pre> |
</pre> |
1376 |
other endianness. This is the error that PCRE gives when the magic number is |
other endianness. This is the error that PCRE gives when the magic number is |
1377 |
not present. |
not present. |
1378 |
<pre> |
<pre> |
1379 |
PCRE_ERROR_UNKNOWN_NODE (-5) |
PCRE_ERROR_UNKNOWN_OPCODE (-5) |
1380 |
</pre> |
</pre> |
1381 |
While running the pattern match, an unknown item was encountered in the |
While running the pattern match, an unknown item was encountered in the |
1382 |
compiled pattern. This error could be caused by a bug in PCRE or by overwriting |
compiled pattern. This error could be caused by a bug in PCRE or by overwriting |
1402 |
<b>pcre_extra</b> structure (or defaulted) was reached. See the description |
<b>pcre_extra</b> structure (or defaulted) was reached. See the description |
1403 |
above. |
above. |
1404 |
<pre> |
<pre> |
|
PCRE_ERROR_RECURSIONLIMIT (-21) |
|
|
</pre> |
|
|
The internal recursion limit, as specified by the <i>match_limit_recursion</i> |
|
|
field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the |
|
|
description above. |
|
|
<pre> |
|
1405 |
PCRE_ERROR_CALLOUT (-9) |
PCRE_ERROR_CALLOUT (-9) |
1406 |
</pre> |
</pre> |
1407 |
This error is never generated by <b>pcre_exec()</b> itself. It is provided for |
This error is never generated by <b>pcre_exec()</b> itself. It is provided for |
1439 |
PCRE_ERROR_BADCOUNT (-15) |
PCRE_ERROR_BADCOUNT (-15) |
1440 |
</pre> |
</pre> |
1441 |
This error is given if the value of the <i>ovecsize</i> argument is negative. |
This error is given if the value of the <i>ovecsize</i> argument is negative. |
1442 |
|
<pre> |
1443 |
|
PCRE_ERROR_RECURSIONLIMIT (-21) |
1444 |
|
</pre> |
1445 |
|
The internal recursion limit, as specified by the <i>match_limit_recursion</i> |
1446 |
|
field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the |
1447 |
|
description above. |
1448 |
|
<pre> |
1449 |
|
PCRE_ERROR_NULLWSLIMIT (-22) |
1450 |
|
</pre> |
1451 |
|
When a group that can match an empty substring is repeated with an unbounded |
1452 |
|
upper limit, the subject position at the start of the group must be remembered, |
1453 |
|
so that a test for an empty string can be made when the end of the group is |
1454 |
|
reached. Some workspace is required for this; if it runs out, this error is |
1455 |
|
given. |
1456 |
|
<pre> |
1457 |
|
PCRE_ERROR_BADNEWLINE (-23) |
1458 |
|
</pre> |
1459 |
|
An invalid combination of PCRE_NEWLINE_<i>xxx</i> options was given. |
1460 |
|
</P> |
1461 |
|
<P> |
1462 |
|
Error numbers -16 to -20 are not used by <b>pcre_exec()</b>. |
1463 |
</P> |
</P> |
1464 |
<br><a name="SEC15" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> |
<br><a name="SEC15" href="#TOC1">EXTRACTING CAPTURED SUBSTRINGS BY NUMBER</a><br> |
1465 |
<P> |
<P> |
1514 |
<i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is |
<i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block of memory is |
1515 |
obtained via <b>pcre_malloc</b>, and its address is returned via |
obtained via <b>pcre_malloc</b>, and its address is returned via |
1516 |
<i>stringptr</i>. The yield of the function is the length of the string, not |
<i>stringptr</i>. The yield of the function is the length of the string, not |
1517 |
including the terminating zero, or one of |
including the terminating zero, or one of these error codes: |
1518 |
<pre> |
<pre> |
1519 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
1520 |
</pre> |
</pre> |
1531 |
memory that is obtained via <b>pcre_malloc</b>. The address of the memory block |
memory that is obtained via <b>pcre_malloc</b>. The address of the memory block |
1532 |
is returned via <i>listptr</i>, which is also the start of the list of string |
is returned via <i>listptr</i>, which is also the start of the list of string |
1533 |
pointers. The end of the list is marked by a NULL pointer. The yield of the |
pointers. The end of the list is marked by a NULL pointer. The yield of the |
1534 |
function is zero if all went well, or |
function is zero if all went well, or the error code |
1535 |
<pre> |
<pre> |
1536 |
PCRE_ERROR_NOMEMORY (-6) |
PCRE_ERROR_NOMEMORY (-6) |
1537 |
</pre> |
</pre> |
1577 |
To extract a substring by name, you first have to find associated number. |
To extract a substring by name, you first have to find associated number. |
1578 |
For example, for this pattern |
For example, for this pattern |
1579 |
<pre> |
<pre> |
1580 |
(a+)b(?P<xxx>\d+)... |
(a+)b(?<xxx>\d+)... |
1581 |
</pre> |
</pre> |
1582 |
the number of the subpattern called "xxx" is 2. If the name is known to be |
the number of the subpattern called "xxx" is 2. If the name is known to be |
1583 |
unique (PCRE_DUPNAMES was not set), you can find the number from the name by |
unique (PCRE_DUPNAMES was not set), you can find the number from the name by |
1632 |
fourth are pointers to variables which are updated by the function. After it |
fourth are pointers to variables which are updated by the function. After it |
1633 |
has run, they point to the first and last entries in the name-to-number table |
has run, they point to the first and last entries in the name-to-number table |
1634 |
for the given name. The function itself returns the length of each entry, or |
for the given name. The function itself returns the length of each entry, or |
1635 |
PCRE_ERROR_NOSUBSTRING if there are none. The format of the table is described |
PCRE_ERROR_NOSUBSTRING (-7) if there are none. The format of the table is |
1636 |
above in the section entitled <i>Information about a pattern</i>. Given all the |
described above in the section entitled <i>Information about a pattern</i>. |
1637 |
relevant entries for the name, you can extract each of their numbers, and hence |
Given all the relevant entries for the name, you can extract each of their |
1638 |
the captured data, if any. |
numbers, and hence the captured data, if any. |
1639 |
</P> |
</P> |
1640 |
<br><a name="SEC18" href="#TOC1">FINDING ALL POSSIBLE MATCHES</a><br> |
<br><a name="SEC18" href="#TOC1">FINDING ALL POSSIBLE MATCHES</a><br> |
1641 |
<P> |
<P> |
1665 |
</P> |
</P> |
1666 |
<P> |
<P> |
1667 |
The function <b>pcre_dfa_exec()</b> is called to match a subject string against |
The function <b>pcre_dfa_exec()</b> is called to match a subject string against |
1668 |
a compiled pattern, using a "DFA" matching algorithm. This has different |
a compiled pattern, using a matching algorithm that scans the subject string |
1669 |
characteristics to the normal algorithm, and is not compatible with Perl. Some |
just once, and does not backtrack. This has different characteristics to the |
1670 |
of the features of PCRE patterns are not supported. Nevertheless, there are |
normal algorithm, and is not compatible with Perl. Some of the features of PCRE |
1671 |
times when this kind of matching can be useful. For a discussion of the two |
patterns are not supported. Nevertheless, there are times when this kind of |
1672 |
matching algorithms, see the |
matching can be useful. For a discussion of the two matching algorithms, see |
1673 |
|
the |
1674 |
<a href="pcrematching.html"><b>pcrematching</b></a> |
<a href="pcrematching.html"><b>pcrematching</b></a> |
1675 |
documentation. |
documentation. |
1676 |
</P> |
</P> |
1729 |
PCRE_DFA_SHORTEST |
PCRE_DFA_SHORTEST |
1730 |
</pre> |
</pre> |
1731 |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as |
Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as |
1732 |
soon as it has found one match. Because of the way the DFA algorithm works, |
soon as it has found one match. Because of the way the alternative algorithm |
1733 |
this is necessarily the shortest possible match at the first possible matching |
works, this is necessarily the shortest possible match at the first possible |
1734 |
point in the subject string. |
matching point in the subject string. |
1735 |
<pre> |
<pre> |
1736 |
PCRE_DFA_RESTART |
PCRE_DFA_RESTART |
1737 |
</pre> |
</pre> |
1769 |
On success, the yield of the function is a number greater than zero, which is |
On success, the yield of the function is a number greater than zero, which is |
1770 |
the number of matched substrings. The substrings themselves are returned in |
the number of matched substrings. The substrings themselves are returned in |
1771 |
<i>ovector</i>. Each string uses two elements; the first is the offset to the |
<i>ovector</i>. Each string uses two elements; the first is the offset to the |
1772 |
start, and the second is the offset to the end. All the strings have the same |
start, and the second is the offset to the end. In fact, all the strings have |
1773 |
start offset. (Space could have been saved by giving this only once, but it was |
the same start offset. (Space could have been saved by giving this only once, |
1774 |
decided to retain some compatibility with the way <b>pcre_exec()</b> returns |
but it was decided to retain some compatibility with the way <b>pcre_exec()</b> |
1775 |
data, even though the meaning of the strings is different.) |
returns data, even though the meaning of the strings is different.) |
1776 |
</P> |
</P> |
1777 |
<P> |
<P> |
1778 |
The strings are returned in reverse order of length; that is, the longest |
The strings are returned in reverse order of length; that is, the longest |
1798 |
<pre> |
<pre> |
1799 |
PCRE_ERROR_DFA_UCOND (-17) |
PCRE_ERROR_DFA_UCOND (-17) |
1800 |
</pre> |
</pre> |
1801 |
This return is given if <b>pcre_dfa_exec()</b> encounters a condition item in a |
This return is given if <b>pcre_dfa_exec()</b> encounters a condition item that |
1802 |
pattern that uses a back reference for the condition. This is not supported. |
uses a back reference for the condition, or a test for recursion in a specific |
1803 |
|
group. These are not supported. |
1804 |
<pre> |
<pre> |
1805 |
PCRE_ERROR_DFA_UMLIMIT (-18) |
PCRE_ERROR_DFA_UMLIMIT (-18) |
1806 |
</pre> |
</pre> |
1820 |
error is given if the output vector is not large enough. This should be |
error is given if the output vector is not large enough. This should be |
1821 |
extremely rare, as a vector of size 1000 is used. |
extremely rare, as a vector of size 1000 is used. |
1822 |
</P> |
</P> |
1823 |
|
<br><a name="SEC20" href="#TOC1">SEE ALSO</a><br> |
1824 |
|
<P> |
1825 |
|
<b>pcrebuild</b>(3), <b>pcrecallout</b>(3), <b>pcrecpp(3)</b>(3), |
1826 |
|
<b>pcrematching</b>(3), <b>pcrepartial</b>(3), <b>pcreposix</b>(3), |
1827 |
|
<b>pcreprecompile</b>(3), <b>pcresample</b>(3), <b>pcrestack</b>(3). |
1828 |
|
</P> |
1829 |
<P> |
<P> |
1830 |
Last updated: 08 June 2006 |
Last updated: 30 November 2006 |
1831 |
<br> |
<br> |
1832 |
Copyright © 1997-2006 University of Cambridge. |
Copyright © 1997-2006 University of Cambridge. |
1833 |
<p> |
<p> |