/[pcre]/code/trunk/doc/pcrepattern.3
ViewVC logotype

Diff of /code/trunk/doc/pcrepattern.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 859 by ph10, Mon Jan 9 17:43:54 2012 UTC revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC
# Line 25  there is now also support for UTF-8 stri Line 25  there is now also support for UTF-8 stri
25  second library that supports 16-bit and UTF-16 character strings. To use these  second library that supports 16-bit and UTF-16 character strings. To use these
26  features, PCRE must be built to include appropriate support. When using UTF  features, PCRE must be built to include appropriate support. When using UTF
27  strings you must either call the compiling function with the PCRE_UTF8 or  strings you must either call the compiling function with the PCRE_UTF8 or
28  PCRE_UTF16 option, or the pattern must start with one of these special  PCRE_UTF16 option, or the pattern must start with one of these special
29  sequences:  sequences:
30  .sp  .sp
31    (*UTF8)    (*UTF8)
32    (*UTF16)    (*UTF16)
33  .sp  .sp
34  Starting a pattern with such a sequence is equivalent to setting the relevant  Starting a pattern with such a sequence is equivalent to setting the relevant
35  option. This feature is not Perl-compatible. How setting a UTF mode affects  option. This feature is not Perl-compatible. How setting a UTF mode affects
# Line 263  between \ex{ and }, but the character co Line 263  between \ex{ and }, but the character co
263    8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint    8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint
264    16-bit non-UTF mode   less than 0x10000    16-bit non-UTF mode   less than 0x10000
265    16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint    16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
266  .sp  .sp
267  Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called  Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
268  "surrogate" codepoints).  "surrogate" codepoints).
269  .P  .P
270  If characters other than hexadecimal digits appear between \ex{ and }, or if  If characters other than hexadecimal digits appear between \ex{ and }, or if
# Line 307  parenthesized subpatterns. Line 307  parenthesized subpatterns.
307  Inside a character class, or if the decimal number is greater than 9 and there  Inside a character class, or if the decimal number is greater than 9 and there
308  have not been that many capturing subpatterns, PCRE re-reads up to three octal  have not been that many capturing subpatterns, PCRE re-reads up to three octal
309  digits following the backslash, and uses them to generate a data character. Any  digits following the backslash, and uses them to generate a data character. Any
310  subsequent digits stand for themselves. The value of the character is  subsequent digits stand for themselves. The value of the character is
311  constrained in the same way as characters specified in hexadecimal.  constrained in the same way as characters specified in hexadecimal.
312  For example:  For example:
313  .sp  .sp
# Line 499  The vertical space characters are: Line 499  The vertical space characters are:
499    U+2028     Line separator    U+2028     Line separator
500    U+2029     Paragraph separator    U+2029     Paragraph separator
501  .sp  .sp
502  In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are  In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are
503  relevant.  relevant.
504  .  .
505  .  .
506  .\" HTML <a name="newlineseq"></a>  .\" HTML <a name="newlineseq"></a>
# Line 974  end of the subject in both modes, and if Line 974  end of the subject in both modes, and if
974  .sp  .sp
975  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
976  the subject string except (by default) a character that signifies the end of a  the subject string except (by default) a character that signifies the end of a
977  line.  line.
978  .P  .P
979  When a line ending is defined as a single character, dot never matches that  When a line ending is defined as a single character, dot never matches that
980  character; when the two-character sequence CRLF is used, dot does not match CR  character; when the two-character sequence CRLF is used, dot does not match CR
# Line 1104  followed by two other characters. The oc Line 1104  followed by two other characters. The oc
1104  "]" can also be used to end a range.  "]" can also be used to end a range.
1105  .P  .P
1106  Ranges operate in the collating sequence of character values. They can also be  Ranges operate in the collating sequence of character values. They can also be
1107  used for characters specified numerically, for example [\e000-\e037]. Ranges  used for characters specified numerically, for example [\e000-\e037]. Ranges
1108  can include any characters that are valid for the current mode.  can include any characters that are valid for the current mode.
1109  .P  .P
1110  If a range that includes letters is used when caseless matching is set, it  If a range that includes letters is used when caseless matching is set, it
# Line 1305  match "cataract", "erpillar" or an empty Line 1305  match "cataract", "erpillar" or an empty
1305  .sp  .sp
1306  2. It sets up the subpattern as a capturing subpattern. This means that, when  2. It sets up the subpattern as a capturing subpattern. This means that, when
1307  the whole pattern matches, that portion of the subject string that matched the  the whole pattern matches, that portion of the subject string that matched the
1308  subpattern is passed back to the caller via the \fIovector\fP argument of the  subpattern is passed back to the caller via the \fIovector\fP argument of the
1309  matching function. (This applies only to the traditional matching functions;  matching function. (This applies only to the traditional matching functions;
1310  the DFA matching functions do not support capturing.)  the DFA matching functions do not support capturing.)
1311  .P  .P
1312  Opening parentheses are counted from left to right (starting from 1) to obtain  Opening parentheses are counted from left to right (starting from 1) to obtain
# Line 2538  same pair of parentheses when there is a Line 2538  same pair of parentheses when there is a
2538  .P  .P
2539  PCRE provides a similar feature, but of course it cannot obey arbitrary Perl  PCRE provides a similar feature, but of course it cannot obey arbitrary Perl
2540  code. The feature is called "callout". The caller of PCRE provides an external  code. The feature is called "callout". The caller of PCRE provides an external
2541  function by putting its entry point in the global variable \fIpcre_callout\fP  function by putting its entry point in the global variable \fIpcre_callout\fP
2542  (8-bit library) or \fIpcre16_callout\fP (16-bit library). By default, this  (8-bit library) or \fIpcre16_callout\fP (16-bit library). By default, this
2543  variable contains NULL, which disables all calling out.  variable contains NULL, which disables all calling out.
2544  .P  .P

Legend:
Removed from v.859  
changed lines
  Added in v.903

  ViewVC Help
Powered by ViewVC 1.1.5