/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 902 by ph10, Sat Jan 14 11:16:23 2012 UTC revision 903 by ph10, Sat Jan 21 16:37:17 2012 UTC
# Line 65  there is now also support for UTF-8 stri Line 65  there is now also support for UTF-8 stri
65  second library that supports 16-bit and UTF-16 character strings. To use these  second library that supports 16-bit and UTF-16 character strings. To use these
66  features, PCRE must be built to include appropriate support. When using UTF  features, PCRE must be built to include appropriate support. When using UTF
67  strings you must either call the compiling function with the PCRE_UTF8 or  strings you must either call the compiling function with the PCRE_UTF8 or
68  PCRE_UTF16 option, or the pattern must start with one of these special  PCRE_UTF16 option, or the pattern must start with one of these special
69  sequences:  sequences:
70  <pre>  <pre>
71    (*UTF8)    (*UTF8)
72    (*UTF16)    (*UTF16)
73  </pre>  </pre>
74  Starting a pattern with such a sequence is equivalent to setting the relevant  Starting a pattern with such a sequence is equivalent to setting the relevant
75  option. This feature is not Perl-compatible. How setting a UTF mode affects  option. This feature is not Perl-compatible. How setting a UTF mode affects
# Line 292  between \x{ and }, but the character cod Line 292  between \x{ and }, but the character cod
292    16-bit non-UTF mode   less than 0x10000    16-bit non-UTF mode   less than 0x10000
293    16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint    16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
294  </pre>  </pre>
295  Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called  Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
296  "surrogate" codepoints).  "surrogate" codepoints).
297  </P>  </P>
298  <P>  <P>
# Line 335  following the discussion of Line 335  following the discussion of
335  Inside a character class, or if the decimal number is greater than 9 and there  Inside a character class, or if the decimal number is greater than 9 and there
336  have not been that many capturing subpatterns, PCRE re-reads up to three octal  have not been that many capturing subpatterns, PCRE re-reads up to three octal
337  digits following the backslash, and uses them to generate a data character. Any  digits following the backslash, and uses them to generate a data character. Any
338  subsequent digits stand for themselves. The value of the character is  subsequent digits stand for themselves. The value of the character is
339  constrained in the same way as characters specified in hexadecimal.  constrained in the same way as characters specified in hexadecimal.
340  For example:  For example:
341  <pre>  <pre>
# Line 503  The vertical space characters are: Line 503  The vertical space characters are:
503    U+2028     Line separator    U+2028     Line separator
504    U+2029     Paragraph separator    U+2029     Paragraph separator
505  </pre>  </pre>
506  In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are  In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are
507  relevant.  relevant.
508  <a name="newlineseq"></a></P>  <a name="newlineseq"></a></P>
509  <br><b>  <br><b>
510  Newline sequences  Newline sequences
# Line 970  end of the subject in both modes, and if Line 970  end of the subject in both modes, and if
970  <P>  <P>
971  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
972  the subject string except (by default) a character that signifies the end of a  the subject string except (by default) a character that signifies the end of a
973  line.  line.
974  </P>  </P>
975  <P>  <P>
976  When a line ending is defined as a single character, dot never matches that  When a line ending is defined as a single character, dot never matches that
# Line 1103  followed by two other characters. The oc Line 1103  followed by two other characters. The oc
1103  </P>  </P>
1104  <P>  <P>
1105  Ranges operate in the collating sequence of character values. They can also be  Ranges operate in the collating sequence of character values. They can also be
1106  used for characters specified numerically, for example [\000-\037]. Ranges  used for characters specified numerically, for example [\000-\037]. Ranges
1107  can include any characters that are valid for the current mode.  can include any characters that are valid for the current mode.
1108  </P>  </P>
1109  <P>  <P>
# Line 1298  match "cataract", "erpillar" or an empty Line 1298  match "cataract", "erpillar" or an empty
1298  <br>  <br>
1299  2. It sets up the subpattern as a capturing subpattern. This means that, when  2. It sets up the subpattern as a capturing subpattern. This means that, when
1300  the whole pattern matches, that portion of the subject string that matched the  the whole pattern matches, that portion of the subject string that matched the
1301  subpattern is passed back to the caller via the <i>ovector</i> argument of the  subpattern is passed back to the caller via the <i>ovector</i> argument of the
1302  matching function. (This applies only to the traditional matching functions;  matching function. (This applies only to the traditional matching functions;
1303  the DFA matching functions do not support capturing.)  the DFA matching functions do not support capturing.)
1304  </P>  </P>
1305  <P>  <P>
# Line 2505  same pair of parentheses when there is a Line 2505  same pair of parentheses when there is a
2505  <P>  <P>
2506  PCRE provides a similar feature, but of course it cannot obey arbitrary Perl  PCRE provides a similar feature, but of course it cannot obey arbitrary Perl
2507  code. The feature is called "callout". The caller of PCRE provides an external  code. The feature is called "callout". The caller of PCRE provides an external
2508  function by putting its entry point in the global variable <i>pcre_callout</i>  function by putting its entry point in the global variable <i>pcre_callout</i>
2509  (8-bit library) or <i>pcre16_callout</i> (16-bit library). By default, this  (8-bit library) or <i>pcre16_callout</i> (16-bit library). By default, this
2510  variable contains NULL, which disables all calling out.  variable contains NULL, which disables all calling out.
2511  </P>  </P>

Legend:
Removed from v.902  
changed lines
  Added in v.903

  ViewVC Help
Powered by ViewVC 1.1.5