/[pcre]/code/trunk/doc/html/pcreapi.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcreapi.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 974 by ph10, Sat Apr 14 16:16:58 2012 UTC revision 975 by ph10, Sat Jun 2 11:03:06 2012 UTC
# Line 317  PCRE supports five different conventions Line 317  PCRE supports five different conventions
317  strings: a single CR (carriage return) character, a single LF (linefeed)  strings: a single CR (carriage return) character, a single LF (linefeed)
318  character, the two-character sequence CRLF, any of the three preceding, or any  character, the two-character sequence CRLF, any of the three preceding, or any
319  Unicode newline sequence. The Unicode newline sequences are the three just  Unicode newline sequence. The Unicode newline sequences are the three just
320  mentioned, plus the single characters VT (vertical tab, U+000B), FF (formfeed,  mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed,
321  U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS  U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
322  (paragraph separator, U+2029).  (paragraph separator, U+2029).
323  </P>  </P>
# Line 641  documentation. Line 641  documentation.
641  <pre>  <pre>
642    PCRE_EXTENDED    PCRE_EXTENDED
643  </pre>  </pre>
644  If this bit is set, whitespace data characters in the pattern are totally  If this bit is set, white space data characters in the pattern are totally
645  ignored except when escaped or inside a character class. Whitespace does not  ignored except when escaped or inside a character class. White space does not
646  include the VT character (code 11). In addition, characters between an  include the VT character (code 11). In addition, characters between an
647  unescaped # outside a character class and the next newline, inclusive, are also  unescaped # outside a character class and the next newline, inclusive, are also
648  ignored. This is equivalent to Perl's /x option, and it can be changed within a  ignored. This is equivalent to Perl's /x option, and it can be changed within a
# Line 659  happen to represent a newline do not cou Line 659  happen to represent a newline do not cou
659  </P>  </P>
660  <P>  <P>
661  This option makes it possible to include comments inside complicated patterns.  This option makes it possible to include comments inside complicated patterns.
662  Note, however, that this applies only to data characters. Whitespace characters  Note, however, that this applies only to data characters. White space characters
663  may never appear within special character sequences in a pattern, for example  may never appear within special character sequences in a pattern, for example
664  within the sequence (?( that introduces a conditional subpattern.  within the sequence (?( that introduces a conditional subpattern.
665  <pre>  <pre>
# Line 745  CRLF sequence. Setting PCRE_NEWLINE_ANYC Line 745  CRLF sequence. Setting PCRE_NEWLINE_ANYC
745  preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies  preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
746  that any Unicode newline sequence should be recognized. The Unicode newline  that any Unicode newline sequence should be recognized. The Unicode newline
747  sequences are the three just mentioned, plus the single characters VT (vertical  sequences are the three just mentioned, plus the single characters VT (vertical
748  tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line  tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line
749  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit
750  library, the last two are recognized only in UTF-8 mode.  library, the last two are recognized only in UTF-8 mode.
751  </P>  </P>
# Line 759  other combinations may yield unused numb Line 759  other combinations may yield unused numb
759  </P>  </P>
760  <P>  <P>
761  The only time that a line break in a pattern is specially recognized when  The only time that a line break in a pattern is specially recognized when
762  compiling is when PCRE_EXTENDED is set. CR and LF are whitespace characters,  compiling is when PCRE_EXTENDED is set. CR and LF are white space characters,
763  and so are ignored in this mode. Also, an unescaped # outside a character class  and so are ignored in this mode. Also, an unescaped # outside a character class
764  indicates a comment that lasts until after the next line break sequence. In  indicates a comment that lasts until after the next line break sequence. In
765  other circumstances, line break sequences in patterns are treated as literal  other circumstances, line break sequences in patterns are treated as literal
# Line 916  fallen out of use. To avoid confusion, t Line 916  fallen out of use. To avoid confusion, t
916    72  too many forward references    72  too many forward references
917    73  disallowed Unicode code point (&#62;= 0xd800 && &#60;= 0xdfff)    73  disallowed Unicode code point (&#62;= 0xd800 && &#60;= 0xdfff)
918    74  invalid UTF-16 string (specifically UTF-16)    74  invalid UTF-16 string (specifically UTF-16)
919      75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
920  </pre>  </pre>
921  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may  The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may
922  be used if the limits were changed when PCRE was built.  be used if the limits were changed when PCRE was built.
# Line 950  wants to pass any of the other fields to Line 951  wants to pass any of the other fields to
951  </P>  </P>
952  <P>  <P>
953  The second argument of <b>pcre_study()</b> contains option bits. There are three  The second argument of <b>pcre_study()</b> contains option bits. There are three
954  options:  options:
955  <pre>  <pre>
956    PCRE_STUDY_JIT_COMPILE    PCRE_STUDY_JIT_COMPILE
957    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
# Line 1231  is -1. Line 1232  is -1.
1232  </pre>  </pre>
1233  Return the number of characters (NB not bytes) in the longest lookbehind  Return the number of characters (NB not bytes) in the longest lookbehind
1234  assertion in the pattern. Note that the simple assertions \b and \B require a  assertion in the pattern. Note that the simple assertions \b and \B require a
1235  one-character lookbehind. This information is useful when doing multi-segment  one-character lookbehind. This information is useful when doing multi-segment
1236  matching using the partial matching facilities.  matching using the partial matching facilities.
1237  <pre>  <pre>
1238    PCRE_INFO_MINLENGTH    PCRE_INFO_MINLENGTH
# Line 1506  This limit is of use only if it is set s Line 1507  This limit is of use only if it is set s
1507  Limiting the recursion depth limits the amount of machine stack that can be  Limiting the recursion depth limits the amount of machine stack that can be
1508  used, or, when PCRE has been compiled to use memory on the heap instead of the  used, or, when PCRE has been compiled to use memory on the heap instead of the
1509  stack, the amount of heap memory that can be used. This limit is not relevant,  stack, the amount of heap memory that can be used. This limit is not relevant,
1510  and is ignored, when matching is done using JIT compiled code.  and is ignored, when matching is done using JIT compiled code.
1511  </P>  </P>
1512  <P>  <P>
1513  The default value for <i>match_limit_recursion</i> can be set when PCRE is  The default value for <i>match_limit_recursion</i> can be set when PCRE is
# Line 1689  causing performance to suffer, but ensur Line 1690  causing performance to suffer, but ensur
1690  "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)  "no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK)
1691  are considered at every possible starting position in the subject string. If  are considered at every possible starting position in the subject string. If
1692  PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching  PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching
1693  time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,  time. The use of PCRE_NO_START_OPTIMIZE disables JIT execution; when it is set,
1694  matching is always done using interpretively.  matching is always done using interpretively.
1695  </P>  </P>
1696  <P>  <P>
# Line 2084  just-in-time processing stack is not lar Line 2085  just-in-time processing stack is not lar
2085  <a href="pcrejit.html"><b>pcrejit</b></a>  <a href="pcrejit.html"><b>pcrejit</b></a>
2086  documentation for more details.  documentation for more details.
2087  <pre>  <pre>
2088    PCRE_ERROR_BADMODE (-28)    PCRE_ERROR_BADMODE        (-28)
2089  </pre>  </pre>
2090  This error is given if a pattern that was compiled by the 8-bit library is  This error is given if a pattern that was compiled by the 8-bit library is
2091  passed to a 16-bit library function, or vice versa.  passed to a 16-bit library function, or vice versa.
2092  <pre>  <pre>
2093    PCRE_ERROR_BADENDIANNESS (-29)    PCRE_ERROR_BADENDIANNESS  (-29)
2094  </pre>  </pre>
2095  This error is given if a pattern that was compiled and saved is reloaded on a  This error is given if a pattern that was compiled and saved is reloaded on a
2096  host with different endianness. The utility function  host with different endianness. The utility function
# Line 2097  host with different endianness. The util Line 2098  host with different endianness. The util
2098  so that it runs on the new host.  so that it runs on the new host.
2099  </P>  </P>
2100  <P>  <P>
2101  Error numbers -16 to -20 and -22 are not used by <b>pcre_exec()</b>.  Error numbers -16 to -20, -22, and -30 are not used by <b>pcre_exec()</b>.
2102  <a name="badutf8reasons"></a></P>  <a name="badutf8reasons"></a></P>
2103  <br><b>  <br><b>
2104  Reason codes for invalid UTF-8 strings  Reason codes for invalid UTF-8 strings
# Line 2592  When a recursive subpattern is processed Line 2593  When a recursive subpattern is processed
2593  recursively, using private vectors for <i>ovector</i> and <i>workspace</i>. This  recursively, using private vectors for <i>ovector</i> and <i>workspace</i>. This
2594  error is given if the output vector is not large enough. This should be  error is given if the output vector is not large enough. This should be
2595  extremely rare, as a vector of size 1000 is used.  extremely rare, as a vector of size 1000 is used.
2596    <pre>
2597      PCRE_ERROR_DFA_BADRESTART (-30)
2598    </pre>
2599    When <b>pcre_dfa_exec()</b> is called with the <b>PCRE_DFA_RESTART</b> option,
2600    some plausibility checks are made on the contents of the workspace, which
2601    should contain data about the previous partial match. If any of these checks
2602    fail, this error is given.
2603  </P>  </P>
2604  <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
2605  <P>  <P>
# Line 2610  Cambridge CB2 3QH, England. Line 2618  Cambridge CB2 3QH, England.
2618  </P>  </P>
2619  <br><a name="SEC26" href="#TOC1">REVISION</a><br>  <br><a name="SEC26" href="#TOC1">REVISION</a><br>
2620  <P>  <P>
2621  Last updated: 14 April 2012  Last updated: 04 May 2012
2622  <br>  <br>
2623  Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
2624  <br>  <br>

Legend:
Removed from v.974  
changed lines
  Added in v.975

  ViewVC Help
Powered by ViewVC 1.1.5