/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 69 by nigel, Sat Feb 24 21:40:18 2007 UTC revision 71 by nigel, Sat Feb 24 21:40:24 2007 UTC
# Line 118  UTF-8 SUPPORT Line 118  UTF-8 SUPPORT
118       The following comments apply when PCRE is running  in  UTF-8       The following comments apply when PCRE is running  in  UTF-8
119       mode:       mode:
120    
121       1. PCRE assumes that the strings it is given  contain  valid       1. When you set the PCRE_UTF8 flag, the  strings  passed  as
122       UTF-8  codes. It does not diagnose invalid UTF-8 strings. If       patterns  and  subjects are checked for validity on entry to
123       you pass invalid UTF-8 strings  to  PCRE,  the  results  are       the relevant  functions.  If  an  invalid  UTF-8  string  is
124       undefined.       passed,  an  error  return is given. In some situations, you
125         may already know that your strings are valid, and  therefore
126         want  to  skip these checks in order to improve performance.
127         If you set the PCRE_NO_UTF8_CHECK flag at compile time or at
128         run  time,  PCRE  assumes  that the pattern or subject it is
129         given (respectively) contains only  valid  UTF-8  codes.  In
130         this  case, it does not diagnose an invalid UTF-8 string. If
131         you  pass   an   invalid   UTF-8   string   to   PCRE   when
132         PCRE_NO_UTF8_CHECK  is  set, the results are undefined. Your
133         program may crash.
134    
135       2. In a pattern, the escape sequence \x{...}, where the con-       2. In a pattern, the escape sequence \x{...}, where the con-
136       tents  of  the  braces is a string of hexadecimal digits, is       tents  of  the  braces is a string of hexadecimal digits, is
# Line 164  AUTHOR Line 173  AUTHOR
173       Cambridge CB2 3QG, England.       Cambridge CB2 3QG, England.
174       Phone: +44 1223 334714       Phone: +44 1223 334714
175    
176  Last updated: 04 February 2003  Last updated: 20 August 2003
177  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2003 University of Cambridge.
178  -----------------------------------------------------------------------------  -----------------------------------------------------------------------------
179    
# Line 654  COMPILING A PATTERN Line 663  COMPILING A PATTERN
663       option  changes  the behaviour of PCRE are given in the sec-       option  changes  the behaviour of PCRE are given in the sec-
664       tion on UTF-8 support in the main pcre page.       tion on UTF-8 support in the main pcre page.
665    
666           PCRE_NO_UTF8_CHECK
667    
668         When PCRE_UTF8 is set, the validity  of  the  pattern  as  a
669         UTF-8  string  is automatically checked. If an invalid UTF-8
670         sequence of bytes is found, pcre_compile() returns an error.
671         If you already know that your pattern is valid, and you want
672         to skip this check for performance reasons, you can set  the
673         PCRE_NO_UTF8_CHECK  option.  When  it  is set, the effect of
674         passing an invalid UTF-8 string as a pattern  is  undefined.
675         It  may  cause  your program to crash.  Note that there is a
676         similar option  for  suppressing  the  checking  of  subject
677         strings passed to pcre_exec().
678    
679    
680    
681  STUDYING A PATTERN  STUDYING A PATTERN
682    
# Line 747  INFORMATION ABOUT A PATTERN Line 770  INFORMATION ABOUT A PATTERN
770       compiled pattern. It replaces the obsolete pcre_info() func-       compiled pattern. It replaces the obsolete pcre_info() func-
771       tion, which is nevertheless retained for backwards compabil-       tion, which is nevertheless retained for backwards compabil-
772       ity (and is documented below).       ity (and is documented below).
   
773       The first argument for pcre_fullinfo() is a pointer  to  the       The first argument for pcre_fullinfo() is a pointer  to  the
774       compiled  pattern.  The  second  argument  is  the result of       compiled  pattern.  The  second  argument  is  the result of
775       pcre_study(), or NULL if the pattern was  not  studied.  The       pcre_study(), or NULL if the pattern was  not  studied.  The
# Line 1014  MATCHING A PATTERN Line 1036  MATCHING A PATTERN
1036       turned out to be anchored by virtue of its contents, it can-       turned out to be anchored by virtue of its contents, it can-
1037       not be made unachored at matching time.       not be made unachored at matching time.
1038    
1039         When PCRE_UTF8 was set at compile time, the validity of  the
1040         subject  as  a  UTF-8 string is automatically checked. If an
1041         invalid  UTF-8  sequence  of  bytes  is  found,  pcre_exec()
1042         returns  the  error  PCRE_ERROR_BADUTF8. If you already know
1043         that your subject is valid, and you want to skip this  check
1044         for  performance reasons, you can set the PCRE_NO_UTF8_CHECK
1045         option when calling pcre_exec(). When this  option  is  set,
1046         the  effect  of passing an invalid UTF-8 string as a subject
1047         is undefined. It may cause your program to crash.
1048    
1049       There are also three further options that can be set only at       There are also three further options that can be set only at
1050       matching time:       matching time:
1051    
# Line 1103  MATCHING A PATTERN Line 1135  MATCHING A PATTERN
1135       used for a fragment of a pattern that picks out a substring.       used for a fragment of a pattern that picks out a substring.
1136       PCRE supports several other kinds of  parenthesized  subpat-       PCRE supports several other kinds of  parenthesized  subpat-
1137       tern that do not cause substrings to be captured.       tern that do not cause substrings to be captured.
   
1138       Captured substrings are returned to the caller via a  vector       Captured substrings are returned to the caller via a  vector
1139       of  integer  offsets whose address is passed in ovector. The       of  integer  offsets whose address is passed in ovector. The
1140       number of elements in the vector is passed in ovecsize.  The       number of elements in the vector is passed in ovecsize.  The
# Line 1219  MATCHING A PATTERN Line 1250  MATCHING A PATTERN
1250       distinctive error code. See  the  pcrecallout  documentation       distinctive error code. See  the  pcrecallout  documentation
1251       for details.       for details.
1252    
1253           PCRE_ERROR_BADUTF8       (-10)
1254    
1255         A string that contains an invalid UTF-8  byte  sequence  was
1256         passed as a subject.
1257    
1258    
1259  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
1260    
# Line 1255  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER Line 1291  EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
1291       returned zero, indicating that it ran out of space in  ovec-       returned zero, indicating that it ran out of space in  ovec-
1292       tor,  the  value passed as stringcount should be the size of       tor,  the  value passed as stringcount should be the size of
1293       the vector divided by three.       the vector divided by three.
   
1294       The functions pcre_copy_substring() and pcre_get_substring()       The functions pcre_copy_substring() and pcre_get_substring()
1295       extract a single substring, whose number is given as string-       extract a single substring, whose number is given as string-
1296       number. A value of zero extracts the substring that  matched       number. A value of zero extracts the substring that  matched
# Line 1352  EXTRACTING CAPTURED SUBSTRINGS BY NAME Line 1387  EXTRACTING CAPTURED SUBSTRINGS BY NAME
1387       succeeds,    they   then   call   pcre_copy_substring()   or       succeeds,    they   then   call   pcre_copy_substring()   or
1388       pcre_get_substring(), as appropriate.       pcre_get_substring(), as appropriate.
1389    
1390  Last updated: 03 February 2003  Last updated: 20 August 2003
1391  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2003 University of Cambridge.
1392  -----------------------------------------------------------------------------  -----------------------------------------------------------------------------
1393    
# Line 1420  PCRE CALLOUTS Line 1455  PCRE CALLOUTS
1455       The current_position field contains the  offset  within  the       The current_position field contains the  offset  within  the
1456       subject of the current match pointer.       subject of the current match pointer.
1457    
1458       The capture_top field contains the  number  of  the  highest       The capture_top field contains one more than the  number  of
1459       captured substring so far.       the  highest  numbered captured substring so far. If no sub-
1460         strings have been captured, the value of capture_top is one.
1461    
1462       The capture_last field  contains  the  number  of  the  most       The capture_last field  contains  the  number  of  the  most
1463       recently captured substring.       recently captured substring.

Legend:
Removed from v.69  
changed lines
  Added in v.71

  ViewVC Help
Powered by ViewVC 1.1.5