/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 141 by ph10, Tue Mar 20 11:46:50 2007 UTC revision 142 by ph10, Fri Mar 30 15:55:18 2007 UTC
# Line 1323  STUDYING A PATTERN Line 1323  STUDYING A PATTERN
1323  LOCALE SUPPORT  LOCALE SUPPORT
1324    
1325         PCRE handles caseless matching, and determines whether  characters  are         PCRE handles caseless matching, and determines whether  characters  are
1326         letters  digits,  or whatever, by reference to a set of tables, indexed         letters,  digits, or whatever, by reference to a set of tables, indexed
1327         by character value. When running in UTF-8 mode, this  applies  only  to         by character value. When running in UTF-8 mode, this  applies  only  to
1328         characters  with  codes  less than 128. Higher-valued codes never match         characters  with  codes  less than 128. Higher-valued codes never match
1329         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built         escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
1330         with  Unicode  character property support. The use of locales with Uni-         with  Unicode  character property support. The use of locales with Uni-
1331         code is discouraged.         code is discouraged. If you are handling characters with codes  greater
1332           than  128, you should either use UTF-8 and Unicode, or use locales, but
1333         An internal set of tables is created in the default C locale when  PCRE         not try to mix the two.
1334         is  built.  This  is  used when the final argument of pcre_compile() is  
1335         NULL, and is sufficient for many applications. An  alternative  set  of         PCRE contains an internal set of tables that are used  when  the  final
1336         tables  can,  however, be supplied. These may be created in a different         argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
1337         locale from the default. As more and more applications change to  using         applications.  Normally, the internal tables recognize only ASCII char-
1338         Unicode, the need for this locale support is expected to die away.         acters. However, when PCRE is built, it is possible to cause the inter-
1339           nal tables to be rebuilt in the default "C" locale of the local system,
1340         External  tables  are  built by calling the pcre_maketables() function,         which may cause them to be different.
1341         which has no arguments, in the relevant locale. The result can then  be  
1342         passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For         The  internal tables can always be overridden by tables supplied by the
1343         example, to build and use tables that are appropriate  for  the  French         application that calls PCRE. These may be created in a different locale
1344         locale  (where  accented  characters  with  values greater than 128 are         from  the  default.  As more and more applications change to using Uni-
1345           code, the need for this locale support is expected to die away.
1346    
1347           External tables are built by calling  the  pcre_maketables()  function,
1348           which  has no arguments, in the relevant locale. The result can then be
1349           passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
1350           example,  to  build  and use tables that are appropriate for the French
1351           locale (where accented characters with  values  greater  than  128  are
1352         treated as letters), the following code could be used:         treated as letters), the following code could be used:
1353    
1354           setlocale(LC_CTYPE, "fr_FR");           setlocale(LC_CTYPE, "fr_FR");
1355           tables = pcre_maketables();           tables = pcre_maketables();
1356           re = pcre_compile(..., tables);           re = pcre_compile(..., tables);
1357    
1358           The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
1359           if you are using Windows, the name for the French locale is "french".
1360    
1361         When pcre_maketables() runs, the tables are built  in  memory  that  is         When pcre_maketables() runs, the tables are built  in  memory  that  is
1362         obtained  via  pcre_malloc. It is the caller's responsibility to ensure         obtained  via  pcre_malloc. It is the caller's responsibility to ensure
1363         that the memory containing the tables remains available for as long  as         that the memory containing the tables remains available for as long  as
# Line 2918  BACKSLASH Line 2928  BACKSLASH
2928         is a letter or digit. The definition of  letters  and  digits  is  con-         is a letter or digit. The definition of  letters  and  digits  is  con-
2929         trolled  by PCRE's low-valued character tables, and may vary if locale-         trolled  by PCRE's low-valued character tables, and may vary if locale-
2930         specific matching is taking place (see "Locale support" in the  pcreapi         specific matching is taking place (see "Locale support" in the  pcreapi
2931         page).  For  example,  in  the  "fr_FR" (French) locale, some character         page).  For  example,  in  a French locale such as "fr_FR" in Unix-like
2932         codes greater than 128 are used for accented  letters,  and  these  are         systems, or "french" in Windows, some character codes greater than  128
2933         matched by \w.         are used for accented letters, and these are matched by \w.
2934    
2935         In  UTF-8 mode, characters with values greater than 128 never match \d,         In  UTF-8 mode, characters with values greater than 128 never match \d,
2936         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-         \s, or \w, and always match \D, \S, and \W. This is true even when Uni-
# Line 3289  SQUARE BRACKETS AND CHARACTER CLASSES Line 3299  SQUARE BRACKETS AND CHARACTER CLASSES
3299         If a range that includes letters is used when caseless matching is set,         If a range that includes letters is used when caseless matching is set,
3300         it matches the letters in either case. For example, [W-c] is equivalent         it matches the letters in either case. For example, [W-c] is equivalent
3301         to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if         to  [][\\^_`wxyzabc],  matched  caselessly,  and  in non-UTF-8 mode, if
3302         character tables for the "fr_FR" locale are in use, [\xc8-\xcb] matches         character tables for a French locale are in  use,  [\xc8-\xcb]  matches
3303         accented  E  characters in both cases. In UTF-8 mode, PCRE supports the         accented  E  characters in both cases. In UTF-8 mode, PCRE supports the
3304         concept of case for characters with values greater than 128  only  when         concept of case for characters with values greater than 128  only  when
3305         it is compiled with Unicode property support.         it is compiled with Unicode property support.

Legend:
Removed from v.141  
changed lines
  Added in v.142

  ViewVC Help
Powered by ViewVC 1.1.5