/[pcre]/code/trunk/README
ViewVC logotype

Diff of /code/trunk/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 23 by nigel, Sat Feb 24 21:38:41 2007 UTC revision 31 by nigel, Sat Feb 24 21:38:57 2007 UTC
# Line 8  README file for PCRE (Perl-compatible re Line 8  README file for PCRE (Perl-compatible re
8  * ovector is required at matching time, to provide some additional workspace. *  * ovector is required at matching time, to provide some additional workspace. *
9  * The new man page has details. This change was necessary in order to support *  * The new man page has details. This change was necessary in order to support *
10  * some of the new functionality in Perl 5.005.                                *  * some of the new functionality in Perl 5.005.                                *
11    *                                                                             *
12    *           IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00                   *
13    *                                                                             *
14    * Another (I hope this is the last!) change has been made to the API for the  *
15    * pcre_compile() function. An additional argument has been added to make it   *
16    * possible to pass over a pointer to character tables built in the current    *
17    * locale by pcre_maketables(). To use the default tables, this new arguement  *
18    * should be passed as NULL.                                                   *
19  *******************************************************************************  *******************************************************************************
20    
21  The distribution should contain the following files:  The distribution should contain the following files:
22    
23    ChangeLog         log of changes to the code    ChangeLog         log of changes to the code
24      LICENCE           conditions for the use of PCRE
25    Makefile          for building PCRE    Makefile          for building PCRE
26    README            this file    README            this file
27    RunTest           a shell script for running tests    RunTest           a shell script for running tests
28    Tech.Notes        notes on the encoding    Tech.Notes        notes on the encoding
29    pcre.3            man page for the functions    pcre.3            man page for the functions
30    pcreposix.3       man page for the POSIX wrapper API    pcreposix.3       man page for the POSIX wrapper API
31    maketables.c      auxiliary program for building chartables.c    dftables.c        auxiliary program for building chartables.c
32      get.c             )
33      maketables.c      )
34    study.c           ) source of    study.c           ) source of
35    pcre.c            )   the functions    pcre.c            )   the functions
36    pcreposix.c       )    pcreposix.c       )
# Line 33  The distribution should contain the foll Line 44  The distribution should contain the foll
44    testinput         test data, compatible with Perl 5.004 and 5.005    testinput         test data, compatible with Perl 5.004 and 5.005
45    testinput2        test data for error messages and non-Perl things    testinput2        test data for error messages and non-Perl things
46    testinput3        test data, compatible with Perl 5.005    testinput3        test data, compatible with Perl 5.005
47      testinput4        test data for locale-specific tests
48    testoutput        test results corresponding to testinput    testoutput        test results corresponding to testinput
49    testoutput2       test results corresponding to testinput2    testoutput2       test results corresponding to testinput2
50    testoutput3       test results corresponding to testinpug3    testoutput3       test results corresponding to testinput3
51      testoutput4       test results corresponding to testinput4
52    
53  To build PCRE, edit Makefile for your system (it is a fairly simple make file,  To build PCRE, edit Makefile for your system (it is a fairly simple make file,
54  and there are some comments at the top) and then run it. It builds two  and there are some comments at the top) and then run it. It builds two
# Line 58  additional features of release 5.005, wh Line 71  additional features of release 5.005, wh
71  main test input, which needs only Perl 5.004. In the long run, when 5.005 is  main test input, which needs only Perl 5.004. In the long run, when 5.005 is
72  widespread, these two test files may get amalgamated.  widespread, these two test files may get amalgamated.
73    
74  The second set of tests check pcre_info(), pcre_study(), error detection and  The second set of tests check pcre_info(), pcre_study(), pcre_copy_substring(),
75  run-time flags that are specific to PCRE, as well as the POSIX wrapper API.  pcre_get_substring(), pcre_get_substring_list(), error detection and run-time
76    flags that are specific to PCRE, as well as the POSIX wrapper API.
77    
78    The fourth set of tests checks pcre_maketables(), the facility for building a
79    set of character tables for a specific locale and using them instead of the
80    default tables. The tests make use of the "fr" (French) locale. Before running
81    the test, the script checks for the presence of this locale by running the
82    "locale" command. If that command fails, or if it doesn't include "fr" in the
83    list of available locales, the fourth test cannot be run, and a comment is
84    output to say why. If running this test produces instances of the error
85    
86      ** Failed to set locale "fr"
87    
88    in the comparison output, it means that locale is not available on your system,
89    despite being listed by "locale". This does not mean that PCRE is broken.
90    
91  To install PCRE, copy libpcre.a to any suitable library directory (e.g.  To install PCRE, copy libpcre.a to any suitable library directory (e.g.
92  /usr/local/lib), pcre.h to any suitable include directory (e.g.  /usr/local/lib), pcre.h to any suitable include directory (e.g.
# Line 83  uses the POSIX API, it will have to be r Line 110  uses the POSIX API, it will have to be r
110  Character tables  Character tables
111  ----------------  ----------------
112    
113  PCRE uses four tables for manipulating and identifying characters. These are  PCRE uses four tables for manipulating and identifying characters. The final
114  compiled from a source file called chartables.c. This is not supplied in  argument of the pcre_compile() function is a pointer to a block of memory
115  the distribution, but is built by the program maketables (compiled from  containing the concatenated tables. A call to pcre_maketables() is used to
116  maketables.c), which uses the ANSI C character handling functions such as  generate a set of tables in the current locale. However, if the final argument
117  isalnum(), isalpha(), isupper(), islower(), etc. to build the table sources.  is passed as NULL, a set of default tables that is built into the binary is
118  This means that the default C locale set in your system may affect the contents  used.
119  of the tables. You can change the tables by editing chartables.c and then  
120  re-building PCRE. If you do this, you should probably also edit Makefile to  The source file called chartables.c contains the default set of tables. This is
121  ensure that the file doesn't ever get re-generated.  not supplied in the distribution, but is built by the program dftables
122    (compiled from dftables.c), which uses the ANSI C character handling functions
123  The first two tables pcre_lcc[] and pcre_fcc[] provide lower casing and a  such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
124  case flipping functions, respectively. The pcre_cbits[] table consists of four  sources. This means that the default C locale set your system will control the
125  32-byte bit maps which identify digits, letters, "word" characters, and white  contents of the tables. You can change the default tables by editing
126  space, respectively. These are used when building 32-byte bit maps that  chartables.c and then re-building PCRE. If you do this, you should probably
127  represent character classes.  also edit Makefile to ensure that the file doesn't ever get re-generated.
128    
129    The first two 256-byte tables provide lower casing and case flipping functions,
130    respectively. The next table consists of three 32-byte bit maps which identify
131    digits, "word" characters, and white space, respectively. These are used when
132    building 32-byte bit maps that represent character classes.
133    
134  The pcre_ctypes[] table has bits indicating various character types, as  The final 256-byte table has bits indicating various character types, as
135  follows:  follows:
136    
137      1   white space character      1   white space character
# Line 128  The program handles any number of sets o Line 160  The program handles any number of sets o
160  set starts with a regular expression, and continues with any number of data  set starts with a regular expression, and continues with any number of data
161  lines to be matched against the pattern. An empty line signals the end of the  lines to be matched against the pattern. An empty line signals the end of the
162  set. The regular expressions are given enclosed in any non-alphameric  set. The regular expressions are given enclosed in any non-alphameric
163  delimiters, for example  delimiters other than backslash, for example
164    
165    /(a|bc)x+yz/    /(a|bc)x+yz/
166    
167  and may be followed by i, m, s, or x to set the PCRE_CASELESS, PCRE_MULTILINE,  White space before the initial delimiter is ignored. A regular expression may
168  PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These options have the  be continued over several input lines, in which case the newline characters are
169  same effect as they do in Perl.  included within it. See the testinput files for many examples. It is possible
170    to include the delimiter within the pattern by escaping it, for example
171    
172      /abc\/def/
173    
174    If you do so, the escape and the delimiter form part of the pattern, but since
175    delimiters are always non-alphameric, this does not affect its interpretation.
176    If the terminating delimiter is immediately followed by a backslash, for
177    example,
178    
179      /abc/\
180    
181    then a backslash is added to the end of the pattern. This provides a way of
182    testing the error condition that arises if a pattern finishes with a backslash,
183    because
184    
185      /abc\/
186    
187    is interpreted as the first line of a pattern that starts with "abc/", causing
188    pcretest to read the next line as a continuation of the regular expression.
189    
190    The pattern may be followed by i, m, s, or x to set the PCRE_CASELESS,
191    PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively. These
192    options have the same effect as they do in Perl.
193    
194  There are also some upper case options that do not match Perl options: /A, /E,  There are also some upper case options that do not match Perl options: /A, /E,
195  and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.  and /X set PCRE_ANCHORED, PCRE_DOLLAR_ENDONLY, and PCRE_EXTRA respectively.
196  The /D option is a PCRE debugging feature. It causes the internal form of  
197  compiled regular expressions to be output after compilation. The /S option  The /L option must be followed directly by the name of a locale, for example,
198  causes pcre_study() to be called after the expression has been compiled, and  
199  the results used when the expression is matched.    /pattern/Lfr
200    
201    For this reason, it must be the last option letter. The given locale is set,
202    pcre_maketables() is called to build a set of character tables for the locale,
203    and this is then passed to pcre_compile() when compiling the regular
204    expression. Without an /L option, NULL is passed as the tables pointer; that
205    is, /L applies only to the expression on which it appears.
206    
207    The /I option requests that pcretest output information about the compiled
208    expression (whether it is anchored, has a fixed first character, and so on). It
209    does this by calling pcre_info() after compiling an expression, and outputting
210    the information it gets back. If the pattern is studied, the results of that
211    are also output.
212    
213    The /D option is a PCRE debugging feature, which also assumes /I. It causes the
214    internal form of compiled regular expressions to be output after compilation.
215    
216    The /S option causes pcre_study() to be called after the expression has been
217    compiled, and the results used when the expression is matched.
218    
219    The /M option causes information about the size of memory block used to hold
220    the compile pattern to be output.
221    
222  Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API  Finally, the /P option causes pcretest to call PCRE via the POSIX wrapper API
223  rather than its native API. When this is done, all other options except /i and  rather than its native API. When this is done, all other options except /i and
# Line 149  rather than its native API. When this is Line 225  rather than its native API. When this is
225  is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and  is present. The wrapper functions force PCRE_DOLLAR_ENDONLY always, and
226  PCRE_DOTALL unless REG_NEWLINE is set.  PCRE_DOTALL unless REG_NEWLINE is set.
227    
 A regular expression can extend over several lines of input; the newlines are  
 included in it. See the testinput files for many examples.  
   
228  Before each data line is passed to pcre_exec(), leading and trailing whitespace  Before each data line is passed to pcre_exec(), leading and trailing whitespace
229  is removed, and it is then scanned for \ escapes. The following are recognized:  is removed, and it is then scanned for \ escapes. The following are recognized:
230    
# Line 168  is removed, and it is then scanned for \ Line 241  is removed, and it is then scanned for \
241    
242    \A     pass the PCRE_ANCHORED option to pcre_exec()    \A     pass the PCRE_ANCHORED option to pcre_exec()
243    \B     pass the PCRE_NOTBOL option to pcre_exec()    \B     pass the PCRE_NOTBOL option to pcre_exec()
244      \Cdd   call pcre_copy_substring() for substring dd after a successful match
245               (any decimal number less than 32)
246      \Gdd   call pcre_get_substring() for substring dd after a successful match
247               (any decimal number less than 32)
248      \L     call pcre_get_substringlist() after a successful match
249    \Odd   set the size of the output vector passed to pcre_exec() to dd    \Odd   set the size of the output vector passed to pcre_exec() to dd
250             (any number of decimal digits)             (any number of decimal digits)
251    \Z     pass the PCRE_NOTEOL option to pcre_exec()    \Z     pass the PCRE_NOTEOL option to pcre_exec()
# Line 180  If /P was present on the regex, causing Line 258  If /P was present on the regex, causing
258  \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to  \B, and \Z have any effect, causing REG_NOTBOL and REG_NOTEOL to be passed to
259  regexec() respectively.  regexec() respectively.
260    
261  When a match succeeds, pcretest outputs the list of identified substrings that  When a match succeeds, pcretest outputs the list of captured substrings that
262  pcre_exec() returns, starting with number 0 for the string that matched the  pcre_exec() returns, starting with number 0 for the string that matched the
263  whole pattern. Here is an example of an interactive pcretest run.  whole pattern. Here is an example of an interactive pcretest run.
264    
# Line 195  whole pattern. Here is an example of an Line 273  whole pattern. Here is an example of an
273    data> xyz    data> xyz
274    No match    No match
275    
276    If any of \C, \G, or \L are present in a data line that is successfully
277    matched, the substrings extracted by the convenience functions are output with
278    C, G, or L after the string number instead of a colon. This is in addition to
279    the normal full list. The string length (that is, the return from the
280    extraction function) is given in parentheses after each string for \C and \G.
281    
282  Note that while patterns can be continued over several lines (a plain ">"  Note that while patterns can be continued over several lines (a plain ">"
283  prompt is used for continuations), data lines may not. However newlines can be  prompt is used for continuations), data lines may not. However newlines can be
284  included in data by means of the \n escape.  included in data by means of the \n escape.
# Line 206  following flags has any effect in this c Line 290  following flags has any effect in this c
290  If the option -d is given to pcretest, it is equivalent to adding /D to each  If the option -d is given to pcretest, it is equivalent to adding /D to each
291  regular expression: the internal form is output after compilation.  regular expression: the internal form is output after compilation.
292    
293  If the option -i (for "information") is given to pcretest, it calls pcre_info()  If the option -i is given to pcretest, it is equivalent to adding /I to each
294  after compiling an expression, and outputs the information it gets back. If the  regular expression: information about the compiled pattern is given after
295  pattern is studied, the results of that are also output.  compilation.
296    
297  If the option -s is given to pcretest, it outputs the size of each compiled  If the option -m is given to pcretest, it outputs the size of each compiled
298  pattern after it has been compiled.  pattern after it has been compiled. It is equivalent to adding /M to each
299    regular expression. For compatibility with earlier versions of pcretest, -s is
300    a synonym for -m.
301    
302  If the -t option is given, each compile, study, and match is run 10000 times  If the -t option is given, each compile, study, and match is run 20000 times
303  while being timed, and the resulting time per compile or match is output in  while being timed, and the resulting time per compile or match is output in
304  milliseconds. Do not set -t with -s, because you will then get the size output  milliseconds. Do not set -t with -s, because you will then get the size output
305  10000 times and the timing will be distorted. If you want to change the number  20000 times and the timing will be distorted. If you want to change the number
306  of repetitions used for timing, edit the definition of LOOPREPEAT at the top of  of repetitions used for timing, edit the definition of LOOPREPEAT at the top of
307  pcretest.c  pcretest.c
308    
# Line 237  for pcretest, and the special upper case Line 323  for pcretest, and the special upper case
323  recognizes are not used in this file. The output should be identical, apart  recognizes are not used in this file. The output should be identical, apart
324  from the initial identifying banner.  from the initial identifying banner.
325    
326  The testinput2 file is not suitable for feeding to Perltest, since it does  The testinput2 and testinput4 files are not suitable for feeding to Perltest,
327  make use of the special upper case options and escapes that pcretest uses to  since they do make use of the special upper case options and escapes that
328  test some features of PCRE. It also contains malformed regular expressions, in  pcretest uses to test some features of PCRE. The first of these files also
329  order to check that PCRE diagnoses them correctly.  contains malformed regular expressions, in order to check that PCRE diagnoses
330    them correctly.
331    
332  Philip Hazel <ph10@cam.ac.uk>  Philip Hazel <ph10@cam.ac.uk>
333  September 1998  February 1999

Legend:
Removed from v.23  
changed lines
  Added in v.31

  ViewVC Help
Powered by ViewVC 1.1.5