/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 259 by ph10, Wed Sep 19 09:52:44 2007 UTC revision 289 by ph10, Sun Dec 23 12:17:20 2007 UTC
# Line 271  NAME Line 271  NAME
271  PCRE BUILD-TIME OPTIONS  PCRE BUILD-TIME OPTIONS
272    
273         This  document  describes  the  optional  features  of PCRE that can be         This  document  describes  the  optional  features  of PCRE that can be
274         selected when the library is compiled. They are all selected, or  dese-         selected when the library is compiled. It assumes use of the  configure
275         lected, by providing options to the configure script that is run before         script,  where the optional features are selected or deselected by pro-
276         the make command. The complete list of  options  for  configure  (which         viding options to configure before running the make  command.  However,
277         includes  the  standard  ones such as the selection of the installation         the  same  options  can be selected in both Unix-like and non-Unix-like
278         directory) can be obtained by running         environments using the GUI facility of  CMakeSetup  if  you  are  using
279           CMake instead of configure to build PCRE.
280    
281           The complete list of options for configure (which includes the standard
282           ones such as the  selection  of  the  installation  directory)  can  be
283           obtained by running
284    
285           ./configure --help           ./configure --help
286    
287         The following sections include  descriptions  of  options  whose  names         The  following  sections  include  descriptions  of options whose names
288         begin with --enable or --disable. These settings specify changes to the         begin with --enable or --disable. These settings specify changes to the
289         defaults for the configure command. Because of the way  that  configure         defaults  for  the configure command. Because of the way that configure
290         works,  --enable  and --disable always come in pairs, so the complemen-         works, --enable and --disable always come in pairs, so  the  complemen-
291         tary option always exists as well, but as it specifies the default,  it         tary  option always exists as well, but as it specifies the default, it
292         is not described.         is not described.
293    
294    
# Line 304  UTF-8 SUPPORT Line 309  UTF-8 SUPPORT
309    
310           --enable-utf8           --enable-utf8
311    
312         to  the  configure  command.  Of  itself, this does not make PCRE treat         to the configure command. Of itself, this  does  not  make  PCRE  treat
313         strings as UTF-8. As well as compiling PCRE with this option, you  also         strings  as UTF-8. As well as compiling PCRE with this option, you also
314         have  have to set the PCRE_UTF8 option when you call the pcre_compile()         have have to set the PCRE_UTF8 option when you call the  pcre_compile()
315         function.         function.
316    
317    
318  UNICODE CHARACTER PROPERTY SUPPORT  UNICODE CHARACTER PROPERTY SUPPORT
319    
320         UTF-8 support allows PCRE to process character values greater than  255         UTF-8  support allows PCRE to process character values greater than 255
321         in  the  strings that it handles. On its own, however, it does not pro-         in the strings that it handles. On its own, however, it does  not  pro-
322         vide any facilities for accessing the properties of such characters. If         vide any facilities for accessing the properties of such characters. If
323         you  want  to  be able to use the pattern escapes \P, \p, and \X, which         you want to be able to use the pattern escapes \P, \p,  and  \X,  which
324         refer to Unicode character properties, you must add         refer to Unicode character properties, you must add
325    
326           --enable-unicode-properties           --enable-unicode-properties
327    
328         to the configure command. This implies UTF-8 support, even if you  have         to  the configure command. This implies UTF-8 support, even if you have
329         not explicitly requested it.         not explicitly requested it.
330    
331         Including  Unicode  property  support  adds around 30K of tables to the         Including Unicode property support adds around 30K  of  tables  to  the
332         PCRE library. Only the general category properties such as  Lu  and  Nd         PCRE  library.  Only  the general category properties such as Lu and Nd
333         are supported. Details are given in the pcrepattern documentation.         are supported. Details are given in the pcrepattern documentation.
334    
335    
336  CODE VALUE OF NEWLINE  CODE VALUE OF NEWLINE
337    
338         By  default,  PCRE interprets character 10 (linefeed, LF) as indicating         By default, PCRE interprets character 10 (linefeed, LF)  as  indicating
339         the end of a line. This is the normal newline  character  on  Unix-like         the  end  of  a line. This is the normal newline character on Unix-like
340         systems. You can compile PCRE to use character 13 (carriage return, CR)         systems. You can compile PCRE to use character 13 (carriage return, CR)
341         instead, by adding         instead, by adding
342    
343           --enable-newline-is-cr           --enable-newline-is-cr
344    
345         to the  configure  command.  There  is  also  a  --enable-newline-is-lf         to  the  configure  command.  There  is  also  a --enable-newline-is-lf
346         option, which explicitly specifies linefeed as the newline character.         option, which explicitly specifies linefeed as the newline character.
347    
348         Alternatively, you can specify that line endings are to be indicated by         Alternatively, you can specify that line endings are to be indicated by
# Line 349  CODE VALUE OF NEWLINE Line 354  CODE VALUE OF NEWLINE
354    
355           --enable-newline-is-anycrlf           --enable-newline-is-anycrlf
356    
357         which causes PCRE to recognize any of the three sequences  CR,  LF,  or         which  causes  PCRE  to recognize any of the three sequences CR, LF, or
358         CRLF as indicating a line ending. Finally, a fifth option, specified by         CRLF as indicating a line ending. Finally, a fifth option, specified by
359    
360           --enable-newline-is-any           --enable-newline-is-any
361    
362         causes PCRE to recognize any Unicode newline sequence.         causes PCRE to recognize any Unicode newline sequence.
363    
364         Whatever line ending convention is selected when PCRE is built  can  be         Whatever  line  ending convention is selected when PCRE is built can be
365         overridden  when  the library functions are called. At build time it is         overridden when the library functions are called. At build time  it  is
366         conventional to use the standard for your operating system.         conventional to use the standard for your operating system.
367    
368    
369  WHAT \R MATCHES  WHAT \R MATCHES
370    
371         By default, the sequence \R in a pattern matches  any  Unicode  newline         By  default,  the  sequence \R in a pattern matches any Unicode newline
372         sequence,  whatever  has  been selected as the line ending sequence. If         sequence, whatever has been selected as the line  ending  sequence.  If
373         you specify         you specify
374    
375           --enable-bsr-anycrlf           --enable-bsr-anycrlf
376    
377         the default is changed so that \R matches only CR, LF, or  CRLF.  What-         the  default  is changed so that \R matches only CR, LF, or CRLF. What-
378         ever  is selected when PCRE is built can be overridden when the library         ever is selected when PCRE is built can be overridden when the  library
379         functions are called.         functions are called.
380    
381    
382  BUILDING SHARED AND STATIC LIBRARIES  BUILDING SHARED AND STATIC LIBRARIES
383    
384         The PCRE building process uses libtool to build both shared and  static         The  PCRE building process uses libtool to build both shared and static
385         Unix  libraries by default. You can suppress one of these by adding one         Unix libraries by default. You can suppress one of these by adding  one
386         of         of
387    
388           --disable-shared           --disable-shared
# Line 389  BUILDING SHARED AND STATIC LIBRARIES Line 394  BUILDING SHARED AND STATIC LIBRARIES
394  POSIX MALLOC USAGE  POSIX MALLOC USAGE
395    
396         When PCRE is called through the POSIX interface (see the pcreposix doc-         When PCRE is called through the POSIX interface (see the pcreposix doc-
397         umentation),  additional  working  storage  is required for holding the         umentation), additional working storage is  required  for  holding  the
398         pointers to capturing substrings, because PCRE requires three  integers         pointers  to capturing substrings, because PCRE requires three integers
399         per  substring,  whereas  the POSIX interface provides only two. If the         per substring, whereas the POSIX interface provides only  two.  If  the
400         number of expected substrings is small, the wrapper function uses space         number of expected substrings is small, the wrapper function uses space
401         on the stack, because this is faster than using malloc() for each call.         on the stack, because this is faster than using malloc() for each call.
402         The default threshold above which the stack is no longer used is 10; it         The default threshold above which the stack is no longer used is 10; it
# Line 404  POSIX MALLOC USAGE Line 409  POSIX MALLOC USAGE
409    
410  HANDLING VERY LARGE PATTERNS  HANDLING VERY LARGE PATTERNS
411    
412         Within  a  compiled  pattern,  offset values are used to point from one         Within a compiled pattern, offset values are used  to  point  from  one
413         part to another (for example, from an opening parenthesis to an  alter-         part  to another (for example, from an opening parenthesis to an alter-
414         nation  metacharacter).  By default, two-byte values are used for these         nation metacharacter). By default, two-byte values are used  for  these
415         offsets, leading to a maximum size for a  compiled  pattern  of  around         offsets,  leading  to  a  maximum size for a compiled pattern of around
416         64K.  This  is sufficient to handle all but the most gigantic patterns.         64K. This is sufficient to handle all but the most  gigantic  patterns.
417         Nevertheless, some people do want to process enormous patterns,  so  it         Nevertheless,  some  people do want to process enormous patterns, so it
418         is  possible  to compile PCRE to use three-byte or four-byte offsets by         is possible to compile PCRE to use three-byte or four-byte  offsets  by
419         adding a setting such as         adding a setting such as
420    
421           --with-link-size=3           --with-link-size=3
422    
423         to the configure command. The value given must be 2,  3,  or  4.  Using         to  the  configure  command.  The value given must be 2, 3, or 4. Using
424         longer  offsets slows down the operation of PCRE because it has to load         longer offsets slows down the operation of PCRE because it has to  load
425         additional bytes when handling them.         additional bytes when handling them.
426    
427    
428  AVOIDING EXCESSIVE STACK USAGE  AVOIDING EXCESSIVE STACK USAGE
429    
430         When matching with the pcre_exec() function, PCRE implements backtrack-         When matching with the pcre_exec() function, PCRE implements backtrack-
431         ing  by  making recursive calls to an internal function called match().         ing by making recursive calls to an internal function  called  match().
432         In environments where the size of the stack is limited,  this  can  se-         In  environments  where  the size of the stack is limited, this can se-
433         verely  limit  PCRE's operation. (The Unix environment does not usually         verely limit PCRE's operation. (The Unix environment does  not  usually
434         suffer from this problem, but it may sometimes be necessary to increase         suffer from this problem, but it may sometimes be necessary to increase
435         the  maximum  stack size.  There is a discussion in the pcrestack docu-         the maximum stack size.  There is a discussion in the  pcrestack  docu-
436         mentation.) An alternative approach to recursion that uses memory  from         mentation.)  An alternative approach to recursion that uses memory from
437         the  heap  to remember data, instead of using recursive function calls,         the heap to remember data, instead of using recursive  function  calls,
438         has been implemented to work round the problem of limited  stack  size.         has  been  implemented to work round the problem of limited stack size.
439         If you want to build a version of PCRE that works this way, add         If you want to build a version of PCRE that works this way, add
440    
441           --disable-stack-for-recursion           --disable-stack-for-recursion
442    
443         to  the  configure  command. With this configuration, PCRE will use the         to the configure command. With this configuration, PCRE  will  use  the
444         pcre_stack_malloc and pcre_stack_free variables to call memory  manage-         pcre_stack_malloc  and pcre_stack_free variables to call memory manage-
445         ment  functions. By default these point to malloc() and free(), but you         ment functions. By default these point to malloc() and free(), but  you
446         can replace the pointers so that your own functions are used.         can replace the pointers so that your own functions are used.
447    
448         Separate functions are  provided  rather  than  using  pcre_malloc  and         Separate  functions  are  provided  rather  than  using pcre_malloc and
449         pcre_free  because  the  usage  is  very  predictable:  the block sizes         pcre_free because the  usage  is  very  predictable:  the  block  sizes
450         requested are always the same, and  the  blocks  are  always  freed  in         requested  are  always  the  same,  and  the blocks are always freed in
451         reverse  order.  A calling program might be able to implement optimized         reverse order. A calling program might be able to  implement  optimized
452         functions that perform better  than  malloc()  and  free().  PCRE  runs         functions  that  perform  better  than  malloc()  and free(). PCRE runs
453         noticeably more slowly when built in this way. This option affects only         noticeably more slowly when built in this way. This option affects only
454         the  pcre_exec()  function;  it   is   not   relevant   for   the   the         the   pcre_exec()   function;   it   is   not   relevant  for  the  the
455         pcre_dfa_exec() function.         pcre_dfa_exec() function.
456    
457    
458  LIMITING PCRE RESOURCE USAGE  LIMITING PCRE RESOURCE USAGE
459    
460         Internally,  PCRE has a function called match(), which it calls repeat-         Internally, PCRE has a function called match(), which it calls  repeat-
461         edly  (sometimes  recursively)  when  matching  a  pattern   with   the         edly   (sometimes   recursively)  when  matching  a  pattern  with  the
462         pcre_exec()  function.  By controlling the maximum number of times this         pcre_exec() function. By controlling the maximum number of  times  this
463         function may be called during a single matching operation, a limit  can         function  may be called during a single matching operation, a limit can
464         be  placed  on  the resources used by a single call to pcre_exec(). The         be placed on the resources used by a single call  to  pcre_exec().  The
465         limit can be changed at run time, as described in the pcreapi  documen-         limit  can be changed at run time, as described in the pcreapi documen-
466         tation.  The default is 10 million, but this can be changed by adding a         tation. The default is 10 million, but this can be changed by adding  a
467         setting such as         setting such as
468    
469           --with-match-limit=500000           --with-match-limit=500000
470    
471         to  the  configure  command.  This  setting  has  no  effect   on   the         to   the   configure  command.  This  setting  has  no  effect  on  the
472         pcre_dfa_exec() matching function.         pcre_dfa_exec() matching function.
473    
474         In  some  environments  it is desirable to limit the depth of recursive         In some environments it is desirable to limit the  depth  of  recursive
475         calls of match() more strictly than the total number of calls, in order         calls of match() more strictly than the total number of calls, in order
476         to  restrict  the maximum amount of stack (or heap, if --disable-stack-         to restrict the maximum amount of stack (or heap,  if  --disable-stack-
477         for-recursion is specified) that is used. A second limit controls this;         for-recursion is specified) that is used. A second limit controls this;
478         it  defaults  to  the  value  that is set for --with-match-limit, which         it defaults to the value that  is  set  for  --with-match-limit,  which
479         imposes no additional constraints. However, you can set a  lower  limit         imposes  no  additional constraints. However, you can set a lower limit
480         by adding, for example,         by adding, for example,
481    
482           --with-match-limit-recursion=10000           --with-match-limit-recursion=10000
483    
484         to  the  configure  command.  This  value can also be overridden at run         to the configure command. This value can  also  be  overridden  at  run
485         time.         time.
486    
487    
488  CREATING CHARACTER TABLES AT BUILD TIME  CREATING CHARACTER TABLES AT BUILD TIME
489    
490         PCRE uses fixed tables for processing characters whose code values  are         PCRE  uses fixed tables for processing characters whose code values are
491         less  than 256. By default, PCRE is built with a set of tables that are         less than 256. By default, PCRE is built with a set of tables that  are
492         distributed in the file pcre_chartables.c.dist. These  tables  are  for         distributed  in  the  file pcre_chartables.c.dist. These tables are for
493         ASCII codes only. If you add         ASCII codes only. If you add
494    
495           --enable-rebuild-chartables           --enable-rebuild-chartables
496    
497         to  the  configure  command, the distributed tables are no longer used.         to the configure command, the distributed tables are  no  longer  used.
498         Instead, a program called dftables is compiled and  run.  This  outputs         Instead,  a  program  called dftables is compiled and run. This outputs
499         the source for new set of tables, created in the default locale of your         the source for new set of tables, created in the default locale of your
500         C runtime system. (This method of replacing the tables does not work if         C runtime system. (This method of replacing the tables does not work if
501         you  are cross compiling, because dftables is run on the local host. If         you are cross compiling, because dftables is run on the local host.  If
502         you need to create alternative tables when cross  compiling,  you  will         you  need  to  create alternative tables when cross compiling, you will
503         have to do so "by hand".)         have to do so "by hand".)
504    
505    
506  USING EBCDIC CODE  USING EBCDIC CODE
507    
508         PCRE  assumes  by  default that it will run in an environment where the         PCRE assumes by default that it will run in an  environment  where  the
509         character code is ASCII (or Unicode, which is  a  superset  of  ASCII).         character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
510         This  is  the  case for most computer operating systems. PCRE can, how-         This is the case for most computer operating systems.  PCRE  can,  how-
511         ever, be compiled to run in an EBCDIC environment by adding         ever, be compiled to run in an EBCDIC environment by adding
512    
513           --enable-ebcdic           --enable-ebcdic
514    
515         to the configure command. This setting implies --enable-rebuild-charta-         to the configure command. This setting implies --enable-rebuild-charta-
516         bles.  You  should  only  use  it if you know that you are in an EBCDIC         bles. You should only use it if you know that  you  are  in  an  EBCDIC
517         environment (for example, an IBM mainframe operating system).         environment (for example, an IBM mainframe operating system).
518    
519    
520    PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT
521    
522           By default, pcregrep reads all files as plain text. You can build it so
523           that it recognizes files whose names end in .gz or .bz2, and reads them
524           with libz or libbz2, respectively, by adding one or both of
525    
526             --enable-pcregrep-libz
527             --enable-pcregrep-libbz2
528    
529           to the configure command. These options naturally require that the rel-
530           evant libraries are installed on your system. Configuration  will  fail
531           if they are not.
532    
533    
534    PCRETEST OPTION FOR LIBREADLINE SUPPORT
535    
536           If you add
537    
538             --enable-pcretest-libreadline
539    
540           to  the  configure  command,  pcretest  is  linked with the libreadline
541           library, and when its input is from a terminal, it reads it  using  the
542           readline() function. This provides line-editing and history facilities.
543           Note that libreadline is GPL-licenced, so if you distribute a binary of
544           pcretest linked in this way, there may be licensing issues.
545    
546    
547  SEE ALSO  SEE ALSO
548    
549         pcreapi(3), pcre_config(3).         pcreapi(3), pcre_config(3).
# Line 526  AUTHOR Line 558  AUTHOR
558    
559  REVISION  REVISION
560    
561         Last updated: 11 September 2007         Last updated: 18 December 2007
562         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
563  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
564    
# Line 1570  INFORMATION ABOUT A PATTERN Line 1602  INFORMATION ABOUT A PATTERN
1602    
1603           PCRE_INFO_JCHANGED           PCRE_INFO_JCHANGED
1604    
1605         Return 1 if the (?J) option setting is used in the  pattern,  otherwise         Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
1606         0. The fourth argument should point to an int variable. The (?J) inter-         otherwise  0. The fourth argument should point to an int variable. (?J)
1607         nal option setting changes the local PCRE_DUPNAMES option.         and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.
1608    
1609           PCRE_INFO_LASTLITERAL           PCRE_INFO_LASTLITERAL
1610    
# Line 2525  AUTHOR Line 2557  AUTHOR
2557    
2558  REVISION  REVISION
2559    
2560         Last updated: 11 September 2007         Last updated: 27 November 2007
2561         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
2562  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
2563    
# Line 4991  CHARACTER CLASSES Line 5023  CHARACTER CLASSES
5023           [^...]      negative character class           [^...]      negative character class
5024           [x-y]       range (can be used for hex characters)           [x-y]       range (can be used for hex characters)
5025           [[:xxx:]]   positive POSIX named set           [[:xxx:]]   positive POSIX named set
5026           [[^:xxx:]]  negative POSIX named set           [[:^xxx:]]  negative POSIX named set
5027    
5028           alnum       alphanumeric           alnum       alphanumeric
5029           alpha       alphabetic           alpha       alphabetic
# Line 5161  BACKTRACKING CONTROL Line 5193  BACKTRACKING CONTROL
5193    
5194  NEWLINE CONVENTIONS  NEWLINE CONVENTIONS
5195    
5196         These are recognized only at the very start of a pattern.         These  are  recognized only at the very start of the pattern or after a
5197           (*BSR_...) option.
5198    
5199           (*CR)           (*CR)
5200           (*LF)           (*LF)
# Line 5172  NEWLINE CONVENTIONS Line 5205  NEWLINE CONVENTIONS
5205    
5206  WHAT \R MATCHES  WHAT \R MATCHES
5207    
5208         These are recognized only at the very start of a pattern.         These are recognized only at the very start of the pattern or  after  a
5209           (*...) option that sets the newline convention.
5210    
5211           (*BSR_ANYCRLF)           (*BSR_ANYCRLF)
5212           (*BSR_UNICODE)           (*BSR_UNICODE)
# Line 5198  AUTHOR Line 5232  AUTHOR
5232    
5233  REVISION  REVISION
5234    
5235         Last updated: 11 September 2007         Last updated: 14 November 2007
5236         Copyright (c) 1997-2007 University of Cambridge.         Copyright (c) 1997-2007 University of Cambridge.
5237  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
5238    
# Line 6002  MATCHING INTERFACE Line 6036  MATCHING INTERFACE
6036    
6037           c. The "i"th argument has a suitable type for holding the           c. The "i"th argument has a suitable type for holding the
6038              string captured as the "i"th sub-pattern. If you pass in              string captured as the "i"th sub-pattern. If you pass in
6039              NULL for the "i"th argument, or pass fewer arguments than              void * NULL for the "i"th argument, or a non-void * NULL
6040                of the correct type, or pass fewer arguments than the
6041              number of sub-patterns, "i"th captured sub-pattern is              number of sub-patterns, "i"th captured sub-pattern is
6042              ignored.              ignored.
6043    
# Line 6250  AUTHOR Line 6285  AUTHOR
6285    
6286  REVISION  REVISION
6287    
6288         Last updated: 06 March 2007         Last updated: 12 November 2007
6289  ------------------------------------------------------------------------------  ------------------------------------------------------------------------------
6290    
6291    

Legend:
Removed from v.259  
changed lines
  Added in v.289

  ViewVC Help
Powered by ViewVC 1.1.5