/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 869 by ph10, Sat Jan 14 11:16:23 2012 UTC revision 878 by ph10, Sun Jan 15 15:44:47 2012 UTC
# Line 554  UTF-8 and UTF-16 SUPPORT Line 554  UTF-8 and UTF-16 SUPPORT
554    
555         to the configure command.  This  setting  applies  to  both  libraries,         to the configure command.  This  setting  applies  to  both  libraries,
556         adding support for UTF-8 to the 8-bit library and support for UTF-16 to         adding support for UTF-8 to the 8-bit library and support for UTF-16 to
557         the 16-bit library. It is not possible to build one  library  with  UTF         the 16-bit library. There are no separate options  for  enabling  UTF-8
558         support and the other without in the same configuration. (For backwards         and  UTF-16  independently because that would allow ridiculous settings
559         compatibility, --enable-utf8 is a synonym of --enable-utf.)         such as  requesting  UTF-16  support  while  building  only  the  8-bit
560           library.  It  is not possible to build one library with UTF support and
561           the other without in the same configuration. (For backwards compatibil-
562           ity, --enable-utf8 is a synonym of --enable-utf.)
563    
564         Of itself, this setting does not make PCRE treat strings  as  UTF-8  or         Of  itself,  this  setting does not make PCRE treat strings as UTF-8 or
565         UTF-16.  As well as compiling PCRE with this option, you also have have         UTF-16. As well as compiling PCRE with this option, you also have  have
566         to set the PCRE_UTF8 or PCRE_UTF16 option when you call one of the pat-         to set the PCRE_UTF8 or PCRE_UTF16 option when you call one of the pat-
567         tern compiling functions.         tern compiling functions.
568    
569         If  you  set --enable-utf when compiling in an EBCDIC environment, PCRE         If you set --enable-utf when compiling in an EBCDIC  environment,  PCRE
570         expects its input to be either ASCII or UTF-8 (depending on the runtime         expects its input to be either ASCII or UTF-8 (depending on the runtime
571         option).  It  is not possible to support both EBCDIC and UTF-8 codes in         option). It is not possible to support both EBCDIC and UTF-8  codes  in
572         the  same  version  of  the  library.  Consequently,  --enable-utf  and         the  same  version  of  the  library.  Consequently,  --enable-utf  and
573         --enable-ebcdic are mutually exclusive.         --enable-ebcdic are mutually exclusive.
574    
575    
576  UNICODE CHARACTER PROPERTY SUPPORT  UNICODE CHARACTER PROPERTY SUPPORT
577    
578         UTF  support allows the libraries to process character codepoints up to         UTF support allows the libraries to process character codepoints up  to
579         0x10ffff in the strings that they handle. On its own, however, it  does         0x10ffff  in the strings that they handle. On its own, however, it does
580         not provide any facilities for accessing the properties of such charac-         not provide any facilities for accessing the properties of such charac-
581         ters. If you want to be able to use the pattern escapes \P, \p, and \X,         ters. If you want to be able to use the pattern escapes \P, \p, and \X,
582         which refer to Unicode character properties, you must add         which refer to Unicode character properties, you must add
583    
584           --enable-unicode-properties           --enable-unicode-properties
585    
586         to  the  configure  command. This implies UTF support, even if you have         to the configure command. This implies UTF support, even  if  you  have
587         not explicitly requested it.         not explicitly requested it.
588    
589         Including Unicode property support adds around 30K  of  tables  to  the         Including  Unicode  property  support  adds around 30K of tables to the
590         PCRE  library.  Only  the general category properties such as Lu and Nd         PCRE library. Only the general category properties such as  Lu  and  Nd
591         are supported. Details are given in the pcrepattern documentation.         are supported. Details are given in the pcrepattern documentation.
592    
593    
# Line 594  JUST-IN-TIME COMPILER SUPPORT Line 597  JUST-IN-TIME COMPILER SUPPORT
597    
598           --enable-jit           --enable-jit
599    
600         This support is available only for certain hardware  architectures.  If         This  support  is available only for certain hardware architectures. If
601         this  option  is  set  for  an unsupported architecture, a compile time         this option is set for an  unsupported  architecture,  a  compile  time
602         error occurs.  See the pcrejit documentation for a  discussion  of  JIT         error  occurs.   See  the pcrejit documentation for a discussion of JIT
603         usage. When JIT support is enabled, pcregrep automatically makes use of         usage. When JIT support is enabled, pcregrep automatically makes use of
604         it, unless you add         it, unless you add
605    
# Line 607  JUST-IN-TIME COMPILER SUPPORT Line 610  JUST-IN-TIME COMPILER SUPPORT
610    
611  CODE VALUE OF NEWLINE  CODE VALUE OF NEWLINE
612    
613         By default, PCRE interprets the linefeed (LF) character  as  indicating         By  default,  PCRE interprets the linefeed (LF) character as indicating
614         the  end  of  a line. This is the normal newline character on Unix-like         the end of a line. This is the normal newline  character  on  Unix-like
615         systems. You can compile PCRE to use carriage return (CR)  instead,  by         systems.  You  can compile PCRE to use carriage return (CR) instead, by
616         adding         adding
617    
618           --enable-newline-is-cr           --enable-newline-is-cr
619    
620         to  the  configure  command.  There  is  also  a --enable-newline-is-lf         to the  configure  command.  There  is  also  a  --enable-newline-is-lf
621         option, which explicitly specifies linefeed as the newline character.         option, which explicitly specifies linefeed as the newline character.
622    
623         Alternatively, you can specify that line endings are to be indicated by         Alternatively, you can specify that line endings are to be indicated by
# Line 626  CODE VALUE OF NEWLINE Line 629  CODE VALUE OF NEWLINE
629    
630           --enable-newline-is-anycrlf           --enable-newline-is-anycrlf
631    
632         which  causes  PCRE  to recognize any of the three sequences CR, LF, or         which causes PCRE to recognize any of the three sequences  CR,  LF,  or
633         CRLF as indicating a line ending. Finally, a fifth option, specified by         CRLF as indicating a line ending. Finally, a fifth option, specified by
634    
635           --enable-newline-is-any           --enable-newline-is-any
636    
637         causes PCRE to recognize any Unicode newline sequence.         causes PCRE to recognize any Unicode newline sequence.
638    
639         Whatever line ending convention is selected when PCRE is built  can  be         Whatever  line  ending convention is selected when PCRE is built can be
640         overridden  when  the library functions are called. At build time it is         overridden when the library functions are called. At build time  it  is
641         conventional to use the standard for your operating system.         conventional to use the standard for your operating system.
642    
643    
644  WHAT \R MATCHES  WHAT \R MATCHES
645    
646         By default, the sequence \R in a pattern matches  any  Unicode  newline         By  default,  the  sequence \R in a pattern matches any Unicode newline
647         sequence,  whatever  has  been selected as the line ending sequence. If         sequence, whatever has been selected as the line  ending  sequence.  If
648         you specify         you specify
649    
650           --enable-bsr-anycrlf           --enable-bsr-anycrlf
651    
652         the default is changed so that \R matches only CR, LF, or  CRLF.  What-         the  default  is changed so that \R matches only CR, LF, or CRLF. What-
653         ever  is selected when PCRE is built can be overridden when the library         ever is selected when PCRE is built can be overridden when the  library
654         functions are called.         functions are called.
655    
656    
657  POSIX MALLOC USAGE  POSIX MALLOC USAGE
658    
659         When the 8-bit library is called through the POSIX interface  (see  the         When  the  8-bit library is called through the POSIX interface (see the
660         pcreposix  documentation),  additional  working storage is required for         pcreposix documentation), additional working storage  is  required  for
661         holding the pointers to capturing  substrings,  because  PCRE  requires         holding  the  pointers  to  capturing substrings, because PCRE requires
662         three integers per substring, whereas the POSIX interface provides only         three integers per substring, whereas the POSIX interface provides only
663         two. If the number of expected substrings is small, the  wrapper  func-         two.  If  the number of expected substrings is small, the wrapper func-
664         tion  uses  space  on the stack, because this is faster than using mal-         tion uses space on the stack, because this is faster  than  using  mal-
665         loc() for each call. The default threshold above which the stack is  no         loc()  for each call. The default threshold above which the stack is no
666         longer used is 10; it can be changed by adding a setting such as         longer used is 10; it can be changed by adding a setting such as
667    
668           --with-posix-malloc-threshold=20           --with-posix-malloc-threshold=20
# Line 669  POSIX MALLOC USAGE Line 672  POSIX MALLOC USAGE
672    
673  HANDLING VERY LARGE PATTERNS  HANDLING VERY LARGE PATTERNS
674    
675         Within  a  compiled  pattern,  offset values are used to point from one         Within a compiled pattern, offset values are used  to  point  from  one
676         part to another (for example, from an opening parenthesis to an  alter-         part  to another (for example, from an opening parenthesis to an alter-
677         nation  metacharacter).  By default, two-byte values are used for these         nation metacharacter). By default, two-byte values are used  for  these
678         offsets, leading to a maximum size for a  compiled  pattern  of  around         offsets,  leading  to  a  maximum size for a compiled pattern of around
679         64K.  This  is sufficient to handle all but the most gigantic patterns.         64K. This is sufficient to handle all but the most  gigantic  patterns.
680         Nevertheless, some people do want to process truly  enormous  patterns,         Nevertheless,  some  people do want to process truly enormous patterns,
681         so  it  is possible to compile PCRE to use three-byte or four-byte off-         so it is possible to compile PCRE to use three-byte or  four-byte  off-
682         sets by adding a setting such as         sets by adding a setting such as
683    
684           --with-link-size=3           --with-link-size=3
685    
686         to the configure command. The value given must be 2, 3, or 4.  For  the         to  the  configure command. The value given must be 2, 3, or 4. For the
687         16-bit  library,  a value of 3 is rounded up to 4. Using longer offsets         16-bit library, a value of 3 is rounded up to 4. Using  longer  offsets
688         slows down the operation of PCRE because it has to load additional data         slows down the operation of PCRE because it has to load additional data
689         when handling them.         when handling them.
690    
# Line 689  HANDLING VERY LARGE PATTERNS Line 692  HANDLING VERY LARGE PATTERNS
692  AVOIDING EXCESSIVE STACK USAGE  AVOIDING EXCESSIVE STACK USAGE
693    
694         When matching with the pcre_exec() function, PCRE implements backtrack-         When matching with the pcre_exec() function, PCRE implements backtrack-
695         ing by making recursive calls to an internal function  called  match().         ing  by  making recursive calls to an internal function called match().
696         In  environments  where  the size of the stack is limited, this can se-         In environments where the size of the stack is limited,  this  can  se-
697         verely limit PCRE's operation. (The Unix environment does  not  usually         verely  limit  PCRE's operation. (The Unix environment does not usually
698         suffer from this problem, but it may sometimes be necessary to increase         suffer from this problem, but it may sometimes be necessary to increase
699         the maximum stack size.  There is a discussion in the  pcrestack  docu-         the  maximum  stack size.  There is a discussion in the pcrestack docu-
700         mentation.)  An alternative approach to recursion that uses memory from         mentation.) An alternative approach to recursion that uses memory  from
701         the heap to remember data, instead of using recursive  function  calls,         the  heap  to remember data, instead of using recursive function calls,
702         has  been  implemented to work round the problem of limited stack size.         has been implemented to work round the problem of limited  stack  size.
703         If you want to build a version of PCRE that works this way, add         If you want to build a version of PCRE that works this way, add
704    
705           --disable-stack-for-recursion           --disable-stack-for-recursion
706    
707         to the configure command. With this configuration, PCRE  will  use  the         to  the  configure  command. With this configuration, PCRE will use the
708         pcre_stack_malloc  and pcre_stack_free variables to call memory manage-         pcre_stack_malloc and pcre_stack_free variables to call memory  manage-
709         ment functions. By default these point to malloc() and free(), but  you         ment  functions. By default these point to malloc() and free(), but you
710         can replace the pointers so that your own functions are used instead.         can replace the pointers so that your own functions are used instead.
711    
712         Separate  functions  are  provided  rather  than  using pcre_malloc and         Separate functions are  provided  rather  than  using  pcre_malloc  and
713         pcre_free because the  usage  is  very  predictable:  the  block  sizes         pcre_free  because  the  usage  is  very  predictable:  the block sizes
714         requested  are  always  the  same,  and  the blocks are always freed in         requested are always the same, and  the  blocks  are  always  freed  in
715         reverse order. A calling program might be able to  implement  optimized         reverse  order.  A calling program might be able to implement optimized
716         functions  that  perform  better  than  malloc()  and free(). PCRE runs         functions that perform better  than  malloc()  and  free().  PCRE  runs
717         noticeably more slowly when built in this way. This option affects only         noticeably more slowly when built in this way. This option affects only
718         the pcre_exec() function; it is not relevant for pcre_dfa_exec().         the pcre_exec() function; it is not relevant for pcre_dfa_exec().
719    
720    
721  LIMITING PCRE RESOURCE USAGE  LIMITING PCRE RESOURCE USAGE
722    
723         Internally,  PCRE has a function called match(), which it calls repeat-         Internally, PCRE has a function called match(), which it calls  repeat-
724         edly  (sometimes  recursively)  when  matching  a  pattern   with   the         edly   (sometimes   recursively)  when  matching  a  pattern  with  the
725         pcre_exec()  function.  By controlling the maximum number of times this         pcre_exec() function. By controlling the maximum number of  times  this
726         function may be called during a single matching operation, a limit  can         function  may be called during a single matching operation, a limit can
727         be  placed  on  the resources used by a single call to pcre_exec(). The         be placed on the resources used by a single call  to  pcre_exec().  The
728         limit can be changed at run time, as described in the pcreapi  documen-         limit  can be changed at run time, as described in the pcreapi documen-
729         tation.  The default is 10 million, but this can be changed by adding a         tation. The default is 10 million, but this can be changed by adding  a
730         setting such as         setting such as
731    
732           --with-match-limit=500000           --with-match-limit=500000
733    
734         to  the  configure  command.  This  setting  has  no  effect   on   the         to   the   configure  command.  This  setting  has  no  effect  on  the
735         pcre_dfa_exec() matching function.         pcre_dfa_exec() matching function.
736    
737         In  some  environments  it is desirable to limit the depth of recursive         In some environments it is desirable to limit the  depth  of  recursive
738         calls of match() more strictly than the total number of calls, in order         calls of match() more strictly than the total number of calls, in order
739         to  restrict  the maximum amount of stack (or heap, if --disable-stack-         to restrict the maximum amount of stack (or heap,  if  --disable-stack-
740         for-recursion is specified) that is used. A second limit controls this;         for-recursion is specified) that is used. A second limit controls this;
741         it  defaults  to  the  value  that is set for --with-match-limit, which         it defaults to the value that  is  set  for  --with-match-limit,  which
742         imposes no additional constraints. However, you can set a  lower  limit         imposes  no  additional constraints. However, you can set a lower limit
743         by adding, for example,         by adding, for example,
744    
745           --with-match-limit-recursion=10000           --with-match-limit-recursion=10000
746    
747         to  the  configure  command.  This  value can also be overridden at run         to the configure command. This value can  also  be  overridden  at  run
748         time.         time.
749    
750    
751  CREATING CHARACTER TABLES AT BUILD TIME  CREATING CHARACTER TABLES AT BUILD TIME
752    
753         PCRE uses fixed tables for processing characters whose code values  are         PCRE  uses fixed tables for processing characters whose code values are
754         less  than 256. By default, PCRE is built with a set of tables that are         less than 256. By default, PCRE is built with a set of tables that  are
755         distributed in the file pcre_chartables.c.dist. These  tables  are  for         distributed  in  the  file pcre_chartables.c.dist. These tables are for
756         ASCII codes only. If you add         ASCII codes only. If you add
757    
758           --enable-rebuild-chartables           --enable-rebuild-chartables
759    
760         to  the  configure  command, the distributed tables are no longer used.         to the configure command, the distributed tables are  no  longer  used.
761         Instead, a program called dftables is compiled and  run.  This  outputs         Instead,  a  program  called dftables is compiled and run. This outputs
762         the source for new set of tables, created in the default locale of your         the source for new set of tables, created in the default locale of your
763         C runtime system. (This method of replacing the tables does not work if         C runtime system. (This method of replacing the tables does not work if
764         you  are cross compiling, because dftables is run on the local host. If         you are cross compiling, because dftables is run on the local host.  If
765         you need to create alternative tables when cross  compiling,  you  will         you  need  to  create alternative tables when cross compiling, you will
766         have to do so "by hand".)         have to do so "by hand".)
767    
768    
769  USING EBCDIC CODE  USING EBCDIC CODE
770    
771         PCRE  assumes  by  default that it will run in an environment where the         PCRE assumes by default that it will run in an  environment  where  the
772         character code is ASCII (or Unicode, which is  a  superset  of  ASCII).         character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
773         This  is  the  case for most computer operating systems. PCRE can, how-         This is the case for most computer operating systems.  PCRE  can,  how-
774         ever, be compiled to run in an EBCDIC environment by adding         ever, be compiled to run in an EBCDIC environment by adding
775    
776           --enable-ebcdic           --enable-ebcdic
777    
778         to the configure command. This setting implies --enable-rebuild-charta-         to the configure command. This setting implies --enable-rebuild-charta-
779         bles.  You  should  only  use  it if you know that you are in an EBCDIC         bles. You should only use it if you know that  you  are  in  an  EBCDIC
780         environment (for example,  an  IBM  mainframe  operating  system).  The         environment  (for  example,  an  IBM  mainframe  operating system). The
781         --enable-ebcdic option is incompatible with --enable-utf.         --enable-ebcdic option is incompatible with --enable-utf.
782    
783    
# Line 788  PCREGREP OPTIONS FOR COMPRESSED FILE SUP Line 791  PCREGREP OPTIONS FOR COMPRESSED FILE SUP
791           --enable-pcregrep-libbz2           --enable-pcregrep-libbz2
792    
793         to the configure command. These options naturally require that the rel-         to the configure command. These options naturally require that the rel-
794         evant libraries are installed on your system. Configuration  will  fail         evant  libraries  are installed on your system. Configuration will fail
795         if they are not.         if they are not.
796    
797    
798  PCREGREP BUFFER SIZE  PCREGREP BUFFER SIZE
799    
800         pcregrep  uses  an internal buffer to hold a "window" on the file it is         pcregrep uses an internal buffer to hold a "window" on the file  it  is
801         scanning, in order to be able to output "before" and "after" lines when         scanning, in order to be able to output "before" and "after" lines when
802         it  finds  a match. The size of the buffer is controlled by a parameter         it finds a match. The size of the buffer is controlled by  a  parameter
803         whose default value is 20K. The buffer itself is three times this size,         whose default value is 20K. The buffer itself is three times this size,
804         but because of the way it is used for holding "before" lines, the long-         but because of the way it is used for holding "before" lines, the long-
805         est line that is guaranteed to be processable is  the  parameter  size.         est  line  that  is guaranteed to be processable is the parameter size.
806         You can change the default parameter value by adding, for example,         You can change the default parameter value by adding, for example,
807    
808           --with-pcregrep-bufsize=50K           --with-pcregrep-bufsize=50K
# Line 814  PCRETEST OPTION FOR LIBREADLINE SUPPORT Line 817  PCRETEST OPTION FOR LIBREADLINE SUPPORT
817    
818           --enable-pcretest-libreadline           --enable-pcretest-libreadline
819    
820         to the configure command,  pcretest  is  linked  with  the  libreadline         to  the  configure  command,  pcretest  is  linked with the libreadline
821         library,  and  when its input is from a terminal, it reads it using the         library, and when its input is from a terminal, it reads it  using  the
822         readline() function. This provides line-editing and history facilities.         readline() function. This provides line-editing and history facilities.
823         Note that libreadline is GPL-licensed, so if you distribute a binary of         Note that libreadline is GPL-licensed, so if you distribute a binary of
824         pcretest linked in this way, there may be licensing issues.         pcretest linked in this way, there may be licensing issues.
825    
826         Setting this option causes the -lreadline option to  be  added  to  the         Setting  this  option  causes  the -lreadline option to be added to the
827         pcretest  build.  In many operating environments with a sytem-installed         pcretest build. In many operating environments with  a  sytem-installed
828         libreadline this is sufficient. However, in some environments (e.g.  if         libreadline this is sufficient. However, in some environments (e.g.  if
829         an  unmodified  distribution version of readline is in use), some extra         an unmodified distribution version of readline is in use),  some  extra
830         configuration may be necessary. The INSTALL file for  libreadline  says         configuration  may  be necessary. The INSTALL file for libreadline says
831         this:         this:
832    
833           "Readline uses the termcap functions, but does not link with the           "Readline uses the termcap functions, but does not link with the
834           termcap or curses library itself, allowing applications which link           termcap or curses library itself, allowing applications which link
835           with readline the to choose an appropriate library."           with readline the to choose an appropriate library."
836    
837         If  your environment has not been set up so that an appropriate library         If your environment has not been set up so that an appropriate  library
838         is automatically included, you may need to add something like         is automatically included, you may need to add something like
839    
840           LIBS="-ncurses"           LIBS="-ncurses"

Legend:
Removed from v.869  
changed lines
  Added in v.878

  ViewVC Help
Powered by ViewVC 1.1.5