/[pcre]/code/trunk/maint/README
ViewVC logotype

Diff of /code/trunk/maint/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 181 by ph10, Wed Jun 13 14:55:18 2007 UTC revision 351 by ph10, Fri Jul 4 18:27:16 2008 UTC
# Line 16  also contains some notes for maintainers Line 16  also contains some notes for maintainers
16  Files in the maint directory  Files in the maint directory
17  ----------------------------  ----------------------------
18    
19    ----------------- This file is now OBSOLETE and no longer used ----------------
20  Builducptable    A Perl script that creates the contents of the ucptable.h file  Builducptable    A Perl script that creates the contents of the ucptable.h file
21                   from two Unicode data files, which themselves are downloaded                   from two Unicode data files, which themselves are downloaded
22                   from the Unicode web site. Run this script in the "maint"                   from the Unicode web site. Run this script in the "maint"
23                   directory.                   directory.
24    ----------------- This file is now OBSOLETE and no longer used ----------------
25    
26    GenerateUtt.py   A Python script to generate part of the pcre_tables.c file
27                     that contains Unicode script names in a long string with
28                     offsets, which is tedious to maintain by hand.
29    
30  ManyConfigTests  A shell script that runs "configure, make, test" a number of  ManyConfigTests  A shell script that runs "configure, make, test" a number of
31                   times with different configuration settings.                   times with different configuration settings.
32    
33  Unicode.tables   The files in this directory, Scripts.txt and UnicodeData.txt,  MultiStage2.py   A Python script that generates the file pcre_ucd.c from three
34                   were downloaded from the Unicode web site. They contain                   Unicode data tables, which are themselves downloaded from the
35                   information about Unicode characters and scripts.                   Unicode web site. Run this script in the "maint" directory.
36                     The generated file contains the tables for a 2-stage lookup
37  ucptest.c        A short C program for testing the Unicode property functions                   of Unicode properties.
38                   in pcre_ucp_searchfuncs.c, mainly useful after rebuilding the  
39                   Unicode property table. Compile and run this in the "maint"  Unicode.tables   The files in this directory, DerivedGeneralCategory.txt,
40                   directory.                   Scripts.txt and UnicodeData.txt, were downloaded from the
41                     Unicode web site. They contain information about Unicode
42                     characters and scripts.
43    
44    ucptest.c        A short C program for testing the Unicode property macros
45                     that do lookups in the pcre_ucd.c data, mainly useful after
46                     rebuilding the Unicode property table. Compile and run this in
47                     the "maint" directory (see comments at its head).
48    
49  ucptestdata      A directory containing two files, testinput1 and testoutput1,  ucptestdata      A directory containing two files, testinput1 and testoutput1,
50                   to use in conjunction with the ucptest program.                   to use in conjunction with the ucptest program.
# Line 49  Updating to a new Unicode release Line 62  Updating to a new Unicode release
62  ---------------------------------  ---------------------------------
63    
64  When there is a new release of Unicode, the files in Unicode.tables must be  When there is a new release of Unicode, the files in Unicode.tables must be
65  refreshed from the web site, and the Buildupctable script can then be run to  refreshed from the web site. If the new version of Unicode adds new character
66  generate a new version of ucptable.h. The ucptest program can be used to check  scripts, both the MultiStage2.py and the GenerateUtt.py scripts must be edited
67  that the resulting table works properly, using the data files in ucptestdata to  to add the new names. Then the MultiStage2.py script can then be run to
68  check a number of test characters.  generate a new version of pcre_ucd.c and the GenerateUtt.py can be run to
69    generate the tricky tables in pcre_tables.c.
70    
71    The ucptest program can then be compiled and used to check that the new tables
72    in pcre_ucd.c work properly, using the data files in ucptestdata to check a
73    number of test characters.
74    
75    
76  Preparing for a PCRE release  Preparing for a PCRE release
# Line 63  distribution for a new release. Line 81  distribution for a new release.
81    
82  . Ensure that the version number and version date are correct in configure.ac,  . Ensure that the version number and version date are correct in configure.ac,
83    ChangeLog, and NEWS.    ChangeLog, and NEWS.
84    
85    . If new build options have been added, ensure that they are added to the CMake
86      files as well as to the autoconf files.
87    
88  . Run ./autogen.sh to ensure everything is up-to-date.  . Run ./autogen.sh to ensure everything is up-to-date.
89    
# Line 113  Making a PCRE release Line 134  Making a PCRE release
134    
135  Run PrepareRelease and commit the files that it changes (by removing trailing  Run PrepareRelease and commit the files that it changes (by removing trailing
136  spaces). Then run "make distcheck" to create the tarballs and the zipball.  spaces). Then run "make distcheck" to create the tarballs and the zipball.
137    Double-check with "svn status", then create an SVN tagged copy:
138    
139      svn copy svn://vcs.exim.org/pcre/code/trunk \
140               svn://vcs.exim.org/pcre/code/tags/pcre-7.x
141    
142  Don't forget to update Freshmeat when the new release is out, and to tell  Don't forget to update Freshmeat when the new release is out, and to tell
143  webmaster@pcre.org and the mailing list.  webmaster@pcre.org and the mailing list.
# Line 217  others are relatively new. Line 242  others are relatively new.
242    to switch this dynamically. It would have to be specified when PCRE was    to switch this dynamically. It would have to be specified when PCRE was
243    compiled. PCRE would then call a function every time it wanted a character.    compiled. PCRE would then call a function every time it wanted a character.
244    
 . There are new (*PRUNE) facilities in Perl 5.10, some of which it might be  
   relatively easy to implement.  
   
245  . Wild thought: the ability to compile from PCRE's internal byte code to a real  . Wild thought: the ability to compile from PCRE's internal byte code to a real
246    FSM and a very fast (third) matcher to process the result. There would be    FSM and a very fast (third) matcher to process the result. There would be
247    even more restrictions than for pcre_dfa_exec(), however. This is not easy.    even more restrictions than for pcre_dfa_exec(), however. This is not easy.
# Line 233  others are relatively new. Line 255  others are relatively new.
255    
256  . Someone suggested --disable-callout to save code space when callouts are  . Someone suggested --disable-callout to save code space when callouts are
257    never wanted. This seems rather marginal.    never wanted. This seems rather marginal.
258    
259  . "Cut" as described in Jeffrey Friedl's book, p364: \v and \V. The definitions  . Check names that consist entirely of digits: PCRE allows, but do Perl and
260    aren't yet clear enough for me. \v flushes saved states so that no    Python, etc?
   backtracking to anything earlier can happen; \V says "no more bumpalong", but  
   does it fail the current match? As described in the book, these aren't really  
   "cut" as in Prolog, are they? NOTE: (a) PCRE once had "cut", but it was  
   removed when atomic groups were introduced. (b) Perl 5.10 has some (*PRUNE)  
   features --  
   
 . These are the Perl 5.10 backtracking control features (all of which are  
   described as "experimental" -- some of them "very experimental") that it  
   might be easy to add to PCRE. They all succeed when encountered, but act as  
   follows when backtracking:  
   
   (*PRUNE)  fail this match attempt, but still bumpalong  
   (*SKIP)   fail this match attempt, bumpalong to current match point  
   (*THEN)   fail this branch, try next branch at same level or fail if none  
   (*COMMIT) fail this match attempt, suppress bumpalong  
   (*FAIL)   fail and backtrack (same as (?!) and that can be optimized)  
   (*F)      synonym for (*FAIL)  
   (*ACCEPT) behave as if end of pattern reached ("very experimental")  
   
   Some of these can have arguments (*PRUNE:NAME) but I'm not sure whether they  
   make sense in the PCRE context.  
261    
262  Philip Hazel  Philip Hazel
263  Email local part: ph10  Email local part: ph10
264  Email domain: cam.ac.uk  Email domain: cam.ac.uk
265  Last updated: 13 June 2007  Last updated: 04 July 2008

Legend:
Removed from v.181  
changed lines
  Added in v.351

  ViewVC Help
Powered by ViewVC 1.1.5