36 |
The generated file contains the tables for a 2-stage lookup |
The generated file contains the tables for a 2-stage lookup |
37 |
of Unicode properties. |
of Unicode properties. |
38 |
|
|
39 |
|
README This file. |
40 |
|
|
41 |
Unicode.tables The files in this directory, DerivedGeneralCategory.txt, |
Unicode.tables The files in this directory, DerivedGeneralCategory.txt, |
42 |
Scripts.txt and UnicodeData.txt, were downloaded from the |
Scripts.txt and UnicodeData.txt, were downloaded from the |
43 |
Unicode web site. They contain information about Unicode |
Unicode web site. They contain information about Unicode |
64 |
--------------------------------- |
--------------------------------- |
65 |
|
|
66 |
When there is a new release of Unicode, the files in Unicode.tables must be |
When there is a new release of Unicode, the files in Unicode.tables must be |
67 |
refreshed from the web site. If the new version of Unicode adds new character |
refreshed from the web site. If the new version of Unicode adds new character |
68 |
scripts, the source file ucp.h and both the MultiStage2.py and the |
scripts, the source file ucp.h and both the MultiStage2.py and the |
69 |
GenerateUtt.py scripts must be edited to add the new names. Then the |
GenerateUtt.py scripts must be edited to add the new names. Then MultiStage2.py |
70 |
MultiStage2.py script can then be run to generate a new version of pcre_ucd.c |
can be run to generate a new version of pcre_ucd.c, and GenerateUtt.py can be |
71 |
and the GenerateUtt.py can be run to generate the tricky tables for inclusion |
run to generate the tricky tables for inclusion in pcre_tables.c. |
72 |
in pcre_tables.c. |
|
73 |
|
The ucptest program can be compiled and used to check that the new tables in |
74 |
The ucptest program can then be compiled and used to check that the new tables |
pcre_ucd.c work properly, using the data files in ucptestdata to check a number |
75 |
in pcre_ucd.c work properly, using the data files in ucptestdata to check a |
of test characters. |
|
number of test characters. |
|
76 |
|
|
77 |
|
|
78 |
Preparing for a PCRE release |
Preparing for a PCRE release |
81 |
This section contains a checklist of things that I consult before building a |
This section contains a checklist of things that I consult before building a |
82 |
distribution for a new release. |
distribution for a new release. |
83 |
|
|
84 |
. Ensure that the version number and version date are correct in configure.ac, |
. Ensure that the version number and version date are correct in configure.ac. |
|
ChangeLog, and NEWS. |
|
85 |
|
|
86 |
. If new build options have been added, ensure that they are added to the CMake |
. If new build options have been added, ensure that they are added to the CMake |
87 |
files as well as to the autoconf files. |
files as well as to the autoconf files. |
91 |
. Compile and test with many different config options, and combinations of |
. Compile and test with many different config options, and combinations of |
92 |
options. The maint/ManyConfigTests script now encapsulates this testing. |
options. The maint/ManyConfigTests script now encapsulates this testing. |
93 |
|
|
94 |
. Run perltest.pl on the test data for tests 1 and 4. The output should match |
. Run perltest.pl on the test data for tests 1, 4, 6, and 11. The first two can |
95 |
the PCRE test output, apart from the version identification at the top. The |
be run with Perl 5.8 or 5.10; the last two require Perl 5.10. The output |
96 |
other tests are not Perl-compatible (they use various special PCRE options). |
should match the PCRE test output, apart from the version identification at |
97 |
|
the start of each test. The other tests are not Perl-compatible (they use |
98 |
|
various PCRE-specific features or options). |
99 |
|
|
100 |
. Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest |
. Test with valgrind by running "RunTest valgrind". There is also "RunGrepTest |
101 |
valgrind", though that takes quite a long time. |
valgrind", though that takes quite a long time. |
118 |
used" warnings for the modules in which there is no call to memmove(). These |
used" warnings for the modules in which there is no call to memmove(). These |
119 |
can be ignored. |
can be ignored. |
120 |
|
|
121 |
. Documentation: check AUTHORS, COPYING, ChangeLog (check date), INSTALL, |
. Documentation: check AUTHORS, COPYING, ChangeLog (check version and date), |
122 |
LICENCE, NEWS (check date), NON-UNIX-USE, and README. Many of these won't |
INSTALL, LICENCE, NEWS (check version and date), NON-UNIX-USE, and README. |
123 |
need changing, but over the long term things do change. |
Many of these won't need changing, but over the long term things do change. |
124 |
|
|
125 |
. Man pages: Check all man pages for \ not followed by e or f or " because |
. Man pages: Check all man pages for \ not followed by e or f or " because |
126 |
that indicates a markup error. |
that indicates a markup error. |
140 |
Double-check with "svn status", then create an SVN tagged copy: |
Double-check with "svn status", then create an SVN tagged copy: |
141 |
|
|
142 |
svn copy svn://vcs.exim.org/pcre/code/trunk \ |
svn copy svn://vcs.exim.org/pcre/code/trunk \ |
143 |
svn://vcs.exim.org/pcre/code/tags/pcre-7.x |
svn://vcs.exim.org/pcre/code/tags/pcre-8.xx |
144 |
|
|
145 |
Don't forget to update Freshmeat when the new release is out, and to tell |
Don't forget to update Freshmeat when the new release is out, and to tell |
146 |
webmaster@pcre.org and the mailing list. |
webmaster@pcre.org and the mailing list. |
168 |
to have little effect, and maybe makes things worse. |
to have little effect, and maybe makes things worse. |
169 |
|
|
170 |
* "Ends with literal string" - note that a single character doesn't gain much |
* "Ends with literal string" - note that a single character doesn't gain much |
171 |
over the existing "required byte" (reqbyte) feature that just saves one |
over the existing "required byte" (reqbyte) feature that just remembers one |
172 |
byte. |
byte. |
173 |
|
|
174 |
* These probably need to go in study(): |
* These probably need to go in study(): |
178 |
o A required byte from alternatives - not just the last char, but an |
o A required byte from alternatives - not just the last char, but an |
179 |
earlier one if common to all alternatives. |
earlier one if common to all alternatives. |
180 |
|
|
181 |
o Minimum length of subject needed. |
o Minimum length of subject needed (see also next . bullet). |
182 |
|
|
183 |
o Friedl contains other ideas. |
o Friedl contains other ideas. |
184 |
|
|
185 |
|
. There was a request for a way of finding the minimum subject length that can |
186 |
|
match a given pattern. (If this were available, it could be usefully added |
187 |
|
to study() - see above.) This is easy for simple cases, but I haven't figured |
188 |
|
out how to handle recursion. |
189 |
|
|
190 |
. If Perl gets to a consistent state over the settings of capturing sub- |
. If Perl gets to a consistent state over the settings of capturing sub- |
191 |
patterns inside repeats, see if we can match it. One example of the |
patterns inside repeats, see if we can match it. One example of the |
220 |
|
|
221 |
* Option to use NUL as a line terminator in subject strings. This could now |
* Option to use NUL as a line terminator in subject strings. This could now |
222 |
be done relatively easily since the extension to support LF, CR, and CRLF. |
be done relatively easily since the extension to support LF, CR, and CRLF. |
223 |
If this is done, a suitable option for pcregrep is also required. |
If it is done, a suitable option for pcregrep is also required. |
224 |
|
|
225 |
. Option to provide the pattern with a length instead of with a NUL terminator. |
. Option to provide the pattern with a length instead of with a NUL terminator. |
226 |
This probably affects quite a few places in the code. |
This affects quite a few places in the code and is not trivial. |
227 |
|
|
228 |
. Catch SIGSEGV for stack overflows? |
. Catch SIGSEGV for stack overflows? |
229 |
|
|
238 |
preceded by a blank line, instead of adding it to every matched line, and (b) |
preceded by a blank line, instead of adding it to every matched line, and (b) |
239 |
support --outputfile=name. |
support --outputfile=name. |
240 |
|
|
241 |
. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 7. |
. Consider making UTF-8 and UCP the default for PCRE n.0 for some n > 8. |
242 |
|
|
243 |
. Add a user pointer to pcre_malloc/free functions -- some option would be |
. Add a user pointer to pcre_malloc/free functions -- some option would be |
244 |
needed to retain backward compatibility. |
needed to retain backward compatibility. |
275 |
. Callouts with arguments: (?Cn:ARG) for instance. |
. Callouts with arguments: (?Cn:ARG) for instance. |
276 |
|
|
277 |
. A user is going to supply a patch to generalize the API for user-specific |
. A user is going to supply a patch to generalize the API for user-specific |
278 |
memory allocation so that it is more flexible in threaded environments. |
memory allocation so that it is more flexible in threaded environments. Thiw |
279 |
|
was promised a long time ago, and never appeared... |
280 |
|
|
281 |
|
. Write a function that generates random matching strings for a compiled regex. |
282 |
|
|
283 |
|
. Write a wrapper to maintain a structure with specified runtime parameters, |
284 |
|
such as recurse limit, and pass these to PCRE each time it is called. Also |
285 |
|
maybe malloc and free. A user sent a prototype. |
286 |
|
|
287 |
|
. Pcregrep: an option to specify the output line separator, either as a string |
288 |
|
or select from a fixed list. This is not dead easy, because at the moment it |
289 |
|
outputs whatever is in the input file. |
290 |
|
|
291 |
|
. Improve the code for duplicate checking in pcre_dfa_exec(). An incomplete, |
292 |
|
non-thread-safe patch showed that this can help performance for patterns |
293 |
|
where there are many alternatives. However, a simple thread-safe |
294 |
|
implementation that I tried made things worse in many simple cases, so this |
295 |
|
is not an obviously good thing. |
296 |
|
|
297 |
|
. Make the longest lookbehind available via pcre_fullinfo(). This is not |
298 |
|
straightforward because lookbehinds can be nested inside lookbehinds. This |
299 |
|
case will have to be identified, and the amounts added. This should then give |
300 |
|
the maximum possible lookbehind length. The reason for wanting this is to |
301 |
|
help when implementing multi-segment matching using pcre_exec() with partial |
302 |
|
matching and overlapping segments. |
303 |
|
|
304 |
|
. PCRE cannot at present distinguish between subpatterns with different names, |
305 |
|
but the same number (created by the use of ?|). In order to do so, a way of |
306 |
|
remembering *which* subpattern numbered n matched is needed. Bugzilla #760. |
307 |
|
|
308 |
Philip Hazel |
Philip Hazel |
309 |
Email local part: ph10 |
Email local part: ph10 |
310 |
Email domain: cam.ac.uk |
Email domain: cam.ac.uk |
311 |
Last updated: 26 August 2008 |
Last updated: 20 September 2009 |