1 |
ChangeLog for PCRE |
ChangeLog for PCRE |
2 |
------------------ |
------------------ |
3 |
|
|
4 |
Version 8.34 xx-xxxx-201x |
Version 8.34 19-November-2013 |
5 |
-------------------------- |
----------------------------- |
6 |
|
|
7 |
1. Add pcre[16|32]_jit_free_unused_memory to forcibly free unused JIT |
1. Add pcre[16|32]_jit_free_unused_memory to forcibly free unused JIT |
8 |
executable memory. Patch inspired by Carsten Klein. |
executable memory. Patch inspired by Carsten Klein. |
9 |
|
|
10 |
2. ./configure --enable-coverage defined SUPPORT_GCOV in config.h, although |
2. ./configure --enable-coverage defined SUPPORT_GCOV in config.h, although |
11 |
this macro is never tested and has no effect, because the work to support |
this macro is never tested and has no effect, because the work to support |
12 |
coverage involves only compiling and linking options and special targets in |
coverage involves only compiling and linking options and special targets in |
13 |
the Makefile. The comment in config.h implied that defining the macro would |
the Makefile. The comment in config.h implied that defining the macro would |
14 |
enable coverage support, which is totally false. There was also support for |
enable coverage support, which is totally false. There was also support for |
15 |
setting this macro in the CMake files (my fault, I just copied it from |
setting this macro in the CMake files (my fault, I just copied it from |
16 |
configure). SUPPORT_GCOV has now been removed. |
configure). SUPPORT_GCOV has now been removed. |
17 |
|
|
18 |
3. Make a small performance improvement in strlen16() and strlen32() in |
3. Make a small performance improvement in strlen16() and strlen32() in |
19 |
pcretest. |
pcretest. |
20 |
|
|
21 |
4. Change 36 for 8.33 left some unreachable statements in pcre_exec.c, |
4. Change 36 for 8.33 left some unreachable statements in pcre_exec.c, |
22 |
detected by the Solaris compiler (gcc doesn't seem to be able to diagnose |
detected by the Solaris compiler (gcc doesn't seem to be able to diagnose |
23 |
these cases). There was also one in pcretest.c. |
these cases). There was also one in pcretest.c. |
24 |
|
|
25 |
5. Cleaned up a "may be uninitialized" compiler warning in pcre_exec.c. |
5. Cleaned up a "may be uninitialized" compiler warning in pcre_exec.c. |
26 |
|
|
27 |
6. In UTF mode, the code for checking whether a group could match an empty |
6. In UTF mode, the code for checking whether a group could match an empty |
28 |
string (which is used for indefinitely repeated groups to allow for |
string (which is used for indefinitely repeated groups to allow for |
31 |
data item and had a minimum repetition of zero (for example, [^\x{100}]* in |
data item and had a minimum repetition of zero (for example, [^\x{100}]* in |
32 |
UTF-8 mode). The effect was undefined: the group might or might not be |
UTF-8 mode). The effect was undefined: the group might or might not be |
33 |
deemed as matching an empty string, or the program might have crashed. |
deemed as matching an empty string, or the program might have crashed. |
34 |
|
|
35 |
7. The code for checking whether a group could match an empty string was not |
7. The code for checking whether a group could match an empty string was not |
36 |
recognizing that \h, \H, \v, \V, and \R must match a character. |
recognizing that \h, \H, \v, \V, and \R must match a character. |
37 |
|
|
38 |
8. Implemented PCRE_INFO_MATCH_EMPTY, which yields 1 if the pattern can match |
8. Implemented PCRE_INFO_MATCH_EMPTY, which yields 1 if the pattern can match |
39 |
an empty string. If it can, pcretest shows this in its information output. |
an empty string. If it can, pcretest shows this in its information output. |
40 |
|
|
41 |
9. Fixed two related bugs that applied to Unicode extended grapheme clusters |
9. Fixed two related bugs that applied to Unicode extended grapheme clusters |
42 |
that were repeated with a maximizing qualifier (e.g. \X* or \X{2,5}) when |
that were repeated with a maximizing qualifier (e.g. \X* or \X{2,5}) when |
43 |
matched by pcre_exec() without using JIT: |
matched by pcre_exec() without using JIT: |
44 |
|
|
45 |
(a) If the rest of the pattern did not match after a maximal run of |
(a) If the rest of the pattern did not match after a maximal run of |
46 |
grapheme clusters, the code for backing up to try with fewer of them |
grapheme clusters, the code for backing up to try with fewer of them |
47 |
did not always back up over a full grapheme when characters that do not |
did not always back up over a full grapheme when characters that do not |
48 |
have the modifier quality were involved, e.g. Hangul syllables. |
have the modifier quality were involved, e.g. Hangul syllables. |
49 |
|
|
50 |
(b) If the match point in a subject started with modifier character, and |
(b) If the match point in a subject started with modifier character, and |
51 |
there was no match, the code could incorrectly back up beyond the match |
there was no match, the code could incorrectly back up beyond the match |
52 |
point, and potentially beyond the first character in the subject, |
point, and potentially beyond the first character in the subject, |
53 |
leading to a segfault or an incorrect match result. |
leading to a segfault or an incorrect match result. |
54 |
|
|
55 |
10. A conditional group with an assertion condition could lead to PCRE |
10. A conditional group with an assertion condition could lead to PCRE |
56 |
recording an incorrect first data item for a match if no other first data |
recording an incorrect first data item for a match if no other first data |
57 |
item was recorded. For example, the pattern (?(?=ab)ab) recorded "a" as a |
item was recorded. For example, the pattern (?(?=ab)ab) recorded "a" as a |
58 |
first data item, and therefore matched "ca" after "c" instead of at the |
first data item, and therefore matched "ca" after "c" instead of at the |
59 |
start. |
start. |
60 |
|
|
61 |
11. Change 40 for 8.33 (allowing pcregrep to find empty strings) showed up a |
11. Change 40 for 8.33 (allowing pcregrep to find empty strings) showed up a |
62 |
bug that caused the command "echo a | ./pcregrep -M '|a'" to loop. |
bug that caused the command "echo a | ./pcregrep -M '|a'" to loop. |
63 |
|
|
64 |
12. The source of pcregrep now includes z/OS-specific code so that it can be |
12. The source of pcregrep now includes z/OS-specific code so that it can be |
65 |
compiled for z/OS as part of the special z/OS distribution. |
compiled for z/OS as part of the special z/OS distribution. |
66 |
|
|
67 |
13. Added the -T and -TM options to pcretest. |
13. Added the -T and -TM options to pcretest. |
68 |
|
|
69 |
14. The code in pcre_compile.c for creating the table of named capturing groups |
14. The code in pcre_compile.c for creating the table of named capturing groups |
70 |
has been refactored. Instead of creating the table dynamically during the |
has been refactored. Instead of creating the table dynamically during the |
71 |
actual compiling pass, the information is remembered during the pre-compile |
actual compiling pass, the information is remembered during the pre-compile |
72 |
pass (on the stack unless there are more than 20 named groups, in which |
pass (on the stack unless there are more than 20 named groups, in which |
73 |
case malloc() is used) and the whole table is created before the actual |
case malloc() is used) and the whole table is created before the actual |
74 |
compile happens. This has simplified the code (it is now nearly 150 lines |
compile happens. This has simplified the code (it is now nearly 150 lines |
75 |
shorter) and prepared the way for better handling of references to groups |
shorter) and prepared the way for better handling of references to groups |
76 |
with duplicate names. |
with duplicate names. |
77 |
|
|
78 |
15. A back reference to a named subpattern when there is more than one of the |
15. A back reference to a named subpattern when there is more than one of the |
79 |
same name now checks them in the order in which they appear in the pattern. |
same name now checks them in the order in which they appear in the pattern. |
80 |
The first one that is set is used for the reference. Previously only the |
The first one that is set is used for the reference. Previously only the |
81 |
first one was inspected. This change makes PCRE more compatible with Perl. |
first one was inspected. This change makes PCRE more compatible with Perl. |
82 |
|
|
83 |
16. Unicode character properties were updated from Unicode 6.3.0. |
16. Unicode character properties were updated from Unicode 6.3.0. |
84 |
|
|
85 |
17. The compile-time code for auto-possessification has been refactored, based |
17. The compile-time code for auto-possessification has been refactored, based |
86 |
on a patch by Zoltan Herczeg. It now happens after instead of during |
on a patch by Zoltan Herczeg. It now happens after instead of during |
87 |
compilation. The code is cleaner, and more cases are handled. The option |
compilation. The code is cleaner, and more cases are handled. The option |
88 |
PCRE_NO_AUTO_POSSESS is added for testing purposes, and the -O and /O |
PCRE_NO_AUTO_POSSESS is added for testing purposes, and the -O and /O |
89 |
options in pcretest are provided to set it. It can also be set by |
options in pcretest are provided to set it. It can also be set by |
90 |
(*NO_AUTO_POSSESS) at the start of a pattern. |
(*NO_AUTO_POSSESS) at the start of a pattern. |
91 |
|
|
92 |
18. The character VT has been added to the set of characters that match \s and |
18. The character VT has been added to the set of characters that match \s and |
93 |
are generally treated as white space, following this same change in Perl |
are generally treated as white space, following this same change in Perl |
94 |
5.18. There is now no difference between "Perl space" and "POSIX space". |
5.18. There is now no difference between "Perl space" and "POSIX space". |
95 |
|
|
96 |
19. The code for checking named groups as conditions, either for being set or |
19. The code for checking named groups as conditions, either for being set or |
97 |
for being recursed, has been refactored (this is related to 14 and 15 |
for being recursed, has been refactored (this is related to 14 and 15 |
98 |
above). Processing unduplicated named groups should now be as fast at |
above). Processing unduplicated named groups should now be as fast at |
99 |
numerical groups, and processing duplicated groups should be faster than |
numerical groups, and processing duplicated groups should be faster than |
100 |
before. |
before. |
101 |
|
|
102 |
20. Two patches to the CMake build system, by Alexander Barkov: |
20. Two patches to the CMake build system, by Alexander Barkov: |
103 |
|
|
104 |
(1) Replace the "source" command by "." in CMakeLists.txt because |
(1) Replace the "source" command by "." in CMakeLists.txt because |
105 |
"source" is a bash-ism. |
"source" is a bash-ism. |
106 |
|
|
107 |
(2) Add missing HAVE_STDINT_H and HAVE_INTTYPES_H to config-cmake.h.in; |
(2) Add missing HAVE_STDINT_H and HAVE_INTTYPES_H to config-cmake.h.in; |
108 |
without these the CMake build does not work on Solaris. |
without these the CMake build does not work on Solaris. |
109 |
|
|
110 |
21. Perl has changed its handling of \8 and \9. If there is no previously |
21. Perl has changed its handling of \8 and \9. If there is no previously |
111 |
encountered capturing group of those numbers, they are treated as the |
encountered capturing group of those numbers, they are treated as the |
112 |
literal characters 8 and 9 instead of a binary zero followed by the |
literal characters 8 and 9 instead of a binary zero followed by the |
113 |
literals. PCRE now does the same. |
literals. PCRE now does the same. |
114 |
|
|
115 |
22. Following Perl, added \o{} to specify codepoints in octal, making it |
22. Following Perl, added \o{} to specify codepoints in octal, making it |
116 |
possible to specify values greater than 0777 and also making them |
possible to specify values greater than 0777 and also making them |
117 |
unambiguous. |
unambiguous. |
118 |
|
|
119 |
23. Perl now gives an error for missing closing braces after \x{... instead of |
23. Perl now gives an error for missing closing braces after \x{... instead of |
120 |
treating the string as literal. PCRE now does the same. |
treating the string as literal. PCRE now does the same. |
121 |
|
|
122 |
24. RunTest used to grumble if an inappropriate test was selected explicitly, |
24. RunTest used to grumble if an inappropriate test was selected explicitly, |
123 |
but just skip it when running all tests. This make it awkward to run ranges |
but just skip it when running all tests. This make it awkward to run ranges |
124 |
of tests when one of them was inappropriate. Now it just skips any |
of tests when one of them was inappropriate. Now it just skips any |
125 |
inappropriate tests, as it always did when running all tests. |
inappropriate tests, as it always did when running all tests. |
126 |
|
|
127 |
25. If PCRE_AUTO_CALLOUT and PCRE_UCP were set for a pattern that contained |
25. If PCRE_AUTO_CALLOUT and PCRE_UCP were set for a pattern that contained |
128 |
character types such as \d or \w, too many callouts were inserted, and the |
character types such as \d or \w, too many callouts were inserted, and the |
129 |
data that they returned was rubbish. |
data that they returned was rubbish. |
130 |
|
|
131 |
26. In UCP mode, \s was not matching two of the characters that Perl matches, |
26. In UCP mode, \s was not matching two of the characters that Perl matches, |
132 |
namely NEL (U+0085) and MONGOLIAN VOWEL SEPARATOR (U+180E), though they |
namely NEL (U+0085) and MONGOLIAN VOWEL SEPARATOR (U+180E), though they |
133 |
were matched by \h. The code has now been refactored so that the lists of |
were matched by \h. The code has now been refactored so that the lists of |
134 |
the horizontal and vertical whitespace characters used for \h and \v (which |
the horizontal and vertical whitespace characters used for \h and \v (which |
135 |
are defined only in one place) are now also used for \s. |
are defined only in one place) are now also used for \s. |
136 |
|
|
137 |
27. Add JIT support for the 64 bit TileGX architecture. |
27. Add JIT support for the 64 bit TileGX architecture. |
138 |
Patch by Jiong Wang (Tilera Corporation). |
Patch by Jiong Wang (Tilera Corporation). |
139 |
|
|
140 |
28. Possessive quantifiers for classes (both explicit and automatically |
28. Possessive quantifiers for classes (both explicit and automatically |
141 |
generated) now use special opcodes instead of wrapping in ONCE brackets. |
generated) now use special opcodes instead of wrapping in ONCE brackets. |
142 |
|
|
143 |
29. Whereas an item such as A{4}+ ignored the possessivenes of the quantifier |
29. Whereas an item such as A{4}+ ignored the possessivenes of the quantifier |
144 |
(because it's meaningless), this was not happening when PCRE_CASELESS was |
(because it's meaningless), this was not happening when PCRE_CASELESS was |
145 |
set. Not wrong, but inefficient. |
set. Not wrong, but inefficient. |
146 |
|
|
147 |
30. Updated perltest.pl to add /u (force Unicode mode) when /W (use Unicode |
30. Updated perltest.pl to add /u (force Unicode mode) when /W (use Unicode |
148 |
properties for \w, \d, etc) is present in a test regex. Otherwise if the |
properties for \w, \d, etc) is present in a test regex. Otherwise if the |
149 |
test contains no characters greater than 255, Perl doesn't realise it |
test contains no characters greater than 255, Perl doesn't realise it |
150 |
should be using Unicode semantics. |
should be using Unicode semantics. |
151 |
|
|
152 |
31. Upgraded the handling of the POSIX classes [:graph:], [:print:], and |
31. Upgraded the handling of the POSIX classes [:graph:], [:print:], and |
153 |
[:punct:] when PCRE_UCP is set so as to include the same characters as Perl |
[:punct:] when PCRE_UCP is set so as to include the same characters as Perl |
154 |
does in Unicode mode. |
does in Unicode mode. |
155 |
|
|
156 |
32. Added the "forbid" facility to pcretest so that putting tests into the |
32. Added the "forbid" facility to pcretest so that putting tests into the |
157 |
wrong test files can sometimes be quickly detected. |
wrong test files can sometimes be quickly detected. |
158 |
|
|
159 |
33. There is now a limit (default 250) on the depth of nesting of parentheses. |
33. There is now a limit (default 250) on the depth of nesting of parentheses. |
160 |
This limit is imposed to control the amount of system stack used at compile |
This limit is imposed to control the amount of system stack used at compile |
161 |
time. It can be changed at build time by --with-parens-nest-limit=xxx or |
time. It can be changed at build time by --with-parens-nest-limit=xxx or |
162 |
the equivalent in CMake. |
the equivalent in CMake. |
163 |
|
|
164 |
34. Character classes such as [A-\d] or [a-[:digit:]] now cause compile-time |
34. Character classes such as [A-\d] or [a-[:digit:]] now cause compile-time |
165 |
errors. Perl warns for these when in warning mode, but PCRE has no facility |
errors. Perl warns for these when in warning mode, but PCRE has no facility |
166 |
for giving warnings. |
for giving warnings. |
167 |
|
|
168 |
35. Change 34 for 8.13 allowed quantifiers on assertions, because Perl does. |
35. Change 34 for 8.13 allowed quantifiers on assertions, because Perl does. |
169 |
However, this was not working for (?!) because it is optimized to (*FAIL), |
However, this was not working for (?!) because it is optimized to (*FAIL), |
170 |
for which PCRE does not allow quantifiers. The optimization is now disabled |
for which PCRE does not allow quantifiers. The optimization is now disabled |
171 |
when a quantifier follows (?!). I can't see any use for this, but it makes |
when a quantifier follows (?!). I can't see any use for this, but it makes |
172 |
things uniform. |
things uniform. |
173 |
|
|
174 |
36. Perl no longer allows group names to start with digits, so I have made this |
36. Perl no longer allows group names to start with digits, so I have made this |
175 |
change also in PCRE. It simplifies the code a bit. |
change also in PCRE. It simplifies the code a bit. |
176 |
|
|
177 |
37. In extended mode, Perl ignores spaces before a + that indicates a |
37. In extended mode, Perl ignores spaces before a + that indicates a |
178 |
possessive quantifier. PCRE allowed a space before the quantifier, but not |
possessive quantifier. PCRE allowed a space before the quantifier, but not |
179 |
before the possessive +. It now does. |
before the possessive +. It now does. |
180 |
|
|
181 |
38. The use of \K (reset reported match start) within a repeated possessive |
38. The use of \K (reset reported match start) within a repeated possessive |
182 |
group such as (a\Kb)*+ was not working. |
group such as (a\Kb)*+ was not working. |
183 |
|
|
184 |
40. Document that the same character tables must be used at compile time and |
40. Document that the same character tables must be used at compile time and |
185 |
run time, and that the facility to pass tables to pcre_exec() and |
run time, and that the facility to pass tables to pcre_exec() and |
186 |
pcre_dfa_exec() is for use only with saved/restored patterns. |
pcre_dfa_exec() is for use only with saved/restored patterns. |
187 |
|
|
188 |
|
|
189 |
Version 8.33 28-May-2013 |
Version 8.33 28-May-2013 |