1 |
ChangeLog for PCRE |
ChangeLog for PCRE |
2 |
------------------ |
------------------ |
3 |
|
|
4 |
Version 8.37 xx-xxx-2015 |
Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All |
5 |
------------------------ |
development is happening in the PCRE2 10.xx series. |
6 |
|
|
7 |
|
Version 8.39 xx-xxxxxx-201x |
8 |
|
--------------------------- |
9 |
|
|
10 |
|
1. If PCRE_AUTO_CALLOUT was set on a pattern that had a (?# comment between |
11 |
|
an item and its qualifier (for example, A(?#comment)?B) pcre_compile() |
12 |
|
misbehaved. This bug was found by the LLVM fuzzer. |
13 |
|
|
14 |
|
2. Similar to the above, if an isolated \E was present between an item and its |
15 |
|
qualifier when PCRE_AUTO_CALLOUT was set, pcre_compile() misbehaved. This |
16 |
|
bug was found by the LLVM fuzzer. |
17 |
|
|
18 |
|
3. Further to 8.38/46, negated classes such as [^[:^ascii:]\d] were also not |
19 |
|
working correctly in UCP mode. |
20 |
|
|
21 |
|
4. The POSIX wrapper function regexec() crashed if the option REG_STARTEND |
22 |
|
was set when the pmatch argument was NULL. It now returns REG_INVARG. |
23 |
|
|
24 |
|
|
25 |
|
Version 8.38 23-November-2015 |
26 |
|
----------------------------- |
27 |
|
|
28 |
|
1. If a group that contained a recursive back reference also contained a |
29 |
|
forward reference subroutine call followed by a non-forward-reference |
30 |
|
subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to |
31 |
|
compile correct code, leading to undefined behaviour or an internally |
32 |
|
detected error. This bug was discovered by the LLVM fuzzer. |
33 |
|
|
34 |
|
2. Quantification of certain items (e.g. atomic back references) could cause |
35 |
|
incorrect code to be compiled when recursive forward references were |
36 |
|
involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. |
37 |
|
This bug was discovered by the LLVM fuzzer. |
38 |
|
|
39 |
|
3. A repeated conditional group whose condition was a reference by name caused |
40 |
|
a buffer overflow if there was more than one group with the given name. |
41 |
|
This bug was discovered by the LLVM fuzzer. |
42 |
|
|
43 |
|
4. A recursive back reference by name within a group that had the same name as |
44 |
|
another group caused a buffer overflow. For example: |
45 |
|
/(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer. |
46 |
|
|
47 |
|
5. A forward reference by name to a group whose number is the same as the |
48 |
|
current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused |
49 |
|
a buffer overflow at compile time. This bug was discovered by the LLVM |
50 |
|
fuzzer. |
51 |
|
|
52 |
|
6. A lookbehind assertion within a set of mutually recursive subpatterns could |
53 |
|
provoke a buffer overflow. This bug was discovered by the LLVM fuzzer. |
54 |
|
|
55 |
|
7. Another buffer overflow bug involved duplicate named groups with a |
56 |
|
reference between their definition, with a group that reset capture |
57 |
|
numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been |
58 |
|
fixed by always allowing for more memory, even if not needed. (A proper fix |
59 |
|
is implemented in PCRE2, but it involves more refactoring.) |
60 |
|
|
61 |
|
8. There was no check for integer overflow in subroutine calls such as (?123). |
62 |
|
|
63 |
|
9. The table entry for \l in EBCDIC environments was incorrect, leading to its |
64 |
|
being treated as a literal 'l' instead of causing an error. |
65 |
|
|
66 |
|
10. There was a buffer overflow if pcre_exec() was called with an ovector of |
67 |
|
size 1. This bug was found by american fuzzy lop. |
68 |
|
|
69 |
|
11. If a non-capturing group containing a conditional group that could match |
70 |
|
an empty string was repeated, it was not identified as matching an empty |
71 |
|
string itself. For example: /^(?:(?(1)x|)+)+$()/. |
72 |
|
|
73 |
|
12. In an EBCDIC environment, pcretest was mishandling the escape sequences |
74 |
|
\a and \e in test subject lines. |
75 |
|
|
76 |
|
13. In an EBCDIC environment, \a in a pattern was converted to the ASCII |
77 |
|
instead of the EBCDIC value. |
78 |
|
|
79 |
|
14. The handling of \c in an EBCDIC environment has been revised so that it is |
80 |
|
now compatible with the specification in Perl's perlebcdic page. |
81 |
|
|
82 |
|
15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in |
83 |
|
ASCII/Unicode. This has now been added to the list of characters that are |
84 |
|
recognized as white space in EBCDIC. |
85 |
|
|
86 |
|
16. When PCRE was compiled without UCP support, the use of \p and \P gave an |
87 |
|
error (correctly) when used outside a class, but did not give an error |
88 |
|
within a class. |
89 |
|
|
90 |
|
17. \h within a class was incorrectly compiled in EBCDIC environments. |
91 |
|
|
92 |
|
18. A pattern with an unmatched closing parenthesis that contained a backward |
93 |
|
assertion which itself contained a forward reference caused buffer |
94 |
|
overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/. |
95 |
|
|
96 |
|
19. JIT should return with error when the compiled pattern requires more stack |
97 |
|
space than the maximum. |
98 |
|
|
99 |
|
20. A possessively repeated conditional group that could match an empty string, |
100 |
|
for example, /(?(R))*+/, was incorrectly compiled. |
101 |
|
|
102 |
|
21. Fix infinite recursion in the JIT compiler when certain patterns such as |
103 |
|
/(?:|a|){100}x/ are analysed. |
104 |
|
|
105 |
|
22. Some patterns with character classes involving [: and \\ were incorrectly |
106 |
|
compiled and could cause reading from uninitialized memory or an incorrect |
107 |
|
error diagnosis. |
108 |
|
|
109 |
|
23. Pathological patterns containing many nested occurrences of [: caused |
110 |
|
pcre_compile() to run for a very long time. |
111 |
|
|
112 |
|
24. A conditional group with only one branch has an implicit empty alternative |
113 |
|
branch and must therefore be treated as potentially matching an empty |
114 |
|
string. |
115 |
|
|
116 |
|
25. If (?R was followed by - or + incorrect behaviour happened instead of a |
117 |
|
diagnostic. |
118 |
|
|
119 |
|
26. Arrange to give up on finding the minimum matching length for overly |
120 |
|
complex patterns. |
121 |
|
|
122 |
|
27. Similar to (4) above: in a pattern with duplicated named groups and an |
123 |
|
occurrence of (?| it is possible for an apparently non-recursive back |
124 |
|
reference to become recursive if a later named group with the relevant |
125 |
|
number is encountered. This could lead to a buffer overflow. Wen Guanxing |
126 |
|
from Venustech ADLAB discovered this bug. |
127 |
|
|
128 |
|
28. If pcregrep was given the -q option with -c or -l, or when handling a |
129 |
|
binary file, it incorrectly wrote output to stdout. |
130 |
|
|
131 |
|
29. The JIT compiler did not restore the control verb head in case of *THEN |
132 |
|
control verbs. This issue was found by Karl Skomski with a custom LLVM |
133 |
|
fuzzer. |
134 |
|
|
135 |
|
30. Error messages for syntax errors following \g and \k were giving inaccurate |
136 |
|
offsets in the pattern. |
137 |
|
|
138 |
|
31. Added a check for integer overflow in conditions (?(<digits>) and |
139 |
|
(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM |
140 |
|
fuzzer. |
141 |
|
|
142 |
|
32. Handling recursive references such as (?2) when the reference is to a group |
143 |
|
later in the pattern uses code that is very hacked about and error-prone. |
144 |
|
It has been re-written for PCRE2. Here in PCRE1, a check has been added to |
145 |
|
give an internal error if it is obvious that compiling has gone wrong. |
146 |
|
|
147 |
|
33. The JIT compiler should not check repeats after a {0,1} repeat byte code. |
148 |
|
This issue was found by Karl Skomski with a custom LLVM fuzzer. |
149 |
|
|
150 |
|
34. The JIT compiler should restore the control chain for empty possessive |
151 |
|
repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer. |
152 |
|
|
153 |
|
35. Match limit check added to JIT recursion. This issue was found by Karl |
154 |
|
Skomski with a custom LLVM fuzzer. |
155 |
|
|
156 |
|
36. Yet another case similar to 27 above has been circumvented by an |
157 |
|
unconditional allocation of extra memory. This issue is fixed "properly" in |
158 |
|
PCRE2 by refactoring the way references are handled. Wen Guanxing |
159 |
|
from Venustech ADLAB discovered this bug. |
160 |
|
|
161 |
|
37. Fix two assertion fails in JIT. These issues were found by Karl Skomski |
162 |
|
with a custom LLVM fuzzer. |
163 |
|
|
164 |
|
38. Fixed a corner case of range optimization in JIT. |
165 |
|
|
166 |
|
39. An incorrect error "overran compiling workspace" was given if there were |
167 |
|
exactly enough group forward references such that the last one extended |
168 |
|
into the workspace safety margin. The next one would have expanded the |
169 |
|
workspace. The test for overflow was not including the safety margin. |
170 |
|
|
171 |
|
40. A match limit issue is fixed in JIT which was found by Karl Skomski |
172 |
|
with a custom LLVM fuzzer. |
173 |
|
|
174 |
|
41. Remove the use of /dev/null in testdata/testinput2, because it doesn't |
175 |
|
work under Windows. (Why has it taken so long for anyone to notice?) |
176 |
|
|
177 |
|
42. In a character class such as [\W\p{Any}] where both a negative-type escape |
178 |
|
("not a word character") and a property escape were present, the property |
179 |
|
escape was being ignored. |
180 |
|
|
181 |
|
43. Fix crash caused by very long (*MARK) or (*THEN) names. |
182 |
|
|
183 |
|
44. A sequence such as [[:punct:]b] that is, a POSIX character class followed |
184 |
|
by a single ASCII character in a class item, was incorrectly compiled in |
185 |
|
UCP mode. The POSIX class got lost, but only if the single character |
186 |
|
followed it. |
187 |
|
|
188 |
|
45. [:punct:] in UCP mode was matching some characters in the range 128-255 |
189 |
|
that should not have been matched. |
190 |
|
|
191 |
|
46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated |
192 |
|
class, all characters with code points greater than 255 are in the class. |
193 |
|
When a Unicode property was also in the class (if PCRE_UCP is set, escapes |
194 |
|
such as \w are turned into Unicode properties), wide characters were not |
195 |
|
correctly handled, and could fail to match. |
196 |
|
|
197 |
|
|
198 |
|
Version 8.37 28-April-2015 |
199 |
|
-------------------------- |
200 |
|
|
201 |
1. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges |
1. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges |
202 |
for those parentheses to be closed with whatever has been captured so far. |
for those parentheses to be closed with whatever has been captured so far. |
235 |
was no other kind of back reference (a situation which is probably quite |
was no other kind of back reference (a situation which is probably quite |
236 |
rare). The effect of the bug was that the condition was always treated as |
rare). The effect of the bug was that the condition was always treated as |
237 |
FALSE when the capture could not be consulted, leading to a incorrect |
FALSE when the capture could not be consulted, leading to a incorrect |
238 |
behaviour by pcre2_match(). This bug has been fixed. |
behaviour by pcre_exec(). This bug has been fixed. |
239 |
|
|
240 |
9. A reference to a duplicated named group (either a back reference or a test |
9. A reference to a duplicated named group (either a back reference or a test |
241 |
for being set in a conditional) that occurred in a part of the pattern where |
for being set in a conditional) that occurred in a part of the pattern where |
259 |
failed to allow the zero-repeat case if pcre2_exec() was called with an |
failed to allow the zero-repeat case if pcre2_exec() was called with an |
260 |
ovector too small to capture the group. |
ovector too small to capture the group. |
261 |
|
|
262 |
13. Fixed two bugs in pcretest that were discovered by fuzzing and reported by |
13. Fixed two bugs in pcretest that were discovered by fuzzing and reported by |
263 |
Red Hat Product Security: |
Red Hat Product Security: |
264 |
|
|
265 |
(a) A crash if /K and /F were both set with the option to save the compiled |
(a) A crash if /K and /F were both set with the option to save the compiled |
268 |
(b) Another crash if the option to print captured substrings in a callout |
(b) Another crash if the option to print captured substrings in a callout |
269 |
was combined with setting a null ovector, for example \O\C+ as a subject |
was combined with setting a null ovector, for example \O\C+ as a subject |
270 |
string. |
string. |
271 |
|
|
272 |
14. A pattern such as "((?2){0,1999}())?", which has a group containing a |
14. A pattern such as "((?2){0,1999}())?", which has a group containing a |
273 |
forward reference repeated a large (but limited) number of times within a |
forward reference repeated a large (but limited) number of times within a |
274 |
repeated outer group that has a zero minimum quantifier, caused incorrect |
repeated outer group that has a zero minimum quantifier, caused incorrect |
275 |
code to be compiled, leading to the error "internal error: |
code to be compiled, leading to the error "internal error: |
276 |
previously-checked referenced subpattern not found" when an incorrect |
previously-checked referenced subpattern not found" when an incorrect |
277 |
memory address was read. This bug was reported as "heap overflow", |
memory address was read. This bug was reported as "heap overflow", |
278 |
discovered by Kai Lu of Fortinet's FortiGuard Labs and given the CVE number |
discovered by Kai Lu of Fortinet's FortiGuard Labs and given the CVE number |
279 |
CVE-2015-2325. |
CVE-2015-2325. |
280 |
|
|
281 |
23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine |
23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine |
282 |
call within a group that also contained a recursive back reference caused |
call within a group that also contained a recursive back reference caused |
283 |
incorrect code to be compiled. This bug was reported as "heap overflow", |
incorrect code to be compiled. This bug was reported as "heap overflow", |
284 |
discovered by Kai Lu of Fortinet's FortiGuard Labs, and given the CVE |
discovered by Kai Lu of Fortinet's FortiGuard Labs, and given the CVE |
285 |
number CVE-2015-2326. |
number CVE-2015-2326. |
286 |
|
|
287 |
24. Computing the size of the JIT read-only data in advance has been a source |
24. Computing the size of the JIT read-only data in advance has been a source |
296 |
|
|
297 |
26. Fix JIT compilation of conditional blocks, which assertion |
26. Fix JIT compilation of conditional blocks, which assertion |
298 |
is converted to (*FAIL). E.g: /(?(?!))/. |
is converted to (*FAIL). E.g: /(?(?!))/. |
299 |
|
|
300 |
27. The pattern /(?(?!)^)/ caused references to random memory. This bug was |
27. The pattern /(?(?!)^)/ caused references to random memory. This bug was |
301 |
discovered by the LLVM fuzzer. |
discovered by the LLVM fuzzer. |
302 |
|
|
305 |
pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect |
pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect |
306 |
error about an unsupported item. |
error about an unsupported item. |
307 |
|
|
308 |
|
29. For some types of pattern, for example /Z*(|d*){216}/, the auto- |
309 |
|
possessification code could take exponential time to complete. A recursion |
310 |
|
depth limit of 1000 has been imposed to limit the resources used by this |
311 |
|
optimization. |
312 |
|
|
313 |
|
30. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class |
314 |
|
such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored |
315 |
|
because \S ensures they are all in the class. The code for doing this was |
316 |
|
interacting badly with the code for computing the amount of space needed to |
317 |
|
compile the pattern, leading to a buffer overflow. This bug was discovered |
318 |
|
by the LLVM fuzzer. |
319 |
|
|
320 |
|
31. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside |
321 |
|
other kinds of group caused stack overflow at compile time. This bug was |
322 |
|
discovered by the LLVM fuzzer. |
323 |
|
|
324 |
|
32. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment |
325 |
|
between a subroutine call and its quantifier was incorrectly compiled, |
326 |
|
leading to buffer overflow or other errors. This bug was discovered by the |
327 |
|
LLVM fuzzer. |
328 |
|
|
329 |
|
33. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an |
330 |
|
assertion after (?(. The code was failing to check the character after |
331 |
|
(?(?< for the ! or = that would indicate a lookbehind assertion. This bug |
332 |
|
was discovered by the LLVM fuzzer. |
333 |
|
|
334 |
|
34. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with |
335 |
|
a fixed maximum following a group that contains a subroutine reference was |
336 |
|
incorrectly compiled and could trigger buffer overflow. This bug was |
337 |
|
discovered by the LLVM fuzzer. |
338 |
|
|
339 |
|
35. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) |
340 |
|
caused a stack overflow instead of the diagnosis of a non-fixed length |
341 |
|
lookbehind assertion. This bug was discovered by the LLVM fuzzer. |
342 |
|
|
343 |
|
36. The use of \K in a positive lookbehind assertion in a non-anchored pattern |
344 |
|
(e.g. /(?<=\Ka)/) could make pcregrep loop. |
345 |
|
|
346 |
|
37. There was a similar problem to 36 in pcretest for global matches. |
347 |
|
|
348 |
|
38. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), |
349 |
|
and a subsequent item in the pattern caused a non-match, backtracking over |
350 |
|
the repeated \X did not stop, but carried on past the start of the subject, |
351 |
|
causing reference to random memory and/or a segfault. There were also some |
352 |
|
other cases where backtracking after \C could crash. This set of bugs was |
353 |
|
discovered by the LLVM fuzzer. |
354 |
|
|
355 |
|
39. The function for finding the minimum length of a matching string could take |
356 |
|
a very long time if mutual recursion was present many times in a pattern, |
357 |
|
for example, /((?2){73}(?2))((?1))/. A better mutual recursion detection |
358 |
|
method has been implemented. This infelicity was discovered by the LLVM |
359 |
|
fuzzer. |
360 |
|
|
361 |
|
40. Static linking against the PCRE library using the pkg-config module was |
362 |
|
failing on missing pthread symbols. |
363 |
|
|
364 |
|
|
365 |
Version 8.36 26-September-2014 |
Version 8.36 26-September-2014 |
366 |
------------------------------ |
------------------------------ |