/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1522 by ph10, Sun Feb 8 17:02:05 2015 UTC revision 1604 by ph10, Tue Nov 17 17:27:17 2015 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
3    
4  Version 8.37 xx-xxx-2015  Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
5  ------------------------  development is happening in the PCRE2 10.xx series.
6    
7  1.  When an (*ACCEPT) is triggered inside capturing parentheses, it arranges  Version 8.38 27-October-2015
8      for those parentheses to be closed with whatever has been captured so far.  ----------------------------
9      However, it was failing to mark any other groups between the hightest  
10      capture so far and the currrent group as "unset". Thus, the ovector for  1.  If a group that contained a recursive back reference also contained a
11      those groups contained whatever was previously there. An example is the      forward reference subroutine call followed by a non-forward-reference
12      pattern /(x)|((*ACCEPT))/ when matched against "abcd".      subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
13        compile correct code, leading to undefined behaviour or an internally
14  2.  If an assertion condition was quantified with a minimum of zero (an odd      detected error. This bug was discovered by the LLVM fuzzer.
15      thing to do, but it happened), SIGSEGV or other misbehaviour could occur.  
16    2.  Quantification of certain items (e.g. atomic back references) could cause
17        incorrect code to be compiled when recursive forward references were
18        involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
19        This bug was discovered by the LLVM fuzzer.
20    
21    3.  A repeated conditional group whose condition was a reference by name caused
22        a buffer overflow if there was more than one group with the given name.
23        This bug was discovered by the LLVM fuzzer.
24    
25    4.  A recursive back reference by name within a group that had the same name as
26        another group caused a buffer overflow. For example:
27        /(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
28    
29    5.  A forward reference by name to a group whose number is the same as the
30        current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
31        a buffer overflow at compile time. This bug was discovered by the LLVM
32        fuzzer.
33    
34    6.  A lookbehind assertion within a set of mutually recursive subpatterns could
35        provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
36    
37    7.  Another buffer overflow bug involved duplicate named groups with a
38        reference between their definition, with a group that reset capture
39        numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
40        fixed by always allowing for more memory, even if not needed. (A proper fix
41        is implemented in PCRE2, but it involves more refactoring.)
42    
43    8.  There was no check for integer overflow in subroutine calls such as (?123).
44    
45    9.  The table entry for \l in EBCDIC environments was incorrect, leading to its
46        being treated as a literal 'l' instead of causing an error.
47    
48    10. There was a buffer overflow if pcre_exec() was called with an ovector of
49        size 1. This bug was found by american fuzzy lop.
50    
51    11. If a non-capturing group containing a conditional group that could match
52        an empty string was repeated, it was not identified as matching an empty
53        string itself. For example: /^(?:(?(1)x|)+)+$()/.
54    
55    12. In an EBCDIC environment, pcretest was mishandling the escape sequences
56        \a and \e in test subject lines.
57    
58    13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
59        instead of the EBCDIC value.
60    
61    14. The handling of \c in an EBCDIC environment has been revised so that it is
62        now compatible with the specification in Perl's perlebcdic page.
63    
64    15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
65        ASCII/Unicode. This has now been added to the list of characters that are
66        recognized as white space in EBCDIC.
67    
68    16. When PCRE was compiled without UCP support, the use of \p and \P gave an
69        error (correctly) when used outside a class, but did not give an error
70        within a class.
71    
72    17. \h within a class was incorrectly compiled in EBCDIC environments.
73    
74    18. A pattern with an unmatched closing parenthesis that contained a backward
75        assertion which itself contained a forward reference caused buffer
76        overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
77    
78    19. JIT should return with error when the compiled pattern requires more stack
79        space than the maximum.
80    
81    20. A possessively repeated conditional group that could match an empty string,
82        for example, /(?(R))*+/, was incorrectly compiled.
83    
84    21. Fix infinite recursion in the JIT compiler when certain patterns such as
85        /(?:|a|){100}x/ are analysed.
86    
87    22. Some patterns with character classes involving [: and \\ were incorrectly
88        compiled and could cause reading from uninitialized memory or an incorrect
89        error diagnosis.
90    
91    23. Pathological patterns containing many nested occurrences of [: caused
92        pcre_compile() to run for a very long time.
93    
94    24. A conditional group with only one branch has an implicit empty alternative
95        branch and must therefore be treated as potentially matching an empty
96        string.
97    
98    25. If (?R was followed by - or + incorrect behaviour happened instead of a
99        diagnostic.
100    
101    26. Arrange to give up on finding the minimum matching length for overly
102        complex patterns.
103    
104    27. Similar to (4) above: in a pattern with duplicated named groups and an
105        occurrence of (?| it is possible for an apparently non-recursive back
106        reference to become recursive if a later named group with the relevant
107        number is encountered. This could lead to a buffer overflow. Wen Guanxing
108        from Venustech ADLAB discovered this bug.
109    
110    28. If pcregrep was given the -q option with -c or -l, or when handling a
111        binary file, it incorrectly wrote output to stdout.
112    
113    29. The JIT compiler did not restore the control verb head in case of *THEN
114        control verbs. This issue was found by Karl Skomski with a custom LLVM
115        fuzzer.
116    
117    30. Error messages for syntax errors following \g and \k were giving inaccurate
118        offsets in the pattern.
119    
120    31. Added a check for integer overflow in conditions (?(<digits>) and
121        (?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
122        fuzzer.
123    
124    32. Handling recursive references such as (?2) when the reference is to a group
125        later in the pattern uses code that is very hacked about and error-prone.
126        It has been re-written for PCRE2. Here in PCRE1, a check has been added to
127        give an internal error if it is obvious that compiling has gone wrong.
128    
129    33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
130        This issue was found by Karl Skomski with a custom LLVM fuzzer.
131    
132    34. The JIT compiler should restore the control chain for empty possessive
133        repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
134    
135    35. Match limit check added to JIT recursion. This issue was found by Karl
136        Skomski with a custom LLVM fuzzer.
137    
138    36. Yet another case similar to 27 above has been circumvented by an
139        unconditional allocation of extra memory. This issue is fixed "properly" in
140        PCRE2 by refactoring the way references are handled. Wen Guanxing
141        from Venustech ADLAB discovered this bug.
142    
143    37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
144        with a custom LLVM fuzzer.
145    
146    38. Fixed a corner case of range optimization in JIT.
147    
148    39. An incorrect error "overran compiling workspace" was given if there were
149        exactly enough group forward references such that the last one extended
150        into the workspace safety margin. The next one would have expanded the
151        workspace. The test for overflow was not including the safety margin.
152    
153    40. A match limit issue is fixed in JIT which was found by Karl Skomski
154        with a custom LLVM fuzzer.
155    
156    41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
157        work under Windows. (Why has it taken so long for anyone to notice?)
158    
159    42. In a character class such as [\W\p{Any}] where both a negative-type escape
160        ("not a word character") and a property escape were present, the property
161        escape was being ignored.
162    
163    43. Fix crash caused by very long (*MARK) or (*THEN) names.
164    
165    44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
166        by a single ASCII character in a class item, was incorrectly compiled in
167        UCP mode. The POSIX class got lost, but only if the single character
168        followed it.
169    
170    
171    Version 8.37 28-April-2015
172    --------------------------
173    
174    1.  When an (*ACCEPT) is triggered inside capturing parentheses, it arranges
175        for those parentheses to be closed with whatever has been captured so far.
176        However, it was failing to mark any other groups between the hightest
177        capture so far and the currrent group as "unset". Thus, the ovector for
178        those groups contained whatever was previously there. An example is the
179        pattern /(x)|((*ACCEPT))/ when matched against "abcd".
180    
181    2.  If an assertion condition was quantified with a minimum of zero (an odd
182        thing to do, but it happened), SIGSEGV or other misbehaviour could occur.
183    
184  3.  If a pattern in pcretest input had the P (POSIX) modifier followed by an  3.  If a pattern in pcretest input had the P (POSIX) modifier followed by an
185      unrecognized modifier, a crash could occur.      unrecognized modifier, a crash could occur.
186    
187  4.  An attempt to do global matching in pcretest with a zero-length ovector  4.  An attempt to do global matching in pcretest with a zero-length ovector
188      caused a crash.      caused a crash.
189    
190  5.  Fixed a memory leak during matching that could occur for a subpattern  5.  Fixed a memory leak during matching that could occur for a subpattern
191      subroutine call (recursive or otherwise) if the number of captured groups      subroutine call (recursive or otherwise) if the number of captured groups
192      that had to be saved was greater than ten.      that had to be saved was greater than ten.
193    
194  6.  Catch a bad opcode during auto-possessification after compiling a bad UTF  6.  Catch a bad opcode during auto-possessification after compiling a bad UTF
195      string with NO_UTF_CHECK. This is a tidyup, not a bug fix, as passing bad      string with NO_UTF_CHECK. This is a tidyup, not a bug fix, as passing bad
196      UTF with NO_UTF_CHECK is documented as having an undefined outcome.      UTF with NO_UTF_CHECK is documented as having an undefined outcome.
197    
198  7.  A UTF pattern containing a "not" match of a non-ASCII character and a  7.  A UTF pattern containing a "not" match of a non-ASCII character and a
199      subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.      subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
200    
# Line 41  Version 8.37 xx-xxx-2015 Line 208  Version 8.37 xx-xxx-2015
208     was no other kind of back reference (a situation which is probably quite     was no other kind of back reference (a situation which is probably quite
209     rare). The effect of the bug was that the condition was always treated as     rare). The effect of the bug was that the condition was always treated as
210     FALSE when the capture could not be consulted, leading to a incorrect     FALSE when the capture could not be consulted, leading to a incorrect
211     behaviour by pcre2_match(). This bug has been fixed.     behaviour by pcre_exec(). This bug has been fixed.
212    
213  9. A reference to a duplicated named group (either a back reference or a test  9. A reference to a duplicated named group (either a back reference or a test
214     for being set in a conditional) that occurred in a part of the pattern where     for being set in a conditional) that occurred in a part of the pattern where
# Line 53  Version 8.37 xx-xxx-2015 Line 220  Version 8.37 xx-xxx-2015
220      The infinite loop is now broken (with the minimum length unset, that is,      The infinite loop is now broken (with the minimum length unset, that is,
221      zero).      zero).
222    
223    11. If an assertion that was used as a condition was quantified with a minimum
224        of zero, matching went wrong. In particular, if the whole group had
225        unlimited repetition and could match an empty string, a segfault was
226        likely. The pattern (?(?=0)?)+ is an example that caused this. Perl allows
227        assertions to be quantified, but not if they are being used as conditions,
228        so the above pattern is faulted by Perl. PCRE has now been changed so that
229        it also rejects such patterns.
230    
231    12. A possessive capturing group such as (a)*+ with a minimum repeat of zero
232        failed to allow the zero-repeat case if pcre2_exec() was called with an
233        ovector too small to capture the group.
234    
235    13. Fixed two bugs in pcretest that were discovered by fuzzing and reported by
236        Red Hat Product Security:
237    
238        (a) A crash if /K and /F were both set with the option to save the compiled
239        pattern.
240    
241        (b) Another crash if the option to print captured substrings in a callout
242        was combined with setting a null ovector, for example \O\C+ as a subject
243        string.
244    
245    14. A pattern such as "((?2){0,1999}())?", which has a group containing a
246        forward reference repeated a large (but limited) number of times within a
247        repeated outer group that has a zero minimum quantifier, caused incorrect
248        code to be compiled, leading to the error "internal error:
249        previously-checked referenced subpattern not found" when an incorrect
250        memory address was read. This bug was reported as "heap overflow",
251        discovered by Kai Lu of Fortinet's FortiGuard Labs and given the CVE number
252        CVE-2015-2325.
253    
254    23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
255        call within a group that also contained a recursive back reference caused
256        incorrect code to be compiled. This bug was reported as "heap overflow",
257        discovered by Kai Lu of Fortinet's FortiGuard Labs, and given the CVE
258        number CVE-2015-2326.
259    
260    24. Computing the size of the JIT read-only data in advance has been a source
261        of various issues, and new ones are still appear unfortunately. To fix
262        existing and future issues, size computation is eliminated from the code,
263        and replaced by on-demand memory allocation.
264    
265    25. A pattern such as /(?i)[A-`]/, where characters in the other case are
266        adjacent to the end of the range, and the range contained characters with
267        more than one other case, caused incorrect behaviour when compiled in UTF
268        mode. In that example, the range a-j was left out of the class.
269    
270    26. Fix JIT compilation of conditional blocks, which assertion
271        is converted to (*FAIL). E.g: /(?(?!))/.
272    
273    27. The pattern /(?(?!)^)/ caused references to random memory. This bug was
274        discovered by the LLVM fuzzer.
275    
276    28. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
277        when this assertion was used as a condition, for example (?(?!)a|b). In
278        pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
279        error about an unsupported item.
280    
281    29. For some types of pattern, for example /Z*(|d*){216}/, the auto-
282        possessification code could take exponential time to complete. A recursion
283        depth limit of 1000 has been imposed to limit the resources used by this
284        optimization.
285    
286    30. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
287        such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
288        because \S ensures they are all in the class. The code for doing this was
289        interacting badly with the code for computing the amount of space needed to
290        compile the pattern, leading to a buffer overflow. This bug was discovered
291        by the LLVM fuzzer.
292    
293    31. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
294        other kinds of group caused stack overflow at compile time. This bug was
295        discovered by the LLVM fuzzer.
296    
297    32. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
298        between a subroutine call and its quantifier was incorrectly compiled,
299        leading to buffer overflow or other errors. This bug was discovered by the
300        LLVM fuzzer.
301    
302    33. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
303        assertion after (?(. The code was failing to check the character after
304        (?(?< for the ! or = that would indicate a lookbehind assertion. This bug
305        was discovered by the LLVM fuzzer.
306    
307    34. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
308        a fixed maximum following a group that contains a subroutine reference was
309        incorrectly compiled and could trigger buffer overflow. This bug was
310        discovered by the LLVM fuzzer.
311    
312    35. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
313        caused a stack overflow instead of the diagnosis of a non-fixed length
314        lookbehind assertion. This bug was discovered by the LLVM fuzzer.
315    
316    36. The use of \K in a positive lookbehind assertion in a non-anchored pattern
317        (e.g. /(?<=\Ka)/) could make pcregrep loop.
318    
319    37. There was a similar problem to 36 in pcretest for global matches.
320    
321    38. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
322        and a subsequent item in the pattern caused a non-match, backtracking over
323        the repeated \X did not stop, but carried on past the start of the subject,
324        causing reference to random memory and/or a segfault. There were also some
325        other cases where backtracking after \C could crash. This set of bugs was
326        discovered by the LLVM fuzzer.
327    
328    39. The function for finding the minimum length of a matching string could take
329        a very long time if mutual recursion was present many times in a pattern,
330        for example, /((?2){73}(?2))((?1))/. A better mutual recursion detection
331        method has been implemented. This infelicity was discovered by the LLVM
332        fuzzer.
333    
334    40. Static linking against the PCRE library using the pkg-config module was
335        failing on missing pthread symbols.
336    
337    
338  Version 8.36 26-September-2014  Version 8.36 26-September-2014
339  ------------------------------  ------------------------------

Legend:
Removed from v.1522  
changed lines
  Added in v.1604

  ViewVC Help
Powered by ViewVC 1.1.5