/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 661 by ph10, Sun Aug 21 09:00:54 2011 UTC revision 746 by ph10, Tue Nov 15 15:07:02 2011 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
3    
4  Version 8.20  Version 8.21
5  ------------  ------------
6    
7  1. Change 37 of 8.13 broke patterns like [:a]...[b:] because it thought it had  1.  Updating the JIT compiler.
8  a POSIX class. After further experiments with Perl, which convinced me that  
9  Perl has bugs and confusions, a closing square bracket is no longer allowed in  2.  JIT compiler now supports OP_NCREF, OP_RREF and OP_NRREF. New test cases
10  a POSIX name.      are added as well.
11    
12    3.  Fix cache-flush issue on PowerPC (It is still an experimental JIT port).
13        PCRE_EXTRA_TABLES is not suported by JIT, and should be checked before
14        calling _pcre_jit_exec. Some extra comments are added.
15    
16    4.  Mark settings inside atomic groups that do not contain any capturing
17        parentheses, for example, (?>a(*:m)), were not being passed out. This bug
18        was introduced by change 18 for 8.20.
19    
20    5.  Supporting of \x, \U and \u in JavaScript compatibility mode based on the
21        ECMA-262 standard.
22    
23    6.  Lookbehinds such as (?<=a{2}b) that contained a fixed repetition were
24        erroneously being rejected as "not fixed length" if PCRE_CASELESS was set.
25        This bug was probably introduced by change 9 of 8.13.
26    
27    
28    Version 8.20 21-Oct-2011
29    ------------------------
30    
31    1.  Change 37 of 8.13 broke patterns like [:a]...[b:] because it thought it had
32        a POSIX class. After further experiments with Perl, which convinced me that
33        Perl has bugs and confusions, a closing square bracket is no longer allowed
34        in a POSIX name. This bug also affected patterns with classes that started
35        with full stops.
36    
37    2.  If a pattern such as /(a)b|ac/ is matched against "ac", there is no
38        captured substring, but while checking the failing first alternative,
39        substring 1 is temporarily captured. If the output vector supplied to
40        pcre_exec() was not big enough for this capture, the yield of the function
41        was still zero ("insufficient space for captured substrings"). This cannot
42        be totally fixed without adding another stack variable, which seems a lot
43        of expense for a edge case. However, I have improved the situation in cases
44        such as /(a)(b)x|abc/ matched against "abc", where the return code
45        indicates that fewer than the maximum number of slots in the ovector have
46        been set.
47    
48    3.  Related to (2) above: when there are more back references in a pattern than
49        slots in the output vector, pcre_exec() uses temporary memory during
50        matching, and copies in the captures as far as possible afterwards. It was
51        using the entire output vector, but this conflicts with the specification
52        that only 2/3 is used for passing back captured substrings. Now it uses
53        only the first 2/3, for compatibility. This is, of course, another edge
54        case.
55    
56    4.  Zoltan Herczeg's just-in-time compiler support has been integrated into the
57        main code base, and can be used by building with --enable-jit. When this is
58        done, pcregrep automatically uses it unless --disable-pcregrep-jit or the
59        runtime --no-jit option is given.
60    
61    5.  When the number of matches in a pcre_dfa_exec() run exactly filled the
62        ovector, the return from the function was zero, implying that there were
63        other matches that did not fit. The correct "exactly full" value is now
64        returned.
65    
66    6.  If a subpattern that was called recursively or as a subroutine contained
67        (*PRUNE) or any other control that caused it to give a non-standard return,
68        invalid errors such as "Error -26 (nested recursion at the same subject
69        position)" or even infinite loops could occur.
70    
71    7.  If a pattern such as /a(*SKIP)c|b(*ACCEPT)|/ was studied, it stopped
72        computing the minimum length on reaching *ACCEPT, and so ended up with the
73        wrong value of 1 rather than 0. Further investigation indicates that
74        computing a minimum subject length in the presence of *ACCEPT is difficult
75        (think back references, subroutine calls), and so I have changed the code
76        so that no minimum is registered for a pattern that contains *ACCEPT.
77    
78    8.  If (*THEN) was present in the first (true) branch of a conditional group,
79        it was not handled as intended. [But see 16 below.]
80    
81    9.  Replaced RunTest.bat and CMakeLists.txt with improved versions provided by
82        Sheri Pierce.
83    
84    10. A pathological pattern such as /(*ACCEPT)a/ was miscompiled, thinking that
85        the first byte in a match must be "a".
86    
87    11. Change 17 for 8.13 increased the recursion depth for patterns like
88        /a(?:.)*?a/ drastically. I've improved things by remembering whether a
89        pattern contains any instances of (*THEN). If it does not, the old
90        optimizations are restored. It would be nice to do this on a per-group
91        basis, but at the moment that is not feasible.
92    
93    12. In some environments, the output of pcretest -C is CRLF terminated. This
94        broke RunTest's code that checks for the link size. A single white space
95        character after the value is now allowed for.
96    
97    13. RunTest now checks for the "fr" locale as well as for "fr_FR" and "french".
98        For "fr", it uses the Windows-specific input and output files.
99    
100    14. If (*THEN) appeared in a group that was called recursively or as a
101        subroutine, it did not work as intended. [But see next item.]
102    
103    15. Consider the pattern /A (B(*THEN)C) | D/ where A, B, C, and D are complex
104        pattern fragments (but not containing any | characters). If A and B are
105        matched, but there is a failure in C so that it backtracks to (*THEN), PCRE
106        was behaving differently to Perl. PCRE backtracked into A, but Perl goes to
107        D. In other words, Perl considers parentheses that do not contain any |
108        characters to be part of a surrounding alternative, whereas PCRE was
109        treading (B(*THEN)C) the same as (B(*THEN)C|(*FAIL)) -- which Perl handles
110        differently. PCRE now behaves in the same way as Perl, except in the case
111        of subroutine/recursion calls such as (?1) which have in any case always
112        been different (but PCRE had them first :-).
113    
114    16. Related to 15 above: Perl does not treat the | in a conditional group as
115        creating alternatives. Such a group is treated in the same way as an
116        ordinary group without any | characters when processing (*THEN). PCRE has
117        been changed to match Perl's behaviour.
118    
119    17. If a user had set PCREGREP_COLO(U)R to something other than 1:31, the
120        RunGrepTest script failed.
121    
122    18. Change 22 for version 13 caused atomic groups to use more stack. This is
123        inevitable for groups that contain captures, but it can lead to a lot of
124        stack use in large patterns. The old behaviour has been restored for atomic
125        groups that do not contain any capturing parentheses.
126    
127    19. If the PCRE_NO_START_OPTIMIZE option was set for pcre_compile(), it did not
128        suppress the check for a minimum subject length at run time. (If it was
129        given to pcre_exec() or pcre_dfa_exec() it did work.)
130    
131    20. Fixed an ASCII-dependent infelicity in pcretest that would have made it
132        fail to work when decoding hex characters in data strings in EBCDIC
133        environments.
134    
135    21. It appears that in at least one Mac OS environment, the isxdigit() function
136        is implemented as a macro that evaluates to its argument more than once,
137        contravening the C 90 Standard (I haven't checked a later standard). There
138        was an instance in pcretest which caused it to go wrong when processing
139        \x{...} escapes in subject strings. The has been rewritten to avoid using
140        things like p++ in the argument of isxdigit().
141    
142    
143  Version 8.13 16-Aug-2011  Version 8.13 16-Aug-2011
# Line 107  Version 8.13 16-Aug-2011 Line 237  Version 8.13 16-Aug-2011
237      tail recursion to cut down on stack usage. Unfortunately, now that there is      tail recursion to cut down on stack usage. Unfortunately, now that there is
238      the possibility of (*THEN) occurring in these branches, tail recursion is      the possibility of (*THEN) occurring in these branches, tail recursion is
239      no longer possible because the return has to be checked for (*THEN). These      no longer possible because the return has to be checked for (*THEN). These
240      two optimizations have therefore been removed.      two optimizations have therefore been removed. [But see 8.20/11 above.]
241    
242  18. If a pattern containing \R was studied, it was assumed that \R always  18. If a pattern containing \R was studied, it was assumed that \R always
243      matched two bytes, thus causing the minimum subject length to be      matched two bytes, thus causing the minimum subject length to be

Legend:
Removed from v.661  
changed lines
  Added in v.746

  ViewVC Help
Powered by ViewVC 1.1.5