/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 67 by nigel, Sat Feb 24 21:40:13 2007 UTC revision 73 by nigel, Sat Feb 24 21:40:30 2007 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
3    
4    Version 4.5 01-Dec-03
5    ---------------------
6    
7     1. There has been some re-arrangement of the code for the match() function so
8        that it can be compiled in a version that does not call itself recursively.
9        Instead, it keeps those local variables that need separate instances for
10        each "recursion" in a frame on the heap, and gets/frees frames whenever it
11        needs to "recurse". Keeping track of where control must go is done by means
12        of setjmp/longjmp. The whole thing is implemented by a set of macros that
13        hide most of the details from the main code, and operates only if
14        NO_RECURSE is defined while compiling pcre.c. If PCRE is built using the
15        "configure" mechanism, "--disable-stack-for-recursion" turns on this way of
16        operating.
17    
18        To make it easier for callers to provide specially tailored get/free
19        functions for this usage, two new functions, pcre_stack_malloc, and
20        pcre_stack_free, are used. They are always called in strict stacking order,
21        and the size of block requested is always the same.
22    
23        The PCRE_CONFIG_STACKRECURSE info parameter can be used to find out whether
24        PCRE has been compiled to use the stack or the heap for recursion. The
25        -C option of pcretest uses this to show which version is compiled.
26    
27        A new data escape \S, is added to pcretest; it causes the amounts of store
28        obtained and freed by both kinds of malloc/free at match time to be added
29        to the output.
30    
31     2. Changed the locale test to use "fr_FR" instead of "fr" because that's
32        what's available on my current Linux desktop machine.
33    
34     3. When matching a UTF-8 string, the test for a valid string at the start has
35        been extended. If start_offset is not zero, PCRE now checks that it points
36        to a byte that is the start of a UTF-8 character. If not, it returns
37        PCRE_ERROR_BADUTF8_OFFSET (-11). Note: the whole string is still checked;
38        this is necessary because there may be backward assertions in the pattern.
39        When matching the same subject several times, it may save resources to use
40        PCRE_NO_UTF8_CHECK on all but the first call if the string is long.
41    
42     4. The code for checking the validity of UTF-8 strings has been tightened so
43        that it rejects (a) strings containing 0xfe or 0xff bytes and (b) strings
44        containing "overlong sequences".
45    
46     5. Fixed a bug (appearing twice) that I could not find any way of exploiting!
47        I had written "if ((digitab[*p++] && chtab_digit) == 0)" where the "&&"
48        should have been "&", but it just so happened that all the cases this let
49        through by mistake were picked up later in the function.
50    
51     6. I had used a variable called "isblank" - this is a C99 function, causing
52        some compilers to warn. To avoid this, I renamed it (as "blankclass").
53    
54     7. Cosmetic: (a) only output another newline at the end of pcretest if it is
55        prompting; (b) run "./pcretest /dev/null" at the start of the test script
56        so the version is shown; (c) stop "make test" echoing "./RunTest".
57    
58     8. Added patches from David Burgess to enable PCRE to run on EBCDIC systems.
59    
60     9. The prototype for memmove() for systems that don't have it was using
61        size_t, but the inclusion of the header that defines size_t was later. I've
62        moved the #includes for the C headers earlier to avoid this.
63    
64    10. Added some adjustments to the code to make it easier to compiler on certain
65        special systems:
66    
67          (a) Some "const" qualifiers were missing.
68          (b) Added the macro EXPORT before all exported functions; by default this
69              is defined to be empty.
70          (c) Changed the dftables auxiliary program (that builds chartables.c) so
71              that it reads its output file name as an argument instead of writing
72              to the standard output and assuming this can be redirected.
73    
74    11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
75        class containing characters with values greater than 255, PCRE compilation
76        went into a loop.
77    
78    12. A recursive reference to a subpattern that was within another subpattern
79        that had a minimum quantifier of zero caused PCRE to crash. For example,
80        (x(y(?2))z)? provoked this bug with a subject that got as far as the
81        recursion. If the recursively-called subpattern itself had a zero repeat,
82        that was OK.
83    
84    13. In pcretest, the buffer for reading a data line was set at 30K, but the
85        buffer into which it was copied (for escape processing) was still set at
86        1024, so long lines caused crashes.
87    
88    14. A pattern such as /[ab]{1,3}+/ failed to compile, giving the error
89        "internal error: code overflow...". This applied to any character class
90        that was followed by a possessive quantifier.
91    
92    15. Modified the Makefile to add libpcre.la as a prerequisite for
93        libpcreposix.la because I was told this is needed for a parallel build to
94        work.
95    
96    16. If a pattern that contained .* following optional items at the start was
97        studied, the wrong optimizing data was generated, leading to matching
98        errors. For example, studying /[ab]*.*c/ concluded, erroneously, that any
99        matching string must start with a or b or c. The correct conclusion for
100        this pattern is that a match can start with any character.
101    
102    
103    Version 4.4 13-Aug-03
104    ---------------------
105    
106     1. In UTF-8 mode, a character class containing characters with values between
107        127 and 255 was not handled correctly if the compiled pattern was studied.
108        In fixing this, I have also improved the studying algorithm for such
109        classes (slightly).
110    
111     2. Three internal functions had redundant arguments passed to them. Removal
112        might give a very teeny performance improvement.
113    
114     3. Documentation bug: the value of the capture_top field in a callout is *one
115        more than* the number of the hightest numbered captured substring.
116    
117     4. The Makefile linked pcretest and pcregrep with -lpcre, which could result
118        in incorrectly linking with a previously installed version. They now link
119        explicitly with libpcre.la.
120    
121     5. configure.in no longer needs to recognize Cygwin specially.
122    
123     6. A problem in pcre.in for Windows platforms is fixed.
124    
125     7. If a pattern was successfully studied, and the -d (or /D) flag was given to
126        pcretest, it used to include the size of the study block as part of its
127        output. Unfortunately, the structure contains a field that has a different
128        size on different hardware architectures. This meant that the tests that
129        showed this size failed. As the block is currently always of a fixed size,
130        this information isn't actually particularly useful in pcretest output, so
131        I have just removed it.
132    
133     8. Three pre-processor statements accidentally did not start in column 1.
134        Sadly, there are *still* compilers around that complain, even though
135        standard C has not required this for well over a decade. Sigh.
136    
137     9. In pcretest, the code for checking callouts passed small integers in the
138        callout_data field, which is a void * field. However, some picky compilers
139        complained about the casts involved for this on 64-bit systems. Now
140        pcretest passes the address of the small integer instead, which should get
141        rid of the warnings.
142    
143    10. By default, when in UTF-8 mode, PCRE now checks for valid UTF-8 strings at
144        both compile and run time, and gives an error if an invalid UTF-8 sequence
145        is found. There is a option for disabling this check in cases where the
146        string is known to be correct and/or the maximum performance is wanted.
147    
148    11. In response to a bug report, I changed one line in Makefile.in from
149    
150            -Wl,--out-implib,.libs/lib@WIN_PREFIX@pcreposix.dll.a \
151        to
152            -Wl,--out-implib,.libs/@WIN_PREFIX@libpcreposix.dll.a \
153    
154        to look similar to other lines, but I have no way of telling whether this
155        is the right thing to do, as I do not use Windows. No doubt I'll get told
156        if it's wrong...
157    
158    
159    Version 4.3 21-May-03
160    ---------------------
161    
162    1. Two instances of @WIN_PREFIX@ omitted from the Windows targets in the
163       Makefile.
164    
165    2. Some refactoring to improve the quality of the code:
166    
167       (i)   The utf8_table... variables are now declared "const".
168    
169       (ii)  The code for \cx, which used the "case flipping" table to upper case
170             lower case letters, now just substracts 32. This is ASCII-specific,
171             but the whole concept of \cx is ASCII-specific, so it seems
172             reasonable.
173    
174       (iii) PCRE was using its character types table to recognize decimal and
175             hexadecimal digits in the pattern. This is silly, because it handles
176             only 0-9, a-f, and A-F, but the character types table is locale-
177             specific, which means strange things might happen. A private
178             table is now used for this - though it costs 256 bytes, a table is
179             much faster than multiple explicit tests. Of course, the standard
180             character types table is still used for matching digits in subject
181             strings against \d.
182    
183       (iv)  Strictly, the identifier ESC_t is reserved by POSIX (all identifiers
184             ending in _t are). So I've renamed it as ESC_tee.
185    
186    3. The first argument for regexec() in the POSIX wrapper should have been
187       defined as "const".
188    
189    4. Changed pcretest to use malloc() for its buffers so that they can be
190       Electric Fenced for debugging.
191    
192    5. There were several places in the code where, in UTF-8 mode, PCRE would try
193       to read one or more bytes before the start of the subject string. Often this
194       had no effect on PCRE's behaviour, but in some circumstances it could
195       provoke a segmentation fault.
196    
197    6. A lookbehind at the start of a pattern in UTF-8 mode could also cause PCRE
198       to try to read one or more bytes before the start of the subject string.
199    
200    7. A lookbehind in a pattern matched in non-UTF-8 mode on a PCRE compiled with
201       UTF-8 support could misbehave in various ways if the subject string
202       contained bytes with the 0x80 bit set and the 0x40 bit unset in a lookbehind
203       area. (PCRE was not checking for the UTF-8 mode flag, and trying to move
204       back over UTF-8 characters.)
205    
206    
207  Version 4.2 14-Apr-03  Version 4.2 14-Apr-03
208  ---------------------  ---------------------
209    

Legend:
Removed from v.67  
changed lines
  Added in v.73

  ViewVC Help
Powered by ViewVC 1.1.5