ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 67 by nigel, Sat Feb 24 21:40:13 2007 UTC revision 73 by nigel, Sat Feb 24 21:40:30 2007 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
4    Version 4.5 01-Dec-03
5    ---------------------
7     1. There has been some re-arrangement of the code for the match() function so
8        that it can be compiled in a version that does not call itself recursively.
9        Instead, it keeps those local variables that need separate instances for
10        each "recursion" in a frame on the heap, and gets/frees frames whenever it
11        needs to "recurse". Keeping track of where control must go is done by means
12        of setjmp/longjmp. The whole thing is implemented by a set of macros that
13        hide most of the details from the main code, and operates only if
14        NO_RECURSE is defined while compiling pcre.c. If PCRE is built using the
15        "configure" mechanism, "--disable-stack-for-recursion" turns on this way of
16        operating.
18        To make it easier for callers to provide specially tailored get/free
19        functions for this usage, two new functions, pcre_stack_malloc, and
20        pcre_stack_free, are used. They are always called in strict stacking order,
21        and the size of block requested is always the same.
23        The PCRE_CONFIG_STACKRECURSE info parameter can be used to find out whether
24        PCRE has been compiled to use the stack or the heap for recursion. The
25        -C option of pcretest uses this to show which version is compiled.
27        A new data escape \S, is added to pcretest; it causes the amounts of store
28        obtained and freed by both kinds of malloc/free at match time to be added
29        to the output.
31     2. Changed the locale test to use "fr_FR" instead of "fr" because that's
32        what's available on my current Linux desktop machine.
34     3. When matching a UTF-8 string, the test for a valid string at the start has
35        been extended. If start_offset is not zero, PCRE now checks that it points
36        to a byte that is the start of a UTF-8 character. If not, it returns
37        PCRE_ERROR_BADUTF8_OFFSET (-11). Note: the whole string is still checked;
38        this is necessary because there may be backward assertions in the pattern.
39        When matching the same subject several times, it may save resources to use
40        PCRE_NO_UTF8_CHECK on all but the first call if the string is long.
42     4. The code for checking the validity of UTF-8 strings has been tightened so
43        that it rejects (a) strings containing 0xfe or 0xff bytes and (b) strings
44        containing "overlong sequences".
46     5. Fixed a bug (appearing twice) that I could not find any way of exploiting!
47        I had written "if ((digitab[*p++] && chtab_digit) == 0)" where the "&&"
48        should have been "&", but it just so happened that all the cases this let
49        through by mistake were picked up later in the function.
51     6. I had used a variable called "isblank" - this is a C99 function, causing
52        some compilers to warn. To avoid this, I renamed it (as "blankclass").
54     7. Cosmetic: (a) only output another newline at the end of pcretest if it is
55        prompting; (b) run "./pcretest /dev/null" at the start of the test script
56        so the version is shown; (c) stop "make test" echoing "./RunTest".
58     8. Added patches from David Burgess to enable PCRE to run on EBCDIC systems.
60     9. The prototype for memmove() for systems that don't have it was using
61        size_t, but the inclusion of the header that defines size_t was later. I've
62        moved the #includes for the C headers earlier to avoid this.
64    10. Added some adjustments to the code to make it easier to compiler on certain
65        special systems:
67          (a) Some "const" qualifiers were missing.
68          (b) Added the macro EXPORT before all exported functions; by default this
69              is defined to be empty.
70          (c) Changed the dftables auxiliary program (that builds chartables.c) so
71              that it reads its output file name as an argument instead of writing
72              to the standard output and assuming this can be redirected.
74    11. In UTF-8 mode, if a recursive reference (e.g. (?1)) followed a character
75        class containing characters with values greater than 255, PCRE compilation
76        went into a loop.
78    12. A recursive reference to a subpattern that was within another subpattern
79        that had a minimum quantifier of zero caused PCRE to crash. For example,
80        (x(y(?2))z)? provoked this bug with a subject that got as far as the
81        recursion. If the recursively-called subpattern itself had a zero repeat,
82        that was OK.
84    13. In pcretest, the buffer for reading a data line was set at 30K, but the
85        buffer into which it was copied (for escape processing) was still set at
86        1024, so long lines caused crashes.
88    14. A pattern such as /[ab]{1,3}+/ failed to compile, giving the error
89        "internal error: code overflow...". This applied to any character class
90        that was followed by a possessive quantifier.
92    15. Modified the Makefile to add libpcre.la as a prerequisite for
93        libpcreposix.la because I was told this is needed for a parallel build to
94        work.
96    16. If a pattern that contained .* following optional items at the start was
97        studied, the wrong optimizing data was generated, leading to matching
98        errors. For example, studying /[ab]*.*c/ concluded, erroneously, that any
99        matching string must start with a or b or c. The correct conclusion for
100        this pattern is that a match can start with any character.
103    Version 4.4 13-Aug-03
104    ---------------------
106     1. In UTF-8 mode, a character class containing characters with values between
107        127 and 255 was not handled correctly if the compiled pattern was studied.
108        In fixing this, I have also improved the studying algorithm for such
109        classes (slightly).
111     2. Three internal functions had redundant arguments passed to them. Removal
112        might give a very teeny performance improvement.
114     3. Documentation bug: the value of the capture_top field in a callout is *one
115        more than* the number of the hightest numbered captured substring.
117     4. The Makefile linked pcretest and pcregrep with -lpcre, which could result
118        in incorrectly linking with a previously installed version. They now link
119        explicitly with libpcre.la.
121     5. configure.in no longer needs to recognize Cygwin specially.
123     6. A problem in pcre.in for Windows platforms is fixed.
125     7. If a pattern was successfully studied, and the -d (or /D) flag was given to
126        pcretest, it used to include the size of the study block as part of its
127        output. Unfortunately, the structure contains a field that has a different
128        size on different hardware architectures. This meant that the tests that
129        showed this size failed. As the block is currently always of a fixed size,
130        this information isn't actually particularly useful in pcretest output, so
131        I have just removed it.
133     8. Three pre-processor statements accidentally did not start in column 1.
134        Sadly, there are *still* compilers around that complain, even though
135        standard C has not required this for well over a decade. Sigh.
137     9. In pcretest, the code for checking callouts passed small integers in the
138        callout_data field, which is a void * field. However, some picky compilers
139        complained about the casts involved for this on 64-bit systems. Now
140        pcretest passes the address of the small integer instead, which should get
141        rid of the warnings.
143    10. By default, when in UTF-8 mode, PCRE now checks for valid UTF-8 strings at
144        both compile and run time, and gives an error if an invalid UTF-8 sequence
145        is found. There is a option for disabling this check in cases where the
146        string is known to be correct and/or the maximum performance is wanted.
148    11. In response to a bug report, I changed one line in Makefile.in from
150            -Wl,--out-implib,.libs/lib@WIN_PREFIX@pcreposix.dll.a \
151        to
152            -Wl,--out-implib,.libs/@WIN_PREFIX@libpcreposix.dll.a \
154        to look similar to other lines, but I have no way of telling whether this
155        is the right thing to do, as I do not use Windows. No doubt I'll get told
156        if it's wrong...
159    Version 4.3 21-May-03
160    ---------------------
162    1. Two instances of @WIN_PREFIX@ omitted from the Windows targets in the
163       Makefile.
165    2. Some refactoring to improve the quality of the code:
167       (i)   The utf8_table... variables are now declared "const".
169       (ii)  The code for \cx, which used the "case flipping" table to upper case
170             lower case letters, now just substracts 32. This is ASCII-specific,
171             but the whole concept of \cx is ASCII-specific, so it seems
172             reasonable.
174       (iii) PCRE was using its character types table to recognize decimal and
175             hexadecimal digits in the pattern. This is silly, because it handles
176             only 0-9, a-f, and A-F, but the character types table is locale-
177             specific, which means strange things might happen. A private
178             table is now used for this - though it costs 256 bytes, a table is
179             much faster than multiple explicit tests. Of course, the standard
180             character types table is still used for matching digits in subject
181             strings against \d.
183       (iv)  Strictly, the identifier ESC_t is reserved by POSIX (all identifiers
184             ending in _t are). So I've renamed it as ESC_tee.
186    3. The first argument for regexec() in the POSIX wrapper should have been
187       defined as "const".
189    4. Changed pcretest to use malloc() for its buffers so that they can be
190       Electric Fenced for debugging.
192    5. There were several places in the code where, in UTF-8 mode, PCRE would try
193       to read one or more bytes before the start of the subject string. Often this
194       had no effect on PCRE's behaviour, but in some circumstances it could
195       provoke a segmentation fault.
197    6. A lookbehind at the start of a pattern in UTF-8 mode could also cause PCRE
198       to try to read one or more bytes before the start of the subject string.
200    7. A lookbehind in a pattern matched in non-UTF-8 mode on a PCRE compiled with
201       UTF-8 support could misbehave in various ways if the subject string
202       contained bytes with the 0x80 bit set and the 0x40 bit unset in a lookbehind
203       area. (PCRE was not checking for the UTF-8 mode flag, and trying to move
204       back over UTF-8 characters.)
207  Version 4.2 14-Apr-03  Version 4.2 14-Apr-03
208  ---------------------  ---------------------

Removed from v.67  
changed lines
  Added in v.73

  ViewVC Help
Powered by ViewVC 1.1.5