/[pcre]/code/trunk/ChangeLog
ViewVC logotype

Diff of /code/trunk/ChangeLog

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 275 by ph10, Wed Nov 21 15:35:09 2007 UTC revision 1186 by ph10, Sun Oct 28 17:57:32 2012 UTC
# Line 1  Line 1 
1  ChangeLog for PCRE  ChangeLog for PCRE
2  ------------------  ------------------
3    
4  Version 7.5 12-Nov-07  Version 8.32
5    ------------
6    
7    1.  Improved JIT compiler optimizations for first character search and single
8        character iterators.
9    
10    2.  Supporting IBM XL C compilers for PPC architectures in the JIT compiler.
11        Patch by Daniel Richard G.
12    
13    3.  Single character iterator optimizations in the JIT compiler.
14    
15    4.  Improved JIT compiler optimizations for character ranges.
16    
17    5.  Rename the "leave" variable names to "quit" to improve WinCE compatibility.
18        Reported by Giuseppe D'Angelo.
19    
20    6.  The PCRE_STARTLINE bit, indicating that a match can occur only at the start
21        of a line, was being set incorrectly in cases where .* appeared inside
22        atomic brackets at the start of a pattern, or where there was a subsequent
23        *PRUNE or *SKIP.
24    
25    7.  Improved instruction cache flush for POWER/PowerPC.
26        Patch by Daniel Richard G.
27    
28    8.  Fixed a number of issues in pcregrep, making it more compatible with GNU
29        grep:
30    
31        (a) There is now no limit to the number of patterns to be matched.
32    
33        (b) An error is given if a pattern is too long.
34    
35        (c) Multiple uses of --exclude, --exclude-dir, --include, and --include-dir
36            are now supported.
37    
38        (d) --exclude-from and --include-from (multiple use) have been added.
39    
40        (e) Exclusions and inclusions now apply to all files and directories, not
41            just to those obtained from scanning a directory recursively.
42    
43        (f) Multiple uses of -f and --file-list are now supported.
44    
45        (g) In a Windows environment, the default for -d has been changed from
46            "read" (the GNU grep default) to "skip", because otherwise the presence
47            of a directory in the file list provokes an error.
48    
49        (h) The documentation has been revised and clarified in places.
50    
51    9.  Improve the matching speed of capturing brackets.
52    
53    10. Changed the meaning of \X so that it now matches a Unicode extended
54        grapheme cluster.
55    
56    11. Patch by Daniel Richard G to the autoconf files to add a macro for sorting
57        out POSIX threads when JIT support is configured.
58    
59    12. Added support for PCRE_STUDY_EXTRA_NEEDED.
60    
61    13. In the POSIX wrapper regcomp() function, setting re_nsub field in the preg
62        structure could go wrong in environments where size_t is not the same size
63        as int.
64    
65    14. Applied user-supplied patch to pcrecpp.cc to allow PCRE_NO_UTF8_CHECK to be
66        set.
67    
68    15. The EBCDIC support had decayed; later updates to the code had included
69        explicit references to (e.g.) \x0a instead of CHAR_LF. There has been a
70        general tidy up of EBCDIC-related issues, and the documentation was also
71        not quite right. There is now a test that can be run on ASCII systems to
72        check some of the EBCDIC-related things (but is it not a full test).
73    
74    16. The new PCRE_STUDY_EXTRA_NEEDED option is now used by pcregrep, resulting
75        in a small tidy to the code.
76    
77    17. Fix JIT tests when UTF is disabled and both 8 and 16 bit mode are enabled.
78    
79    18. If the --only-matching (-o) option in pcregrep is specified multiple
80        times, each one causes appropriate output. For example, -o1 -o2 outputs the
81        substrings matched by the 1st and 2nd capturing parentheses. A separating
82        string can be specified by --om-separator (default empty).
83    
84    19. Improving the first n character searches.
85    
86    20. Turn case lists for horizontal and vertical white space into macros so that
87        they are defined only once.
88    
89    21. This set of changes together give more compatible Unicode case-folding
90        behaviour for characters that have more than one other case when UCP
91        support is available.
92    
93        (a) The Unicode property table now has offsets into a new table of sets of
94            three or more characters that are case-equivalent. The MultiStage2.py
95            script that generates these tables (the pcre_ucd.c file) now scans
96            CaseFolding.txt instead of UnicodeData.txt for character case
97            information.
98    
99        (b) The code for adding characters or ranges of characters to a character
100            class has been abstracted into a generalized function that also handles
101            case-independence. In UTF-mode with UCP support, this uses the new data
102            to handle characters with more than one other case.
103    
104        (c) A bug that is fixed as a result of (b) is that codepoints less than 256
105            whose other case is greater than 256 are now correctly matched
106            caselessly. Previously, the high codepoint matched the low one, but not
107            vice versa.
108    
109        (d) The processing of \h, \H, \v, and \ in character classes now makes use
110            of the new class addition function, using character lists defined as
111            macros alongside the case definitions of 20 above.
112    
113        (e) Caseless back references now work with characters that have more than
114            one other case.
115    
116        (f) General caseless matching of characters with more than one other case
117            is supported.
118    
119    22. Unicode character properties were updated from Unicode 6.2.0
120    
121    23. Improved CMake support under Windows. Patch by Daniel Richard G.
122    
123    24. Add support for 32-bit character strings, and UTF-32
124    
125    25. Major JIT compiler update (code refactoring and bugfixing).
126        Experimental Sparc 32 support is added.
127    
128    26. Applied a modified version of Daniel Richard G's patch to create
129        pcre.h.generic and config.h.generic by "make" instead of in the
130        PrepareRelease script.
131    
132    27. Added a definition for CHAR_NULL (helpful for the z/OS port), and use it in
133        pcre_compile.c when checking for a zero character.
134    
135    
136    Version 8.31 06-July-2012
137    -------------------------
138    
139    1.  Fixing a wrong JIT test case and some compiler warnings.
140    
141    2.  Removed a bashism from the RunTest script.
142    
143    3.  Add a cast to pcre_exec.c to fix the warning "unary minus operator applied
144        to unsigned type, result still unsigned" that was given by an MS compiler
145        on encountering the code "-sizeof(xxx)".
146    
147    4.  Partial matching support is added to the JIT compiler.
148    
149    5.  Fixed several bugs concerned with partial matching of items that consist
150        of more than one character:
151    
152        (a) /^(..)\1/ did not partially match "aba" because checking references was
153            done on an "all or nothing" basis. This also applied to repeated
154            references.
155    
156        (b) \R did not give a hard partial match if \r was found at the end of the
157            subject.
158    
159        (c) \X did not give a hard partial match after matching one or more
160            characters at the end of the subject.
161    
162        (d) When newline was set to CRLF, a pattern such as /a$/ did not recognize
163            a partial match for the string "\r".
164    
165        (e) When newline was set to CRLF, the metacharacter "." did not recognize
166            a partial match for a CR character at the end of the subject string.
167    
168    6.  If JIT is requested using /S++ or -s++ (instead of just /S+ or -s+) when
169        running pcretest, the text "(JIT)" added to the output whenever JIT is
170        actually used to run the match.
171    
172    7.  Individual JIT compile options can be set in pcretest by following -s+[+]
173        or /S+[+] with a digit between 1 and 7.
174    
175    8.  OP_NOT now supports any UTF character not just single-byte ones.
176    
177    9.  (*MARK) control verb is now supported by the JIT compiler.
178    
179    10. The command "./RunTest list" lists the available tests without actually
180        running any of them. (Because I keep forgetting what they all are.)
181    
182    11. Add PCRE_INFO_MAXLOOKBEHIND.
183    
184    12. Applied a (slightly modified) user-supplied patch that improves performance
185        when the heap is used for recursion (compiled with --disable-stack-for-
186        recursion). Instead of malloc and free for each heap frame each time a
187        logical recursion happens, frames are retained on a chain and re-used where
188        possible. This sometimes gives as much as 30% improvement.
189    
190    13. As documented, (*COMMIT) is now confined to within a recursive subpattern
191        call.
192    
193    14. As documented, (*COMMIT) is now confined to within a positive assertion.
194    
195    15. It is now possible to link pcretest with libedit as an alternative to
196        libreadline.
197    
198    16. (*COMMIT) control verb is now supported by the JIT compiler.
199    
200    17. The Unicode data tables have been updated to Unicode 6.1.0.
201    
202    18. Added --file-list option to pcregrep.
203    
204    19. Added binary file support to pcregrep, including the -a, --binary-files,
205        -I, and --text options.
206    
207    20. The madvise function is renamed for posix_madvise for QNX compatibility
208        reasons. Fixed by Giuseppe D'Angelo.
209    
210    21. Fixed a bug for backward assertions with REVERSE 0 in the JIT compiler.
211    
212    22. Changed the option for creating symbolic links for 16-bit man pages from
213        -s to -sf so that re-installing does not cause issues.
214    
215    23. Support PCRE_NO_START_OPTIMIZE in JIT as (*MARK) support requires it.
216    
217    24. Fixed a very old bug in pcretest that caused errors with restarted DFA
218        matches in certain environments (the workspace was not being correctly
219        retained). Also added to pcre_dfa_exec() a simple plausibility check on
220        some of the workspace data at the beginning of a restart.
221    
222    25. \s*\R was auto-possessifying the \s* when it should not, whereas \S*\R
223        was not doing so when it should - probably a typo introduced by SVN 528
224        (change 8.10/14).
225    
226    26. When PCRE_UCP was not set, \w+\x{c4} was incorrectly auto-possessifying the
227        \w+ when the character tables indicated that \x{c4} was a word character.
228        There were several related cases, all because the tests for doing a table
229        lookup were testing for characters less than 127 instead of 255.
230    
231    27. If a pattern contains capturing parentheses that are not used in a match,
232        their slots in the ovector are set to -1. For those that are higher than
233        any matched groups, this happens at the end of processing. In the case when
234        there were back references that the ovector was too small to contain
235        (causing temporary malloc'd memory to be used during matching), and the
236        highest capturing number was not used, memory off the end of the ovector
237        was incorrectly being set to -1. (It was using the size of the temporary
238        memory instead of the true size.)
239    
240    28. To catch bugs like 27 using valgrind, when pcretest is asked to specify an
241        ovector size, it uses memory at the end of the block that it has got.
242    
243    29. Check for an overlong MARK name and give an error at compile time. The
244        limit is 255 for the 8-bit library and 65535 for the 16-bit library.
245    
246    30. JIT compiler update.
247    
248    31. JIT is now supported on jailbroken iOS devices. Thanks for Ruiger
249        Rill for the patch.
250    
251    32. Put spaces around SLJIT_PRINT_D in the JIT compiler. Required by CXX11.
252    
253    33. Variable renamings in the PCRE-JIT compiler. No functionality change.
254    
255    34. Fixed typos in pcregrep: in two places there was SUPPORT_LIBZ2 instead of
256        SUPPORT_LIBBZ2. This caused a build problem when bzip2 but not gzip (zlib)
257        was enabled.
258    
259    35. Improve JIT code generation for greedy plus quantifier.
260    
261    36. When /((?:a?)*)*c/ or /((?>a?)*)*c/ was matched against "aac", it set group
262        1 to "aa" instead of to an empty string. The bug affected repeated groups
263        that could potentially match an empty string.
264    
265    37. Optimizing single character iterators in JIT.
266    
267    38. Wide characters specified with \uxxxx in JavaScript mode are now subject to
268        the same checks as \x{...} characters in non-JavaScript mode. Specifically,
269        codepoints that are too big for the mode are faulted, and in a UTF mode,
270        disallowed codepoints are also faulted.
271    
272    39. If PCRE was compiled with UTF support, in three places in the DFA
273        matcher there was code that should only have been obeyed in UTF mode, but
274        was being obeyed unconditionally. In 8-bit mode this could cause incorrect
275        processing when bytes with values greater than 127 were present. In 16-bit
276        mode the bug would be provoked by values in the range 0xfc00 to 0xdc00. In
277        both cases the values are those that cannot be the first data item in a UTF
278        character. The three items that might have provoked this were recursions,
279        possessively repeated groups, and atomic groups.
280    
281    40. Ensure that libpcre is explicitly listed in the link commands for pcretest
282        and pcregrep, because some OS require shared objects to be explicitly
283        passed to ld, causing the link step to fail if they are not.
284    
285    41. There were two incorrect #ifdefs in pcre_study.c, meaning that, in 16-bit
286        mode, patterns that started with \h* or \R* might be incorrectly matched.
287    
288    
289    Version 8.30 04-February-2012
290    -----------------------------
291    
292    1.  Renamed "isnumber" as "is_a_number" because in some Mac environments this
293        name is defined in ctype.h.
294    
295    2.  Fixed a bug in fixed-length calculation for lookbehinds that would show up
296        only in quite long subpatterns.
297    
298    3.  Removed the function pcre_info(), which has been obsolete and deprecated
299        since it was replaced by pcre_fullinfo() in February 2000.
300    
301    4.  For a non-anchored pattern, if (*SKIP) was given with a name that did not
302        match a (*MARK), and the match failed at the start of the subject, a
303        reference to memory before the start of the subject could occur. This bug
304        was introduced by fix 17 of release 8.21.
305    
306    5.  A reference to an unset group with zero minimum repetition was giving
307        totally wrong answers (in non-JavaScript-compatibility mode). For example,
308        /(another)?(\1?)test/ matched against "hello world test". This bug was
309        introduced in release 8.13.
310    
311    6.  Add support for 16-bit character strings (a large amount of work involving
312        many changes and refactorings).
313    
314    7.  RunGrepTest failed on msys because \r\n was replaced by whitespace when the
315        command "pattern=`printf 'xxx\r\njkl'`" was run. The pattern is now taken
316        from a file.
317    
318    8.  Ovector size of 2 is also supported by JIT based pcre_exec (the ovector size
319        rounding is not applied in this particular case).
320    
321    9.  The invalid Unicode surrogate codepoints U+D800 to U+DFFF are now rejected
322        if they appear, or are escaped, in patterns.
323    
324    10. Get rid of a number of -Wunused-but-set-variable warnings.
325    
326    11. The pattern /(?=(*:x))(q|)/ matches an empty string, and returns the mark
327        "x". The similar pattern /(?=(*:x))((*:y)q|)/ did not return a mark at all.
328        Oddly, Perl behaves the same way. PCRE has been fixed so that this pattern
329        also returns the mark "x". This bug applied to capturing parentheses,
330        non-capturing parentheses, and atomic parentheses. It also applied to some
331        assertions.
332    
333    12. Stephen Kelly's patch to CMakeLists.txt allows it to parse the version
334        information out of configure.ac instead of relying on pcre.h.generic, which
335        is not stored in the repository.
336    
337    13. Applied Dmitry V. Levin's patch for a more portable method for linking with
338        -lreadline.
339    
340    14. ZH added PCRE_CONFIG_JITTARGET; added its output to pcretest -C.
341    
342    15. Applied Graycode's patch to put the top-level frame on the stack rather
343        than the heap when not using the stack for recursion. This gives a
344        performance improvement in many cases when recursion is not deep.
345    
346    16. Experimental code added to "pcretest -C" to output the stack frame size.
347    
348    
349    Version 8.21 12-Dec-2011
350    ------------------------
351    
352    1.  Updating the JIT compiler.
353    
354    2.  JIT compiler now supports OP_NCREF, OP_RREF and OP_NRREF. New test cases
355        are added as well.
356    
357    3.  Fix cache-flush issue on PowerPC (It is still an experimental JIT port).
358        PCRE_EXTRA_TABLES is not suported by JIT, and should be checked before
359        calling _pcre_jit_exec. Some extra comments are added.
360    
361    4.  (*MARK) settings inside atomic groups that do not contain any capturing
362        parentheses, for example, (?>a(*:m)), were not being passed out. This bug
363        was introduced by change 18 for 8.20.
364    
365    5.  Supporting of \x, \U and \u in JavaScript compatibility mode based on the
366        ECMA-262 standard.
367    
368    6.  Lookbehinds such as (?<=a{2}b) that contained a fixed repetition were
369        erroneously being rejected as "not fixed length" if PCRE_CASELESS was set.
370        This bug was probably introduced by change 9 of 8.13.
371    
372    7.  While fixing 6 above, I noticed that a number of other items were being
373        incorrectly rejected as "not fixed length". This arose partly because newer
374        opcodes had not been added to the fixed-length checking code. I have (a)
375        corrected the bug and added tests for these items, and (b) arranged for an
376        error to occur if an unknown opcode is encountered while checking for fixed
377        length instead of just assuming "not fixed length". The items that were
378        rejected were: (*ACCEPT), (*COMMIT), (*FAIL), (*MARK), (*PRUNE), (*SKIP),
379        (*THEN), \h, \H, \v, \V, and single character negative classes with fixed
380        repetitions, e.g. [^a]{3}, with and without PCRE_CASELESS.
381    
382    8.  A possessively repeated conditional subpattern such as (?(?=c)c|d)++ was
383        being incorrectly compiled and would have given unpredicatble results.
384    
385    9.  A possessively repeated subpattern with minimum repeat count greater than
386        one behaved incorrectly. For example, (A){2,}+ behaved as if it was
387        (A)(A)++ which meant that, after a subsequent mismatch, backtracking into
388        the first (A) could occur when it should not.
389    
390    10. Add a cast and remove a redundant test from the code.
391    
392    11. JIT should use pcre_malloc/pcre_free for allocation.
393    
394    12. Updated pcre-config so that it no longer shows -L/usr/lib, which seems
395        best practice nowadays, and helps with cross-compiling. (If the exec_prefix
396        is anything other than /usr, -L is still shown).
397    
398    13. In non-UTF-8 mode, \C is now supported in lookbehinds and DFA matching.
399    
400    14. Perl does not support \N without a following name in a [] class; PCRE now
401        also gives an error.
402    
403    15. If a forward reference was repeated with an upper limit of around 2000,
404        it caused the error "internal error: overran compiling workspace". The
405        maximum number of forward references (including repeats) was limited by the
406        internal workspace, and dependent on the LINK_SIZE. The code has been
407        rewritten so that the workspace expands (via pcre_malloc) if necessary, and
408        the default depends on LINK_SIZE. There is a new upper limit (for safety)
409        of around 200,000 forward references. While doing this, I also speeded up
410        the filling in of repeated forward references.
411    
412    16. A repeated forward reference in a pattern such as (a)(?2){2}(.) was
413        incorrectly expecting the subject to contain another "a" after the start.
414    
415    17. When (*SKIP:name) is activated without a corresponding (*MARK:name) earlier
416        in the match, the SKIP should be ignored. This was not happening; instead
417        the SKIP was being treated as NOMATCH. For patterns such as
418        /A(*MARK:A)A+(*SKIP:B)Z|AAC/ this meant that the AAC branch was never
419        tested.
420    
421    18. The behaviour of (*MARK), (*PRUNE), and (*THEN) has been reworked and is
422        now much more compatible with Perl, in particular in cases where the result
423        is a non-match for a non-anchored pattern. For example, if
424        /b(*:m)f|a(*:n)w/ is matched against "abc", the non-match returns the name
425        "m", where previously it did not return a name. A side effect of this
426        change is that for partial matches, the last encountered mark name is
427        returned, as for non matches. A number of tests that were previously not
428        Perl-compatible have been moved into the Perl-compatible test files. The
429        refactoring has had the pleasing side effect of removing one argument from
430        the match() function, thus reducing its stack requirements.
431    
432    19. If the /S+ option was used in pcretest to study a pattern using JIT,
433        subsequent uses of /S (without +) incorrectly behaved like /S+.
434    
435    21. Retrieve executable code size support for the JIT compiler and fixing
436        some warnings.
437    
438    22. A caseless match of a UTF-8 character whose other case uses fewer bytes did
439        not work when the shorter character appeared right at the end of the
440        subject string.
441    
442    23. Added some (int) casts to non-JIT modules to reduce warnings on 64-bit
443        systems.
444    
445    24. Added PCRE_INFO_JITSIZE to pass on the value from (21) above, and also
446        output it when the /M option is used in pcretest.
447    
448    25. The CheckMan script was not being included in the distribution. Also, added
449        an explicit "perl" to run Perl scripts from the PrepareRelease script
450        because this is reportedly needed in Windows.
451    
452    26. If study data was being save in a file and studying had not found a set of
453        "starts with" bytes for the pattern, the data written to the file (though
454        never used) was taken from uninitialized memory and so caused valgrind to
455        complain.
456    
457    27. Updated RunTest.bat as provided by Sheri Pierce.
458    
459    28. Fixed a possible uninitialized memory bug in pcre_jit_compile.c.
460    
461    29. Computation of memory usage for the table of capturing group names was
462        giving an unnecessarily large value.
463    
464    
465    Version 8.20 21-Oct-2011
466    ------------------------
467    
468    1.  Change 37 of 8.13 broke patterns like [:a]...[b:] because it thought it had
469        a POSIX class. After further experiments with Perl, which convinced me that
470        Perl has bugs and confusions, a closing square bracket is no longer allowed
471        in a POSIX name. This bug also affected patterns with classes that started
472        with full stops.
473    
474    2.  If a pattern such as /(a)b|ac/ is matched against "ac", there is no
475        captured substring, but while checking the failing first alternative,
476        substring 1 is temporarily captured. If the output vector supplied to
477        pcre_exec() was not big enough for this capture, the yield of the function
478        was still zero ("insufficient space for captured substrings"). This cannot
479        be totally fixed without adding another stack variable, which seems a lot
480        of expense for a edge case. However, I have improved the situation in cases
481        such as /(a)(b)x|abc/ matched against "abc", where the return code
482        indicates that fewer than the maximum number of slots in the ovector have
483        been set.
484    
485    3.  Related to (2) above: when there are more back references in a pattern than
486        slots in the output vector, pcre_exec() uses temporary memory during
487        matching, and copies in the captures as far as possible afterwards. It was
488        using the entire output vector, but this conflicts with the specification
489        that only 2/3 is used for passing back captured substrings. Now it uses
490        only the first 2/3, for compatibility. This is, of course, another edge
491        case.
492    
493    4.  Zoltan Herczeg's just-in-time compiler support has been integrated into the
494        main code base, and can be used by building with --enable-jit. When this is
495        done, pcregrep automatically uses it unless --disable-pcregrep-jit or the
496        runtime --no-jit option is given.
497    
498    5.  When the number of matches in a pcre_dfa_exec() run exactly filled the
499        ovector, the return from the function was zero, implying that there were
500        other matches that did not fit. The correct "exactly full" value is now
501        returned.
502    
503    6.  If a subpattern that was called recursively or as a subroutine contained
504        (*PRUNE) or any other control that caused it to give a non-standard return,
505        invalid errors such as "Error -26 (nested recursion at the same subject
506        position)" or even infinite loops could occur.
507    
508    7.  If a pattern such as /a(*SKIP)c|b(*ACCEPT)|/ was studied, it stopped
509        computing the minimum length on reaching *ACCEPT, and so ended up with the
510        wrong value of 1 rather than 0. Further investigation indicates that
511        computing a minimum subject length in the presence of *ACCEPT is difficult
512        (think back references, subroutine calls), and so I have changed the code
513        so that no minimum is registered for a pattern that contains *ACCEPT.
514    
515    8.  If (*THEN) was present in the first (true) branch of a conditional group,
516        it was not handled as intended. [But see 16 below.]
517    
518    9.  Replaced RunTest.bat and CMakeLists.txt with improved versions provided by
519        Sheri Pierce.
520    
521    10. A pathological pattern such as /(*ACCEPT)a/ was miscompiled, thinking that
522        the first byte in a match must be "a".
523    
524    11. Change 17 for 8.13 increased the recursion depth for patterns like
525        /a(?:.)*?a/ drastically. I've improved things by remembering whether a
526        pattern contains any instances of (*THEN). If it does not, the old
527        optimizations are restored. It would be nice to do this on a per-group
528        basis, but at the moment that is not feasible.
529    
530    12. In some environments, the output of pcretest -C is CRLF terminated. This
531        broke RunTest's code that checks for the link size. A single white space
532        character after the value is now allowed for.
533    
534    13. RunTest now checks for the "fr" locale as well as for "fr_FR" and "french".
535        For "fr", it uses the Windows-specific input and output files.
536    
537    14. If (*THEN) appeared in a group that was called recursively or as a
538        subroutine, it did not work as intended. [But see next item.]
539    
540    15. Consider the pattern /A (B(*THEN)C) | D/ where A, B, C, and D are complex
541        pattern fragments (but not containing any | characters). If A and B are
542        matched, but there is a failure in C so that it backtracks to (*THEN), PCRE
543        was behaving differently to Perl. PCRE backtracked into A, but Perl goes to
544        D. In other words, Perl considers parentheses that do not contain any |
545        characters to be part of a surrounding alternative, whereas PCRE was
546        treading (B(*THEN)C) the same as (B(*THEN)C|(*FAIL)) -- which Perl handles
547        differently. PCRE now behaves in the same way as Perl, except in the case
548        of subroutine/recursion calls such as (?1) which have in any case always
549        been different (but PCRE had them first :-).
550    
551    16. Related to 15 above: Perl does not treat the | in a conditional group as
552        creating alternatives. Such a group is treated in the same way as an
553        ordinary group without any | characters when processing (*THEN). PCRE has
554        been changed to match Perl's behaviour.
555    
556    17. If a user had set PCREGREP_COLO(U)R to something other than 1:31, the
557        RunGrepTest script failed.
558    
559    18. Change 22 for version 13 caused atomic groups to use more stack. This is
560        inevitable for groups that contain captures, but it can lead to a lot of
561        stack use in large patterns. The old behaviour has been restored for atomic
562        groups that do not contain any capturing parentheses.
563    
564    19. If the PCRE_NO_START_OPTIMIZE option was set for pcre_compile(), it did not
565        suppress the check for a minimum subject length at run time. (If it was
566        given to pcre_exec() or pcre_dfa_exec() it did work.)
567    
568    20. Fixed an ASCII-dependent infelicity in pcretest that would have made it
569        fail to work when decoding hex characters in data strings in EBCDIC
570        environments.
571    
572    21. It appears that in at least one Mac OS environment, the isxdigit() function
573        is implemented as a macro that evaluates to its argument more than once,
574        contravening the C 90 Standard (I haven't checked a later standard). There
575        was an instance in pcretest which caused it to go wrong when processing
576        \x{...} escapes in subject strings. The has been rewritten to avoid using
577        things like p++ in the argument of isxdigit().
578    
579    
580    Version 8.13 16-Aug-2011
581    ------------------------
582    
583    1.  The Unicode data tables have been updated to Unicode 6.0.0.
584    
585    2.  Two minor typos in pcre_internal.h have been fixed.
586    
587    3.  Added #include <string.h> to pcre_scanner_unittest.cc, pcrecpp.cc, and
588        pcrecpp_unittest.cc. They are needed for strcmp(), memset(), and strchr()
589        in some environments (e.g. Solaris 10/SPARC using Sun Studio 12U2).
590    
591    4.  There were a number of related bugs in the code for matching backrefences
592        caselessly in UTF-8 mode when codes for the characters concerned were
593        different numbers of bytes. For example, U+023A and U+2C65 are an upper
594        and lower case pair, using 2 and 3 bytes, respectively. The main bugs were:
595        (a) A reference to 3 copies of a 2-byte code matched only 2 of a 3-byte
596        code. (b) A reference to 2 copies of a 3-byte code would not match 2 of a
597        2-byte code at the end of the subject (it thought there wasn't enough data
598        left).
599    
600    5.  Comprehensive information about what went wrong is now returned by
601        pcre_exec() and pcre_dfa_exec() when the UTF-8 string check fails, as long
602        as the output vector has at least 2 elements. The offset of the start of
603        the failing character and a reason code are placed in the vector.
604    
605    6.  When the UTF-8 string check fails for pcre_compile(), the offset that is
606        now returned is for the first byte of the failing character, instead of the
607        last byte inspected. This is an incompatible change, but I hope it is small
608        enough not to be a problem. It makes the returned offset consistent with
609        pcre_exec() and pcre_dfa_exec().
610    
611    7.  pcretest now gives a text phrase as well as the error number when
612        pcre_exec() or pcre_dfa_exec() fails; if the error is a UTF-8 check
613        failure, the offset and reason code are output.
614    
615    8.  When \R was used with a maximizing quantifier it failed to skip backwards
616        over a \r\n pair if the subsequent match failed. Instead, it just skipped
617        back over a single character (\n). This seems wrong (because it treated the
618        two characters as a single entity when going forwards), conflicts with the
619        documentation that \R is equivalent to (?>\r\n|\n|...etc), and makes the
620        behaviour of \R* different to (\R)*, which also seems wrong. The behaviour
621        has been changed.
622    
623    9.  Some internal refactoring has changed the processing so that the handling
624        of the PCRE_CASELESS and PCRE_MULTILINE options is done entirely at compile
625        time (the PCRE_DOTALL option was changed this way some time ago: version
626        7.7 change 16). This has made it possible to abolish the OP_OPT op code,
627        which was always a bit of a fudge. It also means that there is one less
628        argument for the match() function, which reduces its stack requirements
629        slightly. This change also fixes an incompatibility with Perl: the pattern
630        (?i:([^b]))(?1) should not match "ab", but previously PCRE gave a match.
631    
632    10. More internal refactoring has drastically reduced the number of recursive
633        calls to match() for possessively repeated groups such as (abc)++ when
634        using pcre_exec().
635    
636    11. While implementing 10, a number of bugs in the handling of groups were
637        discovered and fixed:
638    
639        (?<=(a)+) was not diagnosed as invalid (non-fixed-length lookbehind).
640        (a|)*(?1) gave a compile-time internal error.
641        ((a|)+)+  did not notice that the outer group could match an empty string.
642        (^a|^)+   was not marked as anchored.
643        (.*a|.*)+ was not marked as matching at start or after a newline.
644    
645    12. Yet more internal refactoring has removed another argument from the match()
646        function. Special calls to this function are now indicated by setting a
647        value in a variable in the "match data" data block.
648    
649    13. Be more explicit in pcre_study() instead of relying on "default" for
650        opcodes that mean there is no starting character; this means that when new
651        ones are added and accidentally left out of pcre_study(), testing should
652        pick them up.
653    
654    14. The -s option of pcretest has been documented for ages as being an old
655        synonym of -m (show memory usage). I have changed it to mean "force study
656        for every regex", that is, assume /S for every regex. This is similar to -i
657        and -d etc. It's slightly incompatible, but I'm hoping nobody is still
658        using it. It makes it easier to run collections of tests with and without
659        study enabled, and thereby test pcre_study() more easily. All the standard
660        tests are now run with and without -s (but some patterns can be marked as
661        "never study" - see 20 below).
662    
663    15. When (*ACCEPT) was used in a subpattern that was called recursively, the
664        restoration of the capturing data to the outer values was not happening
665        correctly.
666    
667    16. If a recursively called subpattern ended with (*ACCEPT) and matched an
668        empty string, and PCRE_NOTEMPTY was set, pcre_exec() thought the whole
669        pattern had matched an empty string, and so incorrectly returned a no
670        match.
671    
672    17. There was optimizing code for the last branch of non-capturing parentheses,
673        and also for the obeyed branch of a conditional subexpression, which used
674        tail recursion to cut down on stack usage. Unfortunately, now that there is
675        the possibility of (*THEN) occurring in these branches, tail recursion is
676        no longer possible because the return has to be checked for (*THEN). These
677        two optimizations have therefore been removed. [But see 8.20/11 above.]
678    
679    18. If a pattern containing \R was studied, it was assumed that \R always
680        matched two bytes, thus causing the minimum subject length to be
681        incorrectly computed because \R can also match just one byte.
682    
683    19. If a pattern containing (*ACCEPT) was studied, the minimum subject length
684        was incorrectly computed.
685    
686    20. If /S is present twice on a test pattern in pcretest input, it now
687        *disables* studying, thereby overriding the use of -s on the command line
688        (see 14 above). This is necessary for one or two tests to keep the output
689        identical in both cases.
690    
691    21. When (*ACCEPT) was used in an assertion that matched an empty string and
692        PCRE_NOTEMPTY was set, PCRE applied the non-empty test to the assertion.
693    
694    22. When an atomic group that contained a capturing parenthesis was
695        successfully matched, but the branch in which it appeared failed, the
696        capturing was not being forgotten if a higher numbered group was later
697        captured. For example, /(?>(a))b|(a)c/ when matching "ac" set capturing
698        group 1 to "a", when in fact it should be unset. This applied to multi-
699        branched capturing and non-capturing groups, repeated or not, and also to
700        positive assertions (capturing in negative assertions does not happen
701        in PCRE) and also to nested atomic groups.
702    
703    23. Add the ++ qualifier feature to pcretest, to show the remainder of the
704        subject after a captured substring, to make it easier to tell which of a
705        number of identical substrings has been captured.
706    
707    24. The way atomic groups are processed by pcre_exec() has been changed so that
708        if they are repeated, backtracking one repetition now resets captured
709        values correctly. For example, if ((?>(a+)b)+aabab) is matched against
710        "aaaabaaabaabab" the value of captured group 2 is now correctly recorded as
711        "aaa". Previously, it would have been "a". As part of this code
712        refactoring, the way recursive calls are handled has also been changed.
713    
714    25. If an assertion condition captured any substrings, they were not passed
715        back unless some other capturing happened later. For example, if
716        (?(?=(a))a) was matched against "a", no capturing was returned.
717    
718    26. When studying a pattern that contained subroutine calls or assertions,
719        the code for finding the minimum length of a possible match was handling
720        direct recursions such as (xxx(?1)|yyy) but not mutual recursions (where
721        group 1 called group 2 while simultaneously a separate group 2 called group
722        1). A stack overflow occurred in this case. I have fixed this by limiting
723        the recursion depth to 10.
724    
725    27. Updated RunTest.bat in the distribution to the version supplied by Tom
726        Fortmann. This supports explicit test numbers on the command line, and has
727        argument validation and error reporting.
728    
729    28. An instance of \X with an unlimited repeat could fail if at any point the
730        first character it looked at was a mark character.
731    
732    29. Some minor code refactoring concerning Unicode properties and scripts
733        should reduce the stack requirement of match() slightly.
734    
735    30. Added the '=' option to pcretest to check the setting of unused capturing
736        slots at the end of the pattern, which are documented as being -1, but are
737        not included in the return count.
738    
739    31. If \k was not followed by a braced, angle-bracketed, or quoted name, PCRE
740        compiled something random. Now it gives a compile-time error (as does
741        Perl).
742    
743    32. A *MARK encountered during the processing of a positive assertion is now
744        recorded and passed back (compatible with Perl).
745    
746    33. If --only-matching or --colour was set on a pcregrep call whose pattern
747        had alternative anchored branches, the search for a second match in a line
748        was done as if at the line start. Thus, for example, /^01|^02/ incorrectly
749        matched the line "0102" twice. The same bug affected patterns that started
750        with a backwards assertion. For example /\b01|\b02/ also matched "0102"
751        twice.
752    
753    34. Previously, PCRE did not allow quantification of assertions. However, Perl
754        does, and because of capturing effects, quantifying parenthesized
755        assertions may at times be useful. Quantifiers are now allowed for
756        parenthesized assertions.
757    
758    35. A minor code tidy in pcre_compile() when checking options for \R usage.
759    
760    36. \g was being checked for fancy things in a character class, when it should
761        just be a literal "g".
762    
763    37. PCRE was rejecting [:a[:digit:]] whereas Perl was not. It seems that the
764        appearance of a nested POSIX class supersedes an apparent external class.
765        For example, [:a[:digit:]b:] matches "a", "b", ":", or a digit. Also,
766        unescaped square brackets may also appear as part of class names. For
767        example, [:a[:abc]b:] gives unknown class "[:abc]b:]". PCRE now behaves
768        more like Perl. (But see 8.20/1 above.)
769    
770    38. PCRE was giving an error for \N with a braced quantifier such as {1,} (this
771        was because it thought it was \N{name}, which is not supported).
772    
773    39. Add minix to OS list not supporting the -S option in pcretest.
774    
775    40. PCRE tries to detect cases of infinite recursion at compile time, but it
776        cannot analyze patterns in sufficient detail to catch mutual recursions
777        such as ((?1))((?2)). There is now a runtime test that gives an error if a
778        subgroup is called recursively as a subpattern for a second time at the
779        same position in the subject string. In previous releases this might have
780        been caught by the recursion limit, or it might have run out of stack.
781    
782    41. A pattern such as /(?(R)a+|(?R)b)/ is quite safe, as the recursion can
783        happen only once. PCRE was, however incorrectly giving a compile time error
784        "recursive call could loop indefinitely" because it cannot analyze the
785        pattern in sufficient detail. The compile time test no longer happens when
786        PCRE is compiling a conditional subpattern, but actual runaway loops are
787        now caught at runtime (see 40 above).
788    
789    42. It seems that Perl allows any characters other than a closing parenthesis
790        to be part of the NAME in (*MARK:NAME) and other backtracking verbs. PCRE
791        has been changed to be the same.
792    
793    43. Updated configure.ac to put in more quoting round AC_LANG_PROGRAM etc. so
794        as not to get warnings when autogen.sh is called. Also changed
795        AC_PROG_LIBTOOL (deprecated) to LT_INIT (the current macro).
796    
797    44. To help people who use pcregrep to scan files containing exceedingly long
798        lines, the following changes have been made:
799    
800        (a) The default value of the buffer size parameter has been increased from
801            8K to 20K. (The actual buffer used is three times this size.)
802    
803        (b) The default can be changed by ./configure --with-pcregrep-bufsize when
804            PCRE is built.
805    
806        (c) A --buffer-size=n option has been added to pcregrep, to allow the size
807            to be set at run time.
808    
809        (d) Numerical values in pcregrep options can be followed by K or M, for
810            example --buffer-size=50K.
811    
812        (e) If a line being scanned overflows pcregrep's buffer, an error is now
813            given and the return code is set to 2.
814    
815    45. Add a pointer to the latest mark to the callout data block.
816    
817    46. The pattern /.(*F)/, when applied to "abc" with PCRE_PARTIAL_HARD, gave a
818        partial match of an empty string instead of no match. This was specific to
819        the use of ".".
820    
821    47. The pattern /f.*/8s, when applied to "for" with PCRE_PARTIAL_HARD, gave a
822        complete match instead of a partial match. This bug was dependent on both
823        the PCRE_UTF8 and PCRE_DOTALL options being set.
824    
825    48. For a pattern such as /\babc|\bdef/ pcre_study() was failing to set up the
826        starting byte set, because \b was not being ignored.
827    
828    
829    Version 8.12 15-Jan-2011
830    ------------------------
831    
832    1.  Fixed some typos in the markup of the man pages, and wrote a script that
833        checks for such things as part of the documentation building process.
834    
835    2.  On a big-endian 64-bit system, pcregrep did not correctly process the
836        --match-limit and --recursion-limit options (added for 8.11). In
837        particular, this made one of the standard tests fail. (The integer value
838        went into the wrong half of a long int.)
839    
840    3.  If the --colour option was given to pcregrep with -v (invert match), it
841        did strange things, either producing crazy output, or crashing. It should,
842        of course, ignore a request for colour when reporting lines that do not
843        match.
844    
845    4.  Another pcregrep bug caused similar problems if --colour was specified with
846        -M (multiline) and the pattern match finished with a line ending.
847    
848    5.  In pcregrep, when a pattern that ended with a literal newline sequence was
849        matched in multiline mode, the following line was shown as part of the
850        match. This seems wrong, so I have changed it.
851    
852    6.  Another pcregrep bug in multiline mode, when --colour was specified, caused
853        the check for further matches in the same line (so they could be coloured)
854        to overrun the end of the current line. If another match was found, it was
855        incorrectly shown (and then shown again when found in the next line).
856    
857    7.  If pcregrep was compiled under Windows, there was a reference to the
858        function pcregrep_exit() before it was defined. I am assuming this was
859        the cause of the "error C2371: 'pcregrep_exit' : redefinition;" that was
860        reported by a user. I've moved the definition above the reference.
861    
862    
863    Version 8.11 10-Dec-2010
864    ------------------------
865    
866    1.  (*THEN) was not working properly if there were untried alternatives prior
867        to it in the current branch. For example, in ((a|b)(*THEN)(*F)|c..) it
868        backtracked to try for "b" instead of moving to the next alternative branch
869        at the same level (in this case, to look for "c"). The Perl documentation
870        is clear that when (*THEN) is backtracked onto, it goes to the "next
871        alternative in the innermost enclosing group".
872    
873    2.  (*COMMIT) was not overriding (*THEN), as it does in Perl. In a pattern
874        such as   (A(*COMMIT)B(*THEN)C|D)  any failure after matching A should
875        result in overall failure. Similarly, (*COMMIT) now overrides (*PRUNE) and
876        (*SKIP), (*SKIP) overrides (*PRUNE) and (*THEN), and (*PRUNE) overrides
877        (*THEN).
878    
879    3.  If \s appeared in a character class, it removed the VT character from
880        the class, even if it had been included by some previous item, for example
881        in [\x00-\xff\s]. (This was a bug related to the fact that VT is not part
882        of \s, but is part of the POSIX "space" class.)
883    
884    4.  A partial match never returns an empty string (because you can always
885        match an empty string at the end of the subject); however the checking for
886        an empty string was starting at the "start of match" point. This has been
887        changed to the "earliest inspected character" point, because the returned
888        data for a partial match starts at this character. This means that, for
889        example, /(?<=abc)def/ gives a partial match for the subject "abc"
890        (previously it gave "no match").
891    
892    5.  Changes have been made to the way PCRE_PARTIAL_HARD affects the matching
893        of $, \z, \Z, \b, and \B. If the match point is at the end of the string,
894        previously a full match would be given. However, setting PCRE_PARTIAL_HARD
895        has an implication that the given string is incomplete (because a partial
896        match is preferred over a full match). For this reason, these items now
897        give a partial match in this situation. [Aside: previously, the one case
898        /t\b/ matched against "cat" with PCRE_PARTIAL_HARD set did return a partial
899        match rather than a full match, which was wrong by the old rules, but is
900        now correct.]
901    
902    6.  There was a bug in the handling of #-introduced comments, recognized when
903        PCRE_EXTENDED is set, when PCRE_NEWLINE_ANY and PCRE_UTF8 were also set.
904        If a UTF-8 multi-byte character included the byte 0x85 (e.g. +U0445, whose
905        UTF-8 encoding is 0xd1,0x85), this was misinterpreted as a newline when
906        scanning for the end of the comment. (*Character* 0x85 is an "any" newline,
907        but *byte* 0x85 is not, in UTF-8 mode). This bug was present in several
908        places in pcre_compile().
909    
910    7.  Related to (6) above, when pcre_compile() was skipping #-introduced
911        comments when looking ahead for named forward references to subpatterns,
912        the only newline sequence it recognized was NL. It now handles newlines
913        according to the set newline convention.
914    
915    8.  SunOS4 doesn't have strerror() or strtoul(); pcregrep dealt with the
916        former, but used strtoul(), whereas pcretest avoided strtoul() but did not
917        cater for a lack of strerror(). These oversights have been fixed.
918    
919    9.  Added --match-limit and --recursion-limit to pcregrep.
920    
921    10. Added two casts needed to build with Visual Studio when NO_RECURSE is set.
922    
923    11. When the -o option was used, pcregrep was setting a return code of 1, even
924        when matches were found, and --line-buffered was not being honoured.
925    
926    12. Added an optional parentheses number to the -o and --only-matching options
927        of pcregrep.
928    
929    13. Imitating Perl's /g action for multiple matches is tricky when the pattern
930        can match an empty string. The code to do it in pcretest and pcredemo
931        needed fixing:
932    
933        (a) When the newline convention was "crlf", pcretest got it wrong, skipping
934            only one byte after an empty string match just before CRLF (this case
935            just got forgotten; "any" and "anycrlf" were OK).
936    
937        (b) The pcretest code also had a bug, causing it to loop forever in UTF-8
938            mode when an empty string match preceded an ASCII character followed by
939            a non-ASCII character. (The code for advancing by one character rather
940            than one byte was nonsense.)
941    
942        (c) The pcredemo.c sample program did not have any code at all to handle
943            the cases when CRLF is a valid newline sequence.
944    
945    14. Neither pcre_exec() nor pcre_dfa_exec() was checking that the value given
946        as a starting offset was within the subject string. There is now a new
947        error, PCRE_ERROR_BADOFFSET, which is returned if the starting offset is
948        negative or greater than the length of the string. In order to test this,
949        pcretest is extended to allow the setting of negative starting offsets.
950    
951    15. In both pcre_exec() and pcre_dfa_exec() the code for checking that the
952        starting offset points to the beginning of a UTF-8 character was
953        unnecessarily clumsy. I tidied it up.
954    
955    16. Added PCRE_ERROR_SHORTUTF8 to make it possible to distinguish between a
956        bad UTF-8 sequence and one that is incomplete when using PCRE_PARTIAL_HARD.
957    
958    17. Nobody had reported that the --include_dir option, which was added in
959        release 7.7 should have been called --include-dir (hyphen, not underscore)
960        for compatibility with GNU grep. I have changed it to --include-dir, but
961        left --include_dir as an undocumented synonym, and the same for
962        --exclude-dir, though that is not available in GNU grep, at least as of
963        release 2.5.4.
964    
965    18. At a user's suggestion, the macros GETCHAR and friends (which pick up UTF-8
966        characters from a string of bytes) have been redefined so as not to use
967        loops, in order to improve performance in some environments. At the same
968        time, I abstracted some of the common code into auxiliary macros to save
969        repetition (this should not affect the compiled code).
970    
971    19. If \c was followed by a multibyte UTF-8 character, bad things happened. A
972        compile-time error is now given if \c is not followed by an ASCII
973        character, that is, a byte less than 128. (In EBCDIC mode, the code is
974        different, and any byte value is allowed.)
975    
976    20. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_
977        START_OPTIMIZE option, which is now allowed at compile time - but just
978        passed through to pcre_exec() or pcre_dfa_exec(). This makes it available
979        to pcregrep and other applications that have no direct access to PCRE
980        options. The new /Y option in pcretest sets this option when calling
981        pcre_compile().
982    
983    21. Change 18 of release 8.01 broke the use of named subpatterns for recursive
984        back references. Groups containing recursive back references were forced to
985        be atomic by that change, but in the case of named groups, the amount of
986        memory required was incorrectly computed, leading to "Failed: internal
987        error: code overflow". This has been fixed.
988    
989    22. Some patches to pcre_stringpiece.h, pcre_stringpiece_unittest.cc, and
990        pcretest.c, to avoid build problems in some Borland environments.
991    
992    
993    Version 8.10 25-Jun-2010
994    ------------------------
995    
996    1.  Added support for (*MARK:ARG) and for ARG additions to PRUNE, SKIP, and
997        THEN.
998    
999    2.  (*ACCEPT) was not working when inside an atomic group.
1000    
1001    3.  Inside a character class, \B is treated as a literal by default, but
1002        faulted if PCRE_EXTRA is set. This mimics Perl's behaviour (the -w option
1003        causes the error). The code is unchanged, but I tidied the documentation.
1004    
1005    4.  Inside a character class, PCRE always treated \R and \X as literals,
1006        whereas Perl faults them if its -w option is set. I have changed PCRE so
1007        that it faults them when PCRE_EXTRA is set.
1008    
1009    5.  Added support for \N, which always matches any character other than
1010        newline. (It is the same as "." when PCRE_DOTALL is not set.)
1011    
1012    6.  When compiling pcregrep with newer versions of gcc which may have
1013        FORTIFY_SOURCE set, several warnings "ignoring return value of 'fwrite',
1014        declared with attribute warn_unused_result" were given. Just casting the
1015        result to (void) does not stop the warnings; a more elaborate fudge is
1016        needed. I've used a macro to implement this.
1017    
1018    7.  Minor change to pcretest.c to avoid a compiler warning.
1019    
1020    8.  Added four artifical Unicode properties to help with an option to make
1021        \s etc use properties (see next item). The new properties are: Xan
1022        (alphanumeric), Xsp (Perl space), Xps (POSIX space), and Xwd (word).
1023    
1024    9.  Added PCRE_UCP to make \b, \d, \s, \w, and certain POSIX character classes
1025        use Unicode properties. (*UCP) at the start of a pattern can be used to set
1026        this option. Modified pcretest to add /W to test this facility. Added
1027        REG_UCP to make it available via the POSIX interface.
1028    
1029    10. Added --line-buffered to pcregrep.
1030    
1031    11. In UTF-8 mode, if a pattern that was compiled with PCRE_CASELESS was
1032        studied, and the match started with a letter with a code point greater than
1033        127 whose first byte was different to the first byte of the other case of
1034        the letter, the other case of this starting letter was not recognized
1035        (#976).
1036    
1037    12. If a pattern that was studied started with a repeated Unicode property
1038        test, for example, \p{Nd}+, there was the theoretical possibility of
1039        setting up an incorrect bitmap of starting bytes, but fortunately it could
1040        not have actually happened in practice until change 8 above was made (it
1041        added property types that matched character-matching opcodes).
1042    
1043    13. pcre_study() now recognizes \h, \v, and \R when constructing a bit map of
1044        possible starting bytes for non-anchored patterns.
1045    
1046    14. Extended the "auto-possessify" feature of pcre_compile(). It now recognizes
1047        \R, and also a number of cases that involve Unicode properties, both
1048        explicit and implicit when PCRE_UCP is set.
1049    
1050    15. If a repeated Unicode property match (e.g. \p{Lu}*) was used with non-UTF-8
1051        input, it could crash or give wrong results if characters with values
1052        greater than 0xc0 were present in the subject string. (Detail: it assumed
1053        UTF-8 input when processing these items.)
1054    
1055    16. Added a lot of (int) casts to avoid compiler warnings in systems where
1056        size_t is 64-bit (#991).
1057    
1058    17. Added a check for running out of memory when PCRE is compiled with
1059        --disable-stack-for-recursion (#990).
1060    
1061    18. If the last data line in a file for pcretest does not have a newline on
1062        the end, a newline was missing in the output.
1063    
1064    19. The default pcre_chartables.c file recognizes only ASCII characters (values
1065        less than 128) in its various bitmaps. However, there is a facility for
1066        generating tables according to the current locale when PCRE is compiled. It
1067        turns out that in some environments, 0x85 and 0xa0, which are Unicode space
1068        characters, are recognized by isspace() and therefore were getting set in
1069        these tables, and indeed these tables seem to approximate to ISO 8859. This
1070        caused a problem in UTF-8 mode when pcre_study() was used to create a list
1071        of bytes that can start a match. For \s, it was including 0x85 and 0xa0,
1072        which of course cannot start UTF-8 characters. I have changed the code so
1073        that only real ASCII characters (less than 128) and the correct starting
1074        bytes for UTF-8 encodings are set for characters greater than 127 when in
1075        UTF-8 mode. (When PCRE_UCP is set - see 9 above - the code is different
1076        altogether.)
1077    
1078    20. Added the /T option to pcretest so as to be able to run tests with non-
1079        standard character tables, thus making it possible to include the tests
1080        used for 19 above in the standard set of tests.
1081    
1082    21. A pattern such as (?&t)(?#()(?(DEFINE)(?<t>a)) which has a forward
1083        reference to a subpattern the other side of a comment that contains an
1084        opening parenthesis caused either an internal compiling error, or a
1085        reference to the wrong subpattern.
1086    
1087    
1088    Version 8.02 19-Mar-2010
1089    ------------------------
1090    
1091    1.  The Unicode data tables have been updated to Unicode 5.2.0.
1092    
1093    2.  Added the option --libs-cpp to pcre-config, but only when C++ support is
1094        configured.
1095    
1096    3.  Updated the licensing terms in the pcregexp.pas file, as agreed with the
1097        original author of that file, following a query about its status.
1098    
1099    4.  On systems that do not have stdint.h (e.g. Solaris), check for and include
1100        inttypes.h instead. This fixes a bug that was introduced by change 8.01/8.
1101    
1102    5.  A pattern such as (?&t)*+(?(DEFINE)(?<t>.)) which has a possessive
1103        quantifier applied to a forward-referencing subroutine call, could compile
1104        incorrect code or give the error "internal error: previously-checked
1105        referenced subpattern not found".
1106    
1107    6.  Both MS Visual Studio and Symbian OS have problems with initializing
1108        variables to point to external functions. For these systems, therefore,
1109        pcre_malloc etc. are now initialized to local functions that call the
1110        relevant global functions.
1111    
1112    7.  There were two entries missing in the vectors called coptable and poptable
1113        in pcre_dfa_exec.c. This could lead to memory accesses outsize the vectors.
1114        I've fixed the data, and added a kludgy way of testing at compile time that
1115        the lengths are correct (equal to the number of opcodes).
1116    
1117    8.  Following on from 7, I added a similar kludge to check the length of the
1118        eint vector in pcreposix.c.
1119    
1120    9.  Error texts for pcre_compile() are held as one long string to avoid too
1121        much relocation at load time. To find a text, the string is searched,
1122        counting zeros. There was no check for running off the end of the string,
1123        which could happen if a new error number was added without updating the
1124        string.
1125    
1126    10. \K gave a compile-time error if it appeared in a lookbehind assersion.
1127    
1128    11. \K was not working if it appeared in an atomic group or in a group that
1129        was called as a "subroutine", or in an assertion. Perl 5.11 documents that
1130        \K is "not well defined" if used in an assertion. PCRE now accepts it if
1131        the assertion is positive, but not if it is negative.
1132    
1133    12. Change 11 fortuitously reduced the size of the stack frame used in the
1134        "match()" function of pcre_exec.c by one pointer. Forthcoming
1135        implementation of support for (*MARK) will need an extra pointer on the
1136        stack; I have reserved it now, so that the stack frame size does not
1137        decrease.
1138    
1139    13. A pattern such as (?P<L1>(?P<L2>0)|(?P>L2)(?P>L1)) in which the only other
1140        item in branch that calls a recursion is a subroutine call - as in the
1141        second branch in the above example - was incorrectly given the compile-
1142        time error "recursive call could loop indefinitely" because pcre_compile()
1143        was not correctly checking the subroutine for matching a non-empty string.
1144    
1145    14. The checks for overrunning compiling workspace could trigger after an
1146        overrun had occurred. This is a "should never occur" error, but it can be
1147        triggered by pathological patterns such as hundreds of nested parentheses.
1148        The checks now trigger 100 bytes before the end of the workspace.
1149    
1150    15. Fix typo in configure.ac: "srtoq" should be "strtoq".
1151    
1152    
1153    Version 8.01 19-Jan-2010
1154    ------------------------
1155    
1156    1.  If a pattern contained a conditional subpattern with only one branch (in
1157        particular, this includes all (*DEFINE) patterns), a call to pcre_study()
1158        computed the wrong minimum data length (which is of course zero for such
1159        subpatterns). This could cause incorrect "no match" results.
1160    
1161    2.  For patterns such as (?i)a(?-i)b|c where an option setting at the start of
1162        the pattern is reset in the first branch, pcre_compile() failed with
1163        "internal error: code overflow at offset...". This happened only when
1164        the reset was to the original external option setting. (An optimization
1165        abstracts leading options settings into an external setting, which was the
1166        cause of this.)
1167    
1168    3.  A pattern such as ^(?!a(*SKIP)b) where a negative assertion contained one
1169        of the verbs SKIP, PRUNE, or COMMIT, did not work correctly. When the
1170        assertion pattern did not match (meaning that the assertion was true), it
1171        was incorrectly treated as false if the SKIP had been reached during the
1172        matching. This also applied to assertions used as conditions.
1173    
1174    4.  If an item that is not supported by pcre_dfa_exec() was encountered in an
1175        assertion subpattern, including such a pattern used as a condition,
1176        unpredictable results occurred, instead of the error return
1177        PCRE_ERROR_DFA_UITEM.
1178    
1179    5.  The C++ GlobalReplace function was not working like Perl for the special
1180        situation when an empty string is matched. It now does the fancy magic
1181        stuff that is necessary.
1182    
1183    6.  In pcre_internal.h, obsolete includes to setjmp.h and stdarg.h have been
1184        removed. (These were left over from very, very early versions of PCRE.)
1185    
1186    7.  Some cosmetic changes to the code to make life easier when compiling it
1187        as part of something else:
1188    
1189        (a) Change DEBUG to PCRE_DEBUG.
1190    
1191        (b) In pcre_compile(), rename the member of the "branch_chain" structure
1192            called "current" as "current_branch", to prevent a collision with the
1193            Linux macro when compiled as a kernel module.
1194    
1195        (c) In pcre_study(), rename the function set_bit() as set_table_bit(), to
1196            prevent a collision with the Linux macro when compiled as a kernel
1197            module.
1198    
1199    8.  In pcre_compile() there are some checks for integer overflows that used to
1200        cast potentially large values to (double). This has been changed to that
1201        when building, a check for int64_t is made, and if it is found, it is used
1202        instead, thus avoiding the use of floating point arithmetic. (There is no
1203        other use of FP in PCRE.) If int64_t is not found, the fallback is to
1204        double.
1205    
1206    9.  Added two casts to avoid signed/unsigned warnings from VS Studio Express
1207        2005 (difference between two addresses compared to an unsigned value).
1208    
1209    10. Change the standard AC_CHECK_LIB test for libbz2 in configure.ac to a
1210        custom one, because of the following reported problem in Windows:
1211    
1212          - libbz2 uses the Pascal calling convention (WINAPI) for the functions
1213              under Win32.
1214          - The standard autoconf AC_CHECK_LIB fails to include "bzlib.h",
1215              therefore missing the function definition.
1216          - The compiler thus generates a "C" signature for the test function.
1217          - The linker fails to find the "C" function.
1218          - PCRE fails to configure if asked to do so against libbz2.
1219    
1220    11. When running libtoolize from libtool-2.2.6b as part of autogen.sh, these
1221        messages were output:
1222    
1223          Consider adding `AC_CONFIG_MACRO_DIR([m4])' to configure.ac and
1224          rerunning libtoolize, to keep the correct libtool macros in-tree.
1225          Consider adding `-I m4' to ACLOCAL_AMFLAGS in Makefile.am.
1226    
1227        I have done both of these things.
1228    
1229    12. Although pcre_dfa_exec() does not use nearly as much stack as pcre_exec()
1230        most of the time, it *can* run out if it is given a pattern that contains a
1231        runaway infinite recursion. I updated the discussion in the pcrestack man
1232        page.
1233    
1234    13. Now that we have gone to the x.xx style of version numbers, the minor
1235        version may start with zero. Using 08 or 09 is a bad idea because users
1236        might check the value of PCRE_MINOR in their code, and 08 or 09 may be
1237        interpreted as invalid octal numbers. I've updated the previous comment in
1238        configure.ac, and also added a check that gives an error if 08 or 09 are
1239        used.
1240    
1241    14. Change 8.00/11 was not quite complete: code had been accidentally omitted,
1242        causing partial matching to fail when the end of the subject matched \W
1243        in a UTF-8 pattern where \W was quantified with a minimum of 3.
1244    
1245    15. There were some discrepancies between the declarations in pcre_internal.h
1246        of _pcre_is_newline(), _pcre_was_newline(), and _pcre_valid_utf8() and
1247        their definitions. The declarations used "const uschar *" and the
1248        definitions used USPTR. Even though USPTR is normally defined as "const
1249        unsigned char *" (and uschar is typedeffed as "unsigned char"), it was
1250        reported that: "This difference in casting confuses some C++ compilers, for
1251        example, SunCC recognizes above declarations as different functions and
1252        generates broken code for hbpcre." I have changed the declarations to use
1253        USPTR.
1254    
1255    16. GNU libtool is named differently on some systems. The autogen.sh script now
1256        tries several variants such as glibtoolize (MacOSX) and libtoolize1x
1257        (FreeBSD).
1258    
1259    17. Applied Craig's patch that fixes an HP aCC compile error in pcre 8.00
1260        (strtoXX undefined when compiling pcrecpp.cc). The patch contains this
1261        comment: "Figure out how to create a longlong from a string: strtoll and
1262        equivalent. It's not enough to call AC_CHECK_FUNCS: hpux has a strtoll, for
1263        instance, but it only takes 2 args instead of 3!"
1264    
1265    18. A subtle bug concerned with back references has been fixed by a change of
1266        specification, with a corresponding code fix. A pattern such as
1267        ^(xa|=?\1a)+$ which contains a back reference inside the group to which it
1268        refers, was giving matches when it shouldn't. For example, xa=xaaa would
1269        match that pattern. Interestingly, Perl (at least up to 5.11.3) has the
1270        same bug. Such groups have to be quantified to be useful, or contained
1271        inside another quantified group. (If there's no repetition, the reference
1272        can never match.) The problem arises because, having left the group and
1273        moved on to the rest of the pattern, a later failure that backtracks into
1274        the group uses the captured value from the final iteration of the group
1275        rather than the correct earlier one. I have fixed this in PCRE by forcing
1276        any group that contains a reference to itself to be an atomic group; that
1277        is, there cannot be any backtracking into it once it has completed. This is
1278        similar to recursive and subroutine calls.
1279    
1280    
1281    Version 8.00 19-Oct-09
1282    ----------------------
1283    
1284    1.  The table for translating pcre_compile() error codes into POSIX error codes
1285        was out-of-date, and there was no check on the pcre_compile() error code
1286        being within the table. This could lead to an OK return being given in
1287        error.
1288    
1289    2.  Changed the call to open a subject file in pcregrep from fopen(pathname,
1290        "r") to fopen(pathname, "rb"), which fixed a problem with some of the tests
1291        in a Windows environment.
1292    
1293    3.  The pcregrep --count option prints the count for each file even when it is
1294        zero, as does GNU grep. However, pcregrep was also printing all files when
1295        --files-with-matches was added. Now, when both options are given, it prints
1296        counts only for those files that have at least one match. (GNU grep just
1297        prints the file name in this circumstance, but including the count seems
1298        more useful - otherwise, why use --count?) Also ensured that the
1299        combination -clh just lists non-zero counts, with no names.
1300    
1301    4.  The long form of the pcregrep -F option was incorrectly implemented as
1302        --fixed_strings instead of --fixed-strings. This is an incompatible change,
1303        but it seems right to fix it, and I didn't think it was worth preserving
1304        the old behaviour.
1305    
1306    5.  The command line items --regex=pattern and --regexp=pattern were not
1307        recognized by pcregrep, which required --regex pattern or --regexp pattern
1308        (with a space rather than an '='). The man page documented the '=' forms,
1309        which are compatible with GNU grep; these now work.
1310    
1311    6.  No libpcreposix.pc file was created for pkg-config; there was just
1312        libpcre.pc and libpcrecpp.pc. The omission has been rectified.
1313    
1314    7.  Added #ifndef SUPPORT_UCP into the pcre_ucd.c module, to reduce its size
1315        when UCP support is not needed, by modifying the Python script that
1316        generates it from Unicode data files. This should not matter if the module
1317        is correctly used as a library, but I received one complaint about 50K of
1318        unwanted data. My guess is that the person linked everything into his
1319        program rather than using a library. Anyway, it does no harm.
1320    
1321    8.  A pattern such as /\x{123}{2,2}+/8 was incorrectly compiled; the trigger
1322        was a minimum greater than 1 for a wide character in a possessive
1323        repetition. The same bug could also affect patterns like /(\x{ff}{0,2})*/8
1324        which had an unlimited repeat of a nested, fixed maximum repeat of a wide
1325        character. Chaos in the form of incorrect output or a compiling loop could
1326        result.
1327    
1328    9.  The restrictions on what a pattern can contain when partial matching is
1329        requested for pcre_exec() have been removed. All patterns can now be
1330        partially matched by this function. In addition, if there are at least two
1331        slots in the offset vector, the offset of the earliest inspected character
1332        for the match and the offset of the end of the subject are set in them when
1333        PCRE_ERROR_PARTIAL is returned.
1334    
1335    10. Partial matching has been split into two forms: PCRE_PARTIAL_SOFT, which is
1336        synonymous with PCRE_PARTIAL, for backwards compatibility, and
1337        PCRE_PARTIAL_HARD, which causes a partial match to supersede a full match,
1338        and may be more useful for multi-segment matching.
1339    
1340    11. Partial matching with pcre_exec() is now more intuitive. A partial match
1341        used to be given if ever the end of the subject was reached; now it is
1342        given only if matching could not proceed because another character was
1343        needed. This makes a difference in some odd cases such as Z(*FAIL) with the
1344        string "Z", which now yields "no match" instead of "partial match". In the
1345        case of pcre_dfa_exec(), "no match" is given if every matching path for the
1346        final character ended with (*FAIL).
1347    
1348    12. Restarting a match using pcre_dfa_exec() after a partial match did not work
1349        if the pattern had a "must contain" character that was already found in the
1350        earlier partial match, unless partial matching was again requested. For
1351        example, with the pattern /dog.(body)?/, the "must contain" character is
1352        "g". If the first part-match was for the string "dog", restarting with
1353        "sbody" failed. This bug has been fixed.
1354    
1355    13. The string returned by pcre_dfa_exec() after a partial match has been
1356        changed so that it starts at the first inspected character rather than the
1357        first character of the match. This makes a difference only if the pattern
1358        starts with a lookbehind assertion or \b or \B (\K is not supported by
1359        pcre_dfa_exec()). It's an incompatible change, but it makes the two
1360        matching functions compatible, and I think it's the right thing to do.
1361    
1362    14. Added a pcredemo man page, created automatically from the pcredemo.c file,
1363        so that the demonstration program is easily available in environments where
1364        PCRE has not been installed from source.
1365    
1366    15. Arranged to add -DPCRE_STATIC to cflags in libpcre.pc, libpcreposix.cp,
1367        libpcrecpp.pc and pcre-config when PCRE is not compiled as a shared
1368        library.
1369    
1370    16. Added REG_UNGREEDY to the pcreposix interface, at the request of a user.
1371        It maps to PCRE_UNGREEDY. It is not, of course, POSIX-compatible, but it
1372        is not the first non-POSIX option to be added. Clearly some people find
1373        these options useful.
1374    
1375    17. If a caller to the POSIX matching function regexec() passes a non-zero
1376        value for nmatch with a NULL value for pmatch, the value of
1377        nmatch is forced to zero.
1378    
1379    18. RunGrepTest did not have a test for the availability of the -u option of
1380        the diff command, as RunTest does. It now checks in the same way as
1381        RunTest, and also checks for the -b option.
1382    
1383    19. If an odd number of negated classes containing just a single character
1384        interposed, within parentheses, between a forward reference to a named
1385        subpattern and the definition of the subpattern, compilation crashed with
1386        an internal error, complaining that it could not find the referenced
1387        subpattern. An example of a crashing pattern is /(?&A)(([^m])(?<A>))/.
1388        [The bug was that it was starting one character too far in when skipping
1389        over the character class, thus treating the ] as data rather than
1390        terminating the class. This meant it could skip too much.]
1391    
1392    20. Added PCRE_NOTEMPTY_ATSTART in order to be able to correctly implement the
1393        /g option in pcretest when the pattern contains \K, which makes it possible
1394        to have an empty string match not at the start, even when the pattern is
1395        anchored. Updated pcretest and pcredemo to use this option.
1396    
1397    21. If the maximum number of capturing subpatterns in a recursion was greater
1398        than the maximum at the outer level, the higher number was returned, but
1399        with unset values at the outer level. The correct (outer level) value is
1400        now given.
1401    
1402    22. If (*ACCEPT) appeared inside capturing parentheses, previous releases of
1403        PCRE did not set those parentheses (unlike Perl). I have now found a way to
1404        make it do so. The string so far is captured, making this feature
1405        compatible with Perl.
1406    
1407    23. The tests have been re-organized, adding tests 11 and 12, to make it
1408        possible to check the Perl 5.10 features against Perl 5.10.
1409    
1410    24. Perl 5.10 allows subroutine calls in lookbehinds, as long as the subroutine
1411        pattern matches a fixed length string. PCRE did not allow this; now it
1412        does. Neither allows recursion.
1413    
1414    25. I finally figured out how to implement a request to provide the minimum
1415        length of subject string that was needed in order to match a given pattern.
1416        (It was back references and recursion that I had previously got hung up
1417        on.) This code has now been added to pcre_study(); it finds a lower bound
1418        to the length of subject needed. It is not necessarily the greatest lower
1419        bound, but using it to avoid searching strings that are too short does give
1420        some useful speed-ups. The value is available to calling programs via
1421        pcre_fullinfo().
1422    
1423    26. While implementing 25, I discovered to my embarrassment that pcretest had
1424        not been passing the result of pcre_study() to pcre_dfa_exec(), so the
1425        study optimizations had never been tested with that matching function.
1426        Oops. What is worse, even when it was passed study data, there was a bug in
1427        pcre_dfa_exec() that meant it never actually used it. Double oops. There
1428        were also very few tests of studied patterns with pcre_dfa_exec().
1429    
1430    27. If (?| is used to create subpatterns with duplicate numbers, they are now
1431        allowed to have the same name, even if PCRE_DUPNAMES is not set. However,
1432        on the other side of the coin, they are no longer allowed to have different
1433        names, because these cannot be distinguished in PCRE, and this has caused
1434        confusion. (This is a difference from Perl.)
1435    
1436    28. When duplicate subpattern names are present (necessarily with different
1437        numbers, as required by 27 above), and a test is made by name in a
1438        conditional pattern, either for a subpattern having been matched, or for
1439        recursion in such a pattern, all the associated numbered subpatterns are
1440        tested, and the overall condition is true if the condition is true for any
1441        one of them. This is the way Perl works, and is also more like the way
1442        testing by number works.
1443    
1444    
1445    Version 7.9 11-Apr-09
1446    ---------------------
1447    
1448    1.  When building with support for bzlib/zlib (pcregrep) and/or readline
1449        (pcretest), all targets were linked against these libraries. This included
1450        libpcre, libpcreposix, and libpcrecpp, even though they do not use these
1451        libraries. This caused unwanted dependencies to be created. This problem
1452        has been fixed, and now only pcregrep is linked with bzlib/zlib and only
1453        pcretest is linked with readline.
1454    
1455    2.  The "typedef int BOOL" in pcre_internal.h that was included inside the
1456        "#ifndef FALSE" condition by an earlier change (probably 7.8/18) has been
1457        moved outside it again, because FALSE and TRUE are already defined in AIX,
1458        but BOOL is not.
1459    
1460    3.  The pcre_config() function was treating the PCRE_MATCH_LIMIT and
1461        PCRE_MATCH_LIMIT_RECURSION values as ints, when they should be long ints.
1462    
1463    4.  The pcregrep documentation said spaces were inserted as well as colons (or
1464        hyphens) following file names and line numbers when outputting matching
1465        lines. This is not true; no spaces are inserted. I have also clarified the
1466        wording for the --colour (or --color) option.
1467    
1468    5.  In pcregrep, when --colour was used with -o, the list of matching strings
1469        was not coloured; this is different to GNU grep, so I have changed it to be
1470        the same.
1471    
1472    6.  When --colo(u)r was used in pcregrep, only the first matching substring in
1473        each matching line was coloured. Now it goes on to look for further matches
1474        of any of the test patterns, which is the same behaviour as GNU grep.
1475    
1476    7.  A pattern that could match an empty string could cause pcregrep to loop; it
1477        doesn't make sense to accept an empty string match in pcregrep, so I have
1478        locked it out (using PCRE's PCRE_NOTEMPTY option). By experiment, this
1479        seems to be how GNU grep behaves.
1480    
1481    8.  The pattern (?(?=.*b)b|^) was incorrectly compiled as "match must be at
1482        start or after a newline", because the conditional assertion was not being
1483        correctly handled. The rule now is that both the assertion and what follows
1484        in the first alternative must satisfy the test.
1485    
1486    9.  If auto-callout was enabled in a pattern with a conditional group whose
1487        condition was an assertion, PCRE could crash during matching, both with
1488        pcre_exec() and pcre_dfa_exec().
1489    
1490    10. The PCRE_DOLLAR_ENDONLY option was not working when pcre_dfa_exec() was
1491        used for matching.
1492    
1493    11. Unicode property support in character classes was not working for
1494        characters (bytes) greater than 127 when not in UTF-8 mode.
1495    
1496    12. Added the -M command line option to pcretest.
1497    
1498    14. Added the non-standard REG_NOTEMPTY option to the POSIX interface.
1499    
1500    15. Added the PCRE_NO_START_OPTIMIZE match-time option.
1501    
1502    16. Added comments and documentation about mis-use of no_arg in the C++
1503        wrapper.
1504    
1505    17. Implemented support for UTF-8 encoding in EBCDIC environments, a patch
1506        from Martin Jerabek that uses macro names for all relevant character and
1507        string constants.
1508    
1509    18. Added to pcre_internal.h two configuration checks: (a) If both EBCDIC and
1510        SUPPORT_UTF8 are set, give an error; (b) If SUPPORT_UCP is set without
1511        SUPPORT_UTF8, define SUPPORT_UTF8. The "configure" script handles both of
1512        these, but not everybody uses configure.
1513    
1514    19. A conditional group that had only one branch was not being correctly
1515        recognized as an item that could match an empty string. This meant that an
1516        enclosing group might also not be so recognized, causing infinite looping
1517        (and probably a segfault) for patterns such as ^"((?(?=[a])[^"])|b)*"$
1518        with the subject "ab", where knowledge that the repeated group can match
1519        nothing is needed in order to break the loop.
1520    
1521    20. If a pattern that was compiled with callouts was matched using pcre_dfa_
1522        exec(), but without supplying a callout function, matching went wrong.
1523    
1524    21. If PCRE_ERROR_MATCHLIMIT occurred during a recursion, there was a memory
1525        leak if the size of the offset vector was greater than 30. When the vector
1526        is smaller, the saved offsets during recursion go onto a local stack
1527        vector, but for larger vectors malloc() is used. It was failing to free
1528        when the recursion yielded PCRE_ERROR_MATCH_LIMIT (or any other "abnormal"
1529        error, in fact).
1530    
1531    22. There was a missing #ifdef SUPPORT_UTF8 round one of the variables in the
1532        heapframe that is used only when UTF-8 support is enabled. This caused no
1533        problem, but was untidy.
1534    
1535    23. Steven Van Ingelgem's patch to CMakeLists.txt to change the name
1536        CMAKE_BINARY_DIR to PROJECT_BINARY_DIR so that it works when PCRE is
1537        included within another project.
1538    
1539    24. Steven Van Ingelgem's patches to add more options to the CMake support,
1540        slightly modified by me:
1541    
1542          (a) PCRE_BUILD_TESTS can be set OFF not to build the tests, including
1543              not building pcregrep.
1544    
1545          (b) PCRE_BUILD_PCREGREP can be see OFF not to build pcregrep, but only
1546              if PCRE_BUILD_TESTS is also set OFF, because the tests use pcregrep.
1547    
1548    25. Forward references, both numeric and by name, in patterns that made use of
1549        duplicate group numbers, could behave incorrectly or give incorrect errors,
1550        because when scanning forward to find the reference group, PCRE was not
1551        taking into account the duplicate group numbers. A pattern such as
1552        ^X(?3)(a)(?|(b)|(q))(Y) is an example.
1553    
1554    26. Changed a few more instances of "const unsigned char *" to USPTR, making
1555        the feature of a custom pointer more persuasive (as requested by a user).
1556    
1557    27. Wrapped the definitions of fileno and isatty for Windows, which appear in
1558        pcretest.c, inside #ifndefs, because it seems they are sometimes already
1559        pre-defined.
1560    
1561    28. Added support for (*UTF8) at the start of a pattern.
1562    
1563    29. Arrange for flags added by the "release type" setting in CMake to be shown
1564        in the configuration summary.
1565    
1566    
1567    Version 7.8 05-Sep-08
1568    ---------------------
1569    
1570    1.  Replaced UCP searching code with optimized version as implemented for Ad
1571        Muncher (http://www.admuncher.com/) by Peter Kankowski. This uses a two-
1572        stage table and inline lookup instead of a function, giving speed ups of 2
1573        to 5 times on some simple patterns that I tested. Permission was given to
1574        distribute the MultiStage2.py script that generates the tables (it's not in
1575        the tarball, but is in the Subversion repository).
1576    
1577    2.  Updated the Unicode datatables to Unicode 5.1.0. This adds yet more
1578        scripts.
1579    
1580    3.  Change 12 for 7.7 introduced a bug in pcre_study() when a pattern contained
1581        a group with a zero qualifier. The result of the study could be incorrect,
1582        or the function might crash, depending on the pattern.
1583    
1584    4.  Caseless matching was not working for non-ASCII characters in back
1585        references. For example, /(\x{de})\1/8i was not matching \x{de}\x{fe}.
1586        It now works when Unicode Property Support is available.
1587    
1588    5.  In pcretest, an escape such as \x{de} in the data was always generating
1589        a UTF-8 string, even in non-UTF-8 mode. Now it generates a single byte in
1590        non-UTF-8 mode. If the value is greater than 255, it gives a warning about
1591        truncation.
1592    
1593    6.  Minor bugfix in pcrecpp.cc (change "" == ... to NULL == ...).
1594    
1595    7.  Added two (int) casts to pcregrep when printing the difference of two
1596        pointers, in case they are 64-bit values.
1597    
1598    8.  Added comments about Mac OS X stack usage to the pcrestack man page and to
1599        test 2 if it fails.
1600    
1601    9.  Added PCRE_CALL_CONVENTION just before the names of all exported functions,
1602        and a #define of that name to empty if it is not externally set. This is to
1603        allow users of MSVC to set it if necessary.
1604    
1605    10. The PCRE_EXP_DEFN macro which precedes exported functions was missing from
1606        the convenience functions in the pcre_get.c source file.
1607    
1608    11. An option change at the start of a pattern that had top-level alternatives
1609        could cause overwriting and/or a crash. This command provoked a crash in
1610        some environments:
1611    
1612          printf "/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8\n" | pcretest
1613    
1614        This potential security problem was recorded as CVE-2008-2371.
1615    
1616    12. For a pattern where the match had to start at the beginning or immediately
1617        after a newline (e.g /.*anything/ without the DOTALL flag), pcre_exec() and
1618        pcre_dfa_exec() could read past the end of the passed subject if there was
1619        no match. To help with detecting such bugs (e.g. with valgrind), I modified
1620        pcretest so that it places the subject at the end of its malloc-ed buffer.
1621    
1622    13. The change to pcretest in 12 above threw up a couple more cases when pcre_
1623        exec() might read past the end of the data buffer in UTF-8 mode.
1624    
1625    14. A similar bug to 7.3/2 existed when the PCRE_FIRSTLINE option was set and
1626        the data contained the byte 0x85 as part of a UTF-8 character within its
1627        first line. This applied both to normal and DFA matching.
1628    
1629    15. Lazy qualifiers were not working in some cases in UTF-8 mode. For example,
1630        /^[^d]*?$/8 failed to match "abc".
1631    
1632    16. Added a missing copyright notice to pcrecpp_internal.h.
1633    
1634    17. Make it more clear in the documentation that values returned from
1635        pcre_exec() in ovector are byte offsets, not character counts.
1636    
1637    18. Tidied a few places to stop certain compilers from issuing warnings.
1638    
1639    19. Updated the Virtual Pascal + BCC files to compile the latest v7.7, as
1640        supplied by Stefan Weber. I made a further small update for 7.8 because
1641        there is a change of source arrangements: the pcre_searchfuncs.c module is
1642        replaced by pcre_ucd.c.
1643    
1644    
1645    Version 7.7 07-May-08
1646    ---------------------
1647    
1648    1.  Applied Craig's patch to sort out a long long problem: "If we can't convert
1649        a string to a long long, pretend we don't even have a long long." This is
1650        done by checking for the strtoq, strtoll, and _strtoi64 functions.
1651    
1652    2.  Applied Craig's patch to pcrecpp.cc to restore ABI compatibility with
1653        pre-7.6 versions, which defined a global no_arg variable instead of putting
1654        it in the RE class. (See also #8 below.)
1655    
1656    3.  Remove a line of dead code, identified by coverity and reported by Nuno
1657        Lopes.
1658    
1659    4.  Fixed two related pcregrep bugs involving -r with --include or --exclude:
1660    
1661        (1) The include/exclude patterns were being applied to the whole pathnames
1662            of files, instead of just to the final components.
1663    
1664        (2) If there was more than one level of directory, the subdirectories were
1665            skipped unless they satisfied the include/exclude conditions. This is
1666            inconsistent with GNU grep (and could even be seen as contrary to the
1667            pcregrep specification - which I improved to make it absolutely clear).
1668            The action now is always to scan all levels of directory, and just
1669            apply the include/exclude patterns to regular files.
1670    
1671    5.  Added the --include_dir and --exclude_dir patterns to pcregrep, and used
1672        --exclude_dir in the tests to avoid scanning .svn directories.
1673    
1674    6.  Applied Craig's patch to the QuoteMeta function so that it escapes the
1675        NUL character as backslash + 0 rather than backslash + NUL, because PCRE
1676        doesn't support NULs in patterns.
1677    
1678    7.  Added some missing "const"s to declarations of static tables in
1679        pcre_compile.c and pcre_dfa_exec.c.
1680    
1681    8.  Applied Craig's patch to pcrecpp.cc to fix a problem in OS X that was
1682        caused by fix #2  above. (Subsequently also a second patch to fix the
1683        first patch. And a third patch - this was a messy problem.)
1684    
1685    9.  Applied Craig's patch to remove the use of push_back().
1686    
1687    10. Applied Alan Lehotsky's patch to add REG_STARTEND support to the POSIX
1688        matching function regexec().
1689    
1690    11. Added support for the Oniguruma syntax \g<name>, \g<n>, \g'name', \g'n',
1691        which, however, unlike Perl's \g{...}, are subroutine calls, not back
1692        references. PCRE supports relative numbers with this syntax (I don't think
1693        Oniguruma does).
1694    
1695    12. Previously, a group with a zero repeat such as (...){0} was completely
1696        omitted from the compiled regex. However, this means that if the group
1697        was called as a subroutine from elsewhere in the pattern, things went wrong
1698        (an internal error was given). Such groups are now left in the compiled
1699        pattern, with a new opcode that causes them to be skipped at execution
1700        time.
1701    
1702    13. Added the PCRE_JAVASCRIPT_COMPAT option. This makes the following changes
1703        to the way PCRE behaves:
1704    
1705        (a) A lone ] character is dis-allowed (Perl treats it as data).
1706    
1707        (b) A back reference to an unmatched subpattern matches an empty string
1708            (Perl fails the current match path).
1709    
1710        (c) A data ] in a character class must be notated as \] because if the
1711            first data character in a class is ], it defines an empty class. (In
1712            Perl it is not possible to have an empty class.) The empty class []
1713            never matches; it forces failure and is equivalent to (*FAIL) or (?!).
1714            The negative empty class [^] matches any one character, independently
1715            of the DOTALL setting.
1716    
1717    14. A pattern such as /(?2)[]a()b](abc)/ which had a forward reference to a
1718        non-existent subpattern following a character class starting with ']' and
1719        containing () gave an internal compiling error instead of "reference to
1720        non-existent subpattern". Fortunately, when the pattern did exist, the
1721        compiled code was correct. (When scanning forwards to check for the
1722        existencd of the subpattern, it was treating the data ']' as terminating
1723        the class, so got the count wrong. When actually compiling, the reference
1724        was subsequently set up correctly.)
1725    
1726    15. The "always fail" assertion (?!) is optimzed to (*FAIL) by pcre_compile;
1727        it was being rejected as not supported by pcre_dfa_exec(), even though
1728        other assertions are supported. I have made pcre_dfa_exec() support
1729        (*FAIL).
1730    
1731    16. The implementation of 13c above involved the invention of a new opcode,
1732        OP_ALLANY, which is like OP_ANY but doesn't check the /s flag. Since /s
1733        cannot be changed at match time, I realized I could make a small
1734        improvement to matching performance by compiling OP_ALLANY instead of
1735        OP_ANY for "." when DOTALL was set, and then removing the runtime tests
1736        on the OP_ANY path.
1737    
1738    17. Compiling pcretest on Windows with readline support failed without the
1739        following two fixes: (1) Make the unistd.h include conditional on
1740        HAVE_UNISTD_H; (2) #define isatty and fileno as _isatty and _fileno.
1741    
1742    18. Changed CMakeLists.txt and cmake/FindReadline.cmake to arrange for the
1743        ncurses library to be included for pcretest when ReadLine support is
1744        requested, but also to allow for it to be overridden. This patch came from
1745        Daniel Bergström.
1746    
1747    19. There was a typo in the file ucpinternal.h where f0_rangeflag was defined
1748        as 0x00f00000 instead of 0x00800000. Luckily, this would not have caused
1749        any errors with the current Unicode tables. Thanks to Peter Kankowski for
1750        spotting this.
1751    
1752    
1753    Version 7.6 28-Jan-08
1754    ---------------------
1755    
1756    1.  A character class containing a very large number of characters with
1757        codepoints greater than 255 (in UTF-8 mode, of course) caused a buffer
1758        overflow.
1759    
1760    2.  Patch to cut out the "long long" test in pcrecpp_unittest when
1761        HAVE_LONG_LONG is not defined.
1762    
1763    3.  Applied Christian Ehrlicher's patch to update the CMake build files to
1764        bring them up to date and include new features. This patch includes:
1765    
1766        - Fixed PH's badly added libz and libbz2 support.
1767        - Fixed a problem with static linking.
1768        - Added pcredemo. [But later removed - see 7 below.]
1769        - Fixed dftables problem and added an option.
1770        - Added a number of HAVE_XXX tests, including HAVE_WINDOWS_H and
1771            HAVE_LONG_LONG.
1772        - Added readline support for pcretest.
1773        - Added an listing of the option settings after cmake has run.
1774    
1775    4.  A user submitted a patch to Makefile that makes it easy to create
1776        "pcre.dll" under mingw when using Configure/Make. I added stuff to
1777        Makefile.am that cause it to include this special target, without
1778        affecting anything else. Note that the same mingw target plus all
1779        the other distribution libraries and programs are now supported
1780        when configuring with CMake (see 6 below) instead of with
1781        Configure/Make.
1782    
1783    5.  Applied Craig's patch that moves no_arg into the RE class in the C++ code.
1784        This is an attempt to solve the reported problem "pcrecpp::no_arg is not
1785        exported in the Windows port". It has not yet been confirmed that the patch
1786        solves the problem, but it does no harm.
1787    
1788    6.  Applied Sheri's patch to CMakeLists.txt to add NON_STANDARD_LIB_PREFIX and
1789        NON_STANDARD_LIB_SUFFIX for dll names built with mingw when configured
1790        with CMake, and also correct the comment about stack recursion.
1791    
1792    7.  Remove the automatic building of pcredemo from the ./configure system and
1793        from CMakeLists.txt. The whole idea of pcredemo.c is that it is an example
1794        of a program that users should build themselves after PCRE is installed, so
1795        building it automatically is not really right. What is more, it gave
1796        trouble in some build environments.
1797    
1798    8.  Further tidies to CMakeLists.txt from Sheri and Christian.
1799    
1800    
1801    Version 7.5 10-Jan-08
1802  ---------------------  ---------------------
1803    
1804  1.  Applied a patch from Craig: "This patch makes it possible to 'ignore'  1.  Applied a patch from Craig: "This patch makes it possible to 'ignore'
1805      values in parens when parsing an RE using the C++ wrapper."      values in parens when parsing an RE using the C++ wrapper."
1806    
1807  2.  Negative specials like \S did not work in character classes in UTF-8 mode.  2.  Negative specials like \S did not work in character classes in UTF-8 mode.
1808      Characters greater than 255 were excluded from the class instead of being      Characters greater than 255 were excluded from the class instead of being
1809      included.      included.
1810    
1811  3.  The same bug as (2) above applied to negated POSIX classes such as  3.  The same bug as (2) above applied to negated POSIX classes such as
1812      [:^space:].      [:^space:].
1813    
1814  4.  PCRECPP_STATIC was referenced in pcrecpp_internal.h, but nowhere was it  4.  PCRECPP_STATIC was referenced in pcrecpp_internal.h, but nowhere was it
1815      defined or documented. It seems to have been a typo for PCRE_STATIC, so      defined or documented. It seems to have been a typo for PCRE_STATIC, so
1816      I have changed it.      I have changed it.
1817    
1818  5.  The construct (?&) was not diagnosed as a syntax error (it referenced the  5.  The construct (?&) was not diagnosed as a syntax error (it referenced the
1819      first named subpattern) and a construct such as (?&a) would reference the      first named subpattern) and a construct such as (?&a) would reference the
1820      first named subpattern whose name started with "a" (in other words, the      first named subpattern whose name started with "a" (in other words, the
1821      length check was missing). Both these problems are fixed. "Subpattern name      length check was missing). Both these problems are fixed. "Subpattern name
1822      expected" is now given for (?&) (a zero-length name), and this patch also      expected" is now given for (?&) (a zero-length name), and this patch also
1823      makes it give the same error for \k'' (previously it complained that that      makes it give the same error for \k'' (previously it complained that that
1824      was a reference to a non-existent subpattern).      was a reference to a non-existent subpattern).
1825    
1826  6.  The erroneous patterns (?+-a) and (?-+a) give different error messages;  6.  The erroneous patterns (?+-a) and (?-+a) give different error messages;
1827      this is right because (?- can be followed by option settings as well as by      this is right because (?- can be followed by option settings as well as by
1828      digits. I have, however, made the messages clearer.      digits. I have, however, made the messages clearer.
1829    
1830  7.  Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns  7.  Patterns such as (?(1)a|b) (a pattern that contains fewer subpatterns
1831      than the number used in the conditional) now cause a compile-time error.      than the number used in the conditional) now cause a compile-time error.
1832      This is actually not compatible with Perl, which accepts such patterns, but      This is actually not compatible with Perl, which accepts such patterns, but
1833      treats the conditional as always being FALSE (as PCRE used to), but it      treats the conditional as always being FALSE (as PCRE used to), but it
1834      seems to me that giving a diagnostic is better.      seems to me that giving a diagnostic is better.
1835    
1836  8.  Change "alphameric" to the more common word "alphanumeric" in comments  8.  Change "alphameric" to the more common word "alphanumeric" in comments
1837      and messages.      and messages.
1838    
1839  9.  Fix two occurrences of "backslash" in comments that should have been  9.  Fix two occurrences of "backslash" in comments that should have been
1840      "backspace".      "backspace".
1841    
1842    10. Remove two redundant lines of code that can never be obeyed (their function
1843        was moved elsewhere).
1844    
1845    11. The program that makes PCRE's Unicode character property table had a bug
1846        which caused it to generate incorrect table entries for sequences of
1847        characters that have the same character type, but are in different scripts.
1848        It amalgamated them into a single range, with the script of the first of
1849        them. In other words, some characters were in the wrong script. There were
1850        thirteen such cases, affecting characters in the following ranges:
1851    
1852          U+002b0 - U+002c1
1853          U+0060c - U+0060d
1854          U+0061e - U+00612
1855          U+0064b - U+0065e
1856          U+0074d - U+0076d
1857          U+01800 - U+01805
1858          U+01d00 - U+01d77
1859          U+01d9b - U+01dbf
1860          U+0200b - U+0200f
1861          U+030fc - U+030fe
1862          U+03260 - U+0327f
1863          U+0fb46 - U+0fbb1
1864          U+10450 - U+1049d
1865    
1866    12. The -o option (show only the matching part of a line) for pcregrep was not
1867        compatible with GNU grep in that, if there was more than one match in a
1868        line, it showed only the first of them. It now behaves in the same way as
1869        GNU grep.
1870    
1871    13. If the -o and -v options were combined for pcregrep, it printed a blank
1872        line for every non-matching line. GNU grep prints nothing, and pcregrep now
1873        does the same. The return code can be used to tell if there were any
1874        non-matching lines.
1875    
1876    14. Added --file-offsets and --line-offsets to pcregrep.
1877    
1878    15. The pattern (?=something)(?R) was not being diagnosed as a potentially
1879        infinitely looping recursion. The bug was that positive lookaheads were not
1880        being skipped when checking for a possible empty match (negative lookaheads
1881        and both kinds of lookbehind were skipped).
1882    
1883    16. Fixed two typos in the Windows-only code in pcregrep.c, and moved the
1884        inclusion of <windows.h> to before rather than after the definition of
1885        INVALID_FILE_ATTRIBUTES (patch from David Byron).
1886    
1887    17. Specifying a possessive quantifier with a specific limit for a Unicode
1888        character property caused pcre_compile() to compile bad code, which led at
1889        runtime to PCRE_ERROR_INTERNAL (-14). Examples of patterns that caused this
1890        are: /\p{Zl}{2,3}+/8 and /\p{Cc}{2}+/8. It was the possessive "+" that
1891        caused the error; without that there was no problem.
1892    
1893    18. Added --enable-pcregrep-libz and --enable-pcregrep-libbz2.
1894    
1895    19. Added --enable-pcretest-libreadline.
1896    
1897    20. In pcrecpp.cc, the variable 'count' was incremented twice in
1898        RE::GlobalReplace(). As a result, the number of replacements returned was
1899        double what it should be. I removed one of the increments, but Craig sent a
1900        later patch that removed the other one (the right fix) and added unit tests
1901        that check the return values (which was not done before).
1902    
1903    21. Several CMake things:
1904    
1905        (1) Arranged that, when cmake is used on Unix, the libraries end up with
1906            the names libpcre and libpcreposix, not just pcre and pcreposix.
1907    
1908        (2) The above change means that pcretest and pcregrep are now correctly
1909            linked with the newly-built libraries, not previously installed ones.
1910    
1911        (3) Added PCRE_SUPPORT_LIBREADLINE, PCRE_SUPPORT_LIBZ, PCRE_SUPPORT_LIBBZ2.
1912    
1913    22. In UTF-8 mode, with newline set to "any", a pattern such as .*a.*=.b.*
1914        crashed when matching a string such as a\x{2029}b (note that \x{2029} is a
1915        UTF-8 newline character). The key issue is that the pattern starts .*;
1916        this means that the match must be either at the beginning, or after a
1917        newline. The bug was in the code for advancing after a failed match and
1918        checking that the new position followed a newline. It was not taking
1919        account of UTF-8 characters correctly.
1920    
1921    23. PCRE was behaving differently from Perl in the way it recognized POSIX
1922        character classes. PCRE was not treating the sequence [:...:] as a
1923        character class unless the ... were all letters. Perl, however, seems to
1924        allow any characters between [: and :], though of course it rejects as
1925        unknown any "names" that contain non-letters, because all the known class
1926        names consist only of letters. Thus, Perl gives an error for [[:1234:]],
1927        for example, whereas PCRE did not - it did not recognize a POSIX character
1928        class. This seemed a bit dangerous, so the code has been changed to be
1929        closer to Perl. The behaviour is not identical to Perl, because PCRE will
1930        diagnose an unknown class for, for example, [[:l\ower:]] where Perl will
1931        treat it as [[:lower:]]. However, PCRE does now give "unknown" errors where
1932        Perl does, and where it didn't before.
1933    
1934    24. Rewrite so as to remove the single use of %n from pcregrep because in some
1935        Windows environments %n is disabled by default.
1936    
1937    
1938  Version 7.4 21-Sep-07  Version 7.4 21-Sep-07

Legend:
Removed from v.275  
changed lines
  Added in v.1186

  ViewVC Help
Powered by ViewVC 1.1.5