1 |
News about PCRE releases |
News about PCRE releases |
2 |
------------------------ |
------------------------ |
3 |
|
|
4 |
|
Release 8.33 28-April-2013 |
5 |
|
-------------------------- |
6 |
|
|
7 |
|
A number of bugs are fixed, and some performance improvements have been made. |
8 |
|
There are also some new features, of which these are the most important: |
9 |
|
|
10 |
|
. The behaviour of the backtracking verbs has been rationalized and |
11 |
|
documented in more detail. |
12 |
|
|
13 |
|
. JIT now supports callouts and all of the backtracking verbs. |
14 |
|
|
15 |
|
. Unicode validation has been updated in the light of Unicode Corrigendum #9, |
16 |
|
which points out that "non characters" are not "characters that may not |
17 |
|
appear in Unicode strings" but rather "characters that are reserved for |
18 |
|
internal use and have only local meaning". |
19 |
|
|
20 |
|
. (*LIMIT_MATCH=d) and (*LIMIT_RECURSION=d) have been added so that the |
21 |
|
creator of a pattern can specify lower (but not higher) limits for the |
22 |
|
matching process. |
23 |
|
|
24 |
|
. The PCRE_NEVER_UTF option is available to prevent pattern-writers from using |
25 |
|
the (*UTF) feature, as this could be a security issue. |
26 |
|
|
27 |
|
|
28 |
|
Release 8.32 30-November-2012 |
29 |
|
----------------------------- |
30 |
|
|
31 |
|
This release fixes a number of bugs, but also has some new features. These are |
32 |
|
the highlights: |
33 |
|
|
34 |
|
. There is now support for 32-bit character strings and UTF-32. Like the |
35 |
|
16-bit support, this is done by compiling a separate 32-bit library. |
36 |
|
|
37 |
|
. \X now matches a Unicode extended grapheme cluster. |
38 |
|
|
39 |
|
. Case-independent matching of Unicode characters that have more than one |
40 |
|
"other case" now makes all three (or more) characters equivalent. This |
41 |
|
applies, for example, to Greek Sigma, which has two lowercase versions. |
42 |
|
|
43 |
|
. Unicode character properties are updated to Unicode 6.2.0. |
44 |
|
|
45 |
|
. The EBCDIC support, which had decayed, has had a spring clean. |
46 |
|
|
47 |
|
. A number of JIT optimizations have been added, which give faster JIT |
48 |
|
execution speed. In addition, a new direct interface to JIT execution is |
49 |
|
available. This bypasses some of the sanity checks of pcre_exec() to give a |
50 |
|
noticeable speed-up. |
51 |
|
|
52 |
|
. A number of issues in pcregrep have been fixed, making it more compatible |
53 |
|
with GNU grep. In particular, --exclude and --include (and variants) apply |
54 |
|
to all files now, not just those obtained from scanning a directory |
55 |
|
recursively. In Windows environments, the default action for directories is |
56 |
|
now "skip" instead of "read" (which provokes an error). |
57 |
|
|
58 |
|
. If the --only-matching (-o) option in pcregrep is specified multiple |
59 |
|
times, each one causes appropriate output. For example, -o1 -o2 outputs the |
60 |
|
substrings matched by the 1st and 2nd capturing parentheses. A separating |
61 |
|
string can be specified by --om-separator (default empty). |
62 |
|
|
63 |
|
. When PCRE is built via Autotools using a version of gcc that has the |
64 |
|
"visibility" feature, it is used to hide internal library functions that are |
65 |
|
not part of the public API. |
66 |
|
|
67 |
|
|
68 |
|
Release 8.31 06-July-2012 |
69 |
|
------------------------- |
70 |
|
|
71 |
|
This is mainly a bug-fixing release, with a small number of developments: |
72 |
|
|
73 |
|
. The JIT compiler now supports partial matching and the (*MARK) and |
74 |
|
(*COMMIT) verbs. |
75 |
|
|
76 |
|
. PCRE_INFO_MAXLOOKBEHIND can be used to find the longest lookbehind in a |
77 |
|
pattern. |
78 |
|
|
79 |
|
. There should be a performance improvement when using the heap instead of the |
80 |
|
stack for recursion. |
81 |
|
|
82 |
|
. pcregrep can now be linked with libedit as an alternative to libreadline. |
83 |
|
|
84 |
|
. pcregrep now has a --file-list option where the list of files to scan is |
85 |
|
given as a file. |
86 |
|
|
87 |
|
. pcregrep now recognizes binary files and there are related options. |
88 |
|
|
89 |
|
. The Unicode tables have been updated to 6.1.0. |
90 |
|
|
91 |
|
As always, the full list of changes is in the ChangeLog file. |
92 |
|
|
93 |
|
|
94 |
|
Release 8.30 04-February-2012 |
95 |
|
----------------------------- |
96 |
|
|
97 |
|
Release 8.30 introduces a major new feature: support for 16-bit character |
98 |
|
strings, compiled as a separate library. There are a few changes to the |
99 |
|
8-bit library, in addition to some bug fixes. |
100 |
|
|
101 |
|
. The pcre_info() function, which has been obsolete for over 10 years, has |
102 |
|
been removed. |
103 |
|
|
104 |
|
. When a compiled pattern was saved to a file and later reloaded on a host |
105 |
|
with different endianness, PCRE used automatically to swap the bytes in some |
106 |
|
of the data fields. With the advent of the 16-bit library, where more of this |
107 |
|
swapping is needed, it is no longer done automatically. Instead, the bad |
108 |
|
endianness is detected and a specific error is given. The user can then call |
109 |
|
a new function called pcre_pattern_to_host_byte_order() (or an equivalent |
110 |
|
16-bit function) to do the swap. |
111 |
|
|
112 |
|
. In UTF-8 mode, the values 0xd800 to 0xdfff are not legal Unicode |
113 |
|
code points and are now faulted. (They are the so-called "surrogates" |
114 |
|
that are reserved for coding high values in UTF-16.) |
115 |
|
|
116 |
|
|
117 |
|
Release 8.21 12-Dec-2011 |
118 |
|
------------------------ |
119 |
|
|
120 |
|
This is almost entirely a bug-fix release. The only new feature is the ability |
121 |
|
to obtain the size of the memory used by the JIT compiler. |
122 |
|
|
123 |
|
|
124 |
|
Release 8.20 21-Oct-2011 |
125 |
|
------------------------ |
126 |
|
|
127 |
|
The main change in this release is the inclusion of Zoltan Herczeg's |
128 |
|
just-in-time compiler support, which can be accessed by building PCRE with |
129 |
|
--enable-jit. Large performance benefits can be had in many situations. 8.20 |
130 |
|
also fixes an unfortunate bug that was introduced in 8.13 as well as tidying up |
131 |
|
a number of infelicities and differences from Perl. |
132 |
|
|
133 |
|
|
134 |
|
Release 8.13 16-Aug-2011 |
135 |
|
------------------------ |
136 |
|
|
137 |
|
This is mainly a bug-fix release. There has been a lot of internal refactoring. |
138 |
|
The Unicode tables have been updated. The only new feature in the library is |
139 |
|
the passing of *MARK information to callouts. Some additions have been made to |
140 |
|
pcretest to make testing easier and more comprehensive. There is a new option |
141 |
|
for pcregrep to adjust its internal buffer size. |
142 |
|
|
143 |
|
|
144 |
|
Release 8.12 15-Jan-2011 |
145 |
|
------------------------ |
146 |
|
|
147 |
|
This release fixes some bugs in pcregrep, one of which caused the tests to fail |
148 |
|
on 64-bit big-endian systems. There are no changes to the code of the library. |
149 |
|
|
150 |
|
|
151 |
|
Release 8.11 10-Dec-2010 |
152 |
|
------------------------ |
153 |
|
|
154 |
|
A number of bugs in the library and in pcregrep have been fixed. As always, see |
155 |
|
ChangeLog for details. The following are the non-bug-fix changes: |
156 |
|
|
157 |
|
. Added --match-limit and --recursion-limit to pcregrep. |
158 |
|
|
159 |
|
. Added an optional parentheses number to the -o and --only-matching options |
160 |
|
of pcregrep. |
161 |
|
|
162 |
|
. Changed the way PCRE_PARTIAL_HARD affects the matching of $, \z, \Z, \b, and |
163 |
|
\B. |
164 |
|
|
165 |
|
. Added PCRE_ERROR_SHORTUTF8 to make it possible to distinguish between a |
166 |
|
bad UTF-8 sequence and one that is incomplete when using PCRE_PARTIAL_HARD. |
167 |
|
|
168 |
|
. Recognize (*NO_START_OPT) at the start of a pattern to set the PCRE_NO_ |
169 |
|
START_OPTIMIZE option, which is now allowed at compile time |
170 |
|
|
171 |
|
|
172 |
|
Release 8.10 25-Jun-2010 |
173 |
|
------------------------ |
174 |
|
|
175 |
|
There are two major additions: support for (*MARK) and friends, and the option |
176 |
|
PCRE_UCP, which changes the behaviour of \b, \d, \s, and \w (and their |
177 |
|
opposites) so that they make use of Unicode properties. There are also a number |
178 |
|
of lesser new features, and several bugs have been fixed. A new option, |
179 |
|
--line-buffered, has been added to pcregrep, for use when it is connected to |
180 |
|
pipes. |
181 |
|
|
182 |
|
|
183 |
|
Release 8.02 19-Mar-2010 |
184 |
|
------------------------ |
185 |
|
|
186 |
|
Another bug-fix release. |
187 |
|
|
188 |
|
|
189 |
|
Release 8.01 19-Jan-2010 |
190 |
|
------------------------ |
191 |
|
|
192 |
|
This is a bug-fix release. Several bugs in the code itself and some bugs and |
193 |
|
infelicities in the build system have been fixed. |
194 |
|
|
195 |
|
|
196 |
|
Release 8.00 19-Oct-09 |
197 |
|
---------------------- |
198 |
|
|
199 |
|
Bugs have been fixed in the library and in pcregrep. There are also some |
200 |
|
enhancements. Restrictions on patterns used for partial matching have been |
201 |
|
removed, extra information is given for partial matches, the partial matching |
202 |
|
process has been improved, and an option to make a partial match override a |
203 |
|
full match is available. The "study" process has been enhanced by finding a |
204 |
|
lower bound matching length. Groups with duplicate numbers may now have |
205 |
|
duplicated names without the use of PCRE_DUPNAMES. However, they may not have |
206 |
|
different names. The documentation has been revised to reflect these changes. |
207 |
|
The version number has been expanded to 3 digits as it is clear that the rate |
208 |
|
of change is not slowing down. |
209 |
|
|
210 |
|
|
211 |
|
Release 7.9 11-Apr-09 |
212 |
|
--------------------- |
213 |
|
|
214 |
|
Mostly bugfixes and tidies with just a couple of minor functional additions. |
215 |
|
|
216 |
|
|
217 |
|
Release 7.8 05-Sep-08 |
218 |
|
--------------------- |
219 |
|
|
220 |
|
More bug fixes, plus a performance improvement in Unicode character property |
221 |
|
lookup. |
222 |
|
|
223 |
|
|
224 |
|
Release 7.7 07-May-08 |
225 |
|
--------------------- |
226 |
|
|
227 |
|
This is once again mainly a bug-fix release, but there are a couple of new |
228 |
|
features. |
229 |
|
|
230 |
|
|
231 |
|
Release 7.6 28-Jan-08 |
232 |
|
--------------------- |
233 |
|
|
234 |
|
The main reason for having this release so soon after 7.5 is because it fixes a |
235 |
|
potential buffer overflow problem in pcre_compile() when run in UTF-8 mode. In |
236 |
|
addition, the CMake configuration files have been brought up to date. |
237 |
|
|
238 |
|
|
239 |
|
Release 7.5 10-Jan-08 |
240 |
|
--------------------- |
241 |
|
|
242 |
|
This is mainly a bug-fix release. However the ability to link pcregrep with |
243 |
|
libz or libbz2 and the ability to link pcretest with libreadline have been |
244 |
|
added. Also the --line-offsets and --file-offsets options were added to |
245 |
|
pcregrep. |
246 |
|
|
247 |
|
|
248 |
|
Release 7.4 21-Sep-07 |
249 |
|
--------------------- |
250 |
|
|
251 |
|
The only change of specification is the addition of options to control whether |
252 |
|
\R matches any Unicode line ending (the default) or just CR, LF, and CRLF. |
253 |
|
Otherwise, the changes are bug fixes and a refactoring to reduce the number of |
254 |
|
relocations needed in a shared library. There have also been some documentation |
255 |
|
updates, in particular, some more information about using CMake to build PCRE |
256 |
|
has been added to the NON-UNIX-USE file. |
257 |
|
|
258 |
|
|
259 |
|
Release 7.3 28-Aug-07 |
260 |
|
--------------------- |
261 |
|
|
262 |
|
Most changes are bug fixes. Some that are not: |
263 |
|
|
264 |
|
1. There is some support for Perl 5.10's experimental "backtracking control |
265 |
|
verbs" such as (*PRUNE). |
266 |
|
|
267 |
|
2. UTF-8 checking is now as per RFC 3629 instead of RFC 2279; this is more |
268 |
|
restrictive in the strings it accepts. |
269 |
|
|
270 |
|
3. Checking for potential integer overflow has been made more dynamic, and as a |
271 |
|
consequence there is no longer a hard limit on the size of a subpattern that |
272 |
|
has a limited repeat count. |
273 |
|
|
274 |
|
4. When CRLF is a valid line-ending sequence, pcre_exec() and pcre_dfa_exec() |
275 |
|
no longer advance by two characters instead of one when an unanchored match |
276 |
|
fails at CRLF if there are explicit CR or LF matches within the pattern. |
277 |
|
This gets rid of some anomalous effects that previously occurred. |
278 |
|
|
279 |
|
5. Some PCRE-specific settings for varying the newline options at the start of |
280 |
|
a pattern have been added. |
281 |
|
|
282 |
|
|
283 |
|
Release 7.2 19-Jun-07 |
284 |
|
--------------------- |
285 |
|
|
286 |
|
WARNING: saved patterns that were compiled by earlier versions of PCRE must be |
287 |
|
recompiled for use with 7.2 (necessitated by the addition of \K, \h, \H, \v, |
288 |
|
and \V). |
289 |
|
|
290 |
|
Correction to the notes for 7.1: the note about shared libraries for Windows is |
291 |
|
wrong. Previously, three libraries were built, but each could function |
292 |
|
independently. For example, the pcreposix library also included all the |
293 |
|
functions from the basic pcre library. The change is that the three libraries |
294 |
|
are no longer independent. They are like the Unix libraries. To use the |
295 |
|
pcreposix functions, for example, you need to link with both the pcreposix and |
296 |
|
the basic pcre library. |
297 |
|
|
298 |
|
Some more features from Perl 5.10 have been added: |
299 |
|
|
300 |
|
(?-n) and (?+n) relative references for recursion and subroutines. |
301 |
|
|
302 |
|
(?(-n) and (?(+n) relative references as conditions. |
303 |
|
|
304 |
|
\k{name} and \g{name} are synonyms for \k<name>. |
305 |
|
|
306 |
|
\K to reset the start of the matched string; for example, (foo)\Kbar |
307 |
|
matches bar preceded by foo, but only sets bar as the matched string. |
308 |
|
|
309 |
|
(?| introduces a group where the capturing parentheses in each alternative |
310 |
|
start from the same number; for example, (?|(abc)|(xyz)) sets capturing |
311 |
|
parentheses number 1 in both cases. |
312 |
|
|
313 |
|
\h, \H, \v, \V match horizontal and vertical whitespace, respectively. |
314 |
|
|
315 |
|
|
316 |
|
Release 7.1 24-Apr-07 |
317 |
|
--------------------- |
318 |
|
|
319 |
|
There is only one new feature in this release: a linebreak setting of |
320 |
|
PCRE_NEWLINE_ANYCRLF. It is a cut-down version of PCRE_NEWLINE_ANY, which |
321 |
|
recognizes only CRLF, CR, and LF as linebreaks. |
322 |
|
|
323 |
|
A few bugs are fixed (see ChangeLog for details), but the major change is a |
324 |
|
complete re-implementation of the build system. This now has full Autotools |
325 |
|
support and so is now "standard" in some sense. It should help with compiling |
326 |
|
PCRE in a wide variety of environments. |
327 |
|
|
328 |
|
NOTE: when building shared libraries for Windows, three dlls are now built, |
329 |
|
called libpcre, libpcreposix, and libpcrecpp. Previously, everything was |
330 |
|
included in a single dll. |
331 |
|
|
332 |
|
Another important change is that the dftables auxiliary program is no longer |
333 |
|
compiled and run at "make" time by default. Instead, a default set of character |
334 |
|
tables (assuming ASCII coding) is used. If you want to use dftables to generate |
335 |
|
the character tables as previously, add --enable-rebuild-chartables to the |
336 |
|
"configure" command. You must do this if you are compiling PCRE to run on a |
337 |
|
system that uses EBCDIC code. |
338 |
|
|
339 |
|
There is a discussion about character tables in the README file. The default is |
340 |
|
not to use dftables so that that there is no problem when cross-compiling. |
341 |
|
|
342 |
|
|
343 |
|
Release 7.0 19-Dec-06 |
344 |
|
--------------------- |
345 |
|
|
346 |
|
This release has a new major number because there have been some internal |
347 |
|
upheavals to facilitate the addition of new optimizations and other facilities, |
348 |
|
and to make subsequent maintenance and extension easier. Compilation is likely |
349 |
|
to be a bit slower, but there should be no major effect on runtime performance. |
350 |
|
Previously compiled patterns are NOT upwards compatible with this release. If |
351 |
|
you have saved compiled patterns from a previous release, you will have to |
352 |
|
re-compile them. Important changes that are visible to users are: |
353 |
|
|
354 |
|
1. The Unicode property tables have been updated to Unicode 5.0.0, which adds |
355 |
|
some more scripts. |
356 |
|
|
357 |
|
2. The option PCRE_NEWLINE_ANY causes PCRE to recognize any Unicode newline |
358 |
|
sequence as a newline. |
359 |
|
|
360 |
|
3. The \R escape matches a single Unicode newline sequence as a single unit. |
361 |
|
|
362 |
|
4. New features that will appear in Perl 5.10 are now in PCRE. These include |
363 |
|
alternative Perl syntax for named parentheses, and Perl syntax for |
364 |
|
recursion. |
365 |
|
|
366 |
|
5. The C++ wrapper interface has been extended by the addition of a |
367 |
|
QuoteMeta function and the ability to allow copy construction and |
368 |
|
assignment. |
369 |
|
|
370 |
|
For a complete list of changes, see the ChangeLog file. |
371 |
|
|
372 |
|
|
373 |
|
Release 6.7 04-Jul-06 |
374 |
|
--------------------- |
375 |
|
|
376 |
|
The main additions to this release are the ability to use the same name for |
377 |
|
multiple sets of parentheses, and support for CRLF line endings in both the |
378 |
|
library and pcregrep (and in pcretest for testing). |
379 |
|
|
380 |
|
Thanks to Ian Taylor, the stack usage for many kinds of pattern has been |
381 |
|
significantly reduced for certain subject strings. |
382 |
|
|
383 |
|
|
384 |
|
Release 6.5 01-Feb-06 |
385 |
|
--------------------- |
386 |
|
|
387 |
|
Important changes in this release: |
388 |
|
|
389 |
|
1. A number of new features have been added to pcregrep. |
390 |
|
|
391 |
|
2. The Unicode property tables have been updated to Unicode 4.1.0, and the |
392 |
|
supported properties have been extended with script names such as "Arabic", |
393 |
|
and the derived properties "Any" and "L&". This has necessitated a change to |
394 |
|
the interal format of compiled patterns. Any saved compiled patterns that |
395 |
|
use \p or \P must be recompiled. |
396 |
|
|
397 |
|
3. The specification of recursion in patterns has been changed so that all |
398 |
|
recursive subpatterns are automatically treated as atomic groups. Thus, for |
399 |
|
example, (?R) is treated as if it were (?>(?R)). This is necessary because |
400 |
|
otherwise there are situations where recursion does not work. |
401 |
|
|
402 |
|
See the ChangeLog for a complete list of changes, which include a number of bug |
403 |
|
fixes and tidies. |
404 |
|
|
405 |
|
|
406 |
|
Release 6.0 07-Jun-05 |
407 |
|
--------------------- |
408 |
|
|
409 |
|
The release number has been increased to 6.0 because of the addition of several |
410 |
|
major new pieces of functionality. |
411 |
|
|
412 |
|
A new function, pcre_dfa_exec(), which implements pattern matching using a DFA |
413 |
|
algorithm, has been added. This has a number of advantages for certain cases, |
414 |
|
though it does run more slowly, and lacks the ability to capture substrings. On |
415 |
|
the other hand, it does find all matches, not just the first, and it works |
416 |
|
better for partial matching. The pcrematching man page discusses the |
417 |
|
differences. |
418 |
|
|
419 |
|
The pcretest program has been enhanced so that it can make use of the new |
420 |
|
pcre_dfa_exec() matching function and the extra features it provides. |
421 |
|
|
422 |
|
The distribution now includes a C++ wrapper library. This is built |
423 |
|
automatically if a C++ compiler is found. The pcrecpp man page discusses this |
424 |
|
interface. |
425 |
|
|
426 |
|
The code itself has been re-organized into many more files, one for each |
427 |
|
function, so it no longer requires everything to be linked in when static |
428 |
|
linkage is used. As a consequence, some internal functions have had to have |
429 |
|
their names exposed. These functions all have names starting with _pcre_. They |
430 |
|
are undocumented, and are not intended for use by outside callers. |
431 |
|
|
432 |
|
The pcregrep program has been enhanced with new functionality such as |
433 |
|
multiline-matching and options for output more matching context. See the |
434 |
|
ChangeLog for a complete list of changes to the library and the utility |
435 |
|
programs. |
436 |
|
|
437 |
|
|
438 |
|
Release 5.0 13-Sep-04 |
439 |
|
--------------------- |
440 |
|
|
441 |
|
The licence under which PCRE is released has been changed to the more |
442 |
|
conventional "BSD" licence. |
443 |
|
|
444 |
|
In the code, some bugs have been fixed, and there are also some major changes |
445 |
|
in this release (which is why I've increased the number to 5.0). Some changes |
446 |
|
are internal rearrangements, and some provide a number of new facilities. The |
447 |
|
new features are: |
448 |
|
|
449 |
|
1. There's an "automatic callout" feature that inserts callouts before every |
450 |
|
item in the regex, and there's a new callout field that gives the position |
451 |
|
in the pattern - useful for debugging and tracing. |
452 |
|
|
453 |
|
2. The extra_data structure can now be used to pass in a set of character |
454 |
|
tables at exec time. This is useful if compiled regex are saved and re-used |
455 |
|
at a later time when the tables may not be at the same address. If the |
456 |
|
default internal tables are used, the pointer saved with the compiled |
457 |
|
pattern is now set to NULL, which means that you don't need to do anything |
458 |
|
special unless you are using custom tables. |
459 |
|
|
460 |
|
3. It is possible, with some restrictions on the content of the regex, to |
461 |
|
request "partial" matching. A special return code is given if all of the |
462 |
|
subject string matched part of the regex. This could be useful for testing |
463 |
|
an input field as it is being typed. |
464 |
|
|
465 |
|
4. There is now some optional support for Unicode character properties, which |
466 |
|
means that the patterns items such as \p{Lu} and \X can now be used. Only |
467 |
|
the general category properties are supported. If PCRE is compiled with this |
468 |
|
support, an additional 90K data structure is include, which increases the |
469 |
|
size of the library dramatically. |
470 |
|
|
471 |
|
5. There is support for saving compiled patterns and re-using them later. |
472 |
|
|
473 |
|
6. There is support for running regular expressions that were compiled on a |
474 |
|
different host with the opposite endianness. |
475 |
|
|
476 |
|
7. The pcretest program has been extended to accommodate the new features. |
477 |
|
|
478 |
|
The main internal rearrangement is that sequences of literal characters are no |
479 |
|
longer handled as strings. Instead, each character is handled on its own. This |
480 |
|
makes some UTF-8 handling easier, and makes the support of partial matching |
481 |
|
possible. Compiled patterns containing long literal strings will be larger as a |
482 |
|
result of this change; I hope that performance will not be much affected. |
483 |
|
|
484 |
|
|
485 |
Release 4.5 01-Dec-03 |
Release 4.5 01-Dec-03 |
486 |
--------------------- |
--------------------- |
487 |
|
|