/[pcre]/code/trunk/HACKING
ViewVC logotype

Diff of /code/trunk/HACKING

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 927 by ph10, Wed Feb 22 15:15:08 2012 UTC revision 1055 by chpe, Tue Oct 16 15:53:30 2012 UTC
# Line 49  complexity in Perl regular expressions, Line 49  complexity in Perl regular expressions,
49  first pass through the pattern is helpful for other reasons.  first pass through the pattern is helpful for other reasons.
50    
51    
52  Support for 16-bit data strings  Support for 16-bit and 32-bit data strings
53  -------------------------------  -------------------------------------------
54    
55  From release 8.30, PCRE supports 16-bit as well as 8-bit data strings, by being  From release 8.30, PCRE supports 16-bit as well as 8-bit data strings; and from
56  compilable in either 8-bit or 16-bit modes, or both. Thus, two different  release 8.FIXME, PCRE supports 32-bit data strings. The library can be compiled
57  libraries can be created. In the description that follows, the word "short" is  in any combination of 8-bit, 16-bit or 32-bit modes, creating different
58    libraries. In the description that follows, the word "short" is
59  used for a 16-bit data quantity, and the word "unit" is used for a quantity  used for a 16-bit data quantity, and the word "unit" is used for a quantity
60  that is a byte in 8-bit mode and a short in 16-bit mode. However, so as not to  that is a byte in 8-bit mode, a short in 16-bit mode and a 32-bit unsigned
61  over-complicate the text, the names of PCRE functions are given in 8-bit form  integer in 32-bit mode. However, so as not to over-complicate the text, the
62  only.  names of PCRE functions are given in 8-bit form only.
63    
64    
65  Computing the memory requirement: how it was  Computing the memory requirement: how it was
# Line 138  Format of compiled patterns Line 139  Format of compiled patterns
139  ---------------------------  ---------------------------
140    
141  The compiled form of a pattern is a vector of units (bytes in 8-bit mode, or  The compiled form of a pattern is a vector of units (bytes in 8-bit mode, or
142  shorts in 16-bit mode), containing items of variable length. The first unit in  shorts in 16-bit mode, 32-bit unsigned integers in 32-bit mode), containing
143  an item contains an opcode, and the length of the item is either implicit in  items of variable length. The first unit in an item contains an opcode, and
144  the opcode or contained in the data that follows it.  the length of the item is either implicit in the opcode or contained in the
145    data that follows it.
146    
147  In many cases listed below, LINK_SIZE data values are specified for offsets  In many cases listed below, LINK_SIZE data values are specified for offsets
148  within the compiled pattern. LINK_SIZE always specifies a number of bytes. The  within the compiled pattern. LINK_SIZE always specifies a number of bytes. The
# Line 207  Matching literal characters Line 209  Matching literal characters
209    
210  The OP_CHAR opcode is followed by a single character that is to be matched  The OP_CHAR opcode is followed by a single character that is to be matched
211  casefully. For caseless matching, OP_CHARI is used. In UTF-8 or UTF-16 modes,  casefully. For caseless matching, OP_CHARI is used. In UTF-8 or UTF-16 modes,
212  the character may be more than one unit long.  the character may be more than one unit long. In UTF-32 mode, characters
213    are always exactly one unit long.
214    
215    
216  Repeating single characters  Repeating single characters
# Line 228  following opcodes, which come in caseful Line 231  following opcodes, which come in caseful
231    OP_POSQUERY     OP_POSQUERYI    OP_POSQUERY     OP_POSQUERYI
232    
233  Each opcode is followed by the character that is to be repeated. In ASCII mode,  Each opcode is followed by the character that is to be repeated. In ASCII mode,
234  these are two-unit items; in UTF-8 or UTF-16 modes, the length is variable.  these are two-unit items; in UTF-8 or UTF-16 modes, the length is variable; in
235    UTF-32 mode these are one-unit items.
236  Those with "MIN" in their names are the minimizing versions. Those with "POS"  Those with "MIN" in their names are the minimizing versions. Those with "POS"
237  in their names are possessive versions. Other repeats make use of these  in their names are possessive versions. Other repeats make use of these
238  opcodes:  opcodes:
# Line 299  bit map containing a 1 bit for every cha Line 303  bit map containing a 1 bit for every cha
303  counted from the least significant end of each unit. In caseless mode, bits for  counted from the least significant end of each unit. In caseless mode, bits for
304  both cases are set.  both cases are set.
305    
306  The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8/16 mode,  The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8/16/32 mode,
307  subject characters with values greater than 255 can be handled correctly. For  subject characters with values greater than 255 can be handled correctly. For
308  OP_CLASS they do not match, whereas for OP_NCLASS they do.  OP_CLASS they do not match, whereas for OP_NCLASS they do.
309    
# Line 412  OP_ASSERTBACK and OP_ASSERTBACK_NOT, and Line 416  OP_ASSERTBACK and OP_ASSERTBACK_NOT, and
416  is OP_REVERSE, followed by a two byte (one short) count of the number of  is OP_REVERSE, followed by a two byte (one short) count of the number of
417  characters to move back the pointer in the subject string. In ASCII mode, the  characters to move back the pointer in the subject string. In ASCII mode, the
418  count is a number of units, but in UTF-8/16 mode each character may occupy more  count is a number of units, but in UTF-8/16 mode each character may occupy more
419  than one unit. A separate count is present in each alternative of a lookbehind  than one unit; in UTF-32 mode each character occupies exactly one unit.
420    A separate count is present in each alternative of a lookbehind
421  assertion, allowing them to have different fixed lengths.  assertion, allowing them to have different fixed lengths.
422    
423    

Legend:
Removed from v.927  
changed lines
  Added in v.1055

  ViewVC Help
Powered by ViewVC 1.1.5