/[pcre]/code/trunk/HACKING
ViewVC logotype

Diff of /code/trunk/HACKING

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 181 by ph10, Wed Jun 13 14:55:18 2007 UTC revision 342 by ph10, Sun Apr 20 17:10:13 2008 UTC
# Line 109  variable length. The first byte in an it Line 109  variable length. The first byte in an it
109  item is either implicit in the opcode or contained in the data bytes that  item is either implicit in the opcode or contained in the data bytes that
110  follow it.  follow it.
111    
112  In many cases below "two-byte" data values are specified. This is in fact just  In many cases below LINK_SIZE data values are specified for offsets within the
113  a default when the number is an offset within the compiled pattern. PCRE can be  compiled pattern. The default value for LINK_SIZE is 2, but PCRE can be
114  compiled to use 3-byte or 4-byte values for these offsets (impairing the  compiled to use 3-byte or 4-byte values for these offsets (impairing the
115  performance). This is necessary only when patterns whose compiled length is  performance). This is necessary only when patterns whose compiled length is
116  greater than 64K are going to be processed. In this description, we assume the  greater than 64K are going to be processed. In this description, we assume the
117  "normal" compilation options. "Two-byte" data values that are counts (e.g. for  "normal" compilation options. Data values that are counts (e.g. for
118  quantifiers) are always just two bytes.  quantifiers) are always just two bytes long.
119    
120  A list of all the opcodes follows:  A list of the opcodes follows:
121    
122  Opcodes with no following data  Opcodes with no following data
123  ------------------------------  ------------------------------
# Line 125  Opcodes with no following data Line 125  Opcodes with no following data
125  These items are all just one byte long  These items are all just one byte long
126    
127    OP_END                 end of pattern    OP_END                 end of pattern
128    OP_ANY                 match any character    OP_ANY                 match any one character other than newline
129      OP_ALLANY              match any one character, including newline
130    OP_ANYBYTE             match any single byte, even in UTF-8 mode    OP_ANYBYTE             match any single byte, even in UTF-8 mode
131    OP_SOD                 match start of data: \A    OP_SOD                 match start of data: \A
132    OP_SOM,                start of match (subject + offset): \G    OP_SOM,                start of match (subject + offset): \G
# Line 149  These items are all just one byte long Line 150  These items are all just one byte long
150    OP_EXTUNI              match an extended Unicode character    OP_EXTUNI              match an extended Unicode character
151    OP_ANYNL               match any Unicode newline sequence    OP_ANYNL               match any Unicode newline sequence
152    
153      OP_ACCEPT              )
154      OP_COMMIT              )
155      OP_FAIL                ) These are Perl 5.10's "backtracking
156      OP_PRUNE               ) control verbs".
157      OP_SKIP                )
158      OP_THEN                )
159    
160    
161  Repeating single characters  Repeating single characters
162  ---------------------------  ---------------------------
# Line 311  maximally respectively. All three are fo Line 319  maximally respectively. All three are fo
319  positive number) the offset back to the matching bracket opcode.  positive number) the offset back to the matching bracket opcode.
320    
321  If a subpattern is quantified such that it is permitted to match zero times, it  If a subpattern is quantified such that it is permitted to match zero times, it
322  is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte  is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are
323  opcodes which tell the matcher that skipping this subpattern entirely is a  single-byte opcodes that tell the matcher that skipping the following
324  valid branch.  subpattern entirely is a valid branch. In the case of the first two, not
325    skipping the pattern is also valid (greedy and non-greedy). The third is used
326    when a pattern has the quantifier {0,0}. It cannot be entirely discarded,
327    because it may be called as a subroutine from elsewhere in the regex.
328    
329  A subpattern with an indefinite maximum repetition is replicated in the  A subpattern with an indefinite maximum repetition is replicated in the
330  compiled data its minimum number of times (or once with OP_BRAZERO if the  compiled data its minimum number of times (or once with OP_BRAZERO if the
# Line 404  at compile time, and so does not cause a Line 415  at compile time, and so does not cause a
415  data.  data.
416    
417  Philip Hazel  Philip Hazel
418  June 2007  April 2008

Legend:
Removed from v.181  
changed lines
  Added in v.342

  ViewVC Help
Powered by ViewVC 1.1.5