/[pcre]/code/trunk/HACKING
ViewVC logotype

Diff of /code/trunk/HACKING

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 460 by ph10, Sun Oct 4 09:27:20 2009 UTC revision 550 by ph10, Sun Oct 10 16:24:11 2010 UTC
# Line 4  Technical Notes about PCRE Line 4  Technical Notes about PCRE
4  These are very rough technical notes that record potentially useful information  These are very rough technical notes that record potentially useful information
5  about PCRE internals.  about PCRE internals.
6    
7    
8  Historical note 1  Historical note 1
9  -----------------  -----------------
10    
# Line 22  the one matching the longest subset of t Line 23  the one matching the longest subset of t
23  not necessarily maximize the individual wild portions of the pattern, as is  not necessarily maximize the individual wild portions of the pattern, as is
24  expected in Unix and Perl-style regular expressions.  expected in Unix and Perl-style regular expressions.
25    
26    
27  Historical note 2  Historical note 2
28  -----------------  -----------------
29    
# Line 34  maximizing (or, optionally, minimizing i Line 36  maximizing (or, optionally, minimizing i
36  matches individual wild portions of the pattern. This is an "NFA algorithm" in  matches individual wild portions of the pattern. This is an "NFA algorithm" in
37  Friedl's terminology.  Friedl's terminology.
38    
39    
40  OK, here's the real stuff  OK, here's the real stuff
41  -------------------------  -------------------------
42    
# Line 44  in the pattern, to save on compiling tim Line 47  in the pattern, to save on compiling tim
47  complexity in Perl regular expressions, I couldn't do this. In any case, a  complexity in Perl regular expressions, I couldn't do this. In any case, a
48  first pass through the pattern is helpful for other reasons.  first pass through the pattern is helpful for other reasons.
49    
50    
51  Computing the memory requirement: how it was  Computing the memory requirement: how it was
52  --------------------------------------------  --------------------------------------------
53    
# Line 54  idea was that this would turn out faster Line 58  idea was that this would turn out faster
58  the first pass is degenerate and the second pass can just store stuff straight  the first pass is degenerate and the second pass can just store stuff straight
59  into the vector, which it knows is big enough.  into the vector, which it knows is big enough.
60    
61    
62  Computing the memory requirement: how it is  Computing the memory requirement: how it is
63  -------------------------------------------  -------------------------------------------
64    
# Line 75  runs more slowly than before (30% or mor Line 80  runs more slowly than before (30% or mor
80  is doing a full analysis of the pattern. My hope was that this would not be a  is doing a full analysis of the pattern. My hope was that this would not be a
81  big issue, and in the event, nobody has commented on it.  big issue, and in the event, nobody has commented on it.
82    
83    
84  Traditional matching function  Traditional matching function
85  -----------------------------  -----------------------------
86    
# Line 84  and the way that Perl works. This is not Line 90  and the way that Perl works. This is not
90  as compatible with Perl as possible. This is the function most users of PCRE  as compatible with Perl as possible. This is the function most users of PCRE
91  will use most of the time.  will use most of the time.
92    
93    
94  Supplementary matching function  Supplementary matching function
95  -------------------------------  -------------------------------
96    
# Line 119  quantifiers) are always just two bytes l Line 126  quantifiers) are always just two bytes l
126    
127  A list of the opcodes follows:  A list of the opcodes follows:
128    
   
129  Opcodes with no following data  Opcodes with no following data
130  ------------------------------  ------------------------------
131    
# Line 151  These items are all just one byte long Line 157  These items are all just one byte long
157    OP_EXTUNI              match an extended Unicode character    OP_EXTUNI              match an extended Unicode character
158    OP_ANYNL               match any Unicode newline sequence    OP_ANYNL               match any Unicode newline sequence
159    
160    OP_ACCEPT              ) These are Perl 5.10's "backtracking    OP_ACCEPT              ) These are Perl 5.10's "backtracking control
161    OP_COMMIT              ) control verbs". If OP_ACCEPT is inside    OP_COMMIT              ) verbs". If OP_ACCEPT is inside capturing
162    OP_FAIL                ) capturing parentheses, it may be preceded    OP_FAIL                ) parentheses, it may be preceded by one or more
163    OP_PRUNE               ) by one or more OP_CLOSE, followed by a 2-byte    OP_PRUNE               ) OP_CLOSE, followed by a 2-byte number,
164    OP_SKIP                ) number, indicating which parentheses must be    OP_SKIP                ) indicating which parentheses must be closed.
165    OP_THEN                ) closed.  
166    
167    Backtracking control verbs with data
168    ------------------------------------
169    
170    OP_THEN is followed by a LINK_SIZE offset, which is the distance back to the
171    start of the current branch.
172    
173    OP_MARK is followed by the mark name, preceded by a one-byte length, and
174    followed by a binary zero. For (*PRUNE), (*SKIP), and (*THEN) with arguments,
175    the opcodes OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used. For the first
176    two, the name follows immediately; for OP_THEN_ARG, it follows the LINK_SIZE
177    offset value.
178    
179    
180  Repeating single characters  Repeating single characters
# Line 419  at compile time, and so does not cause a Line 437  at compile time, and so does not cause a
437  data.  data.
438    
439  Philip Hazel  Philip Hazel
440  October 2009  October 2010

Legend:
Removed from v.460  
changed lines
  Added in v.550

  ViewVC Help
Powered by ViewVC 1.1.5