/[pcre]/code/trunk/doc/html/pcrepattern.html
ViewVC logotype

Diff of /code/trunk/doc/html/pcrepattern.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1193 by ph10, Wed Jun 20 15:15:27 2012 UTC revision 1194 by ph10, Wed Oct 31 17:42:29 2012 UTC
# Line 14  man page, in case the conversion went wr Line 14  man page, in case the conversion went wr
14  <br>  <br>
15  <ul>  <ul>
16  <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>  <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>
17  <li><a name="TOC2" href="#SEC2">NEWLINE CONVENTIONS</a>  <li><a name="TOC2" href="#SEC2">EBCDIC CHARACTER CODES</a>
18  <li><a name="TOC3" href="#SEC3">CHARACTERS AND METACHARACTERS</a>  <li><a name="TOC3" href="#SEC3">NEWLINE CONVENTIONS</a>
19  <li><a name="TOC4" href="#SEC4">BACKSLASH</a>  <li><a name="TOC4" href="#SEC4">CHARACTERS AND METACHARACTERS</a>
20  <li><a name="TOC5" href="#SEC5">CIRCUMFLEX AND DOLLAR</a>  <li><a name="TOC5" href="#SEC5">BACKSLASH</a>
21  <li><a name="TOC6" href="#SEC6">FULL STOP (PERIOD, DOT) AND \N</a>  <li><a name="TOC6" href="#SEC6">CIRCUMFLEX AND DOLLAR</a>
22  <li><a name="TOC7" href="#SEC7">MATCHING A SINGLE DATA UNIT</a>  <li><a name="TOC7" href="#SEC7">FULL STOP (PERIOD, DOT) AND \N</a>
23  <li><a name="TOC8" href="#SEC8">SQUARE BRACKETS AND CHARACTER CLASSES</a>  <li><a name="TOC8" href="#SEC8">MATCHING A SINGLE DATA UNIT</a>
24  <li><a name="TOC9" href="#SEC9">POSIX CHARACTER CLASSES</a>  <li><a name="TOC9" href="#SEC9">SQUARE BRACKETS AND CHARACTER CLASSES</a>
25  <li><a name="TOC10" href="#SEC10">VERTICAL BAR</a>  <li><a name="TOC10" href="#SEC10">POSIX CHARACTER CLASSES</a>
26  <li><a name="TOC11" href="#SEC11">INTERNAL OPTION SETTING</a>  <li><a name="TOC11" href="#SEC11">VERTICAL BAR</a>
27  <li><a name="TOC12" href="#SEC12">SUBPATTERNS</a>  <li><a name="TOC12" href="#SEC12">INTERNAL OPTION SETTING</a>
28  <li><a name="TOC13" href="#SEC13">DUPLICATE SUBPATTERN NUMBERS</a>  <li><a name="TOC13" href="#SEC13">SUBPATTERNS</a>
29  <li><a name="TOC14" href="#SEC14">NAMED SUBPATTERNS</a>  <li><a name="TOC14" href="#SEC14">DUPLICATE SUBPATTERN NUMBERS</a>
30  <li><a name="TOC15" href="#SEC15">REPETITION</a>  <li><a name="TOC15" href="#SEC15">NAMED SUBPATTERNS</a>
31  <li><a name="TOC16" href="#SEC16">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>  <li><a name="TOC16" href="#SEC16">REPETITION</a>
32  <li><a name="TOC17" href="#SEC17">BACK REFERENCES</a>  <li><a name="TOC17" href="#SEC17">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
33  <li><a name="TOC18" href="#SEC18">ASSERTIONS</a>  <li><a name="TOC18" href="#SEC18">BACK REFERENCES</a>
34  <li><a name="TOC19" href="#SEC19">CONDITIONAL SUBPATTERNS</a>  <li><a name="TOC19" href="#SEC19">ASSERTIONS</a>
35  <li><a name="TOC20" href="#SEC20">COMMENTS</a>  <li><a name="TOC20" href="#SEC20">CONDITIONAL SUBPATTERNS</a>
36  <li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a>  <li><a name="TOC21" href="#SEC21">COMMENTS</a>
37  <li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a>  <li><a name="TOC22" href="#SEC22">RECURSIVE PATTERNS</a>
38  <li><a name="TOC23" href="#SEC23">ONIGURUMA SUBROUTINE SYNTAX</a>  <li><a name="TOC23" href="#SEC23">SUBPATTERNS AS SUBROUTINES</a>
39  <li><a name="TOC24" href="#SEC24">CALLOUTS</a>  <li><a name="TOC24" href="#SEC24">ONIGURUMA SUBROUTINE SYNTAX</a>
40  <li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>  <li><a name="TOC25" href="#SEC25">CALLOUTS</a>
41  <li><a name="TOC26" href="#SEC26">SEE ALSO</a>  <li><a name="TOC26" href="#SEC26">BACKTRACKING CONTROL</a>
42  <li><a name="TOC27" href="#SEC27">AUTHOR</a>  <li><a name="TOC27" href="#SEC27">SEE ALSO</a>
43  <li><a name="TOC28" href="#SEC28">REVISION</a>  <li><a name="TOC28" href="#SEC28">AUTHOR</a>
44    <li><a name="TOC29" href="#SEC29">REVISION</a>
45  </ul>  </ul>
46  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>  <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
47  <P>  <P>
# Line 61  description of PCRE's regular expression Line 62  description of PCRE's regular expression
62  </P>  </P>
63  <P>  <P>
64  The original operation of PCRE was on strings of one-byte characters. However,  The original operation of PCRE was on strings of one-byte characters. However,
65  there is now also support for UTF-8 strings in the original library, and a  there is now also support for UTF-8 strings in the original library, an
66  second library that supports 16-bit and UTF-16 character strings. To use these  extra library that supports 16-bit and UTF-16 character strings, and an
67    extra library that supports 32-bit and UTF-32 character strings. To use these
68  features, PCRE must be built to include appropriate support. When using UTF  features, PCRE must be built to include appropriate support. When using UTF
69  strings you must either call the compiling function with the PCRE_UTF8 or  strings you must either call the compiling function with the PCRE_UTF8,
70  PCRE_UTF16 option, or the pattern must start with one of these special  PCRE_UTF16 or PCRE_UTF32 option, or the pattern must start with one of
71  sequences:  these special sequences:
72  <pre>  <pre>
73    (*UTF8)    (*UTF8)
74    (*UTF16)    (*UTF16)
75      (*UTF32)
76  </pre>  </pre>
77  Starting a pattern with such a sequence is equivalent to setting the relevant  Starting a pattern with such a sequence is equivalent to setting the relevant
78  option. This feature is not Perl-compatible. How setting a UTF mode affects  option. This feature is not Perl-compatible. How setting a UTF mode affects
# Line 80  page. Line 83  page.
83  </P>  </P>
84  <P>  <P>
85  Another special sequence that may appear at the start of a pattern or in  Another special sequence that may appear at the start of a pattern or in
86  combination with (*UTF8) or (*UTF16) is:  combination with (*UTF8) or (*UTF16) or (*UTF32) is:
87  <pre>  <pre>
88    (*UCP)    (*UCP)
89  </pre>  </pre>
# Line 98  of newlines; they are described below. Line 101  of newlines; they are described below.
101  <P>  <P>
102  The remainder of this document discusses the patterns that are supported by  The remainder of this document discusses the patterns that are supported by
103  PCRE when one its main matching functions, <b>pcre_exec()</b> (8-bit) or  PCRE when one its main matching functions, <b>pcre_exec()</b> (8-bit) or
104  <b>pcre16_exec()</b> (16-bit), is used. PCRE also has alternative matching  <b>pcre[16|32]_exec()</b> (16- or 32-bit), is used. PCRE also has alternative
105  functions, <b>pcre_dfa_exec()</b> and <b>pcre16_dfa_exec()</b>, which match using  matching functions, <b>pcre_dfa_exec()</b> and <b>pcre[16|32_dfa_exec()</b>,
106  a different algorithm that is not Perl-compatible. Some of the features  which match using a different algorithm that is not Perl-compatible. Some of
107  discussed below are not available when DFA matching is used. The advantages and  the features discussed below are not available when DFA matching is used. The
108  disadvantages of the alternative functions, and how they differ from the normal  advantages and disadvantages of the alternative functions, and how they differ
109  functions, are discussed in the  from the normal functions, are discussed in the
110  <a href="pcrematching.html"><b>pcrematching</b></a>  <a href="pcrematching.html"><b>pcrematching</b></a>
111  page.  page.
112    </P>
113    <br><a name="SEC2" href="#TOC1">EBCDIC CHARACTER CODES</a><br>
114    <P>
115    PCRE can be compiled to run in an environment that uses EBCDIC as its character
116    code rather than ASCII or Unicode (typically a mainframe system). In the
117    sections below, character code values are ASCII or Unicode; in an EBCDIC
118    environment these characters may have different code values, and there are no
119    code points greater than 255.
120  <a name="newlines"></a></P>  <a name="newlines"></a></P>
121  <br><a name="SEC2" href="#TOC1">NEWLINE CONVENTIONS</a><br>  <br><a name="SEC3" href="#TOC1">NEWLINE CONVENTIONS</a><br>
122  <P>  <P>
123  PCRE supports five different conventions for indicating line breaks in  PCRE supports five different conventions for indicating line breaks in
124  strings: a single CR (carriage return) character, a single LF (linefeed)  strings: a single CR (carriage return) character, a single LF (linefeed)
# Line 150  description of \R in the section entitle Line 161  description of \R in the section entitle
161  below. A change of \R setting can be combined with a change of newline  below. A change of \R setting can be combined with a change of newline
162  convention.  convention.
163  </P>  </P>
164  <br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>  <br><a name="SEC4" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
165  <P>  <P>
166  A regular expression is a pattern that is matched against a subject string from  A regular expression is a pattern that is matched against a subject string from
167  left to right. Most characters stand for themselves in a pattern, and match the  left to right. Most characters stand for themselves in a pattern, and match the
# Line 207  a character class the only metacharacter Line 218  a character class the only metacharacter
218  </pre>  </pre>
219  The following sections describe the use of each of the metacharacters.  The following sections describe the use of each of the metacharacters.
220  </P>  </P>
221  <br><a name="SEC4" href="#TOC1">BACKSLASH</a><br>  <br><a name="SEC5" href="#TOC1">BACKSLASH</a><br>
222  <P>  <P>
223  The backslash character has several uses. Firstly, if it is followed by a  The backslash character has several uses. Firstly, if it is followed by a
224  character that is not a number or a letter, it takes away any special meaning  character that is not a number or a letter, it takes away any special meaning
# Line 273  one of the following escape sequences th Line 284  one of the following escape sequences th
284    \x{hhh..} character with hex code hhh.. (non-JavaScript mode)    \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
285    \uhhhh    character with hex code hhhh (JavaScript mode only)    \uhhhh    character with hex code hhhh (JavaScript mode only)
286  </pre>  </pre>
287  The precise effect of \cx is as follows: if x is a lower case letter, it  The precise effect of \cx on ASCII characters is as follows: if x is a lower
288  is converted to upper case. Then bit 6 of the character (hex 40) is inverted.  case letter, it is converted to upper case. Then bit 6 of the character (hex
289  Thus \cz becomes hex 1A (z is 7A), but \c{ becomes hex 3B ({ is 7B), while  40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
290  \c; becomes hex 7B (; is 3B). If the byte following \c has a value greater  but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
291  than 127, a compile-time error occurs. This locks out non-ASCII characters in  data item (byte or 16-bit value) following \c has a value greater than 127, a
292  all modes. (When PCRE is compiled in EBCDIC mode, all byte values are valid. A  compile-time error occurs. This locks out non-ASCII characters in all modes.
293  lower case letter is converted to upper case, and then the 0xc0 bits are  </P>
294  flipped.)  <P>
295    The \c facility was designed for use with ASCII characters, but with the
296    extension to Unicode it is even less useful than it once was. It is, however,
297    recognized when PCRE is compiled in EBCDIC mode, where data items are always
298    bytes. In this mode, all values are valid after \c. If the next character is a
299    lower case letter, it is converted to upper case. Then the 0xc0 bits of the
300    byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
301    the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
302    characters also generate different values.
303  </P>  </P>
304  <P>  <P>
305  By default, after \x, from zero to two hexadecimal digits are read (letters  By default, after \x, from zero to two hexadecimal digits are read (letters
# Line 291  between \x{ and }, but the character cod Line 310  between \x{ and }, but the character cod
310    8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint    8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint
311    16-bit non-UTF mode   less than 0x10000    16-bit non-UTF mode   less than 0x10000
312    16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint    16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
313      32-bit non-UTF mode   less than 0x80000000
314      32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint
315  </pre>  </pre>
316  Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called  Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called
317  "surrogate" codepoints).  "surrogate" codepoints), and 0xffef.
318  </P>  </P>
319  <P>  <P>
320  If characters other than hexadecimal digits appear between \x{ and }, or if  If characters other than hexadecimal digits appear between \x{ and }, or if
# Line 341  subsequent digits stand for themselves. Line 362  subsequent digits stand for themselves.
362  constrained in the same way as characters specified in hexadecimal.  constrained in the same way as characters specified in hexadecimal.
363  For example:  For example:
364  <pre>  <pre>
365    \040   is another way of writing a space    \040   is another way of writing an ASCII space
366    \40    is the same, provided there are fewer than 40 previous capturing subpatterns    \40    is the same, provided there are fewer than 40 previous capturing subpatterns
367    \7     is always a back reference    \7     is always a back reference
368    \11    might be a back reference, or another way of writing a tab    \11    might be a back reference, or another way of writing a tab
# Line 475  release 5.10. In contrast to the other s Line 496  release 5.10. In contrast to the other s
496  characters by default, these always match certain high-valued codepoints,  characters by default, these always match certain high-valued codepoints,
497  whether or not PCRE_UCP is set. The horizontal space characters are:  whether or not PCRE_UCP is set. The horizontal space characters are:
498  <pre>  <pre>
499    U+0009     Horizontal tab    U+0009     Horizontal tab (HT)
500    U+0020     Space    U+0020     Space
501    U+00A0     Non-break space    U+00A0     Non-break space
502    U+1680     Ogham space mark    U+1680     Ogham space mark
# Line 497  whether or not PCRE_UCP is set. The hori Line 518  whether or not PCRE_UCP is set. The hori
518  </pre>  </pre>
519  The vertical space characters are:  The vertical space characters are:
520  <pre>  <pre>
521    U+000A     Linefeed    U+000A     Linefeed (LF)
522    U+000B     Vertical tab    U+000B     Vertical tab (VT)
523    U+000C     Form feed    U+000C     Form feed (FF)
524    U+000D     Carriage return    U+000D     Carriage return (CR)
525    U+0085     Next line    U+0085     Next line (NEL)
526    U+2028     Line separator    U+2028     Line separator
527    U+2029     Paragraph separator    U+2029     Paragraph separator
528  </pre>  </pre>
# Line 553  change of newline convention; for exampl Line 574  change of newline convention; for exampl
574  <pre>  <pre>
575    (*ANY)(*BSR_ANYCRLF)    (*ANY)(*BSR_ANYCRLF)
576  </pre>  </pre>
577  They can also be combined with the (*UTF8), (*UTF16), or (*UCP) special  They can also be combined with the (*UTF8), (*UTF16), (*UTF32) or (*UCP) special
578  sequences. Inside a character class, \R is treated as an unrecognized escape  sequences. Inside a character class, \R is treated as an unrecognized escape
579  sequence, and so matches the letter "R" by default, but causes an error if  sequence, and so matches the letter "R" by default, but causes an error if
580  PCRE_EXTRA is set.  PCRE_EXTRA is set.
# Line 570  The extra escape sequences are: Line 591  The extra escape sequences are:
591  <pre>  <pre>
592    \p{<i>xx</i>}   a character with the <i>xx</i> property    \p{<i>xx</i>}   a character with the <i>xx</i> property
593    \P{<i>xx</i>}   a character without the <i>xx</i> property    \P{<i>xx</i>}   a character without the <i>xx</i> property
594    \X       an extended Unicode sequence    \X       a Unicode extended grapheme cluster
595  </pre>  </pre>
596  The property names represented by <i>xx</i> above are limited to the Unicode  The property names represented by <i>xx</i> above are limited to the Unicode
597  script names, the general category properties, "Any", which matches any  script names, the general category properties, "Any", which matches any
# Line 765  a modifier or "other". Line 786  a modifier or "other".
786  The Cs (Surrogate) property applies only to characters in the range U+D800 to  The Cs (Surrogate) property applies only to characters in the range U+D800 to
787  U+DFFF. Such characters are not valid in Unicode strings and so  U+DFFF. Such characters are not valid in Unicode strings and so
788  cannot be tested by PCRE, unless UTF validity checking has been turned off  cannot be tested by PCRE, unless UTF validity checking has been turned off
789  (see the discussion of PCRE_NO_UTF8_CHECK and PCRE_NO_UTF16_CHECK in the  (see the discussion of PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK and
790    PCRE_NO_UTF32_CHECK in the
791  <a href="pcreapi.html"><b>pcreapi</b></a>  <a href="pcreapi.html"><b>pcreapi</b></a>
792  page). Perl does not support the Cs property.  page). Perl does not support the Cs property.
793  </P>  </P>
# Line 784  Specifying caseless matching does not af Line 806  Specifying caseless matching does not af
806  example, \p{Lu} always matches only upper case letters.  example, \p{Lu} always matches only upper case letters.
807  </P>  </P>
808  <P>  <P>
809  The \X escape matches any number of Unicode characters that form an extended  Matching characters by Unicode property is not fast, because PCRE has to do a
810  Unicode sequence. \X is equivalent to  multistage table lookup in order to find a character's property. That is why
811    the traditional escape sequences such as \d and \w do not use Unicode
812    properties in PCRE by default, though you can make them do so by setting the
813    PCRE_UCP option or by starting the pattern with (*UCP).
814    </P>
815    <br><b>
816    Extended grapheme clusters
817    </b><br>
818    <P>
819    The \X escape matches any number of Unicode characters that form an "extended
820    grapheme cluster", and treats the sequence as an atomic group
821    <a href="#atomicgroup">(see below).</a>
822    Up to and including release 8.31, PCRE matched an earlier, simpler definition
823    that was equivalent to
824  <pre>  <pre>
825    (?&#62;\PM\pM*)    (?&#62;\PM\pM*)
826  </pre>  </pre>
827  That is, it matches a character without the "mark" property, followed by zero  That is, it matched a character without the "mark" property, followed by zero
828  or more characters with the "mark" property, and treats the sequence as an  or more characters with the "mark" property. Characters with the "mark"
829  atomic group  property are typically non-spacing accents that affect the preceding character.
 <a href="#atomicgroup">(see below).</a>  
 Characters with the "mark" property are typically accents that affect the  
 preceding character. None of them have codepoints less than 256, so in  
 8-bit non-UTF-8 mode \X matches any one character.  
830  </P>  </P>
831  <P>  <P>
832  Note that recent versions of Perl have changed \X to match what Unicode calls  This simple definition was extended in Unicode to include more complicated
833  an "extended grapheme cluster", which has a more complicated definition.  kinds of composite character by giving each character a grapheme breaking
834    property, and creating rules that use these properties to define the boundaries
835    of extended grapheme clusters. In releases of PCRE later than 8.31, \X matches
836    one of these clusters.
837  </P>  </P>
838  <P>  <P>
839  Matching characters by Unicode property is not fast, because PCRE has to search  \X always matches at least one character. Then it decides whether to add
840  a structure that contains data for over fifteen thousand characters. That is  additional characters according to the following rules for ending a cluster:
841  why the traditional escape sequences such as \d and \w do not use Unicode  </P>
842  properties in PCRE by default, though you can make them do so by setting the  <P>
843  PCRE_UCP option or by starting the pattern with (*UCP).  1. End at the end of the subject string.
844    </P>
845    <P>
846    2. Do not end between CR and LF; otherwise end after any control character.
847    </P>
848    <P>
849    3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters
850    are of five types: L, V, T, LV, and LVT. An L character may be followed by an
851    L, V, LV, or LVT character; an LV or V character may be followed by a V or T
852    character; an LVT or T character may be follwed only by a T character.
853    </P>
854    <P>
855    4. Do not end before extending characters or spacing marks. Characters with
856    the "mark" property always have the "extend" grapheme breaking property.
857    </P>
858    <P>
859    5. Do not end after prepend characters.
860    </P>
861    <P>
862    6. Otherwise, end the cluster.
863  <a name="extraprops"></a></P>  <a name="extraprops"></a></P>
864  <br><b>  <br><b>
865  PCRE's additional properties  PCRE's additional properties
866  </b><br>  </b><br>
867  <P>  <P>
868  As well as the standard Unicode properties described in the previous  As well as the standard Unicode properties described above, PCRE supports four
869  section, PCRE supports four more that make it possible to convert traditional  more that make it possible to convert traditional escape sequences such as \w
870  escape sequences such as \w and \s and POSIX character classes to use Unicode  and \s and POSIX character classes to use Unicode properties. PCRE uses these
871  properties. PCRE uses these non-standard, non-Perl properties internally when  non-standard, non-Perl properties internally when PCRE_UCP is set. They are:
 PCRE_UCP is set. They are:  
872  <pre>  <pre>
873    Xan   Any alphanumeric character    Xan   Any alphanumeric character
874    Xps   Any POSIX space character    Xps   Any POSIX space character
# Line 924  If all the alternatives of a pattern beg Line 976  If all the alternatives of a pattern beg
976  to the starting match position, and the "anchored" flag is set in the compiled  to the starting match position, and the "anchored" flag is set in the compiled
977  regular expression.  regular expression.
978  </P>  </P>
979  <br><a name="SEC5" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>  <br><a name="SEC6" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
980  <P>  <P>
981  Outside a character class, in the default matching mode, the circumflex  Outside a character class, in the default matching mode, the circumflex
982  character is an assertion that is true only if the current matching point is  character is an assertion that is true only if the current matching point is
# Line 978  Note that the sequences \A, \Z, and \z c Line 1030  Note that the sequences \A, \Z, and \z c
1030  end of the subject in both modes, and if all branches of a pattern start with  end of the subject in both modes, and if all branches of a pattern start with
1031  \A it is always anchored, whether or not PCRE_MULTILINE is set.  \A it is always anchored, whether or not PCRE_MULTILINE is set.
1032  <a name="fullstopdot"></a></P>  <a name="fullstopdot"></a></P>
1033  <br><a name="SEC6" href="#TOC1">FULL STOP (PERIOD, DOT) AND \N</a><br>  <br><a name="SEC7" href="#TOC1">FULL STOP (PERIOD, DOT) AND \N</a><br>
1034  <P>  <P>
1035  Outside a character class, a dot in the pattern matches any one character in  Outside a character class, a dot in the pattern matches any one character in
1036  the subject string except (by default) a character that signifies the end of a  the subject string except (by default) a character that signifies the end of a
# Line 1009  the PCRE_DOTALL option. In other words, Line 1061  the PCRE_DOTALL option. In other words,
1061  that signifies the end of a line. Perl also uses \N to match characters by  that signifies the end of a line. Perl also uses \N to match characters by
1062  name; PCRE does not support this.  name; PCRE does not support this.
1063  </P>  </P>
1064  <br><a name="SEC7" href="#TOC1">MATCHING A SINGLE DATA UNIT</a><br>  <br><a name="SEC8" href="#TOC1">MATCHING A SINGLE DATA UNIT</a><br>
1065  <P>  <P>
1066  Outside a character class, the escape sequence \C matches any one data unit,  Outside a character class, the escape sequence \C matches any one data unit,
1067  whether or not a UTF mode is set. In the 8-bit library, one data unit is one  whether or not a UTF mode is set. In the 8-bit library, one data unit is one
1068  byte; in the 16-bit library it is a 16-bit unit. Unlike a dot, \C always  byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is
1069    a 32-bit unit. Unlike a dot, \C always
1070  matches line-ending characters. The feature is provided in Perl in order to  matches line-ending characters. The feature is provided in Perl in order to
1071  match individual bytes in UTF-8 mode, but it is unclear how it can usefully be  match individual bytes in UTF-8 mode, but it is unclear how it can usefully be
1072  used. Because \C breaks up characters into individual data units, matching one  used. Because \C breaks up characters into individual data units, matching one
1073  unit with \C in a UTF mode means that the rest of the string may start with a  unit with \C in a UTF mode means that the rest of the string may start with a
1074  malformed UTF character. This has undefined results, because PCRE assumes that  malformed UTF character. This has undefined results, because PCRE assumes that
1075  it is dealing with valid UTF strings (and by default it checks this at the  it is dealing with valid UTF strings (and by default it checks this at the
1076  start of processing unless the PCRE_NO_UTF8_CHECK or PCRE_NO_UTF16_CHECK option  start of processing unless the PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or
1077  is used).  PCRE_NO_UTF32_CHECK option is used).
1078  </P>  </P>
1079  <P>  <P>
1080  PCRE does not allow \C to appear in lookbehind assertions  PCRE does not allow \C to appear in lookbehind assertions
# Line 1048  character for values whose encoding uses Line 1101  character for values whose encoding uses
1101  character's individual bytes are then captured by the appropriate number of  character's individual bytes are then captured by the appropriate number of
1102  groups.  groups.
1103  <a name="characterclass"></a></P>  <a name="characterclass"></a></P>
1104  <br><a name="SEC8" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>  <br><a name="SEC9" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
1105  <P>  <P>
1106  An opening square bracket introduces a character class, terminated by a closing  An opening square bracket introduces a character class, terminated by a closing
1107  square bracket. A closing square bracket on its own is not special by default.  square bracket. A closing square bracket on its own is not special by default.
# Line 1076  string, and therefore it fails if the cu Line 1129  string, and therefore it fails if the cu
1129  string.  string.
1130  </P>  </P>
1131  <P>  <P>
1132  In UTF-8 (UTF-16) mode, characters with values greater than 255 (0xffff) can be  In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255 (0xffff)
1133  included in a class as a literal string of data units, or by using the \x{  can be included in a class as a literal string of data units, or by using the
1134  escaping mechanism.  \x{ escaping mechanism.
1135  </P>  </P>
1136  <P>  <P>
1137  When caseless matching is set, any letters in a class represent both their  When caseless matching is set, any letters in a class represent both their
# Line 1158  introducing a POSIX class name - see the Line 1211  introducing a POSIX class name - see the
1211  closing square bracket. However, escaping other non-alphanumeric characters  closing square bracket. However, escaping other non-alphanumeric characters
1212  does no harm.  does no harm.
1213  </P>  </P>
1214  <br><a name="SEC9" href="#TOC1">POSIX CHARACTER CLASSES</a><br>  <br><a name="SEC10" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
1215  <P>  <P>
1216  Perl supports the POSIX notation for character classes. This uses names  Perl supports the POSIX notation for character classes. This uses names
1217  enclosed by [: and :] within the enclosing square brackets. PCRE also supports  enclosed by [: and :] within the enclosing square brackets. PCRE also supports
# Line 1220  Negated versions, such as [:^alpha:] use Line 1273  Negated versions, such as [:^alpha:] use
1273  classes are unchanged, and match only characters with code points less than  classes are unchanged, and match only characters with code points less than
1274  128.  128.
1275  </P>  </P>
1276  <br><a name="SEC10" href="#TOC1">VERTICAL BAR</a><br>  <br><a name="SEC11" href="#TOC1">VERTICAL BAR</a><br>
1277  <P>  <P>
1278  Vertical bar characters are used to separate alternative patterns. For example,  Vertical bar characters are used to separate alternative patterns. For example,
1279  the pattern  the pattern
# Line 1235  that succeeds is used. If the alternativ Line 1288  that succeeds is used. If the alternativ
1288  "succeeds" means matching the rest of the main pattern as well as the  "succeeds" means matching the rest of the main pattern as well as the
1289  alternative in the subpattern.  alternative in the subpattern.
1290  </P>  </P>
1291  <br><a name="SEC11" href="#TOC1">INTERNAL OPTION SETTING</a><br>  <br><a name="SEC12" href="#TOC1">INTERNAL OPTION SETTING</a><br>
1292  <P>  <P>
1293  The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and  The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
1294  PCRE_EXTENDED options (which are Perl-compatible) can be changed from within  PCRE_EXTENDED options (which are Perl-compatible) can be changed from within
# Line 1291  the pattern can contain special leading Line 1344  the pattern can contain special leading
1344  what the application has set or what has been defaulted. Details are given in  what the application has set or what has been defaulted. Details are given in
1345  the section entitled  the section entitled
1346  <a href="#newlineseq">"Newline sequences"</a>  <a href="#newlineseq">"Newline sequences"</a>
1347  above. There are also the (*UTF8), (*UTF16), and (*UCP) leading sequences that  above. There are also the (*UTF8), (*UTF16),(*UTF32) and (*UCP) leading
1348  can be used to set UTF and Unicode property modes; they are equivalent to  sequences that can be used to set UTF and Unicode property modes; they are
1349  setting the PCRE_UTF8, PCRE_UTF16, and the PCRE_UCP options, respectively.  equivalent to setting the PCRE_UTF8, PCRE_UTF16, PCRE_UTF32 and the PCRE_UCP
1350    options, respectively.
1351  <a name="subpattern"></a></P>  <a name="subpattern"></a></P>
1352  <br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br>  <br><a name="SEC13" href="#TOC1">SUBPATTERNS</a><br>
1353  <P>  <P>
1354  Subpatterns are delimited by parentheses (round brackets), which can be nested.  Subpatterns are delimited by parentheses (round brackets), which can be nested.
1355  Turning part of a pattern into a subpattern does two things:  Turning part of a pattern into a subpattern does two things:
# Line 1351  from left to right, and options are not Line 1405  from left to right, and options are not
1405  is reached, an option setting in one branch does affect subsequent branches, so  is reached, an option setting in one branch does affect subsequent branches, so
1406  the above patterns match "SUNDAY" as well as "Saturday".  the above patterns match "SUNDAY" as well as "Saturday".
1407  <a name="dupsubpatternnumber"></a></P>  <a name="dupsubpatternnumber"></a></P>
1408  <br><a name="SEC13" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>  <br><a name="SEC14" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
1409  <P>  <P>
1410  Perl 5.10 introduced a feature whereby each alternative in a subpattern uses  Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
1411  the same numbers for its capturing parentheses. Such a subpattern starts with  the same numbers for its capturing parentheses. Such a subpattern starts with
# Line 1395  true if any of the subpatterns of that n Line 1449  true if any of the subpatterns of that n
1449  An alternative approach to using this "branch reset" feature is to use  An alternative approach to using this "branch reset" feature is to use
1450  duplicate named subpatterns, as described in the next section.  duplicate named subpatterns, as described in the next section.
1451  </P>  </P>
1452  <br><a name="SEC14" href="#TOC1">NAMED SUBPATTERNS</a><br>  <br><a name="SEC15" href="#TOC1">NAMED SUBPATTERNS</a><br>
1453  <P>  <P>
1454  Identifying capturing parentheses by number is simple, but it can be very hard  Identifying capturing parentheses by number is simple, but it can be very hard
1455  to keep track of the numbers in complicated regular expressions. Furthermore,  to keep track of the numbers in complicated regular expressions. Furthermore,
# Line 1470  matching. For this reason, an error is g Line 1524  matching. For this reason, an error is g
1524  are given to subpatterns with the same number. However, you can give the same  are given to subpatterns with the same number. However, you can give the same
1525  name to subpatterns with the same number, even when PCRE_DUPNAMES is not set.  name to subpatterns with the same number, even when PCRE_DUPNAMES is not set.
1526  </P>  </P>
1527  <br><a name="SEC15" href="#TOC1">REPETITION</a><br>  <br><a name="SEC16" href="#TOC1">REPETITION</a><br>
1528  <P>  <P>
1529  Repetition is specified by quantifiers, which can follow any of the following  Repetition is specified by quantifiers, which can follow any of the following
1530  items:  items:
# Line 1513  quantifier, but a literal string of four Line 1567  quantifier, but a literal string of four
1567  In UTF modes, quantifiers apply to characters rather than to individual data  In UTF modes, quantifiers apply to characters rather than to individual data
1568  units. Thus, for example, \x{100}{2} matches two characters, each of  units. Thus, for example, \x{100}{2} matches two characters, each of
1569  which is represented by a two-byte sequence in a UTF-8 string. Similarly,  which is represented by a two-byte sequence in a UTF-8 string. Similarly,
1570  \X{3} matches three Unicode extended sequences, each of which may be several  \X{3} matches three Unicode extended grapheme clusters, each of which may be
1571  data units long (and they may be of different lengths).  several data units long (and they may be of different lengths).
1572  </P>  </P>
1573  <P>  <P>
1574  The quantifier {0} is permitted, causing the expression to behave as if the  The quantifier {0} is permitted, causing the expression to behave as if the
# Line 1603  worth setting PCRE_DOTALL in order to ob Line 1657  worth setting PCRE_DOTALL in order to ob
1657  alternatively using ^ to indicate anchoring explicitly.  alternatively using ^ to indicate anchoring explicitly.
1658  </P>  </P>
1659  <P>  <P>
1660  However, there is one situation where the optimization cannot be used. When .*  However, there are some cases where the optimization cannot be used. When .*
1661  is inside capturing parentheses that are the subject of a back reference  is inside capturing parentheses that are the subject of a back reference
1662  elsewhere in the pattern, a match at the start may fail where a later one  elsewhere in the pattern, a match at the start may fail where a later one
1663  succeeds. Consider, for example:  succeeds. Consider, for example:
# Line 1614  If the subject is "xyz123abc123" the mat Line 1668  If the subject is "xyz123abc123" the mat
1668  this reason, such a pattern is not implicitly anchored.  this reason, such a pattern is not implicitly anchored.
1669  </P>  </P>
1670  <P>  <P>
1671    Another case where implicit anchoring is not applied is when the leading .* is
1672    inside an atomic group. Once again, a match at the start may fail where a later
1673    one succeeds. Consider this pattern:
1674    <pre>
1675      (?&#62;.*?a)b
1676    </pre>
1677    It matches "ab" in the subject "aab". The use of the backtracking control verbs
1678    (*PRUNE) and (*SKIP) also disable this optimization.
1679    </P>
1680    <P>
1681  When a capturing subpattern is repeated, the value captured is the substring  When a capturing subpattern is repeated, the value captured is the substring
1682  that matched the final iteration. For example, after  that matched the final iteration. For example, after
1683  <pre>  <pre>
# Line 1628  example, after Line 1692  example, after
1692  </pre>  </pre>
1693  matches "aba" the value of the second captured substring is "b".  matches "aba" the value of the second captured substring is "b".
1694  <a name="atomicgroup"></a></P>  <a name="atomicgroup"></a></P>
1695  <br><a name="SEC16" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>  <br><a name="SEC17" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
1696  <P>  <P>
1697  With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")  With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
1698  repetition, failure of what follows normally causes the repeated item to be  repetition, failure of what follows normally causes the repeated item to be
# Line 1732  an atomic group, like this: Line 1796  an atomic group, like this:
1796  </pre>  </pre>
1797  sequences of non-digits cannot be broken, and failure happens quickly.  sequences of non-digits cannot be broken, and failure happens quickly.
1798  <a name="backreferences"></a></P>  <a name="backreferences"></a></P>
1799  <br><a name="SEC17" href="#TOC1">BACK REFERENCES</a><br>  <br><a name="SEC18" href="#TOC1">BACK REFERENCES</a><br>
1800  <P>  <P>
1801  Outside a character class, a backslash followed by a digit greater than 0 (and  Outside a character class, a backslash followed by a digit greater than 0 (and
1802  possibly further digits) is a back reference to a capturing subpattern earlier  possibly further digits) is a back reference to a capturing subpattern earlier
# Line 1860  as an Line 1924  as an
1924  Once the whole group has been matched, a subsequent matching failure cannot  Once the whole group has been matched, a subsequent matching failure cannot
1925  cause backtracking into the middle of the group.  cause backtracking into the middle of the group.
1926  <a name="bigassertions"></a></P>  <a name="bigassertions"></a></P>
1927  <br><a name="SEC18" href="#TOC1">ASSERTIONS</a><br>  <br><a name="SEC19" href="#TOC1">ASSERTIONS</a><br>
1928  <P>  <P>
1929  An assertion is a test on the characters following or preceding the current  An assertion is a test on the characters following or preceding the current
1930  matching point that does not actually consume any characters. The simple  matching point that does not actually consume any characters. The simple
# Line 2050  preceded by "foo", while Line 2114  preceded by "foo", while
2114  is another pattern that matches "foo" preceded by three digits and any three  is another pattern that matches "foo" preceded by three digits and any three
2115  characters that are not "999".  characters that are not "999".
2116  <a name="conditions"></a></P>  <a name="conditions"></a></P>
2117  <br><a name="SEC19" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>  <br><a name="SEC20" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
2118  <P>  <P>
2119  It is possible to cause the matching process to obey a subpattern  It is possible to cause the matching process to obey a subpattern
2120  conditionally or to choose between two alternative subpatterns, depending on  conditionally or to choose between two alternative subpatterns, depending on
# Line 2205  subject is matched against the first alt Line 2269  subject is matched against the first alt
2269  against the second. This pattern matches strings in one of the two forms  against the second. This pattern matches strings in one of the two forms
2270  dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.  dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
2271  <a name="comments"></a></P>  <a name="comments"></a></P>
2272  <br><a name="SEC20" href="#TOC1">COMMENTS</a><br>  <br><a name="SEC21" href="#TOC1">COMMENTS</a><br>
2273  <P>  <P>
2274  There are two ways of including comments in patterns that are processed by  There are two ways of including comments in patterns that are processed by
2275  PCRE. In both cases, the start of the comment must not be in a character class,  PCRE. In both cases, the start of the comment must not be in a character class,
# Line 2234  a newline in the pattern. The sequence \ Line 2298  a newline in the pattern. The sequence \
2298  it does not terminate the comment. Only an actual character with the code value  it does not terminate the comment. Only an actual character with the code value
2299  0x0a (the default newline) does so.  0x0a (the default newline) does so.
2300  <a name="recursion"></a></P>  <a name="recursion"></a></P>
2301  <br><a name="SEC21" href="#TOC1">RECURSIVE PATTERNS</a><br>  <br><a name="SEC22" href="#TOC1">RECURSIVE PATTERNS</a><br>
2302  <P>  <P>
2303  Consider the problem of matching a string in parentheses, allowing for  Consider the problem of matching a string in parentheses, allowing for
2304  unlimited nested parentheses. Without the use of recursion, the best that can  unlimited nested parentheses. Without the use of recursion, the best that can
# Line 2449  now match "b" and so the whole match suc Line 2513  now match "b" and so the whole match suc
2513  match because inside the recursive call \1 cannot access the externally set  match because inside the recursive call \1 cannot access the externally set
2514  value.  value.
2515  <a name="subpatternsassubroutines"></a></P>  <a name="subpatternsassubroutines"></a></P>
2516  <br><a name="SEC22" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>  <br><a name="SEC23" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
2517  <P>  <P>
2518  If the syntax for a recursive subpattern call (either by number or by  If the syntax for a recursive subpattern call (either by number or by
2519  name) is used outside the parentheses to which it refers, it operates like a  name) is used outside the parentheses to which it refers, it operates like a
# Line 2490  different calls. For example, consider t Line 2554  different calls. For example, consider t
2554  It matches "abcabc". It does not match "abcABC" because the change of  It matches "abcabc". It does not match "abcABC" because the change of
2555  processing option does not affect the called subpattern.  processing option does not affect the called subpattern.
2556  <a name="onigurumasubroutines"></a></P>  <a name="onigurumasubroutines"></a></P>
2557  <br><a name="SEC23" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>  <br><a name="SEC24" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
2558  <P>  <P>
2559  For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or  For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
2560  a number enclosed either in angle brackets or single quotes, is an alternative  a number enclosed either in angle brackets or single quotes, is an alternative
# Line 2508  plus or a minus sign it is taken as a re Line 2572  plus or a minus sign it is taken as a re
2572  Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>  Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
2573  synonymous. The former is a back reference; the latter is a subroutine call.  synonymous. The former is a back reference; the latter is a subroutine call.
2574  </P>  </P>
2575  <br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>  <br><a name="SEC25" href="#TOC1">CALLOUTS</a><br>
2576  <P>  <P>
2577  Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl  Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
2578  code to be obeyed in the middle of matching a regular expression. This makes it  code to be obeyed in the middle of matching a regular expression. This makes it
# Line 2519  same pair of parentheses when there is a Line 2583  same pair of parentheses when there is a
2583  PCRE provides a similar feature, but of course it cannot obey arbitrary Perl  PCRE provides a similar feature, but of course it cannot obey arbitrary Perl
2584  code. The feature is called "callout". The caller of PCRE provides an external  code. The feature is called "callout". The caller of PCRE provides an external
2585  function by putting its entry point in the global variable <i>pcre_callout</i>  function by putting its entry point in the global variable <i>pcre_callout</i>
2586  (8-bit library) or <i>pcre16_callout</i> (16-bit library). By default, this  (8-bit library) or <i>pcre[16|32]_callout</i> (16-bit or 32-bit library).
2587  variable contains NULL, which disables all calling out.  By default, this variable contains NULL, which disables all calling out.
2588  </P>  </P>
2589  <P>  <P>
2590  Within a regular expression, (?C) indicates the points at which the external  Within a regular expression, (?C) indicates the points at which the external
# Line 2544  the callout function is given in the Line 2608  the callout function is given in the
2608  <a href="pcrecallout.html"><b>pcrecallout</b></a>  <a href="pcrecallout.html"><b>pcrecallout</b></a>
2609  documentation.  documentation.
2610  <a name="backtrackcontrol"></a></P>  <a name="backtrackcontrol"></a></P>
2611  <br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>  <br><a name="SEC26" href="#TOC1">BACKTRACKING CONTROL</a><br>
2612  <P>  <P>
2613  Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which  Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
2614  are described in the Perl documentation as "experimental and subject to change  are described in the Perl documentation as "experimental and subject to change
# Line 2575  parenthesis followed by an asterisk. The Line 2639  parenthesis followed by an asterisk. The
2639  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,  (*VERB) or (*VERB:NAME). Some may take either form, with differing behaviour,
2640  depending on whether or not an argument is present. A name is any sequence of  depending on whether or not an argument is present. A name is any sequence of
2641  characters that does not include a closing parenthesis. The maximum length of  characters that does not include a closing parenthesis. The maximum length of
2642  name is 255 in the 8-bit library and 65535 in the 16-bit library. If the name  name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit library.
2643  is empty, that is, if the closing parenthesis immediately follows the colon,  If the name is empty, that is, if the closing parenthesis immediately follows
2644  the effect is as if the colon were not there. Any number of these verbs may  the colon, the effect is as if the colon were not there. Any number of these
2645  occur in a pattern.  verbs may occur in a pattern.
2646  <a name="nooptimize"></a></P>  <a name="nooptimize"></a></P>
2647  <br><b>  <br><b>
2648  Optimizations that affect backtracking verbs  Optimizations that affect backtracking verbs
# Line 2855  position. If subsequently B matches, but Line 2919  position. If subsequently B matches, but
2919  of trying the next alternative (that is, D) does not happen because (*COMMIT)  of trying the next alternative (that is, D) does not happen because (*COMMIT)
2920  overrides.  overrides.
2921  </P>  </P>
2922  <br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>  <br><a name="SEC27" href="#TOC1">SEE ALSO</a><br>
2923  <P>  <P>
2924  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),  <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
2925  <b>pcresyntax</b>(3), <b>pcre</b>(3), <b>pcre16(3)</b>.  <b>pcresyntax</b>(3), <b>pcre</b>(3), <b>pcre16(3)</b>, <b>pcre32(3)</b>.
2926  </P>  </P>
2927  <br><a name="SEC27" href="#TOC1">AUTHOR</a><br>  <br><a name="SEC28" href="#TOC1">AUTHOR</a><br>
2928  <P>  <P>
2929  Philip Hazel  Philip Hazel
2930  <br>  <br>
# Line 2869  University Computing Service Line 2933  University Computing Service
2933  Cambridge CB2 3QH, England.  Cambridge CB2 3QH, England.
2934  <br>  <br>
2935  </P>  </P>
2936  <br><a name="SEC28" href="#TOC1">REVISION</a><br>  <br><a name="SEC29" href="#TOC1">REVISION</a><br>
2937  <P>  <P>
2938  Last updated: 17 June 2012  Last updated: 10 September 2012
2939  <br>  <br>
2940  Copyright &copy; 1997-2012 University of Cambridge.  Copyright &copy; 1997-2012 University of Cambridge.
2941  <br>  <br>

Legend:
Removed from v.1193  
changed lines
  Added in v.1194

  ViewVC Help
Powered by ViewVC 1.1.5