/[pcre]/code/trunk/doc/pcre.txt
ViewVC logotype

Diff of /code/trunk/doc/pcre.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 79 by nigel, Sat Feb 24 21:40:52 2007 UTC revision 81 by nigel, Sat Feb 24 21:40:59 2007 UTC
# Line 4352  SYNOPSIS OF C++ WRAPPER Line 4352  SYNOPSIS OF C++ WRAPPER
4352    
4353  DESCRIPTION  DESCRIPTION
4354    
4355         The  C++  wrapper  for  PCRE was provided by Google Inc. This brief man         The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
4356         page was constructed from the notes in the pcrecpp.h file, which should         functionality was added by Giuseppe Maxia. This brief man page was con-
4357         be consulted for further details.         structed  from  the  notes  in the pcrecpp.h file, which should be con-
4358           sulted for further details.
4359    
4360    
4361  MATCHING INTERFACE  MATCHING INTERFACE
4362    
4363         The  "FullMatch" operation checks that supplied text matches a supplied         The "FullMatch" operation checks that supplied text matches a  supplied
4364         pattern exactly. If pointer arguments are supplied, it  copies  matched         pattern  exactly.  If pointer arguments are supplied, it copies matched
4365         sub-strings that match sub-patterns into them.         sub-strings that match sub-patterns into them.
4366    
4367           Example: successful match           Example: successful match
# Line 4374  MATCHING INTERFACE Line 4375  MATCHING INTERFACE
4375           Example: creating a temporary RE object:           Example: creating a temporary RE object:
4376              pcrecpp::RE("h.*o").FullMatch("hello");              pcrecpp::RE("h.*o").FullMatch("hello");
4377    
4378         You  can pass in a "const char*" or a "string" for "text". The examples         You can pass in a "const char*" or a "string" for "text". The  examples
4379         below tend to use a const char*. You can, as in the different  examples         below  tend to use a const char*. You can, as in the different examples
4380         above,  store the RE object explicitly in a variable or use a temporary         above, store the RE object explicitly in a variable or use a  temporary
4381         RE object. The examples below use one mode or  the  other  arbitrarily.         RE  object.  The  examples below use one mode or the other arbitrarily.
4382         Either could correctly be used for any of these examples.         Either could correctly be used for any of these examples.
4383    
4384         You must supply extra pointer arguments to extract matched subpieces.         You must supply extra pointer arguments to extract matched subpieces.
# Line 4403  MATCHING INTERFACE Line 4404  MATCHING INTERFACE
4404           Example: fails because string cannot be stored in integer           Example: fails because string cannot be stored in integer
4405              !pcrecpp::RE("(.*)").FullMatch("ruby", &i);              !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
4406    
4407         The  provided  pointer  arguments can be pointers to any scalar numeric         The provided pointer arguments can be pointers to  any  scalar  numeric
4408         type, or one of:         type, or one of:
4409    
4410            string        (matched piece is copied to string)            string        (matched piece is copied to string)
# Line 4411  MATCHING INTERFACE Line 4412  MATCHING INTERFACE
4412            T             (where "bool T::ParseFrom(const char*, int)" exists)            T             (where "bool T::ParseFrom(const char*, int)" exists)
4413            NULL          (the corresponding matched sub-pattern is not copied)            NULL          (the corresponding matched sub-pattern is not copied)
4414    
4415         The function returns true iff all of the following conditions are  sat-         The  function returns true iff all of the following conditions are sat-
4416         isfied:         isfied:
4417    
4418           a. "text" matches "pattern" exactly;           a. "text" matches "pattern" exactly;
# Line 4425  MATCHING INTERFACE Line 4426  MATCHING INTERFACE
4426              number of sub-patterns, "i"th captured sub-pattern is              number of sub-patterns, "i"th captured sub-pattern is
4427              ignored.              ignored.
4428    
4429         The  matching interface supports at most 16 arguments per call.  If you         The matching interface supports at most 16 arguments per call.  If  you
4430         need   more,   consider    using    the    more    general    interface         need    more,    consider    using    the    more   general   interface
4431         pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.         pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
4432    
4433    
4434  PARTIAL MATCHES  PARTIAL MATCHES
4435    
4436         You  can  use the "PartialMatch" operation when you want the pattern to         You can use the "PartialMatch" operation when you want the  pattern  to
4437         match any substring of the text.         match any substring of the text.
4438    
4439           Example: simple search for a string:           Example: simple search for a string:
# Line 4447  PARTIAL MATCHES Line 4448  PARTIAL MATCHES
4448    
4449  UTF-8 AND THE MATCHING INTERFACE  UTF-8 AND THE MATCHING INTERFACE
4450    
4451         By default, pattern and text are plain text, one  byte  per  character.         By  default,  pattern  and text are plain text, one byte per character.
4452         The  UTF8  flag,  passed  to  the  constructor, causes both pattern and         The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
4453         string to be treated as UTF-8 text, still a byte stream but potentially         string to be treated as UTF-8 text, still a byte stream but potentially
4454         multiple  bytes  per character. In practice, the text is likelier to be         multiple bytes per character. In practice, the text is likelier  to  be
4455         UTF-8 than the pattern, but the match returned may depend on  the  UTF8         UTF-8  than  the pattern, but the match returned may depend on the UTF8
4456         flag,  so  always use it when matching UTF8 text. For example, "." will         flag, so always use it when matching UTF8 text. For example,  "."  will
4457         match one byte normally but with UTF8 set may match up to  three  bytes         match  one  byte normally but with UTF8 set may match up to three bytes
4458         of a multi-byte character.         of a multi-byte character.
4459    
4460           Example:           Example:
# Line 4470  UTF-8 AND THE MATCHING INTERFACE Line 4471  UTF-8 AND THE MATCHING INTERFACE
4471               --enable-utf8 flag.               --enable-utf8 flag.
4472    
4473    
4474    PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE
4475    
4476           PCRE defines some modifiers to  change  the  behavior  of  the  regular
4477           expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
4478           RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur-
4479           rently, the following modifiers are supported:
4480    
4481              modifier              description               Perl corresponding
4482    
4483              PCRE_CASELESS         case insensitive match      /i
4484              PCRE_MULTILINE        multiple lines match        /m
4485              PCRE_DOTALL           dot matches newlines        /s
4486              PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
4487              PCRE_EXTRA            strict escape parsing       N/A
4488              PCRE_EXTENDED         ignore whitespaces          /x
4489              PCRE_UTF8             handles UTF8 chars          built-in
4490              PCRE_UNGREEDY         reverses * and *?           N/A
4491              PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
4492    
4493           (*)  Both Perl and PCRE allow non capturing parentheses by means of the
4494           "?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap-
4495           ture, while (ab|cd) does.
4496    
4497           For  a  full  account on how each modifier works, please check the PCRE
4498           API reference page.
4499    
4500           For each modifier, there are two member functions whose  name  is  made
4501           out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
4502           instance, PCRE_CASELESS is handled by
4503    
4504             bool caseless()
4505    
4506           which returns true if the modifier is set, and
4507    
4508             RE_Options & set_caseless(bool)
4509    
4510           which sets or unsets the  modifier.  Moreover,  PCRE_CONFIG_MATCH_LIMIT
4511           can  be accessed through the set_match_limit() and match_limit() member
4512           functions. Setting match_limit to a non-zero value will limit the  exe-
4513           cution  of pcre to keep it from doing bad things like blowing the stack
4514           or taking an eternity to return a result.  A  value  of  5000  is  good
4515           enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
4516           to zero disables match limiting.
4517    
4518           Normally, to pass one or more modifiers to a RE class,  you  declare  a
4519           RE_Options object, set the appropriate options, and pass this object to
4520           a RE constructor. Example:
4521    
4522              RE_options opt;
4523              opt.set_caseless(true);
4524              if (RE("HELLO", opt).PartialMatch("hello world")) ...
4525    
4526           RE_options has two constructors. The default constructor takes no argu-
4527           ments  and creates a set of flags that are off by default. The optional
4528           parameter option_flags is to facilitate transfer of legacy code from  C
4529           programs.  This lets you do
4530    
4531              RE(pattern,
4532                RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
4533    
4534           However, new code is better off doing
4535    
4536              RE(pattern,
4537                RE_Options().set_caseless(true).set_multiline(true))
4538                  .PartialMatch(str);
4539    
4540           If you are going to pass one of the most used modifiers, there are some
4541           convenience functions that return a RE_Options class with the appropri-
4542           ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
4543           and EXTENDED().
4544    
4545           If you need to set several options at once, and you don't  want  to  go
4546           through  the pains of declaring a RE_Options object and setting several
4547           options, there is a parallel method that give you such ability  on  the
4548           fly.  You  can  concatenate several set_xxxxx() member functions, since
4549           each of them returns a reference to its class object. For  example,  to
4550           pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
4551           statement, you may write:
4552    
4553              RE(" ^ xyz \\s+ .* blah$",
4554                RE_Options()
4555                  .set_caseless(true)
4556                  .set_extended(true)
4557                  .set_multiline(true)).PartialMatch(sometext);
4558    
4559    
4560  SCANNING TEXT INCREMENTALLY  SCANNING TEXT INCREMENTALLY
4561    
4562         The  "Consume"  operation may be useful if you want to repeatedly match         The "Consume" operation may be useful if you want to  repeatedly  match
4563         regular expressions at the front of a string and skip over them as they         regular expressions at the front of a string and skip over them as they
4564         match.  This requires use of the "StringPiece" type, which represents a         match. This requires use of the "StringPiece" type, which represents  a
4565         sub-range of a real string. Like RE,  StringPiece  is  defined  in  the         sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
4566         pcrecpp namespace.         pcrecpp namespace.
4567    
4568           Example: read lines of the form "var = value" from a string.           Example: read lines of the form "var = value" from a string.
# Line 4489  SCANNING TEXT INCREMENTALLY Line 4576  SCANNING TEXT INCREMENTALLY
4576                ...;                ...;
4577              }              }
4578    
4579         Each  successful  call  to  "Consume"  will  set  "var/value", and also         Each successful call  to  "Consume"  will  set  "var/value",  and  also
4580         advance "input" so it points past the matched text.         advance "input" so it points past the matched text.
4581    
4582         The "FindAndConsume" operation is similar to  "Consume"  but  does  not         The  "FindAndConsume"  operation  is  similar to "Consume" but does not
4583         anchor  your  match  at  the  beginning of the string. For example, you         anchor your match at the beginning of  the  string.  For  example,  you
4584         could extract all words from a string by repeatedly calling         could extract all words from a string by repeatedly calling
4585    
4586           pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)           pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
# Line 4502  SCANNING TEXT INCREMENTALLY Line 4589  SCANNING TEXT INCREMENTALLY
4589  PARSING HEX/OCTAL/C-RADIX NUMBERS  PARSING HEX/OCTAL/C-RADIX NUMBERS
4590    
4591         By default, if you pass a pointer to a numeric value, the corresponding         By default, if you pass a pointer to a numeric value, the corresponding
4592         text  is  interpreted  as  a  base-10  number. You can instead wrap the         text is interpreted as a base-10  number.  You  can  instead  wrap  the
4593         pointer with a call to one of the operators Hex(), Octal(), or CRadix()         pointer with a call to one of the operators Hex(), Octal(), or CRadix()
4594         to  interpret  the text in another base. The CRadix operator interprets         to interpret the text in another base. The CRadix  operator  interprets
4595         C-style "0" (base-8) and  "0x"  (base-16)  prefixes,  but  defaults  to         C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
4596         base-10.         base-10.
4597    
4598           Example:           Example:
# Line 4520  PARSING HEX/OCTAL/C-RADIX NUMBERS Line 4607  PARSING HEX/OCTAL/C-RADIX NUMBERS
4607    
4608  REPLACING PARTS OF STRINGS  REPLACING PARTS OF STRINGS
4609    
4610         You  can  replace the first match of "pattern" in "str" with "rewrite".         You can replace the first match of "pattern" in "str"  with  "rewrite".
4611         Within "rewrite", backslash-escaped digits (\1 to \9) can  be  used  to         Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
4612         insert  text  matching  corresponding parenthesized group from the pat-         insert text matching corresponding parenthesized group  from  the  pat-
4613         tern. \0 in "rewrite" refers to the entire matching text. For example:         tern. \0 in "rewrite" refers to the entire matching text. For example:
4614    
4615           string s = "yabba dabba doo";           string s = "yabba dabba doo";
4616           pcrecpp::RE("b+").Replace("d", &s);           pcrecpp::RE("b+").Replace("d", &s);
4617    
4618         will leave "s" containing "yada dabba doo". The result is true  if  the         will  leave  "s" containing "yada dabba doo". The result is true if the
4619         pattern matches and a replacement occurs, false otherwise.         pattern matches and a replacement occurs, false otherwise.
4620    
4621         GlobalReplace  is  like Replace except that it replaces all occurrences         GlobalReplace is like Replace except that it replaces  all  occurrences
4622         of the pattern in the string with the  rewrite.  Replacements  are  not         of  the  pattern  in  the string with the rewrite. Replacements are not
4623         subject to re-matching. For example:         subject to re-matching. For example:
4624    
4625           string s = "yabba dabba doo";           string s = "yabba dabba doo";
4626           pcrecpp::RE("b+").GlobalReplace("d", &s);           pcrecpp::RE("b+").GlobalReplace("d", &s);
4627    
4628         will  leave  "s"  containing  "yada dada doo". It returns the number of         will leave "s" containing "yada dada doo". It  returns  the  number  of
4629         replacements made.         replacements made.
4630    
4631         Extract is like Replace, except that if the pattern matches,  "rewrite"         Extract  is like Replace, except that if the pattern matches, "rewrite"
4632         is  copied into "out" (an additional argument) with substitutions.  The         is copied into "out" (an additional argument) with substitutions.   The
4633         non-matching portions of "text" are ignored. Returns true iff  a  match         non-matching  portions  of "text" are ignored. Returns true iff a match
4634         occurred and the extraction happened successfully;  if no match occurs,         occurred and the extraction happened successfully;  if no match occurs,
4635         the string is left unaffected.         the string is left unaffected.
4636    

Legend:
Removed from v.79  
changed lines
  Added in v.81

  ViewVC Help
Powered by ViewVC 1.1.5