/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Diff of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 77 by nigel, Sat Feb 24 21:40:45 2007 UTC revision 96 by nigel, Fri Mar 2 13:10:43 2007 UTC
# Line 1  Line 1 
1  PCREGREP(1)                                                        PCREGREP(1)  PCREGREP(1)                                                        PCREGREP(1)
2    
3    
   
4  NAME  NAME
5         pcregrep - a grep with Perl-compatible regular expressions.         pcregrep - a grep with Perl-compatible regular expressions.
6    
7    
8  SYNOPSIS  SYNOPSIS
9         pcregrep [options] [long options] [pattern] [file1 file2 ...]         pcregrep [options] [long options] [pattern] [path1 path2 ...]
10    
11    
12  DESCRIPTION  DESCRIPTION
# Line 14  DESCRIPTION Line 14  DESCRIPTION
14         pcregrep  searches  files  for  character  patterns, in the same way as         pcregrep  searches  files  for  character  patterns, in the same way as
15         other grep commands do, but it uses the PCRE regular expression library         other grep commands do, but it uses the PCRE regular expression library
16         to support patterns that are compatible with the regular expressions of         to support patterns that are compatible with the regular expressions of
17         Perl 5. See pcrepattern for a full description of syntax and  semantics         Perl 5. See pcrepattern(3) for a full description of syntax and  seman-
18         of the regular expressions that PCRE supports.         tics of the regular expressions that PCRE supports.
19    
20           Patterns,  whether  supplied on the command line or in a separate file,
21           are given without delimiters. For example:
22    
23         A pattern must be specified on the command line unless the -f option is           pcregrep Thursday /etc/motd
24         used (see below).  
25           If you attempt to use delimiters (for example, by surrounding a pattern
26           with  slashes,  as  is common in Perl scripts), they are interpreted as
27           part of the pattern. Quotes can of course be used on the  command  line
28           because they are interpreted by the shell, and indeed they are required
29           if a pattern contains white space or shell metacharacters.
30    
31           The first argument that follows any option settings is treated  as  the
32           single  pattern  to be matched when neither -e nor -f is present.  Con-
33           versely, when one or both of these options are  used  to  specify  pat-
34           terns, all arguments are treated as path names. At least one of -e, -f,
35           or an argument pattern must be provided.
36    
37         If no files are specified, pcregrep reads the standard input. The stan-         If no files are specified, pcregrep reads the standard input. The stan-
38         dard  input  can  also  be  referenced by a name consisting of a single         dard  input  can  also  be  referenced by a name consisting of a single
# Line 27  DESCRIPTION Line 41  DESCRIPTION
41           pcregrep some-pattern /file1 - /file3           pcregrep some-pattern /file1 - /file3
42    
43         By default, each line that matches the pattern is copied to  the  stan-         By default, each line that matches the pattern is copied to  the  stan-
44         dard  output,  and  if  there  is  more than one file, the file name is         dard  output, and if there is more than one file, the file name is out-
45         printed before each line of output. However, there are options that can         put at the start of each line. However,  there  are  options  that  can
46         change how pcregrep behaves. In particular, the -M option makes it pos-         change how pcregrep behaves. In particular, the -M option makes it pos-
47         sible to search for patterns that span line boundaries.         sible to search for patterns that span line boundaries. What defines  a
48           line boundary is controlled by the -N (--newline) option.
49    
50         Patterns are limited to 8K  or  BUFSIZ  characters,  whichever  is  the         Patterns  are  limited  to  8K  or  BUFSIZ characters, whichever is the
51         greater.  BUFSIZ is defined in <stdio.h>.         greater.  BUFSIZ is defined in <stdio.h>.
52    
53           If the LC_ALL or LC_CTYPE environment variable is  set,  pcregrep  uses
54           the  value to set a locale when calling the PCRE library.  The --locale
55           option can be used to override this.
56    
 OPTIONS  
57    
58         --        This  terminate the list of options. It is useful if the next  OPTIONS
                  item on the command line starts with a hyphen, but is not  an  
                  option.  
59    
60         -A number Print  number  lines  of context after each matching line. If         --        This terminate the list of options. It is useful if the  next
61                   file names and/or line numbers are being  printed,  a  hyphen                   item  on  the command line starts with a hyphen but is not an
62                   separator is used instead of a colon for the context lines. A                   option. This allows for the processing of patterns and  file-
63                   line containing "--" is printed between each group of  lines,                   names that start with hyphens.
64    
65           -A number, --after-context=number
66                     Output  number  lines of context after each matching line. If
67                     filenames and/or line numbers are being output, a hyphen sep-
68                     arator  is  used  instead of a colon for the context lines. A
69                     line containing "--" is output between each group  of  lines,
70                   unless  they  are  in  fact contiguous in the input file. The                   unless  they  are  in  fact contiguous in the input file. The
71                   value of number is expected to be relatively small.  However,                   value of number is expected to be relatively small.  However,
72                   pcregrep guarantees to have up to 8K of following text avail-                   pcregrep guarantees to have up to 8K of following text avail-
73                   able for context printing.                   able for context output.
74    
75         -B number Print number lines of context before each matching  line.  If         -B number, --before-context=number
76                   file  names  and/or  line numbers are being printed, a hyphen                   Output number lines of context before each matching line.  If
77                   separator is used instead of a colon for the context lines. A                   filenames and/or line numbers are being output, a hyphen sep-
78                   line  containing "--" is printed between each group of lines,                   arator is used instead of a colon for the  context  lines.  A
79                     line  containing  "--" is output between each group of lines,
80                   unless they are in fact contiguous in  the  input  file.  The                   unless they are in fact contiguous in  the  input  file.  The
81                   value  of number is expected to be relatively small. However,                   value  of number is expected to be relatively small. However,
82                   pcregrep guarantees to have up to 8K of preceding text avail-                   pcregrep guarantees to have up to 8K of preceding text avail-
83                   able for context printing.                   able for context output.
84    
85         -C number Print  number  lines  of  context  both before and after each         -C number, --context=number
86                     Output  number  lines  of  context both before and after each
87                   matching line.  This is equivalent to setting both -A and  -B                   matching line.  This is equivalent to setting both -A and  -B
88                   to the same value.                   to the same value.
89    
90         -c        Do  not print individual lines; instead just print a count of         -c, --count
91                   the number of lines that would otherwise have  been  printed.                   Do  not  output individual lines; instead just output a count
92                   If  several  files  are given, a count is printed for each of                   of the number of lines that would otherwise have been output.
93                   them.                   If  several  files  are  given, a count is output for each of
94                     them. In this mode, the -A, -B, and -C options are ignored.
95    
96           --colour, --color
97                     If this option is given without any data, it is equivalent to
98                     "--colour=auto".   If  data  is required, it must be given in
99                     the same shell item, separated by an equals sign.
100    
101           --colour=value, --color=value
102                     This option specifies under what circumstances the part of  a
103                     line that matched a pattern should be coloured in the output.
104                     The value may be "never" (the default), "always", or  "auto".
105                     In  the  latter  case, colouring happens only if the standard
106                     output is connected to a terminal. The colour can  be  speci-
107                     fied  by  setting the environment variable PCREGREP_COLOUR or
108                     PCREGREP_COLOR. The value of this variable should be a string
109                     of  two  numbers,  separated by a semicolon.  They are copied
110                     directly into the control string for setting colour on a ter-
111                     minal,  so it is your responsibility to ensure that they make
112                     sense. If neither of the environment variables  is  set,  the
113                     default is "1;31", which gives red.
114    
115           -D action, --devices=action
116                     If  an  input  path  is  not  a  regular file or a directory,
117                     "action" specifies how it is to be  processed.  Valid  values
118                     are  "read" (the default) or "skip" (silently skip the path).
119    
120           -d action, --directories=action
121                     If an input path is a directory, "action" specifies how it is
122                     to  be  processed.   Valid  values  are "read" (the default),
123                     "recurse" (equivalent to the -r option), or "skip"  (silently
124                     skip  the path). In the default case, directories are read as
125                     if they were ordinary files. In some  operating  systems  the
126                     effect  of reading a directory like this is an immediate end-
127                     of-file.
128    
129           -e pattern, --regex=pattern,
130                     --regexp=pattern Specify a pattern to be matched. This option
131                     can  be  used multiple times in order to specify several pat-
132                     terns. It can also be used as a way of  specifying  a  single
133                     pattern  that starts with a hyphen. When -e is used, no argu-
134                     ment pattern is taken from the command  line;  all  arguments
135                     are treated as file names. There is an overall maximum of 100
136                     patterns. They are applied to each line in the order in which
137                     they  are  defined until one matches (or fails to match if -v
138                     is used). If -f is used with -e, the  command  line  patterns
139                     are  matched  first,  followed by the patterns from the file,
140                     independent of the order in which these  options  are  speci-
141                     fied.  Note that multiple use of -e is not the same as a sin-
142                     gle pattern with alternatives. For  example,  X|Y  finds  the
143                     first  character in a line that is X or Y, whereas if the two
144                     patterns are given separately, pcregrep  finds  X  if  it  is
145                     present, even if it follows Y in the line. It finds Y only if
146                     there is no X in the line. This really matters  only  if  you
147                     are using -o to show the portion of the line that matched.
148    
149         --exclude=pattern         --exclude=pattern
150                   When pcregrep is searching the files in a directory as a con-                   When pcregrep is searching the files in a directory as a con-
151                   sequence of the -r (recursive search) option, any files whose                   sequence of the -r (recursive search) option, any files whose
152                   names match the pattern are excluded. The pattern is  a  PCRE                   names  match  the pattern are excluded. The pattern is a PCRE
153                   regular expression. If a file name matches both --include and                   regular expression. If a file name matches both --include and
154                   --exclude, it is excluded. There is no short  form  for  this                   --exclude,  it  is  excluded. There is no short form for this
155                   option.                   option.
156    
157         -ffilename         -F, --fixed-strings
158                     Interpret each pattern as a list of fixed strings,  separated
159                     by  newlines,  instead  of  as  a  regular expression. The -w
160                     (match as a word) and -x (match whole line)  options  can  be
161                     used with -F. They apply to each of the fixed strings. A line
162                     is selected if any of the fixed strings are found in it (sub-
163                     ject to -w or -x, if present).
164    
165           -f filename, --file=filename
166                   Read  a  number  of patterns from the file, one per line, and                   Read  a  number  of patterns from the file, one per line, and
167                   match all of them against each line of input. A line is  out-                   match them against each line of input. A data line is  output
168                   put  if  any  of  the patterns match it.  When -f is used, no                   if any of the patterns match it. The filename can be given as
169                   pattern is taken from the command  line;  all  arguments  are                   "-" to refer to the standard input. When -f is used, patterns
170                   treated  as  file  names. There is a maximum of 100 patterns.                   specified  on  the command line using -e may also be present;
171                   Trailing white space is removed, and blank lines are ignored.                   they are tested before the file's patterns. However, no other
172                   An  empty  file  contains  no  patterns and therefore matches                   pattern  is  taken  from  the command line; all arguments are
173                   nothing.                   treated as file names. There is an  overall  maximum  of  100
174                     patterns. Trailing white space is removed from each line, and
175                     blank lines are ignored. An empty file contains  no  patterns
176                     and therefore matches nothing.
177    
178           -H, --with-filename
179                     Force  the  inclusion  of the filename at the start of output
180                     lines when searching a single file. By default, the  filename
181                     is  not  shown in this case. For matching lines, the filename
182                     is followed by a colon and a  space;  for  context  lines,  a
183                     hyphen separator is used. If a line number is also being out-
184                     put, it follows the file name without a space.
185    
186           -h, --no-filename
187                     Suppress the output filenames when searching multiple  files.
188                     By  default,  filenames  are  shown  when  multiple files are
189                     searched. For matching lines, the filename is followed  by  a
190                     colon  and  a space; for context lines, a hyphen separator is
191                     used. If a line number is also being output, it  follows  the
192                     file name without a space.
193    
194         -h        Suppress printing of filenames when searching multiple files.         --help    Output a brief help message and exit.
195    
196         -i        Ignore upper/lower case distinctions during comparisons.         -i, --ignore-case
197                     Ignore upper/lower case distinctions during comparisons.
198    
199         --include=pattern         --include=pattern
200                   When pcregrep is searching the files in a directory as a con-                   When pcregrep is searching the files in a directory as a con-
201                   sequence of the -r  (recursive  search)  option,  only  files                   sequence of the -r  (recursive  search)  option,  only  those
202                   whose  names match the pattern are included. The pattern is a                   files whose names match the pattern are included. The pattern
203                   PCRE  regular  expression.  If  a  file  name  matches   both                   is a PCRE regular expression. If a  file  name  matches  both
204                   --include  and  --exclude,  it is excluded. There is no short                   --include  and  --exclude,  it is excluded. There is no short
205                   form for this option.                   form for this option.
206    
207         -L        Instead of printing lines from  the  files,  just  print  the         -L, --files-without-match
208                     Instead of outputting lines from the files, just  output  the
209                   names  of  the files that do not contain any lines that would                   names  of  the files that do not contain any lines that would
210                   have been printed. Each file name is printed once, on a sepa-                   have been output. Each file name is output once, on  a  sepa-
211                   rate line.                   rate line.
212    
213         -l        Instead  of  printing  lines  from  the files, just print the         -l, --files-with-matches
214                   names of the files containing  lines  that  would  have  been                   Instead  of  outputting lines from the files, just output the
215                   printed.  Each file name is printed once, on a separate line.                   names of the files containing lines that would have been out-
216                     put.  Each  file  name  is  output  once, on a separate line.
217                     Searching stops as soon as a matching  line  is  found  in  a
218                     file.
219    
220         --label=name         --label=name
221                   This option supplies a name to be used for the standard input                   This option supplies a name to be used for the standard input
222                   when  file  names are being printed. If not supplied, "(stan-                   when file names are being output. If not supplied, "(standard
223                   dard input)" is used. There is no short form for this option.                   input)" is used. There is no short form for this option.
224    
225         -M        Allow  patterns to match more than one line. When this option         --locale=locale-name
226                     This  option specifies a locale to be used for pattern match-
227                     ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
228                     ronment  variables.  If  no  locale  is  specified,  the PCRE
229                     library's default (usually the "C" locale) is used. There  is
230                     no short form for this option.
231    
232           -M, --multiline
233                     Allow  patterns to match more than one line. When this option
234                   is given, patterns may usefully contain literal newline char-                   is given, patterns may usefully contain literal newline char-
235                   acters  and  internal  occurrences of ^ and $ characters. The                   acters  and  internal  occurrences of ^ and $ characters. The
236                   output for any one match may consist of more than  one  line.                   output for any one match may consist of more than  one  line.
# Line 127  OPTIONS Line 244  OPTIONS
244                   ters, if fewer than 8K) are guaranteed to  be  available  for                   ters, if fewer than 8K) are guaranteed to  be  available  for
245                   lookbehind assertions.                   lookbehind assertions.
246    
247         -n        Precede each line by its line number in the file.         -N newline-type, --newline=newline-type
248                     The  PCRE  library  supports  four  different conventions for
249                     indicating the ends of lines. They are  the  single-character
250                     sequences  CR  (carriage  return) and LF (linefeed), the two-
251                     character sequence CRLF, and an "any"  convention,  in  which
252                     any  Unicode  line  ending sequence is assumed to end a line.
253                     The Unicode sequences are the three just mentioned,  plus  VT
254                     (vertical  tab,  U+000B),  FF  (formfeed,  U+000C), NEL (next
255                     line, U+0085), LS (line separator, U+2028), and PS (paragraph
256                     separator, U+0029).
257    
258                     When  the  PCRE  library  is  built,  a  default  line-ending
259                     sequence  is  specified.   This  is  normally  the   standard
260                     sequence for the operating system. Unless otherwise specified
261                     by this option, pcregrep uses  the  library's  default.   The
262                     possible  values  for  this  option are CR, LF, CRLF, or ANY.
263                     This makes it possible to use pcregrep  on  files  that  have
264                     come  from  other environments without having to modify their
265                     line endings. If the data that  is  being  scanned  does  not
266                     agree  with  the  convention set by this option, pcregrep may
267                     behave in strange ways.
268    
269           -n, --line-number
270                     Precede each output line by its line number in the file, fol-
271                     lowed  by  a colon and a space for matching lines or a hyphen
272                     and a space for context lines. If the filename is also  being
273                     output, it precedes the line number.
274    
275           -o, --only-matching
276                     Show  only  the  part  of the line that matched a pattern. In
277                     this mode, no context is shown. That is, the -A, -B,  and  -C
278                     options are ignored.
279    
280         -q        Work quietly, that is, display nothing except error messages.         -q, --quiet
281                     Work quietly, that is, display nothing except error messages.
282                   The exit status indicates whether or  not  any  matches  were                   The exit status indicates whether or  not  any  matches  were
283                   found.                   found.
284    
285         -r        If  any given path is a directory, recursively scan the files         -r, --recursive
286                     If  any given path is a directory, recursively scan the files
287                   it contains, taking note of any --include and --exclude  set-                   it contains, taking note of any --include and --exclude  set-
288                   tings. Without -r a directory is scanned as a normal file.                   tings.  By  default, a directory is read as a normal file; in
289                     some operating systems this gives an  immediate  end-of-file.
290         -s        Suppress  error  messages  about  non-existent  or unreadable                   This  option  is  a  shorthand  for  setting the -d option to
291                   files. Such files are quietly skipped.  However,  the  return                   "recurse".
292    
293           -s, --no-messages
294                     Suppress error  messages  about  non-existent  or  unreadable
295                     files.  Such  files  are quietly skipped. However, the return
296                   code is still 2, even if matches were found in other files.                   code is still 2, even if matches were found in other files.
297    
298         -u        Operate  in UTF-8 mode. This option is available only if PCRE         -u, --utf-8
299                   has been compiled with UTF-8 support. Both  the  pattern  and                   Operate in UTF-8 mode. This option is available only if  PCRE
300                   each  subject line must be valid strings of UTF-8 characters.                   has  been compiled with UTF-8 support. Both patterns and sub-
301                     ject lines must be valid strings of UTF-8 characters.
302    
303         -V        Write the version numbers of pcregrep and  the  PCRE  library         -V, --version
304                     Write the version numbers of pcregrep and  the  PCRE  library
305                   that is being used to the standard error stream.                   that is being used to the standard error stream.
306    
307         -v        Invert  the  sense  of  the match, so that lines which do not         -v, --invert-match
308                   match the pattern are the ones that are found.                   Invert  the  sense  of  the match, so that lines which do not
309                     match any of the patterns are the ones that are found.
310    
311         -w        Force the pattern to match only whole words. This is  equiva-         -w, --word-regex, --word-regexp
312                     Force the patterns to match only whole words. This is equiva-
313                   lent to having \b at the start and end of the pattern.                   lent to having \b at the start and end of the pattern.
314    
315         -x        Force  the  pattern to be anchored (it must start matching at         -x, --line-regex, --line-regexp
316                   the beginning of the line) and in  addition,  require  it  to                   Force  the  patterns to be anchored (each must start matching
317                   match  the  entire line. This is equivalent to having ^ and $                   at the beginning of a line) and in addition, require them  to
318                     match  entire  lines.  This  is  equivalent to having ^ and $
319                   characters at the start and end of each alternative branch in                   characters at the start and end of each alternative branch in
320                   the regular expression.                   every pattern.
321    
322    
323  LONG OPTIONS  ENVIRONMENT VARIABLES
324    
325         Long  forms  of all the options are available, as in GNU grep. They are         The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
326         shown in the following table:         order, for a locale. The first one that is set is  used.  This  can  be
327           overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
328           library's default (usually the "C" locale) is used.
329    
330           -A   --after-context  
331           -B   --before-context  NEWLINES
332           -C   --context  
333           -c   --count         The -N (--newline) option allows pcregrep to scan files with  different
334                --exclude (no short form)         newline  conventions  from  the  default.  However, the setting of this
335           -f   --file         option does not affect the way in which pcregrep writes information  to
336           -h   --no-filename         the  standard  error  and  output streams. It uses the string "\n" in C
337                --help (no short form)         printf() calls to indicate newlines, relying on the C  I/O  library  to
338           -i   --ignore-case         convert  this  to  an  appropriate  sequence if the output is sent to a
339                --include (no short form)         file.
340           -L   --files-without-match  
341           -l   --files-with-matches  
342                --label (no short form)  OPTIONS COMPATIBILITY
343           -n   --line-number  
344           -r   --recursive         The majority of short and long forms of pcregrep's options are the same
345           -q   --quiet         as  in  the  GNU grep program. Any long option of the form --xxx-regexp
346           -s   --no-messages         (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
347           -u   --utf-8         However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
348           -V   --version         specific to pcregrep.
          -v   --invert-match  
          -x   --line-regex  
          -x   --line-regexp  
349    
350    
351  OPTIONS WITH DATA  OPTIONS WITH DATA
# Line 200  OPTIONS WITH DATA Line 358  OPTIONS WITH DATA
358           -f /some/file           -f /some/file
359    
360         If a long form option is used, the data may appear in the same  command         If a long form option is used, the data may appear in the same  command
361         line  item,  separated  by an = character, or it may appear in the next         line item, separated by an equals character, or (with one exception) it
362         command line item. For example:         may appear in the next command line item. For example:
363    
364           --file=/some/file           --file=/some/file
365           --file /some/file           --file /some/file
366    
367           Note, however, that if you want to supply a file name beginning with  ~
368           as  data  in  a  shell  command,  and have the shell expand ~ to a home
369           directory, you must separate the file name from the option, because the
370           shell  does not treat ~ specially unless it is at the start of an item.
371    
372           The exception to the above is the --colour  (or  --color)  option,  for
373           which  the  data is optional. If this option does have data, it must be
374           given in the first form, using an equals character. Otherwise  it  will
375           be assumed that it has no data.
376    
377    
378    MATCHING ERRORS
379    
380           It  is  possible  to supply a regular expression that takes a very long
381           time to fail to match certain lines.  Such  patterns  normally  involve
382           nested  indefinite repeats, for example: (a+)*\d when matched against a
383           line of a's with no final digit.  The  PCRE  matching  function  has  a
384           resource  limit that causes it to abort in these circumstances. If this
385           happens, pcregrep outputs an error message and the line that caused the
386           problem  to  the  standard error stream. If there are more than 20 such
387           errors, pcregrep gives up.
388    
389    
390  DIAGNOSTICS  DIAGNOSTICS
391    
392         Exit status is 0 if any matches were found, 1 if no matches were found,         Exit status is 0 if any matches were found, 1 if no matches were found,
393         and  2 for syntax errors and non-existent or inacessible files (even if         and  2 for syntax errors and non-existent or inacessible files (even if
394         matches were found in other files). Using the  -s  option  to  suppress         matches were found in other files) or too many matching  errors.  Using
395         error messages about inaccessble files does not affect the return code.         the  -s  option to suppress error messages about inaccessble files does
396           not affect the return code.
397    
398    
399    SEE ALSO
400    
401           pcrepattern(3), pcretest(1).
402    
403    
404  AUTHOR  AUTHOR
405    
406         Philip Hazel         Philip Hazel
407         University Computing Service         University Computing Service
408         Cambridge CB2 3QG, England.         Cambridge CB2 3QH, England.
409    
410  Last updated: 16 May 2005  Last updated: 29 November 2006
411  Copyright (c) 1997-2005 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.77  
changed lines
  Added in v.96

  ViewVC Help
Powered by ViewVC 1.1.5