/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Diff of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 63 by nigel, Sat Feb 24 21:40:03 2007 UTC revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC
# Line 1  Line 1 
1    PCREGREP(1)                                                        PCREGREP(1)
2    
3    
4  NAME  NAME
5       pcregrep - a grep with Perl-compatible regular expressions.         pcregrep - a grep with Perl-compatible regular expressions.
6    
7    
8  SYNOPSIS  SYNOPSIS
9       pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1         pcregrep [options] [long options] [pattern] [path1 path2 ...]
10    
11    
12  DESCRIPTION  DESCRIPTION
13    
14       pcregrep searches files for character patterns, in the  same         pcregrep  searches  files  for  character  patterns, in the same way as
15       way  as other grep commands do, but it uses the PCRE regular         other grep commands do, but it uses the PCRE regular expression library
16       expression library to support patterns that  are  compatible         to support patterns that are compatible with the regular expressions of
17       with  the regular expressions of Perl 5. See pcrepattern for         Perl 5. See pcrepattern for a full description of syntax and  semantics
18       a full description of syntax and semantics  of  the  regular         of the regular expressions that PCRE supports.
19       expressions that PCRE supports.  
20           Patterns,  whether  supplied on the command line or in a separate file,
21       A pattern must be specified on the command line  unless  the         are given without delimiters. For example:
22       -f option is used (see below).  
23             pcregrep Thursday /etc/motd
24       If no files  are  specified,  pcregrep  reads  the  standard  
25       input.  By  default,  each  line that matches the pattern is         If you attempt to use delimiters (for example, by surrounding a pattern
26       copied to the standard output, and if there is more than one         with  slashes,  as  is common in Perl scripts), they are interpreted as
27       file,  the  file name is printed before each line of output.         part of the pattern. Quotes can of course be used on the  command  line
28       However, there are options  that  can  change  how  pcregrep         because they are interpreted by the shell, and indeed they are required
29       behaves.         if a pattern contains white space or shell metacharacters.
30    
31       Lines are limited to BUFSIZ characters. BUFSIZ is defined in         The first argument that follows any option settings is treated  as  the
32       <stdio.h>.  The newline character is removed from the end of         single  pattern  to be matched when neither -e nor -f is present.  Con-
33       each line before it is matched against the pattern.         versely, when one or both of these options are  used  to  specify  pat-
34           terns, all arguments are treated as path names. At least one of -e, -f,
35           or an argument pattern must be provided.
36    
37           If no files are specified, pcregrep reads the standard input. The stan-
38           dard  input  can  also  be  referenced by a name consisting of a single
39           hyphen.  For example:
40    
41             pcregrep some-pattern /file1 - /file3
42    
43           By default, each line that matches the pattern is copied to  the  stan-
44           dard  output, and if there is more than one file, the file name is out-
45           put at the start of each line. However,  there  are  options  that  can
46           change how pcregrep behaves. In particular, the -M option makes it pos-
47           sible to search for patterns that span line boundaries. What defines  a
48           line boundary is controlled by the -N (--newline) option.
49    
50           Patterns  are  limited  to  8K  or  BUFSIZ characters, whichever is the
51           greater.  BUFSIZ is defined in <stdio.h>.
52    
53           If the LC_ALL or LC_CTYPE environment variable is  set,  pcregrep  uses
54           the  value to set a locale when calling the PCRE library.  The --locale
55           option can be used to override this.
56    
57    
58  OPTIONS  OPTIONS
59    
60           --        This terminate the list of options. It is useful if the  next
61       -V        Write the version number of the PCRE library being                   item  on  the command line starts with a hyphen but is not an
62                 used to the standard error stream.                   option. This allows for the processing of patterns and  file-
63                     names that start with hyphens.
64       -c        Do not print individual lines; instead just  print  
65                 a  count  of the number of lines that would other-         -A number, --after-context=number
66                 wise have  been  printed.  If  several  files  are                   Output  number  lines of context after each matching line. If
67                 given, a count is printed for each of them.                   filenames and/or line numbers are being output, a hyphen sep-
68                     arator  is  used  instead of a colon for the context lines. A
69       -ffilename                   line containing "--" is output between each group  of  lines,
70                 Read a number of patterns from the file,  one  per                   unless  they  are  in  fact contiguous in the input file. The
71                 line,  and  match all of them against each line of                   value of number is expected to be relatively small.  However,
72                 input. A line is output if  any  of  the  patterns                   pcregrep guarantees to have up to 8K of following text avail-
73                 match  it.   When  -f is used, no pattern is taken                   able for context output.
74                 from the command line; all arguments  are  treated  
75                 as file names. There is a maximum of 100 patterns.         -B number, --before-context=number
76                 Trailing white space is removed, and  blank  lines                   Output number lines of context before each matching line.  If
77                 are  ignored.  An  empty file contains no patterns                   filenames and/or line numbers are being output, a hyphen sep-
78                 and therefore matches nothing.                   arator is used instead of a colon for the  context  lines.  A
79                     line  containing  "--" is output between each group of lines,
80       -h        Suppress printing of filenames when searching mul-                   unless they are in fact contiguous in  the  input  file.  The
81                 tiple files.                   value  of number is expected to be relatively small. However,
82                     pcregrep guarantees to have up to 8K of preceding text avail-
83       -i        Ignore upper/lower case distinctions  during  com-                   able for context output.
84                 parisons.  
85           -C number, --context=number
86       -l        Instead of printing lines  from  the  files,  just                   Output  number  lines  of  context both before and after each
87                 print the names of the files containing lines that                   matching line.  This is equivalent to setting both -A and  -B
88                 would have been printed. Each file name is printed                   to the same value.
89                 once, on a separate line.  
90           -c, --count
91       -n        Precede each line by its line number in the file.                   Do  not  output individual lines; instead just output a count
92                     of the number of lines that would otherwise have been output.
93       -r        If any file is a directory, recursively  scan  the                   If  several  files  are  given, a count is output for each of
94                 files  it  contains.  Without  -r  a  directory is                   them. In this mode, the -A, -B, and -C options are ignored.
95                 scanned as a normal file.  
96           --colour, --color
97       -s        Work silently, that  is,  display  nothing  except                   If this option is given without any data, it is equivalent to
98                 error messages.  The exit status indicates whether                   "--colour=auto".   If  data  is required, it must be given in
99                 any matches were found.                   the same shell item, separated by an equals sign.
100    
101       -u        Operate in UTF-8 mode. This  option  is  available         --colour=value, --color=value
102                 only if PCRE has been compiled with UTF-8 support.                   This option specifies under what circumstances the part of  a
103                 Both the pattern and each subject line are assumed                   line that matched a pattern should be coloured in the output.
104                 to be valid strings of UTF-8 characters.                   The value may be "never" (the default), "always", or  "auto".
105                     In  the  latter  case, colouring happens only if the standard
106       -v        Invert the sense of the match, so that lines which                   output is connected to a terminal. The colour can  be  speci-
107                 do not match the pattern are now the ones that are                   fied  by  setting the environment variable PCREGREP_COLOUR or
108                 found.                   PCREGREP_COLOR. The value of this variable should be a string
109                     of  two  numbers,  separated by a semicolon.  They are copied
110       -x        Force the pattern to be anchored  (it  must  start                   directly into the control string for setting colour on a ter-
111                 matching  at  the  beginning  of  the line) and in                   minal,  so it is your responsibility to ensure that they make
112                 addition, require it to  match  the  entire  line.                   sense. If neither of the environment variables  is  set,  the
113                 This is equivalent to having ^ and $ characters at                   default is "1;31", which gives red.
114                 the start and end of each  alternative  branch  in  
115                 the regular expression.         -D action, --devices=action
116                     If  an  input  path  is  not  a  regular file or a directory,
117                     "action" specifies how it is to be  processed.  Valid  values
118  LONG OPTIONS                   are  "read" (the default) or "skip" (silently skip the path).
119    
120       Long forms of all the options are available, as in GNU grep.         -d action, --directories=action
121       They are shown in the following table:                   If an input path is a directory, "action" specifies how it is
122                     to  be  processed.   Valid  values  are "read" (the default),
123         -c   --count                   "recurse" (equivalent to the -r option), or "skip"  (silently
124         -h   --no-filename                   skip  the path). In the default case, directories are read as
125         -i   --ignore-case                   if they were ordinary files. In some  operating  systems  the
126         -l   --files-with-matches                   effect  of reading a directory like this is an immediate end-
127         -n   --line-number                   of-file.
128         -r   --recursive  
129         -s   --no-messages         -e pattern, --regex=pattern,
130         -u   --utf-8                   --regexp=pattern Specify a pattern to be matched. This option
131         -V   --version                   can  be  used multiple times in order to specify several pat-
132         -v   --invert-match                   terns. It can also be used as a way of  specifying  a  single
133         -x   --line-regex                   pattern  that starts with a hyphen. When -e is used, no argu-
134         -x   --line-regexp                   ment pattern is taken from the command  line;  all  arguments
135                     are treated as file names. There is an overall maximum of 100
136       In addition, --file=filename is  equivalent  to  -ffilename,                   patterns. They are applied to each line in the order in which
137       and --help shows the list of options and then exits.                   they  are  defined until one matches (or fails to match if -v
138                     is used). If -f is used with -e, the  command  line  patterns
139                     are  matched  first,  followed by the patterns from the file,
140                     independent of the order in which these  options  are  speci-
141                     fied.  Note that multiple use of -e is not the same as a sin-
142                     gle pattern with alternatives. For  example,  X|Y  finds  the
143                     first  character in a line that is X or Y, whereas if the two
144                     patterns are given separately, pcregrep  finds  X  if  it  is
145                     present, even if it follows Y in the line. It finds Y only if
146                     there is no X in the line. This really matters  only  if  you
147                     are using -o to show the portion of the line that matched.
148    
149           --exclude=pattern
150                     When pcregrep is searching the files in a directory as a con-
151                     sequence of the -r (recursive search) option, any files whose
152                     names  match  the pattern are excluded. The pattern is a PCRE
153                     regular expression. If a file name matches both --include and
154                     --exclude,  it  is  excluded. There is no short form for this
155                     option.
156    
157           -F, --fixed-strings
158                     Interpret each pattern as a list of fixed strings,  separated
159                     by  newlines,  instead  of  as  a  regular expression. The -w
160                     (match as a word) and -x (match whole line)  options  can  be
161                     used with -F. They apply to each of the fixed strings. A line
162                     is selected if any of the fixed strings are found in it (sub-
163                     ject to -w or -x, if present).
164    
165           -f filename, --file=filename
166                     Read  a  number  of patterns from the file, one per line, and
167                     match them against each line of input. A data line is  output
168                     if any of the patterns match it. The filename can be given as
169                     "-" to refer to the standard input. When -f is used, patterns
170                     specified  on  the command line using -e may also be present;
171                     they are tested before the file's patterns. However, no other
172                     pattern  is  taken  from  the command line; all arguments are
173                     treated as file names. There is an  overall  maximum  of  100
174                     patterns. Trailing white space is removed from each line, and
175                     blank lines are ignored. An empty file contains  no  patterns
176                     and therefore matches nothing.
177    
178           -H, --with-filename
179                     Force  the  inclusion  of the filename at the start of output
180                     lines when searching a single file. By default, the  filename
181                     is  not  shown in this case. For matching lines, the filename
182                     is followed by a colon and a  space;  for  context  lines,  a
183                     hyphen separator is used. If a line number is also being out-
184                     put, it follows the file name without a space.
185    
186           -h, --no-filename
187                     Suppress the output filenames when searching multiple  files.
188                     By  default,  filenames  are  shown  when  multiple files are
189                     searched. For matching lines, the filename is followed  by  a
190                     colon  and  a space; for context lines, a hyphen separator is
191                     used. If a line number is also being output, it  follows  the
192                     file name without a space.
193    
194           --help    Output a brief help message and exit.
195    
196           -i, --ignore-case
197                     Ignore upper/lower case distinctions during comparisons.
198    
199           --include=pattern
200                     When pcregrep is searching the files in a directory as a con-
201                     sequence of the -r  (recursive  search)  option,  only  those
202                     files whose names match the pattern are included. The pattern
203                     is a PCRE regular expression. If a  file  name  matches  both
204                     --include  and  --exclude,  it is excluded. There is no short
205                     form for this option.
206    
207           -L, --files-without-match
208                     Instead of outputting lines from the files, just  output  the
209                     names  of  the files that do not contain any lines that would
210                     have been output. Each file name is output once, on  a  sepa-
211                     rate line.
212    
213           -l, --files-with-matches
214                     Instead  of  outputting lines from the files, just output the
215                     names of the files containing lines that would have been out-
216                     put.  Each  file  name  is  output  once, on a separate line.
217                     Searching stops as soon as a matching  line  is  found  in  a
218                     file.
219    
220           --label=name
221                     This option supplies a name to be used for the standard input
222                     when file names are being output. If not supplied, "(standard
223                     input)" is used. There is no short form for this option.
224    
225           --locale=locale-name
226                     This  option specifies a locale to be used for pattern match-
227                     ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
228                     ronment  variables.  If  no  locale  is  specified,  the PCRE
229                     library's default (usually the "C" locale) is used. There  is
230                     no short form for this option.
231    
232           -M, --multiline
233                     Allow  patterns to match more than one line. When this option
234                     is given, patterns may usefully contain literal newline char-
235                     acters  and  internal  occurrences of ^ and $ characters. The
236                     output for any one match may consist of more than  one  line.
237                     When  this option is set, the PCRE library is called in "mul-
238                     tiline" mode.  There is a limit to the number of  lines  that
239                     can  be matched, imposed by the way that pcregrep buffers the
240                     input file as it scans it. However, pcregrep ensures that  at
241                     least 8K characters or the rest of the document (whichever is
242                     the shorter) are available for forward  matching,  and  simi-
243                     larly the previous 8K characters (or all the previous charac-
244                     ters, if fewer than 8K) are guaranteed to  be  available  for
245                     lookbehind assertions.
246    
247           -N newline-type, --newline=newline-type
248                     The PCRE library supports three different character sequences
249                     for indicating the ends of lines. They are the single-charac-
250                     ter sequences CR (carriage return) and LF (linefeed), and the
251                     two-character sequence CR, LF. When the library is  built,  a
252                     default  line-ending  sequence is specified. This is normally
253                     the standard sequence for the operating system. Unless other-
254                     wise specified by this option, pcregrep uses the default. The
255                     possible values for this option are CR,  LF,  or  CRLF.  This
256                     makes  it  possible  to  use pcregrep on files that have come
257                     from other environments without having to modify  their  line
258                     endings.   If  the  data that is being scanned does not agree
259                     with the convention set by this option, pcregrep  may  behave
260                     in strange ways.
261    
262           -n, --line-number
263                     Precede each output line by its line number in the file, fol-
264                     lowed by a colon and a space for matching lines or  a  hyphen
265                     and  a space for context lines. If the filename is also being
266                     output, it precedes the line number.
267    
268           -o, --only-matching
269                     Show only the part of the line that  matched  a  pattern.  In
270                     this  mode,  no context is shown. That is, the -A, -B, and -C
271                     options are ignored.
272    
273           -q, --quiet
274                     Work quietly, that is, display nothing except error messages.
275                     The  exit  status  indicates  whether or not any matches were
276                     found.
277    
278           -r, --recursive
279                     If any given path is a directory, recursively scan the  files
280                     it  contains, taking note of any --include and --exclude set-
281                     tings. By default, a directory is read as a normal  file;  in
282                     some  operating  systems this gives an immediate end-of-file.
283                     This option is a shorthand  for  setting  the  -d  option  to
284                     "recurse".
285    
286           -s, --no-messages
287                     Suppress  error  messages  about  non-existent  or unreadable
288                     files. Such files are quietly skipped.  However,  the  return
289                     code is still 2, even if matches were found in other files.
290    
291           -u, --utf-8
292                     Operate  in UTF-8 mode. This option is available only if PCRE
293                     has been compiled with UTF-8 support. Both patterns and  sub-
294                     ject lines must be valid strings of UTF-8 characters.
295    
296           -V, --version
297                     Write  the  version  numbers of pcregrep and the PCRE library
298                     that is being used to the standard error stream.
299    
300           -v, --invert-match
301                     Invert the sense of the match, so that  lines  which  do  not
302                     match any of the patterns are the ones that are found.
303    
304           -w, --word-regex, --word-regexp
305                     Force the patterns to match only whole words. This is equiva-
306                     lent to having \b at the start and end of the pattern.
307    
308           -x, --line-regex, --line-regexp
309                     Force the patterns to be anchored (each must  start  matching
310                     at  the beginning of a line) and in addition, require them to
311                     match entire lines. This is equivalent  to  having  ^  and  $
312                     characters at the start and end of each alternative branch in
313                     every pattern.
314    
315    
316    ENVIRONMENT VARIABLES
317    
318           The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
319           order,  for  a  locale.  The first one that is set is used. This can be
320           overridden by the --locale option.  If  no  locale  is  set,  the  PCRE
321           library's default (usually the "C" locale) is used.
322    
323    
324    NEWLINES
325    
326           The  -N (--newline) option allows pcregrep to scan files with different
327           newline conventions from the default.  However,  the  setting  of  this
328           option  does not affect the way in which pcregrep writes information to
329           the standard error and output streams. It uses the  string  "\n"  in  C
330           printf()  calls  to  indicate newlines, relying on the C I/O library to
331           convert this to an appropriate sequence if the  output  is  sent  to  a
332           file.
333    
334    
335    OPTIONS COMPATIBILITY
336    
337           The majority of short and long forms of pcregrep's options are the same
338           as in the GNU grep program. Any long option of  the  form  --xxx-regexp
339           (GNU  terminology) is also available as --xxx-regex (PCRE terminology).
340           However, the --locale, -M, --multiline, -u,  and  --utf-8  options  are
341           specific to pcregrep.
342    
343    
344    OPTIONS WITH DATA
345    
346           There are four different ways in which an option with data can be spec-
347           ified.  If a short form option is used, the  data  may  follow  immedi-
348           ately, or in the next command line item. For example:
349    
350             -f/some/file
351             -f /some/file
352    
353           If  a long form option is used, the data may appear in the same command
354           line item, separated by an equals character, or (with one exception) it
355           may appear in the next command line item. For example:
356    
357             --file=/some/file
358             --file /some/file
359    
360           Note,  however, that if you want to supply a file name beginning with ~
361           as data in a shell command, and have the  shell  expand  ~  to  a  home
362           directory, you must separate the file name from the option, because the
363           shell does not treat ~ specially unless it is at the start of an  item.
364    
365           The  exception  to  the  above is the --colour (or --color) option, for
366           which the data is optional. If this option does have data, it  must  be
367           given  in  the first form, using an equals character. Otherwise it will
368           be assumed that it has no data.
369    
370    
371    MATCHING ERRORS
372    
373           It is possible to supply a regular expression that takes  a  very  long
374           time  to  fail  to  match certain lines. Such patterns normally involve
375           nested indefinite repeats, for example: (a+)*\d when matched against  a
376           line  of  a's  with  no  final  digit. The PCRE matching function has a
377           resource limit that causes it to abort in these circumstances. If  this
378           happens, pcregrep outputs an error message and the line that caused the
379           problem to the standard error stream. If there are more  than  20  such
380           errors, pcregrep gives up.
381    
382    
383  DIAGNOSTICS  DIAGNOSTICS
384    
385       Exit status is 0 if any matches were found, 1 if no  matches         Exit status is 0 if any matches were found, 1 if no matches were found,
386       were  found,  and  2  for syntax errors or inacessible files         and 2 for syntax errors and non-existent or inacessible files (even  if
387       (even if matches were found).         matches  were  found in other files) or too many matching errors. Using
388           the -s option to suppress error messages about inaccessble  files  does
389           not affect the return code.
390    
391    
392  AUTHOR  AUTHOR
393    
394       Philip Hazel <ph10@cam.ac.uk>         Philip Hazel
395       University Computing Service         University Computing Service
396       Cambridge CB2 3QG, England.         Cambridge CB2 3QG, England.
397    
398  Last updated: 03 February 2003  Last updated: 06 June 2006
399  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.63  
changed lines
  Added in v.91

  ViewVC Help
Powered by ViewVC 1.1.5