/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Diff of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 63 by nigel, Sat Feb 24 21:40:03 2007 UTC revision 87 by nigel, Sat Feb 24 21:41:21 2007 UTC
# Line 1  Line 1 
1    PCREGREP(1)                                                        PCREGREP(1)
2    
3    
4  NAME  NAME
5       pcregrep - a grep with Perl-compatible regular expressions.         pcregrep - a grep with Perl-compatible regular expressions.
6    
7    
8  SYNOPSIS  SYNOPSIS
9       pcregrep [-Vcfhilnrsuvx] [long options] [pattern] [file1         pcregrep [options] [long options] [pattern] [path1 path2 ...]
10    
11    
12  DESCRIPTION  DESCRIPTION
13    
14       pcregrep searches files for character patterns, in the  same         pcregrep  searches  files  for  character  patterns, in the same way as
15       way  as other grep commands do, but it uses the PCRE regular         other grep commands do, but it uses the PCRE regular expression library
16       expression library to support patterns that  are  compatible         to support patterns that are compatible with the regular expressions of
17       with  the regular expressions of Perl 5. See pcrepattern for         Perl 5. See pcrepattern for a full description of syntax and  semantics
18       a full description of syntax and semantics  of  the  regular         of the regular expressions that PCRE supports.
19       expressions that PCRE supports.  
20           Patterns,  whether  supplied on the command line or in a separate file,
21       A pattern must be specified on the command line  unless  the         are given without delimiters. For example:
22       -f option is used (see below).  
23             pcregrep Thursday /etc/motd
24       If no files  are  specified,  pcregrep  reads  the  standard  
25       input.  By  default,  each  line that matches the pattern is         If you attempt to use delimiters (for example, by surrounding a pattern
26       copied to the standard output, and if there is more than one         with  slashes,  as  is common in Perl scripts), they are interpreted as
27       file,  the  file name is printed before each line of output.         part of the pattern. Quotes can of course be used on the  command  line
28       However, there are options  that  can  change  how  pcregrep         because they are interpreted by the shell, and indeed they are required
29       behaves.         if a pattern contains white space or shell metacharacters.
30    
31       Lines are limited to BUFSIZ characters. BUFSIZ is defined in         The first argument that follows any option settings is treated  as  the
32       <stdio.h>.  The newline character is removed from the end of         single  pattern  to be matched when neither -e nor -f is present.  Con-
33       each line before it is matched against the pattern.         versely, when one or both of these options are  used  to  specify  pat-
34           terns, all arguments are treated as path names. At least one of -e, -f,
35           or an argument pattern must be provided.
36    
37           If no files are specified, pcregrep reads the standard input. The stan-
38           dard  input  can  also  be  referenced by a name consisting of a single
39           hyphen.  For example:
40    
41             pcregrep some-pattern /file1 - /file3
42    
43           By default, each line that matches the pattern is copied to  the  stan-
44           dard  output, and if there is more than one file, the file name is out-
45           put at the start of each line. However,  there  are  options  that  can
46           change how pcregrep behaves. In particular, the -M option makes it pos-
47           sible to search for patterns that span line boundaries.
48    
49           Patterns are limited to 8K  or  BUFSIZ  characters,  whichever  is  the
50           greater.  BUFSIZ is defined in <stdio.h>.
51    
52           If  the  LC_ALL  or LC_CTYPE environment variable is set, pcregrep uses
53           the value to set a locale when calling the PCRE library.  The  --locale
54           option can be used to override this.
55    
56    
57  OPTIONS  OPTIONS
58    
59           --        This  terminate the list of options. It is useful if the next
60       -V        Write the version number of the PCRE library being                   item on the command line starts with a hyphen but is  not  an
61                 used to the standard error stream.                   option.  This allows for the processing of patterns and file-
62                     names that start with hyphens.
63       -c        Do not print individual lines; instead just  print  
64                 a  count  of the number of lines that would other-         -A number, --after-context=number
65                 wise have  been  printed.  If  several  files  are                   Output number lines of context after each matching  line.  If
66                 given, a count is printed for each of them.                   filenames and/or line numbers are being output, a hyphen sep-
67                     arator is used instead of a colon for the  context  lines.  A
68       -ffilename                   line  containing  "--" is output between each group of lines,
69                 Read a number of patterns from the file,  one  per                   unless they are in fact contiguous in  the  input  file.  The
70                 line,  and  match all of them against each line of                   value  of number is expected to be relatively small. However,
71                 input. A line is output if  any  of  the  patterns                   pcregrep guarantees to have up to 8K of following text avail-
72                 match  it.   When  -f is used, no pattern is taken                   able for context output.
73                 from the command line; all arguments  are  treated  
74                 as file names. There is a maximum of 100 patterns.         -B number, --before-context=number
75                 Trailing white space is removed, and  blank  lines                   Output  number lines of context before each matching line. If
76                 are  ignored.  An  empty file contains no patterns                   filenames and/or line numbers are being output, a hyphen sep-
77                 and therefore matches nothing.                   arator  is  used  instead of a colon for the context lines. A
78                     line containing "--" is output between each group  of  lines,
79       -h        Suppress printing of filenames when searching mul-                   unless  they  are  in  fact contiguous in the input file. The
80                 tiple files.                   value of number is expected to be relatively small.  However,
81                     pcregrep guarantees to have up to 8K of preceding text avail-
82       -i        Ignore upper/lower case distinctions  during  com-                   able for context output.
83                 parisons.  
84           -C number, --context=number
85       -l        Instead of printing lines  from  the  files,  just                   Output number lines of context both  before  and  after  each
86                 print the names of the files containing lines that                   matching  line.  This is equivalent to setting both -A and -B
87                 would have been printed. Each file name is printed                   to the same value.
88                 once, on a separate line.  
89           -c, --count
90       -n        Precede each line by its line number in the file.                   Do not output individual lines; instead just output  a  count
91                     of the number of lines that would otherwise have been output.
92       -r        If any file is a directory, recursively  scan  the                   If several files are given, a count is  output  for  each  of
93                 files  it  contains.  Without  -r  a  directory is                   them. In this mode, the -A, -B, and -C options are ignored.
94                 scanned as a normal file.  
95           --colour, --color
96       -s        Work silently, that  is,  display  nothing  except                   If this option is given without any data, it is equivalent to
97                 error messages.  The exit status indicates whether                   "--colour=auto".  If data is required, it must  be  given  in
98                 any matches were found.                   the same shell item, separated by an equals sign.
99    
100       -u        Operate in UTF-8 mode. This  option  is  available         --colour=value, --color=value
101                 only if PCRE has been compiled with UTF-8 support.                   This  option specifies under what circumstances the part of a
102                 Both the pattern and each subject line are assumed                   line that matched a pattern should be coloured in the output.
103                 to be valid strings of UTF-8 characters.                   The  value may be "never" (the default), "always", or "auto".
104                     In the latter case, colouring happens only  if  the  standard
105       -v        Invert the sense of the match, so that lines which                   output  is  connected to a terminal. The colour can be speci-
106                 do not match the pattern are now the ones that are                   fied by setting the environment variable  PCREGREP_COLOUR  or
107                 found.                   PCREGREP_COLOR. The value of this variable should be a string
108                     of two numbers, separated by a semicolon.   They  are  copied
109       -x        Force the pattern to be anchored  (it  must  start                   directly into the control string for setting colour on a ter-
110                 matching  at  the  beginning  of  the line) and in                   minal, so it is your responsibility to ensure that they  make
111                 addition, require it to  match  the  entire  line.                   sense.  If  neither  of the environment variables is set, the
112                 This is equivalent to having ^ and $ characters at                   default is "1;31", which gives red.
113                 the start and end of each  alternative  branch  in  
114                 the regular expression.         -D action, --devices=action
115                     If an input path is  not  a  regular  file  or  a  directory,
116                     "action"  specifies  how  it is to be processed. Valid values
117  LONG OPTIONS                   are "read" (the default) or "skip" (silently skip the  path).
118    
119       Long forms of all the options are available, as in GNU grep.         -d action, --directories=action
120       They are shown in the following table:                   If an input path is a directory, "action" specifies how it is
121                     to be processed.  Valid  values  are  "read"  (the  default),
122         -c   --count                   "recurse"  (equivalent to the -r option), or "skip" (silently
123         -h   --no-filename                   skip the path). In the default case, directories are read  as
124         -i   --ignore-case                   if  they  were  ordinary files. In some operating systems the
125         -l   --files-with-matches                   effect of reading a directory like this is an immediate  end-
126         -n   --line-number                   of-file.
127         -r   --recursive  
128         -s   --no-messages         -e pattern, --regex=pattern,
129         -u   --utf-8                   --regexp=pattern Specify a pattern to be matched. This option
130         -V   --version                   can be used multiple times in order to specify  several  pat-
131         -v   --invert-match                   terns.  It  can  also be used as a way of specifying a single
132         -x   --line-regex                   pattern that starts with a hyphen. When -e is used, no  argu-
133         -x   --line-regexp                   ment  pattern  is  taken from the command line; all arguments
134                     are treated as file names. There is an overall maximum of 100
135       In addition, --file=filename is  equivalent  to  -ffilename,                   patterns. They are applied to each line in the order in which
136       and --help shows the list of options and then exits.                   they are defined until one matches (or fails to match  if  -v
137                     is  used).  If  -f is used with -e, the command line patterns
138                     are matched first, followed by the patterns  from  the  file,
139                     independent  of  the  order in which these options are speci-
140                     fied. Note that multiple use of -e is not the same as a  sin-
141                     gle  pattern  with  alternatives.  For example, X|Y finds the
142                     first character in a line that is X or Y, whereas if the  two
143                     patterns  are  given  separately,  pcregrep  finds X if it is
144                     present, even if it follows Y in the line. It finds Y only if
145                     there  is  no  X in the line. This really matters only if you
146                     are using -o to show the portion of the line that matched.
147    
148           --exclude=pattern
149                     When pcregrep is searching the files in a directory as a con-
150                     sequence of the -r (recursive search) option, any files whose
151                     names match the pattern are excluded. The pattern is  a  PCRE
152                     regular expression. If a file name matches both --include and
153                     --exclude, it is excluded. There is no short  form  for  this
154                     option.
155    
156           -F, --fixed-strings
157                     Interpret  each pattern as a list of fixed strings, separated
158                     by newlines, instead of  as  a  regular  expression.  The  -w
159                     (match  as  a  word) and -x (match whole line) options can be
160                     used with -F. They apply to each of the fixed strings. A line
161                     is selected if any of the fixed strings are found in it (sub-
162                     ject to -w or -x, if present).
163    
164           -f filename, --file=filename
165                     Read a number of patterns from the file, one  per  line,  and
166                     match  them against each line of input. A data line is output
167                     if any of the patterns match it. The filename can be given as
168                     "-" to refer to the standard input. When -f is used, patterns
169                     specified on the command line using -e may also  be  present;
170                     they are tested before the file's patterns. However, no other
171                     pattern is taken from the command  line;  all  arguments  are
172                     treated  as  file  names.  There is an overall maximum of 100
173                     patterns. Trailing white space is removed from each line, and
174                     blank  lines  are ignored. An empty file contains no patterns
175                     and therefore matches nothing.
176    
177           -H, --with-filename
178                     Force the inclusion of the filename at the  start  of  output
179                     lines  when searching a single file. By default, the filename
180                     is not shown in this case. For matching lines,  the  filename
181                     is  followed  by  a  colon  and a space; for context lines, a
182                     hyphen separator is used. If a line number is also being out-
183                     put, it follows the file name without a space.
184    
185           -h, --no-filename
186                     Suppress  the output filenames when searching multiple files.
187                     By default, filenames  are  shown  when  multiple  files  are
188                     searched.  For  matching lines, the filename is followed by a
189                     colon and a space; for context lines, a hyphen  separator  is
190                     used.  If  a line number is also being output, it follows the
191                     file name without a space.
192    
193           --help    Output a brief help message and exit.
194    
195           -i, --ignore-case
196                     Ignore upper/lower case distinctions during comparisons.
197    
198           --include=pattern
199                     When pcregrep is searching the files in a directory as a con-
200                     sequence  of  the  -r  (recursive  search) option, only those
201                     files whose names match the pattern are included. The pattern
202                     is  a  PCRE  regular  expression. If a file name matches both
203                     --include and --exclude, it is excluded. There  is  no  short
204                     form for this option.
205    
206           -L, --files-without-match
207                     Instead  of  outputting lines from the files, just output the
208                     names of the files that do not contain any lines  that  would
209                     have  been  output. Each file name is output once, on a sepa-
210                     rate line.
211    
212           -l, --files-with-matches
213                     Instead of outputting lines from the files, just  output  the
214                     names of the files containing lines that would have been out-
215                     put. Each file name is  output  once,  on  a  separate  line.
216                     Searching  stops  as  soon  as  a matching line is found in a
217                     file.
218    
219           --label=name
220                     This option supplies a name to be used for the standard input
221                     when file names are being output. If not supplied, "(standard
222                     input)" is used. There is no short form for this option.
223    
224           --locale=locale-name
225                     This option specifies a locale to be used for pattern  match-
226                     ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi-
227                     ronment variables.  If  no  locale  is  specified,  the  PCRE
228                     library's  default (usually the "C" locale) is used. There is
229                     no short form for this option.
230    
231           -M, --multiline
232                     Allow patterns to match more than one line. When this  option
233                     is given, patterns may usefully contain literal newline char-
234                     acters and internal occurrences of ^ and  $  characters.  The
235                     output  for  any one match may consist of more than one line.
236                     When this option is set, the PCRE library is called in  "mul-
237                     tiline"  mode.   There is a limit to the number of lines that
238                     can be matched, imposed by the way that pcregrep buffers  the
239                     input  file as it scans it. However, pcregrep ensures that at
240                     least 8K characters or the rest of the document (whichever is
241                     the  shorter)  are  available for forward matching, and simi-
242                     larly the previous 8K characters (or all the previous charac-
243                     ters,  if  fewer  than 8K) are guaranteed to be available for
244                     lookbehind assertions.
245    
246           -n, --line-number
247                     Precede each output line by its line number in the file, fol-
248                     lowed  by  a colon and a space for matching lines or a hyphen
249                     and a space for context lines. If the filename is also  being
250                     output, it precedes the line number.
251    
252           -o, --only-matching
253                     Show  only  the  part  of the line that matched a pattern. In
254                     this mode, no context is shown. That is, the -A, -B,  and  -C
255                     options are ignored.
256    
257           -q, --quiet
258                     Work quietly, that is, display nothing except error messages.
259                     The exit status indicates whether or  not  any  matches  were
260                     found.
261    
262           -r, --recursive
263                     If  any given path is a directory, recursively scan the files
264                     it contains, taking note of any --include and --exclude  set-
265                     tings.  By  default, a directory is read as a normal file; in
266                     some operating systems this gives an  immediate  end-of-file.
267                     This  option  is  a  shorthand  for  setting the -d option to
268                     "recurse".
269    
270           -s, --no-messages
271                     Suppress error  messages  about  non-existent  or  unreadable
272                     files.  Such  files  are quietly skipped. However, the return
273                     code is still 2, even if matches were found in other files.
274    
275           -u, --utf-8
276                     Operate in UTF-8 mode. This option is available only if  PCRE
277                     has  been compiled with UTF-8 support. Both patterns and sub-
278                     ject lines must be valid strings of UTF-8 characters.
279    
280           -V, --version
281                     Write the version numbers of pcregrep and  the  PCRE  library
282                     that is being used to the standard error stream.
283    
284           -v, --invert-match
285                     Invert  the  sense  of  the match, so that lines which do not
286                     match any of the patterns are the ones that are found.
287    
288           -w, --word-regex, --word-regexp
289                     Force the patterns to match only whole words. This is equiva-
290                     lent to having \b at the start and end of the pattern.
291    
292           -x, --line-regex, --line-regexp
293                     Force  the  patterns to be anchored (each must start matching
294                     at the beginning of a line) and in addition, require them  to
295                     match  entire  lines.  This  is  equivalent to having ^ and $
296                     characters at the start and end of each alternative branch in
297                     every pattern.
298    
299    
300    ENVIRONMENT VARIABLES
301    
302           The  environment  variables  LC_ALL  and LC_CTYPE are examined, in that
303           order, for a locale. The first one that is set is  used.  This  can  be
304           overridden  by  the  --locale  option.  If  no  locale is set, the PCRE
305           library's default (usually the "C" locale) is used.
306    
307    
308    OPTIONS COMPATIBILITY
309    
310           The majority of short and long forms of pcregrep's options are the same
311           as  in  the  GNU grep program. Any long option of the form --xxx-regexp
312           (GNU terminology) is also available as --xxx-regex (PCRE  terminology).
313           However,  the  --locale,  -M,  --multiline, -u, and --utf-8 options are
314           specific to pcregrep.
315    
316    
317    OPTIONS WITH DATA
318    
319           There are four different ways in which an option with data can be spec-
320           ified.   If  a  short  form option is used, the data may follow immedi-
321           ately, or in the next command line item. For example:
322    
323             -f/some/file
324             -f /some/file
325    
326           If a long form option is used, the data may appear in the same  command
327           line item, separated by an equals character, or (with one exception) it
328           may appear in the next command line item. For example:
329    
330             --file=/some/file
331             --file /some/file
332    
333           Note, however, that if you want to supply a file name beginning with  ~
334           as  data  in  a  shell  command,  and have the shell expand ~ to a home
335           directory, you must separate the file name from the option, because the
336           shell  does not treat ~ specially unless it is at the start of an item.
337    
338           The exception to the above is the --colour  (or  --color)  option,  for
339           which  the  data is optional. If this option does have data, it must be
340           given in the first form, using an equals character. Otherwise  it  will
341           be assumed that it has no data.
342    
343    
344    MATCHING ERRORS
345    
346           It  is  possible  to supply a regular expression that takes a very long
347           time to fail to match certain lines.  Such  patterns  normally  involve
348           nested  indefinite repeats, for example: (a+)*\d when matched against a
349           line of a's with no final digit.  The  PCRE  matching  function  has  a
350           resource  limit that causes it to abort in these circumstances. If this
351           happens, pcregrep outputs an error message and the line that caused the
352           problem  to  the  standard error stream. If there are more than 20 such
353           errors, pcregrep gives up.
354    
355    
356  DIAGNOSTICS  DIAGNOSTICS
357    
358       Exit status is 0 if any matches were found, 1 if no  matches         Exit status is 0 if any matches were found, 1 if no matches were found,
359       were  found,  and  2  for syntax errors or inacessible files         and  2 for syntax errors and non-existent or inacessible files (even if
360       (even if matches were found).         matches were found in other files) or too many matching  errors.  Using
361           the  -s  option to suppress error messages about inaccessble files does
362           not affect the return code.
363    
364    
365  AUTHOR  AUTHOR
366    
367       Philip Hazel <ph10@cam.ac.uk>         Philip Hazel
368       University Computing Service         University Computing Service
369       Cambridge CB2 3QG, England.         Cambridge CB2 3QG, England.
370    
371  Last updated: 03 February 2003  Last updated: 23 January 2006
372  Copyright (c) 1997-2003 University of Cambridge.  Copyright (c) 1997-2006 University of Cambridge.

Legend:
Removed from v.63  
changed lines
  Added in v.87

  ViewVC Help
Powered by ViewVC 1.1.5