/[pcre]/code/trunk/doc/pcregrep.txt
ViewVC logotype

Contents of /code/trunk/doc/pcregrep.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 87 - (show annotations)
Sat Feb 24 21:41:21 2007 UTC (8 years, 6 months ago) by nigel
File MIME type: text/plain
File size: 19410 byte(s)
Error occurred while calculating annotation data.
Load pcre-6.5 into code/trunk.
1 PCREGREP(1) PCREGREP(1)
2
3
4 NAME
5 pcregrep - a grep with Perl-compatible regular expressions.
6
7
8 SYNOPSIS
9 pcregrep [options] [long options] [pattern] [path1 path2 ...]
10
11
12 DESCRIPTION
13
14 pcregrep searches files for character patterns, in the same way as
15 other grep commands do, but it uses the PCRE regular expression library
16 to support patterns that are compatible with the regular expressions of
17 Perl 5. See pcrepattern for a full description of syntax and semantics
18 of the regular expressions that PCRE supports.
19
20 Patterns, whether supplied on the command line or in a separate file,
21 are given without delimiters. For example:
22
23 pcregrep Thursday /etc/motd
24
25 If you attempt to use delimiters (for example, by surrounding a pattern
26 with slashes, as is common in Perl scripts), they are interpreted as
27 part of the pattern. Quotes can of course be used on the command line
28 because they are interpreted by the shell, and indeed they are required
29 if a pattern contains white space or shell metacharacters.
30
31 The first argument that follows any option settings is treated as the
32 single pattern to be matched when neither -e nor -f is present. Con-
33 versely, when one or both of these options are used to specify pat-
34 terns, all arguments are treated as path names. At least one of -e, -f,
35 or an argument pattern must be provided.
36
37 If no files are specified, pcregrep reads the standard input. The stan-
38 dard input can also be referenced by a name consisting of a single
39 hyphen. For example:
40
41 pcregrep some-pattern /file1 - /file3
42
43 By default, each line that matches the pattern is copied to the stan-
44 dard output, and if there is more than one file, the file name is out-
45 put at the start of each line. However, there are options that can
46 change how pcregrep behaves. In particular, the -M option makes it pos-
47 sible to search for patterns that span line boundaries.
48
49 Patterns are limited to 8K or BUFSIZ characters, whichever is the
50 greater. BUFSIZ is defined in <stdio.h>.
51
52 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
53 the value to set a locale when calling the PCRE library. The --locale
54 option can be used to override this.
55
56
57 OPTIONS
58
59 -- This terminate the list of options. It is useful if the next
60 item on the command line starts with a hyphen but is not an
61 option. This allows for the processing of patterns and file-
62 names that start with hyphens.
63
64 -A number, --after-context=number
65 Output number lines of context after each matching line. If
66 filenames and/or line numbers are being output, a hyphen sep-
67 arator is used instead of a colon for the context lines. A
68 line containing "--" is output between each group of lines,
69 unless they are in fact contiguous in the input file. The
70 value of number is expected to be relatively small. However,
71 pcregrep guarantees to have up to 8K of following text avail-
72 able for context output.
73
74 -B number, --before-context=number
75 Output number lines of context before each matching line. If
76 filenames and/or line numbers are being output, a hyphen sep-
77 arator is used instead of a colon for the context lines. A
78 line containing "--" is output between each group of lines,
79 unless they are in fact contiguous in the input file. The
80 value of number is expected to be relatively small. However,
81 pcregrep guarantees to have up to 8K of preceding text avail-
82 able for context output.
83
84 -C number, --context=number
85 Output number lines of context both before and after each
86 matching line. This is equivalent to setting both -A and -B
87 to the same value.
88
89 -c, --count
90 Do not output individual lines; instead just output a count
91 of the number of lines that would otherwise have been output.
92 If several files are given, a count is output for each of
93 them. In this mode, the -A, -B, and -C options are ignored.
94
95 --colour, --color
96 If this option is given without any data, it is equivalent to
97 "--colour=auto". If data is required, it must be given in
98 the same shell item, separated by an equals sign.
99
100 --colour=value, --color=value
101 This option specifies under what circumstances the part of a
102 line that matched a pattern should be coloured in the output.
103 The value may be "never" (the default), "always", or "auto".
104 In the latter case, colouring happens only if the standard
105 output is connected to a terminal. The colour can be speci-
106 fied by setting the environment variable PCREGREP_COLOUR or
107 PCREGREP_COLOR. The value of this variable should be a string
108 of two numbers, separated by a semicolon. They are copied
109 directly into the control string for setting colour on a ter-
110 minal, so it is your responsibility to ensure that they make
111 sense. If neither of the environment variables is set, the
112 default is "1;31", which gives red.
113
114 -D action, --devices=action
115 If an input path is not a regular file or a directory,
116 "action" specifies how it is to be processed. Valid values
117 are "read" (the default) or "skip" (silently skip the path).
118
119 -d action, --directories=action
120 If an input path is a directory, "action" specifies how it is
121 to be processed. Valid values are "read" (the default),
122 "recurse" (equivalent to the -r option), or "skip" (silently
123 skip the path). In the default case, directories are read as
124 if they were ordinary files. In some operating systems the
125 effect of reading a directory like this is an immediate end-
126 of-file.
127
128 -e pattern, --regex=pattern,
129 --regexp=pattern Specify a pattern to be matched. This option
130 can be used multiple times in order to specify several pat-
131 terns. It can also be used as a way of specifying a single
132 pattern that starts with a hyphen. When -e is used, no argu-
133 ment pattern is taken from the command line; all arguments
134 are treated as file names. There is an overall maximum of 100
135 patterns. They are applied to each line in the order in which
136 they are defined until one matches (or fails to match if -v
137 is used). If -f is used with -e, the command line patterns
138 are matched first, followed by the patterns from the file,
139 independent of the order in which these options are speci-
140 fied. Note that multiple use of -e is not the same as a sin-
141 gle pattern with alternatives. For example, X|Y finds the
142 first character in a line that is X or Y, whereas if the two
143 patterns are given separately, pcregrep finds X if it is
144 present, even if it follows Y in the line. It finds Y only if
145 there is no X in the line. This really matters only if you
146 are using -o to show the portion of the line that matched.
147
148 --exclude=pattern
149 When pcregrep is searching the files in a directory as a con-
150 sequence of the -r (recursive search) option, any files whose
151 names match the pattern are excluded. The pattern is a PCRE
152 regular expression. If a file name matches both --include and
153 --exclude, it is excluded. There is no short form for this
154 option.
155
156 -F, --fixed-strings
157 Interpret each pattern as a list of fixed strings, separated
158 by newlines, instead of as a regular expression. The -w
159 (match as a word) and -x (match whole line) options can be
160 used with -F. They apply to each of the fixed strings. A line
161 is selected if any of the fixed strings are found in it (sub-
162 ject to -w or -x, if present).
163
164 -f filename, --file=filename
165 Read a number of patterns from the file, one per line, and
166 match them against each line of input. A data line is output
167 if any of the patterns match it. The filename can be given as
168 "-" to refer to the standard input. When -f is used, patterns
169 specified on the command line using -e may also be present;
170 they are tested before the file's patterns. However, no other
171 pattern is taken from the command line; all arguments are
172 treated as file names. There is an overall maximum of 100
173 patterns. Trailing white space is removed from each line, and
174 blank lines are ignored. An empty file contains no patterns
175 and therefore matches nothing.
176
177 -H, --with-filename
178 Force the inclusion of the filename at the start of output
179 lines when searching a single file. By default, the filename
180 is not shown in this case. For matching lines, the filename
181 is followed by a colon and a space; for context lines, a
182 hyphen separator is used. If a line number is also being out-
183 put, it follows the file name without a space.
184
185 -h, --no-filename
186 Suppress the output filenames when searching multiple files.
187 By default, filenames are shown when multiple files are
188 searched. For matching lines, the filename is followed by a
189 colon and a space; for context lines, a hyphen separator is
190 used. If a line number is also being output, it follows the
191 file name without a space.
192
193 --help Output a brief help message and exit.
194
195 -i, --ignore-case
196 Ignore upper/lower case distinctions during comparisons.
197
198 --include=pattern
199 When pcregrep is searching the files in a directory as a con-
200 sequence of the -r (recursive search) option, only those
201 files whose names match the pattern are included. The pattern
202 is a PCRE regular expression. If a file name matches both
203 --include and --exclude, it is excluded. There is no short
204 form for this option.
205
206 -L, --files-without-match
207 Instead of outputting lines from the files, just output the
208 names of the files that do not contain any lines that would
209 have been output. Each file name is output once, on a sepa-
210 rate line.
211
212 -l, --files-with-matches
213 Instead of outputting lines from the files, just output the
214 names of the files containing lines that would have been out-
215 put. Each file name is output once, on a separate line.
216 Searching stops as soon as a matching line is found in a
217 file.
218
219 --label=name
220 This option supplies a name to be used for the standard input
221 when file names are being output. If not supplied, "(standard
222 input)" is used. There is no short form for this option.
223
224 --locale=locale-name
225 This option specifies a locale to be used for pattern match-
226 ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
227 ronment variables. If no locale is specified, the PCRE
228 library's default (usually the "C" locale) is used. There is
229 no short form for this option.
230
231 -M, --multiline
232 Allow patterns to match more than one line. When this option
233 is given, patterns may usefully contain literal newline char-
234 acters and internal occurrences of ^ and $ characters. The
235 output for any one match may consist of more than one line.
236 When this option is set, the PCRE library is called in "mul-
237 tiline" mode. There is a limit to the number of lines that
238 can be matched, imposed by the way that pcregrep buffers the
239 input file as it scans it. However, pcregrep ensures that at
240 least 8K characters or the rest of the document (whichever is
241 the shorter) are available for forward matching, and simi-
242 larly the previous 8K characters (or all the previous charac-
243 ters, if fewer than 8K) are guaranteed to be available for
244 lookbehind assertions.
245
246 -n, --line-number
247 Precede each output line by its line number in the file, fol-
248 lowed by a colon and a space for matching lines or a hyphen
249 and a space for context lines. If the filename is also being
250 output, it precedes the line number.
251
252 -o, --only-matching
253 Show only the part of the line that matched a pattern. In
254 this mode, no context is shown. That is, the -A, -B, and -C
255 options are ignored.
256
257 -q, --quiet
258 Work quietly, that is, display nothing except error messages.
259 The exit status indicates whether or not any matches were
260 found.
261
262 -r, --recursive
263 If any given path is a directory, recursively scan the files
264 it contains, taking note of any --include and --exclude set-
265 tings. By default, a directory is read as a normal file; in
266 some operating systems this gives an immediate end-of-file.
267 This option is a shorthand for setting the -d option to
268 "recurse".
269
270 -s, --no-messages
271 Suppress error messages about non-existent or unreadable
272 files. Such files are quietly skipped. However, the return
273 code is still 2, even if matches were found in other files.
274
275 -u, --utf-8
276 Operate in UTF-8 mode. This option is available only if PCRE
277 has been compiled with UTF-8 support. Both patterns and sub-
278 ject lines must be valid strings of UTF-8 characters.
279
280 -V, --version
281 Write the version numbers of pcregrep and the PCRE library
282 that is being used to the standard error stream.
283
284 -v, --invert-match
285 Invert the sense of the match, so that lines which do not
286 match any of the patterns are the ones that are found.
287
288 -w, --word-regex, --word-regexp
289 Force the patterns to match only whole words. This is equiva-
290 lent to having \b at the start and end of the pattern.
291
292 -x, --line-regex, --line-regexp
293 Force the patterns to be anchored (each must start matching
294 at the beginning of a line) and in addition, require them to
295 match entire lines. This is equivalent to having ^ and $
296 characters at the start and end of each alternative branch in
297 every pattern.
298
299
300 ENVIRONMENT VARIABLES
301
302 The environment variables LC_ALL and LC_CTYPE are examined, in that
303 order, for a locale. The first one that is set is used. This can be
304 overridden by the --locale option. If no locale is set, the PCRE
305 library's default (usually the "C" locale) is used.
306
307
308 OPTIONS COMPATIBILITY
309
310 The majority of short and long forms of pcregrep's options are the same
311 as in the GNU grep program. Any long option of the form --xxx-regexp
312 (GNU terminology) is also available as --xxx-regex (PCRE terminology).
313 However, the --locale, -M, --multiline, -u, and --utf-8 options are
314 specific to pcregrep.
315
316
317 OPTIONS WITH DATA
318
319 There are four different ways in which an option with data can be spec-
320 ified. If a short form option is used, the data may follow immedi-
321 ately, or in the next command line item. For example:
322
323 -f/some/file
324 -f /some/file
325
326 If a long form option is used, the data may appear in the same command
327 line item, separated by an equals character, or (with one exception) it
328 may appear in the next command line item. For example:
329
330 --file=/some/file
331 --file /some/file
332
333 Note, however, that if you want to supply a file name beginning with ~
334 as data in a shell command, and have the shell expand ~ to a home
335 directory, you must separate the file name from the option, because the
336 shell does not treat ~ specially unless it is at the start of an item.
337
338 The exception to the above is the --colour (or --color) option, for
339 which the data is optional. If this option does have data, it must be
340 given in the first form, using an equals character. Otherwise it will
341 be assumed that it has no data.
342
343
344 MATCHING ERRORS
345
346 It is possible to supply a regular expression that takes a very long
347 time to fail to match certain lines. Such patterns normally involve
348 nested indefinite repeats, for example: (a+)*\d when matched against a
349 line of a's with no final digit. The PCRE matching function has a
350 resource limit that causes it to abort in these circumstances. If this
351 happens, pcregrep outputs an error message and the line that caused the
352 problem to the standard error stream. If there are more than 20 such
353 errors, pcregrep gives up.
354
355
356 DIAGNOSTICS
357
358 Exit status is 0 if any matches were found, 1 if no matches were found,
359 and 2 for syntax errors and non-existent or inacessible files (even if
360 matches were found in other files) or too many matching errors. Using
361 the -s option to suppress error messages about inaccessble files does
362 not affect the return code.
363
364
365 AUTHOR
366
367 Philip Hazel
368 University Computing Service
369 Cambridge CB2 3QG, England.
370
371 Last updated: 23 January 2006
372 Copyright (c) 1997-2006 University of Cambridge.

  ViewVC Help
Powered by ViewVC 1.1.5