/[pcre]/code/trunk/doc/html/pcregrep.html
ViewVC logotype

Contents of /code/trunk/doc/html/pcregrep.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 975 - (show annotations)
Sat Jun 2 11:03:06 2012 UTC (3 years, 1 month ago) by ph10
File MIME type: text/html
File size: 33939 byte(s)
Error occurred while calculating annotation data.
Document update for 8.31-RC1 test release.
1 <html>
2 <head>
3 <title>pcregrep specification</title>
4 </head>
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
6 <h1>pcregrep man page</h1>
7 <p>
8 Return to the <a href="index.html">PCRE index page</a>.
9 </p>
10 <p>
11 This page is part of the PCRE HTML documentation. It was generated automatically
12 from the original man page. If there is any nonsense in it, please consult the
13 man page, in case the conversion went wrong.
14 <br>
15 <ul>
16 <li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
17 <li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
18 <li><a name="TOC3" href="#SEC3">SUPPORT FOR COMPRESSED FILES</a>
19 <li><a name="TOC4" href="#SEC4">BINARY FILES</a>
20 <li><a name="TOC5" href="#SEC5">OPTIONS</a>
21 <li><a name="TOC6" href="#SEC6">ENVIRONMENT VARIABLES</a>
22 <li><a name="TOC7" href="#SEC7">NEWLINES</a>
23 <li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
24 <li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
25 <li><a name="TOC10" href="#SEC10">MATCHING ERRORS</a>
26 <li><a name="TOC11" href="#SEC11">DIAGNOSTICS</a>
27 <li><a name="TOC12" href="#SEC12">SEE ALSO</a>
28 <li><a name="TOC13" href="#SEC13">AUTHOR</a>
29 <li><a name="TOC14" href="#SEC14">REVISION</a>
30 </ul>
31 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
32 <P>
33 <b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
34 </P>
35 <br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
36 <P>
37 <b>pcregrep</b> searches files for character patterns, in the same way as other
38 grep commands do, but it uses the PCRE regular expression library to support
39 patterns that are compatible with the regular expressions of Perl 5. See
40 <a href="pcrepattern.html"><b>pcrepattern</b>(3)</a>
41 for a full description of syntax and semantics of the regular expressions
42 that PCRE supports.
43 </P>
44 <P>
45 Patterns, whether supplied on the command line or in a separate file, are given
46 without delimiters. For example:
47 <pre>
48 pcregrep Thursday /etc/motd
49 </pre>
50 If you attempt to use delimiters (for example, by surrounding a pattern with
51 slashes, as is common in Perl scripts), they are interpreted as part of the
52 pattern. Quotes can of course be used to delimit patterns on the command line
53 because they are interpreted by the shell, and indeed they are required if a
54 pattern contains white space or shell metacharacters.
55 </P>
56 <P>
57 The first argument that follows any option settings is treated as the single
58 pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
59 Conversely, when one or both of these options are used to specify patterns, all
60 arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
61 argument pattern must be provided.
62 </P>
63 <P>
64 If no files are specified, <b>pcregrep</b> reads the standard input. The
65 standard input can also be referenced by a name consisting of a single hyphen.
66 For example:
67 <pre>
68 pcregrep some-pattern /file1 - /file3
69 </pre>
70 By default, each line that matches a pattern is copied to the standard
71 output, and if there is more than one file, the file name is output at the
72 start of each line, followed by a colon. However, there are options that can
73 change how <b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it
74 possible to search for patterns that span line boundaries. What defines a line
75 boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
76 </P>
77 <P>
78 The amount of memory used for buffering files that are being scanned is
79 controlled by a parameter that can be set by the <b>--buffer-size</b> option.
80 The default value for this parameter is specified when <b>pcregrep</b> is built,
81 with the default default being 20K. A block of memory three times this size is
82 used (to allow for buffering "before" and "after" lines). An error occurs if a
83 line overflows the buffer.
84 </P>
85 <P>
86 Patterns are limited to 8K or BUFSIZ bytes, whichever is the greater. BUFSIZ is
87 defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern (specified by
88 the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to each line in
89 the order in which they are defined, except that all the <b>-e</b> patterns are
90 tried before the <b>-f</b> patterns.
91 </P>
92 <P>
93 By default, as soon as one pattern matches (or fails to match when <b>-v</b> is
94 used), no further patterns are considered. However, if <b>--colour</b> (or
95 <b>--color</b>) is used to colour the matching substrings, or if
96 <b>--only-matching</b>, <b>--file-offsets</b>, or <b>--line-offsets</b> is used to
97 output only the part of the line that matched (either shown literally, or as an
98 offset), scanning resumes immediately following the match, so that further
99 matches on the same line can be found. If there are multiple patterns, they are
100 all tried on the remainder of the line, but patterns that follow the one that
101 matched are not tried on the earlier part of the line.
102 </P>
103 <P>
104 This is the same behaviour as GNU grep, but it does mean that the order in
105 which multiple patterns are specified can affect the output when one of the
106 above options is used.
107 </P>
108 <P>
109 Patterns that can match an empty string are accepted, but empty string
110 matches are never recognized. An example is the pattern "(super)?(man)?", in
111 which all components are optional. This pattern finds all occurrences of both
112 "super" and "man"; the output differs from matching with "super|man" when only
113 the matching substrings are being shown.
114 </P>
115 <P>
116 If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
117 <b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
118 The <b>--locale</b> option can be used to override this.
119 </P>
120 <br><a name="SEC3" href="#TOC1">SUPPORT FOR COMPRESSED FILES</a><br>
121 <P>
122 It is possible to compile <b>pcregrep</b> so that it uses <b>libz</b> or
123 <b>libbz2</b> to read files whose names end in <b>.gz</b> or <b>.bz2</b>,
124 respectively. You can find out whether your binary has support for one or both
125 of these file types by running it with the <b>--help</b> option. If the
126 appropriate support is not present, files are treated as plain text. The
127 standard input is always so treated.
128 </P>
129 <br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
130 <P>
131 By default, a file that contains a binary zero byte within the first 1024 bytes
132 is identified as a binary file, and is processed specially. (GNU grep also
133 identifies binary files in this manner.) See the <b>--binary-files</b> option
134 for a means of changing the way binary files are handled.
135 </P>
136 <br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
137 <P>
138 The order in which some of the options appear can affect the output. For
139 example, both the <b>-h</b> and <b>-l</b> options affect the printing of file
140 names. Whichever comes later in the command line will be the one that takes
141 effect. Numerical values for options may be followed by K or M, to signify
142 multiplication by 1024 or 1024*1024 respectively.
143 </P>
144 <P>
145 <b>--</b>
146 This terminates the list of options. It is useful if the next item on the
147 command line starts with a hyphen but is not an option. This allows for the
148 processing of patterns and filenames that start with hyphens.
149 </P>
150 <P>
151 <b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
152 Output <i>number</i> lines of context after each matching line. If filenames
153 and/or line numbers are being output, a hyphen separator is used instead of a
154 colon for the context lines. A line containing "--" is output between each
155 group of lines, unless they are in fact contiguous in the input file. The value
156 of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
157 guarantees to have up to 8K of following text available for context output.
158 </P>
159 <P>
160 <b>-a</b>, <b>--text</b>
161 Treat binary files as text. This is equivalent to
162 <b>--binary-files</b>=<i>text</i>.
163 </P>
164 <P>
165 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
166 Output <i>number</i> lines of context before each matching line. If filenames
167 and/or line numbers are being output, a hyphen separator is used instead of a
168 colon for the context lines. A line containing "--" is output between each
169 group of lines, unless they are in fact contiguous in the input file. The value
170 of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
171 guarantees to have up to 8K of preceding text available for context output.
172 </P>
173 <P>
174 <b>--binary-files=</b><i>word</i>
175 Specify how binary files are to be processed. If the word is "binary" (the
176 default), pattern matching is performed on binary files, but the only output is
177 "Binary file &#60;name&#62; matches" when a match succeeds. If the word is "text",
178 which is equivalent to the <b>-a</b> or <b>--text</b> option, binary files are
179 processed in the same way as any other file. In this case, when a match
180 succeeds, the output may be binary garbage, which can have nasty effects if
181 sent to a terminal. If the word is "without-match", which is equivalent to the
182 <b>-I</b> option, binary files are not processed at all; they are assumed not to
183 be of interest.
184 </P>
185 <P>
186 <b>--buffer-size=</b><i>number</i>
187 Set the parameter that controls how much memory is used for buffering files
188 that are being scanned.
189 </P>
190 <P>
191 <b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
192 Output <i>number</i> lines of context both before and after each matching line.
193 This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
194 </P>
195 <P>
196 <b>-c</b>, <b>--count</b>
197 Do not output individual lines from the files that are being scanned; instead
198 output the number of lines that would otherwise have been shown. If no lines
199 are selected, the number zero is output. If several files are are being
200 scanned, a count is output for each of them. However, if the
201 <b>--files-with-matches</b> option is also used, only those files whose counts
202 are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
203 <b>-B</b>, and <b>-C</b> options are ignored.
204 </P>
205 <P>
206 <b>--colour</b>, <b>--color</b>
207 If this option is given without any data, it is equivalent to "--colour=auto".
208 If data is required, it must be given in the same shell item, separated by an
209 equals sign.
210 </P>
211 <P>
212 <b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
213 This option specifies under what circumstances the parts of a line that matched
214 a pattern should be coloured in the output. By default, the output is not
215 coloured. The value (which is optional, see above) may be "never", "always", or
216 "auto". In the latter case, colouring happens only if the standard output is
217 connected to a terminal. More resources are used when colouring is enabled,
218 because <b>pcregrep</b> has to search for all possible matches in a line, not
219 just one, in order to colour them all.
220 <br>
221 <br>
222 The colour that is used can be specified by setting the environment variable
223 PCREGREP_COLOUR or PCREGREP_COLOR. The value of this variable should be a
224 string of two numbers, separated by a semicolon. They are copied directly into
225 the control string for setting colour on a terminal, so it is your
226 responsibility to ensure that they make sense. If neither of the environment
227 variables is set, the default is "1;31", which gives red.
228 </P>
229 <P>
230 <b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
231 If an input path is not a regular file or a directory, "action" specifies how
232 it is to be processed. Valid values are "read" (the default) or "skip"
233 (silently skip the path).
234 </P>
235 <P>
236 <b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
237 If an input path is a directory, "action" specifies how it is to be processed.
238 Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
239 option), or "skip" (silently skip the path). In the default case, directories
240 are read as if they were ordinary files. In some operating systems the effect
241 of reading a directory like this is an immediate end-of-file.
242 </P>
243 <P>
244 <b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
245 Specify a pattern to be matched. This option can be used multiple times in
246 order to specify several patterns. It can also be used as a way of specifying a
247 single pattern that starts with a hyphen. When <b>-e</b> is used, no argument
248 pattern is taken from the command line; all arguments are treated as file
249 names. There is an overall maximum of 100 patterns. They are applied to each
250 line in the order in which they are defined until one matches (or fails to
251 match if <b>-v</b> is used). If <b>-f</b> is used with <b>-e</b>, the command line
252 patterns are matched first, followed by the patterns from the file, independent
253 of the order in which these options are specified. Note that multiple use of
254 <b>-e</b> is not the same as a single pattern with alternatives. For example,
255 X|Y finds the first character in a line that is X or Y, whereas if the two
256 patterns are given separately, <b>pcregrep</b> finds X if it is present, even if
257 it follows Y in the line. It finds Y only if there is no X in the line. This
258 really matters only if you are using <b>-o</b> to show the part(s) of the line
259 that matched.
260 </P>
261 <P>
262 <b>--exclude</b>=<i>pattern</i>
263 When <b>pcregrep</b> is searching the files in a directory as a consequence of
264 the <b>-r</b> (recursive search) option, any regular files whose names match the
265 pattern are excluded. Subdirectories are not excluded by this option; they are
266 searched recursively, subject to the <b>--exclude-dir</b> and
267 <b>--include_dir</b> options. The pattern is a PCRE regular expression, and is
268 matched against the final component of the file name (not the entire path). If
269 a file name matches both <b>--include</b> and <b>--exclude</b>, it is excluded.
270 There is no short form for this option.
271 </P>
272 <P>
273 <b>--exclude-dir</b>=<i>pattern</i>
274 When <b>pcregrep</b> is searching the contents of a directory as a consequence
275 of the <b>-r</b> (recursive search) option, any subdirectories whose names match
276 the pattern are excluded. (Note that the \fP--exclude\fP option does not affect
277 subdirectories.) The pattern is a PCRE regular expression, and is matched
278 against the final component of the name (not the entire path). If a
279 subdirectory name matches both <b>--include-dir</b> and <b>--exclude-dir</b>, it
280 is excluded. There is no short form for this option.
281 </P>
282 <P>
283 <b>-F</b>, <b>--fixed-strings</b>
284 Interpret each pattern as a list of fixed strings, separated by newlines,
285 instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
286 (match whole line) options can be used with <b>-F</b>. They apply to each of the
287 fixed strings. A line is selected if any of the fixed strings are found in it
288 (subject to <b>-w</b> or <b>-x</b>, if present).
289 </P>
290 <P>
291 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
292 Read a number of patterns from the file, one per line, and match them against
293 each line of input. A data line is output if any of the patterns match it. The
294 filename can be given as "-" to refer to the standard input. When <b>-f</b> is
295 used, patterns specified on the command line using <b>-e</b> may also be
296 present; they are tested before the file's patterns. However, no other pattern
297 is taken from the command line; all arguments are treated as the names of paths
298 to be searched. There is an overall maximum of 100 patterns. Trailing white
299 space is removed from each line, and blank lines are ignored. An empty file
300 contains no patterns and therefore matches nothing. See also the comments about
301 multiple patterns versus a single pattern with alternatives in the description
302 of <b>-e</b> above.
303 </P>
304 <P>
305 <b>--file-list</b>=<i>filename</i>
306 Read a list of files to be searched from the given file, one per line. Trailing
307 white space is removed from each line, and blank lines are ignored. These files
308 are searched before any others that may be listed on the command line. The
309 filename can be given as "-" to refer to the standard input. If <b>--file</b>
310 and <b>--file-list</b> are both specified as "-", patterns are read first. This
311 is useful only when the standard input is a terminal, from which further lines
312 (the list of files) can be read after an end-of-file indication.
313 </P>
314 <P>
315 <b>--file-offsets</b>
316 Instead of showing lines or parts of lines that match, show each match as an
317 offset from the start of the file and a length, separated by a comma. In this
318 mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b>
319 options are ignored. If there is more than one match in a line, each of them is
320 shown separately. This option is mutually exclusive with <b>--line-offsets</b>
321 and <b>--only-matching</b>.
322 </P>
323 <P>
324 <b>-H</b>, <b>--with-filename</b>
325 Force the inclusion of the filename at the start of output lines when searching
326 a single file. By default, the filename is not shown in this case. For matching
327 lines, the filename is followed by a colon; for context lines, a hyphen
328 separator is used. If a line number is also being output, it follows the file
329 name.
330 </P>
331 <P>
332 <b>-h</b>, <b>--no-filename</b>
333 Suppress the output filenames when searching multiple files. By default,
334 filenames are shown when multiple files are searched. For matching lines, the
335 filename is followed by a colon; for context lines, a hyphen separator is used.
336 If a line number is also being output, it follows the file name.
337 </P>
338 <P>
339 <b>--help</b>
340 Output a help message, giving brief details of the command options and file
341 type support, and then exit.
342 </P>
343 <P>
344 <b>-I</b>
345 Treat binary files as never matching. This is equivalent to
346 <b>--binary-files</b>=<i>without-match</i>.
347 </P>
348 <P>
349 <b>-i</b>, <b>--ignore-case</b>
350 Ignore upper/lower case distinctions during comparisons.
351 </P>
352 <P>
353 <b>--include</b>=<i>pattern</i>
354 When <b>pcregrep</b> is searching the files in a directory as a consequence of
355 the <b>-r</b> (recursive search) option, only those regular files whose names
356 match the pattern are included. Subdirectories are always included and searched
357 recursively, subject to the \fP--include-dir\fP and <b>--exclude-dir</b>
358 options. The pattern is a PCRE regular expression, and is matched against the
359 final component of the file name (not the entire path). If a file name matches
360 both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
361 form for this option.
362 </P>
363 <P>
364 <b>--include-dir</b>=<i>pattern</i>
365 When <b>pcregrep</b> is searching the contents of a directory as a consequence
366 of the <b>-r</b> (recursive search) option, only those subdirectories whose
367 names match the pattern are included. (Note that the <b>--include</b> option
368 does not affect subdirectories.) The pattern is a PCRE regular expression, and
369 is matched against the final component of the name (not the entire path). If a
370 subdirectory name matches both <b>--include-dir</b> and <b>--exclude-dir</b>, it
371 is excluded. There is no short form for this option.
372 </P>
373 <P>
374 <b>-L</b>, <b>--files-without-match</b>
375 Instead of outputting lines from the files, just output the names of the files
376 that do not contain any lines that would have been output. Each file name is
377 output once, on a separate line.
378 </P>
379 <P>
380 <b>-l</b>, <b>--files-with-matches</b>
381 Instead of outputting lines from the files, just output the names of the files
382 containing lines that would have been output. Each file name is output
383 once, on a separate line. Searching normally stops as soon as a matching line
384 is found in a file. However, if the <b>-c</b> (count) option is also used,
385 matching continues in order to obtain the correct count, and those files that
386 have at least one match are listed along with their counts. Using this option
387 with <b>-c</b> is a way of suppressing the listing of files with no matches.
388 </P>
389 <P>
390 <b>--label</b>=<i>name</i>
391 This option supplies a name to be used for the standard input when file names
392 are being output. If not supplied, "(standard input)" is used. There is no
393 short form for this option.
394 </P>
395 <P>
396 <b>--line-buffered</b>
397 When this option is given, input is read and processed line by line, and the
398 output is flushed after each write. By default, input is read in large chunks,
399 unless <b>pcregrep</b> can determine that it is reading from a terminal (which
400 is currently possible only in Unix environments). Output to terminal is
401 normally automatically flushed by the operating system. This option can be
402 useful when the input or output is attached to a pipe and you do not want
403 <b>pcregrep</b> to buffer up large amounts of data. However, its use will affect
404 performance, and the <b>-M</b> (multiline) option ceases to work.
405 </P>
406 <P>
407 <b>--line-offsets</b>
408 Instead of showing lines or parts of lines that match, show each match as a
409 line number, the offset from the start of the line, and a length. The line
410 number is terminated by a colon (as usual; see the <b>-n</b> option), and the
411 offset and length are separated by a comma. In this mode, no context is shown.
412 That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are ignored. If there is
413 more than one match in a line, each of them is shown separately. This option is
414 mutually exclusive with <b>--file-offsets</b> and <b>--only-matching</b>.
415 </P>
416 <P>
417 <b>--locale</b>=<i>locale-name</i>
418 This option specifies a locale to be used for pattern matching. It overrides
419 the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
420 locale is specified, the PCRE library's default (usually the "C" locale) is
421 used. There is no short form for this option.
422 </P>
423 <P>
424 <b>--match-limit</b>=<i>number</i>
425 Processing some regular expression patterns can require a very large amount of
426 memory, leading in some cases to a program crash if not enough is available.
427 Other patterns may take a very long time to search for all possible matching
428 strings. The <b>pcre_exec()</b> function that is called by <b>pcregrep</b> to do
429 the matching has two parameters that can limit the resources that it uses.
430 <br>
431 <br>
432 The <b>--match-limit</b> option provides a means of limiting resource usage
433 when processing patterns that are not going to match, but which have a very
434 large number of possibilities in their search trees. The classic example is a
435 pattern that uses nested unlimited repeats. Internally, PCRE uses a function
436 called <b>match()</b> which it calls repeatedly (sometimes recursively). The
437 limit set by <b>--match-limit</b> is imposed on the number of times this
438 function is called during a match, which has the effect of limiting the amount
439 of backtracking that can take place.
440 <br>
441 <br>
442 The <b>--recursion-limit</b> option is similar to <b>--match-limit</b>, but
443 instead of limiting the total number of times that <b>match()</b> is called, it
444 limits the depth of recursive calls, which in turn limits the amount of memory
445 that can be used. The recursion depth is a smaller number than the total number
446 of calls, because not all calls to <b>match()</b> are recursive. This limit is
447 of use only if it is set smaller than <b>--match-limit</b>.
448 <br>
449 <br>
450 There are no short forms for these options. The default settings are specified
451 when the PCRE library is compiled, with the default default being 10 million.
452 </P>
453 <P>
454 <b>-M</b>, <b>--multiline</b>
455 Allow patterns to match more than one line. When this option is given, patterns
456 may usefully contain literal newline characters and internal occurrences of ^
457 and $ characters. The output for a successful match may consist of more than
458 one line, the last of which is the one in which the match ended. If the matched
459 string ends with a newline sequence the output ends at the end of that line.
460 <br>
461 <br>
462 When this option is set, the PCRE library is called in "multiline" mode.
463 There is a limit to the number of lines that can be matched, imposed by the way
464 that <b>pcregrep</b> buffers the input file as it scans it. However,
465 <b>pcregrep</b> ensures that at least 8K characters or the rest of the document
466 (whichever is the shorter) are available for forward matching, and similarly
467 the previous 8K characters (or all the previous characters, if fewer than 8K)
468 are guaranteed to be available for lookbehind assertions. This option does not
469 work when input is read line by line (see \fP--line-buffered\fP.)
470 </P>
471 <P>
472 <b>-N</b> <i>newline-type</i>, <b>--newline</b>=<i>newline-type</i>
473 The PCRE library supports five different conventions for indicating
474 the ends of lines. They are the single-character sequences CR (carriage return)
475 and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
476 which recognizes any of the preceding three types, and an "any" convention, in
477 which any Unicode line ending sequence is assumed to end a line. The Unicode
478 sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
479 (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
480 PS (paragraph separator, U+2029).
481 <br>
482 <br>
483 When the PCRE library is built, a default line-ending sequence is specified.
484 This is normally the standard sequence for the operating system. Unless
485 otherwise specified by this option, <b>pcregrep</b> uses the library's default.
486 The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
487 makes it possible to use <b>pcregrep</b> on files that have come from other
488 environments without having to modify their line endings. If the data that is
489 being scanned does not agree with the convention set by this option,
490 <b>pcregrep</b> may behave in strange ways.
491 </P>
492 <P>
493 <b>-n</b>, <b>--line-number</b>
494 Precede each output line by its line number in the file, followed by a colon
495 for matching lines or a hyphen for context lines. If the filename is also being
496 output, it precedes the line number. This option is forced if
497 <b>--line-offsets</b> is used.
498 </P>
499 <P>
500 <b>--no-jit</b>
501 If the PCRE library is built with support for just-in-time compiling (which
502 speeds up matching), <b>pcregrep</b> automatically makes use of this, unless it
503 was explicitly disabled at build time. This option can be used to disable the
504 use of JIT at run time. It is provided for testing and working round problems.
505 It should never be needed in normal use.
506 </P>
507 <P>
508 <b>-o</b>, <b>--only-matching</b>
509 Show only the part of the line that matched a pattern instead of the whole
510 line. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and
511 <b>-C</b> options are ignored. If there is more than one match in a line, each
512 of them is shown separately. If <b>-o</b> is combined with <b>-v</b> (invert the
513 sense of the match to find non-matching lines), no output is generated, but the
514 return code is set appropriately. If the matched portion of the line is empty,
515 nothing is output unless the file name or line number are being printed, in
516 which case they are shown on an otherwise empty line. This option is mutually
517 exclusive with <b>--file-offsets</b> and <b>--line-offsets</b>.
518 </P>
519 <P>
520 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
521 Show only the part of the line that matched the capturing parentheses of the
522 given number. Up to 32 capturing parentheses are supported. Because these
523 options can be given without an argument (see above), if an argument is
524 present, it must be given in the same shell item, for example, -o3 or
525 --only-matching=2. The comments given for the non-argument case above also
526 apply to this case. If the specified capturing parentheses do not exist in the
527 pattern, or were not set in the match, nothing is output unless the file name
528 or line number are being printed.
529 </P>
530 <P>
531 <b>-q</b>, <b>--quiet</b>
532 Work quietly, that is, display nothing except error messages. The exit
533 status indicates whether or not any matches were found.
534 </P>
535 <P>
536 <b>-r</b>, <b>--recursive</b>
537 If any given path is a directory, recursively scan the files it contains,
538 taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
539 directory is read as a normal file; in some operating systems this gives an
540 immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
541 option to "recurse".
542 </P>
543 <P>
544 <b>--recursion-limit</b>=<i>number</i>
545 See <b>--match-limit</b> above.
546 </P>
547 <P>
548 <b>-s</b>, <b>--no-messages</b>
549 Suppress error messages about non-existent or unreadable files. Such files are
550 quietly skipped. However, the return code is still 2, even if matches were
551 found in other files.
552 </P>
553 <P>
554 <b>-u</b>, <b>--utf-8</b>
555 Operate in UTF-8 mode. This option is available only if PCRE has been compiled
556 with UTF-8 support. Both patterns and subject lines must be valid strings of
557 UTF-8 characters.
558 </P>
559 <P>
560 <b>-V</b>, <b>--version</b>
561 Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
562 used to the standard error stream.
563 </P>
564 <P>
565 <b>-v</b>, <b>--invert-match</b>
566 Invert the sense of the match, so that lines which do <i>not</i> match any of
567 the patterns are the ones that are found.
568 </P>
569 <P>
570 <b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
571 Force the patterns to match only whole words. This is equivalent to having \b
572 at the start and end of the pattern.
573 </P>
574 <P>
575 <b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
576 Force the patterns to be anchored (each must start matching at the beginning of
577 a line) and in addition, require them to match entire lines. This is
578 equivalent to having ^ and $ characters at the start and end of each
579 alternative branch in every pattern.
580 </P>
581 <br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
582 <P>
583 The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
584 order, for a locale. The first one that is set is used. This can be overridden
585 by the <b>--locale</b> option. If no locale is set, the PCRE library's default
586 (usually the "C" locale) is used.
587 </P>
588 <br><a name="SEC7" href="#TOC1">NEWLINES</a><br>
589 <P>
590 The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
591 different newline conventions from the default. However, the setting of this
592 option does not affect the way in which <b>pcregrep</b> writes information to
593 the standard error and output streams. It uses the string "\n" in C
594 <b>printf()</b> calls to indicate newlines, relying on the C I/O library to
595 convert this to an appropriate sequence if the output is sent to a file.
596 </P>
597 <br><a name="SEC8" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
598 <P>
599 Many of the short and long forms of <b>pcregrep</b>'s options are the same
600 as in the GNU <b>grep</b> program. Any long option of the form
601 <b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
602 (PCRE terminology). However, the <b>--file-list</b>, <b>--file-offsets</b>,
603 <b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>,
604 <b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>,
605 <b>--recursion-limit</b>, <b>-u</b>, and <b>--utf-8</b> options are specific to
606 <b>pcregrep</b>, as is the use of the <b>--only-matching</b> option with a
607 capturing parentheses number.
608 </P>
609 <P>
610 Although most of the common options work the same way, a few are different in
611 <b>pcregrep</b>. For example, the <b>--include</b> option's argument is a glob
612 for GNU <b>grep</b>, but a regular expression for <b>pcregrep</b>. If both the
613 <b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
614 without counts, but <b>pcregrep</b> gives the counts.
615 </P>
616 <br><a name="SEC9" href="#TOC1">OPTIONS WITH DATA</a><br>
617 <P>
618 There are four different ways in which an option with data can be specified.
619 If a short form option is used, the data may follow immediately, or (with one
620 exception) in the next command line item. For example:
621 <pre>
622 -f/some/file
623 -f /some/file
624 </pre>
625 The exception is the <b>-o</b> option, which may appear with or without data.
626 Because of this, if data is present, it must follow immediately in the same
627 item, for example -o3.
628 </P>
629 <P>
630 If a long form option is used, the data may appear in the same command line
631 item, separated by an equals character, or (with two exceptions) it may appear
632 in the next command line item. For example:
633 <pre>
634 --file=/some/file
635 --file /some/file
636 </pre>
637 Note, however, that if you want to supply a file name beginning with ~ as data
638 in a shell command, and have the shell expand ~ to a home directory, you must
639 separate the file name from the option, because the shell does not treat ~
640 specially unless it is at the start of an item.
641 </P>
642 <P>
643 The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
644 <b>--only-matching</b> options, for which the data is optional. If one of these
645 options does have data, it must be given in the first form, using an equals
646 character. Otherwise <b>pcregrep</b> will assume that it has no data.
647 </P>
648 <br><a name="SEC10" href="#TOC1">MATCHING ERRORS</a><br>
649 <P>
650 It is possible to supply a regular expression that takes a very long time to
651 fail to match certain lines. Such patterns normally involve nested indefinite
652 repeats, for example: (a+)*\d when matched against a line of a's with no final
653 digit. The PCRE matching function has a resource limit that causes it to abort
654 in these circumstances. If this happens, <b>pcregrep</b> outputs an error
655 message and the line that caused the problem to the standard error stream. If
656 there are more than 20 such errors, <b>pcregrep</b> gives up.
657 </P>
658 <P>
659 The <b>--match-limit</b> option of <b>pcregrep</b> can be used to set the overall
660 resource limit; there is a second option called <b>--recursion-limit</b> that
661 sets a limit on the amount of memory (usually stack) that is used (see the
662 discussion of these options above).
663 </P>
664 <br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
665 <P>
666 Exit status is 0 if any matches were found, 1 if no matches were found, and 2
667 for syntax errors, overlong lines, non-existent or inaccessible files (even if
668 matches were found in other files) or too many matching errors. Using the
669 <b>-s</b> option to suppress error messages about inaccessible files does not
670 affect the return code.
671 </P>
672 <br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
673 <P>
674 <b>pcrepattern</b>(3), <b>pcretest</b>(1).
675 </P>
676 <br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
677 <P>
678 Philip Hazel
679 <br>
680 University Computing Service
681 <br>
682 Cambridge CB2 3QH, England.
683 <br>
684 </P>
685 <br><a name="SEC14" href="#TOC1">REVISION</a><br>
686 <P>
687 Last updated: 04 March 2012
688 <br>
689 Copyright &copy; 1997-2012 University of Cambridge.
690 <br>
691 <p>
692 Return to the <a href="index.html">PCRE index page</a>.
693 </p>

Properties

Name Value
svn:eol-style native
svn:keywords "Author Date Id Revision Url"

  ViewVC Help
Powered by ViewVC 1.1.5