/[pcre]/code/trunk/doc/pcregrep.1
ViewVC logotype

Diff of /code/trunk/doc/pcregrep.1

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 91 by nigel, Sat Feb 24 21:41:34 2007 UTC revision 392 by ph10, Tue Mar 17 21:30:30 2009 UTC
# Line 11  pcregrep - a grep with Perl-compatible r Line 11  pcregrep - a grep with Perl-compatible r
11  grep commands do, but it uses the PCRE regular expression library to support  grep commands do, but it uses the PCRE regular expression library to support
12  patterns that are compatible with the regular expressions of Perl 5. See  patterns that are compatible with the regular expressions of Perl 5. See
13  .\" HREF  .\" HREF
14  \fBpcrepattern\fP  \fBpcrepattern\fP(3)
15  .\"  .\"
16  for a full description of syntax and semantics of the regular expressions that  for a full description of syntax and semantics of the regular expressions
17  PCRE supports.  that PCRE supports.
18  .P  .P
19  Patterns, whether supplied on the command line or in a separate file, are given  Patterns, whether supplied on the command line or in a separate file, are given
20  without delimiters. For example:  without delimiters. For example:
# Line 23  without delimiters. For example: Line 23  without delimiters. For example:
23  .sp  .sp
24  If you attempt to use delimiters (for example, by surrounding a pattern with  If you attempt to use delimiters (for example, by surrounding a pattern with
25  slashes, as is common in Perl scripts), they are interpreted as part of the  slashes, as is common in Perl scripts), they are interpreted as part of the
26  pattern. Quotes can of course be used on the command line because they are  pattern. Quotes can of course be used to delimit patterns on the command line
27  interpreted by the shell, and indeed they are required if a pattern contains  because they are interpreted by the shell, and indeed they are required if a
28  white space or shell metacharacters.  pattern contains white space or shell metacharacters.
29  .P  .P
30  The first argument that follows any option settings is treated as the single  The first argument that follows any option settings is treated as the single
31  pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.  pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.
# Line 39  For example: Line 39  For example:
39  .sp  .sp
40    pcregrep some-pattern /file1 - /file3    pcregrep some-pattern /file1 - /file3
41  .sp  .sp
42  By default, each line that matches the pattern is copied to the standard  By default, each line that matches a pattern is copied to the standard
43  output, and if there is more than one file, the file name is output at the  output, and if there is more than one file, the file name is output at the
44  start of each line. However, there are options that can change how  start of each line, followed by a colon. However, there are options that can
45  \fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it possible to  change how \fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it
46  search for patterns that span line boundaries. What defines a line boundary is  possible to search for patterns that span line boundaries. What defines a line
47  controlled by the \fB-N\fP (\fB--newline\fP) option.  boundary is controlled by the \fB-N\fP (\fB--newline\fP) option.
48  .P  .P
49  Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.  Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
50  BUFSIZ is defined in \fB<stdio.h>\fP.  BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
51    (specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to
52    each line in the order in which they are defined, except that all the \fB-e\fP
53    patterns are tried before the \fB-f\fP patterns.
54    .P
55    By default, as soon as one pattern matches (or fails to match when \fB-v\fP is
56    used), no further patterns are considered. However, if \fB--colour\fP (or
57    \fB--color\fP) is used to colour the matching substrings, or if
58    \fB--only-matching\fP, \fB--file-offsets\fP, or \fB--line-offsets\fP is used to
59    output only the part of the line that matched (either shown literally, or as an
60    offset), scanning resumes immediately following the match, so that further
61    matches on the same line can be found. If there are multiple patterns, they are
62    all tried on the remainder of the line, but patterns that follow the one that
63    matched are not tried on the earlier part of the line.
64    .P
65    This is the same behaviour as GNU grep, but it does mean that the order in
66    which multiple patterns are specified can affect the output when one of the
67    above options is used.
68    .P
69    Patterns that can match an empty string are accepted, but empty string
70    matches are not recognized. An example is the pattern "(super)?(man)?", in
71    which all components are optional. This pattern finds all occurrences of both
72    "super" and "man"; the output differs from matching with "super|man" when only
73    the matching substrings are being shown.
74  .P  .P
75  If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,  If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
76  \fBpcregrep\fP uses the value to set a locale when calling the PCRE library.  \fBpcregrep\fP uses the value to set a locale when calling the PCRE library.
77  The \fB--locale\fP option can be used to override this.  The \fB--locale\fP option can be used to override this.
78  .  .
79    .SH "SUPPORT FOR COMPRESSED FILES"
80    .rs
81    .sp
82    It is possible to compile \fBpcregrep\fP so that it uses \fBlibz\fP or
83    \fBlibbz2\fP to read files whose names end in \fB.gz\fP or \fB.bz2\fP,
84    respectively. You can find out whether your binary has support for one or both
85    of these file types by running it with the \fB--help\fP option. If the
86    appropriate support is not present, files are treated as plain text. The
87    standard input is always so treated.
88    .
89  .SH OPTIONS  .SH OPTIONS
90  .rs  .rs
91  .TP 10  .TP 10
# Line 93  If data is required, it must be given in Line 126  If data is required, it must be given in
126  equals sign.  equals sign.
127  .TP  .TP
128  \fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP  \fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP
129  This option specifies under what circumstances the part of a line that matched  This option specifies under what circumstances the parts of a line that matched
130  a pattern should be coloured in the output. The value may be "never" (the  a pattern should be coloured in the output. By default, the output is not
131  default), "always", or "auto". In the latter case, colouring happens only if  coloured. The value (which is optional, see above) may be "never", "always", or
132  the standard output is connected to a terminal. The colour can be specified by  "auto". In the latter case, colouring happens only if the standard output is
133  setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value  connected to a terminal. More resources are used when colouring is enabled,
134  of this variable should be a string of two numbers, separated by a semicolon.  because \fBpcregrep\fP has to search for all possible matches in a line, not
135  They are copied directly into the control string for setting colour on a  just one, in order to colour them all.
136  terminal, so it is your responsibility to ensure that they make sense. If  
137  neither of the environment variables is set, the default is "1;31", which gives  The colour that is used can be specified by setting the environment variable
138  red.  PCREGREP_COLOUR or PCREGREP_COLOR. The value of this variable should be a
139    string of two numbers, separated by a semicolon. They are copied directly into
140    the control string for setting colour on a terminal, so it is your
141    responsibility to ensure that they make sense. If neither of the environment
142    variables is set, the default is "1;31", which gives red.
143  .TP  .TP
144  \fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP  \fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP
145  If an input path is not a regular file or a directory, "action" specifies how  If an input path is not a regular file or a directory, "action" specifies how
# Line 116  option), or "skip" (silently skip the pa Line 153  option), or "skip" (silently skip the pa
153  are read as if they were ordinary files. In some operating systems the effect  are read as if they were ordinary files. In some operating systems the effect
154  of reading a directory like this is an immediate end-of-file.  of reading a directory like this is an immediate end-of-file.
155  .TP  .TP
156  \fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP,  \fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP, \fB--regexp=\fP\fIpattern\fP
157  \fB--regexp=\fP\fIpattern\fP Specify a pattern to be matched. This option can  Specify a pattern to be matched. This option can be used multiple times in
158  be used multiple times in order to specify several patterns. It can also be  order to specify several patterns. It can also be used as a way of specifying a
159  used as a way of specifying a single pattern that starts with a hyphen. When  single pattern that starts with a hyphen. When \fB-e\fP is used, no argument
160  \fB-e\fP is used, no argument pattern is taken from the command line; all  pattern is taken from the command line; all arguments are treated as file
161  arguments are treated as file names. There is an overall maximum of 100  names. There is an overall maximum of 100 patterns. They are applied to each
162  patterns. They are applied to each line in the order in which they are defined  line in the order in which they are defined until one matches (or fails to
163  until one matches (or fails to match if \fB-v\fP is used). If \fB-f\fP is used  match if \fB-v\fP is used). If \fB-f\fP is used with \fB-e\fP, the command line
164  with \fB-e\fP, the command line patterns are matched first, followed by the  patterns are matched first, followed by the patterns from the file, independent
165  patterns from the file, independent of the order in which these options are  of the order in which these options are specified. Note that multiple use of
166  specified. Note that multiple use of \fB-e\fP is not the same as a single  \fB-e\fP is not the same as a single pattern with alternatives. For example,
167  pattern with alternatives. For example, X|Y finds the first character in a line  X|Y finds the first character in a line that is X or Y, whereas if the two
168  that is X or Y, whereas if the two patterns are given separately,  patterns are given separately, \fBpcregrep\fP finds X if it is present, even if
169  \fBpcregrep\fP finds X if it is present, even if it follows Y in the line. It  it follows Y in the line. It finds Y only if there is no X in the line. This
170  finds Y only if there is no X in the line. This really matters only if you are  really matters only if you are using \fB-o\fP to show the part(s) of the line
171  using \fB-o\fP to show the portion of the line that matched.  that matched.
172  .TP  .TP
173  \fB--exclude\fP=\fIpattern\fP  \fB--exclude\fP=\fIpattern\fP
174  When \fBpcregrep\fP is searching the files in a directory as a consequence of  When \fBpcregrep\fP is searching the files in a directory as a consequence of
175  the \fB-r\fP (recursive search) option, any files whose names match the pattern  the \fB-r\fP (recursive search) option, any regular files whose names match the
176  are excluded. The pattern is a PCRE regular expression. If a file name matches  pattern are excluded. Subdirectories are not excluded by this option; they are
177  both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no short  searched recursively, subject to the \fB--exclude_dir\fP and
178  form for this option.  \fB--include_dir\fP options. The pattern is a PCRE regular expression, and is
179    matched against the final component of the file name (not the entire path). If
180    a file name matches both \fB--include\fP and \fB--exclude\fP, it is excluded.
181    There is no short form for this option.
182    .TP
183    \fB--exclude_dir\fP=\fIpattern\fP
184    When \fBpcregrep\fP is searching the contents of a directory as a consequence
185    of the \fB-r\fP (recursive search) option, any subdirectories whose names match
186    the pattern are excluded. (Note that the \fP--exclude\fP option does not affect
187    subdirectories.) The pattern is a PCRE regular expression, and is matched
188    against the final component of the name (not the entire path). If a
189    subdirectory name matches both \fB--include_dir\fP and \fB--exclude_dir\fP, it
190    is excluded. There is no short form for this option.
191  .TP  .TP
192  \fB-F\fP, \fB--fixed-strings\fP  \fB-F\fP, \fB--fixed-strings\fP
193  Interpret each pattern as a list of fixed strings, separated by newlines,  Interpret each pattern as a list of fixed strings, separated by newlines,
# Line 156  present; they are tested before the file Line 205  present; they are tested before the file
205  is taken from the command line; all arguments are treated as file names. There  is taken from the command line; all arguments are treated as file names. There
206  is an overall maximum of 100 patterns. Trailing white space is removed from  is an overall maximum of 100 patterns. Trailing white space is removed from
207  each line, and blank lines are ignored. An empty file contains no patterns and  each line, and blank lines are ignored. An empty file contains no patterns and
208  therefore matches nothing.  therefore matches nothing. See also the comments about multiple patterns versus
209    a single pattern with alternatives in the description of \fB-e\fP above.
210    .TP
211    \fB--file-offsets\fP
212    Instead of showing lines or parts of lines that match, show each match as an
213    offset from the start of the file and a length, separated by a comma. In this
214    mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP
215    options are ignored. If there is more than one match in a line, each of them is
216    shown separately. This option is mutually exclusive with \fB--line-offsets\fP
217    and \fB--only-matching\fP.
218  .TP  .TP
219  \fB-H\fP, \fB--with-filename\fP  \fB-H\fP, \fB--with-filename\fP
220  Force the inclusion of the filename at the start of output lines when searching  Force the inclusion of the filename at the start of output lines when searching
221  a single file. By default, the filename is not shown in this case. For matching  a single file. By default, the filename is not shown in this case. For matching
222  lines, the filename is followed by a colon and a space; for context lines, a  lines, the filename is followed by a colon; for context lines, a hyphen
223  hyphen separator is used. If a line number is also being output, it follows the  separator is used. If a line number is also being output, it follows the file
224  file name without a space.  name.
225  .TP  .TP
226  \fB-h\fP, \fB--no-filename\fP  \fB-h\fP, \fB--no-filename\fP
227  Suppress the output filenames when searching multiple files. By default,  Suppress the output filenames when searching multiple files. By default,
228  filenames are shown when multiple files are searched. For matching lines, the  filenames are shown when multiple files are searched. For matching lines, the
229  filename is followed by a colon and a space; for context lines, a hyphen  filename is followed by a colon; for context lines, a hyphen separator is used.
230  separator is used. If a line number is also being output, it follows the file  If a line number is also being output, it follows the file name.
 name without a space.  
231  .TP  .TP
232  \fB--help\fP  \fB--help\fP
233  Output a brief help message and exit.  Output a help message, giving brief details of the command options and file
234    type support, and then exit.
235  .TP  .TP
236  \fB-i\fP, \fB--ignore-case\fP  \fB-i\fP, \fB--ignore-case\fP
237  Ignore upper/lower case distinctions during comparisons.  Ignore upper/lower case distinctions during comparisons.
238  .TP  .TP
239  \fB--include\fP=\fIpattern\fP  \fB--include\fP=\fIpattern\fP
240  When \fBpcregrep\fP is searching the files in a directory as a consequence of  When \fBpcregrep\fP is searching the files in a directory as a consequence of
241  the \fB-r\fP (recursive search) option, only those files whose names match the  the \fB-r\fP (recursive search) option, only those regular files whose names
242  pattern are included. The pattern is a PCRE regular expression. If a file name  match the pattern are included. Subdirectories are always included and searched
243  matches both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no  recursively, subject to the \fP--include_dir\fP and \fB--exclude_dir\fP
244  short form for this option.  options. The pattern is a PCRE regular expression, and is matched against the
245    final component of the file name (not the entire path). If a file name matches
246    both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no short
247    form for this option.
248    .TP
249    \fB--include_dir\fP=\fIpattern\fP
250    When \fBpcregrep\fP is searching the contents of a directory as a consequence
251    of the \fB-r\fP (recursive search) option, only those subdirectories whose
252    names match the pattern are included. (Note that the \fB--include\fP option
253    does not affect subdirectories.) The pattern is a PCRE regular expression, and
254    is matched against the final component of the name (not the entire path). If a
255    subdirectory name matches both \fB--include_dir\fP and \fB--exclude_dir\fP, it
256    is excluded. There is no short form for this option.
257  .TP  .TP
258  \fB-L\fP, \fB--files-without-match\fP  \fB-L\fP, \fB--files-without-match\fP
259  Instead of outputting lines from the files, just output the names of the files  Instead of outputting lines from the files, just output the names of the files
# Line 201  This option supplies a name to be used f Line 271  This option supplies a name to be used f
271  are being output. If not supplied, "(standard input)" is used. There is no  are being output. If not supplied, "(standard input)" is used. There is no
272  short form for this option.  short form for this option.
273  .TP  .TP
274    \fB--line-offsets\fP
275    Instead of showing lines or parts of lines that match, show each match as a
276    line number, the offset from the start of the line, and a length. The line
277    number is terminated by a colon (as usual; see the \fB-n\fP option), and the
278    offset and length are separated by a comma. In this mode, no context is shown.
279    That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are ignored. If there is
280    more than one match in a line, each of them is shown separately. This option is
281    mutually exclusive with \fB--file-offsets\fP and \fB--only-matching\fP.
282    .TP
283  \fB--locale\fP=\fIlocale-name\fP  \fB--locale\fP=\fIlocale-name\fP
284  This option specifies a locale to be used for pattern matching. It overrides  This option specifies a locale to be used for pattern matching. It overrides
285  the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no  the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no
# Line 220  the previous 8K characters (or all the p Line 299  the previous 8K characters (or all the p
299  are guaranteed to be available for lookbehind assertions.  are guaranteed to be available for lookbehind assertions.
300  .TP  .TP
301  \fB-N\fP \fInewline-type\fP, \fB--newline=\fP\fInewline-type\fP  \fB-N\fP \fInewline-type\fP, \fB--newline=\fP\fInewline-type\fP
302  The PCRE library supports three different character sequences for indicating  The PCRE library supports five different conventions for indicating
303  the ends of lines. They are the single-character sequences CR (carriage return)  the ends of lines. They are the single-character sequences CR (carriage return)
304  and LF (linefeed), and the two-character sequence CR, LF. When the library is  and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention,
305  built, a default line-ending sequence is specified. This is normally the  which recognizes any of the preceding three types, and an "any" convention, in
306  standard sequence for the operating system. Unless otherwise specified by this  which any Unicode line ending sequence is assumed to end a line. The Unicode
307  option, \fBpcregrep\fP uses the default. The possible values for this option  sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF
308  are CR, LF, or CRLF. This makes it possible to use \fBpcregrep\fP on files that  (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
309  have come from other environments without having to modify their line endings.  PS (paragraph separator, U+2029).
310  If the data that is being scanned does not agree with the convention set by  .sp
311  this option, \fBpcregrep\fP may behave in strange ways.  When the PCRE library is built, a default line-ending sequence is specified.
312    This is normally the standard sequence for the operating system. Unless
313    otherwise specified by this option, \fBpcregrep\fP uses the library's default.
314    The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This
315    makes it possible to use \fBpcregrep\fP on files that have come from other
316    environments without having to modify their line endings. If the data that is
317    being scanned does not agree with the convention set by this option,
318    \fBpcregrep\fP may behave in strange ways.
319  .TP  .TP
320  \fB-n\fP, \fB--line-number\fP  \fB-n\fP, \fB--line-number\fP
321  Precede each output line by its line number in the file, followed by a colon  Precede each output line by its line number in the file, followed by a colon
322  and a space for matching lines or a hyphen and a space for context lines. If  for matching lines or a hyphen for context lines. If the filename is also being
323  the filename is also being output, it precedes the line number.  output, it precedes the line number. This option is forced if
324    \fB--line-offsets\fP is used.
325  .TP  .TP
326  \fB-o\fP, \fB--only-matching\fP  \fB-o\fP, \fB--only-matching\fP
327  Show only the part of the line that matched a pattern. In this mode, no  Show only the part of the line that matched a pattern. In this mode, no
328  context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are  context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are
329  ignored.  ignored. If there is more than one match in a line, each of them is shown
330    separately. If \fB-o\fP is combined with \fB-v\fP (invert the sense of the
331    match to find non-matching lines), no output is generated, but the return code
332    is set appropriately. This option is mutually exclusive with
333    \fB--file-offsets\fP and \fB--line-offsets\fP.
334  .TP  .TP
335  \fB-q\fP, \fB--quiet\fP  \fB-q\fP, \fB--quiet\fP
336  Work quietly, that is, display nothing except error messages. The exit  Work quietly, that is, display nothing except error messages. The exit
# Line 274  the patterns are the ones that are found Line 365  the patterns are the ones that are found
365  Force the patterns to match only whole words. This is equivalent to having \eb  Force the patterns to match only whole words. This is equivalent to having \eb
366  at the start and end of the pattern.  at the start and end of the pattern.
367  .TP  .TP
368  \fB-x\fP, \fB--line-regex\fP, \fP--line-regexp\fP  \fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
369  Force the patterns to be anchored (each must start matching at the beginning of  Force the patterns to be anchored (each must start matching at the beginning of
370  a line) and in addition, require them to match entire lines. This is  a line) and in addition, require them to match entire lines. This is
371  equivalent to having ^ and $ characters at the start and end of each  equivalent to having ^ and $ characters at the start and end of each
# Line 339  in the first form, using an equals chara Line 430  in the first form, using an equals chara
430  it has no data.  it has no data.
431  .  .
432  .  .
433  .SH MATCHING ERRORS  .SH "MATCHING ERRORS"
434  .rs  .rs
435  .sp  .sp
436  It is possible to supply a regular expression that takes a very long time to  It is possible to supply a regular expression that takes a very long time to
# Line 361  suppress error messages about inaccessbl Line 452  suppress error messages about inaccessbl
452  code.  code.
453  .  .
454  .  .
455    .SH "SEE ALSO"
456    .rs
457    .sp
458    \fBpcrepattern\fP(3), \fBpcretest\fP(1).
459    .
460    .
461  .SH AUTHOR  .SH AUTHOR
462  .rs  .rs
463  .sp  .sp
464    .nf
465  Philip Hazel  Philip Hazel
 .br  
466  University Computing Service  University Computing Service
467  .br  Cambridge CB2 3QH, England.
468  Cambridge CB2 3QG, England.  .fi
469  .P  .
470  .in 0  .
471  Last updated: 06 June 2006  .SH REVISION
472  .br  .rs
473  Copyright (c) 1997-2006 University of Cambridge.  .sp
474    .nf
475    Last updated: 01 March 2009
476    Copyright (c) 1997-2009 University of Cambridge.
477    .fi

Legend:
Removed from v.91  
changed lines
  Added in v.392

  ViewVC Help
Powered by ViewVC 1.1.5