/[pcre]/code/trunk/doc/pcreapi.3
ViewVC logotype

Diff of /code/trunk/doc/pcreapi.3

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 978 by ph10, Sun Jun 17 16:55:07 2012 UTC revision 1031 by ph10, Sat Sep 8 15:59:01 2012 UTC
# Line 1  Line 1 
1  .TH PCREAPI 3 "04 May 2012" "PCRE 8.31"  .TH PCREAPI 3 "07 September 2012" "PCRE 8.32"
2  .SH NAME  .SH NAME
3  PCRE - Perl-compatible regular expressions  PCRE - Perl-compatible regular expressions
4  .sp  .sp
# Line 422  unaligned)". If JIT support is not avail Line 422  unaligned)". If JIT support is not avail
422    PCRE_CONFIG_NEWLINE    PCRE_CONFIG_NEWLINE
423  .sp  .sp
424  The output is an integer whose value specifies the default character sequence  The output is an integer whose value specifies the default character sequence
425  that is recognized as meaning "newline". The four values that are supported  that is recognized as meaning "newline". The values that are supported in
426  are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF, and -1 for ANY.  ASCII/Unicode environments are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for
427  Though they are derived from ASCII, the same values are returned in EBCDIC  ANYCRLF, and -1 for ANY. In EBCDIC environments, CR, ANYCRLF, and ANY yield the
428  environments. The default should normally correspond to the standard sequence  same values. However, the value for LF is normally 21, though some EBCDIC
429  for your operating system.  environments use 37. The corresponding values for CRLF are 3349 and 3365. The
430    default should normally correspond to the standard sequence for your operating
431    system.
432  .sp  .sp
433    PCRE_CONFIG_BSR    PCRE_CONFIG_BSR
434  .sp  .sp
# Line 739  indicated by a single character (CR or L Line 741  indicated by a single character (CR or L
741  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character
742  CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three  CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three
743  preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies  preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies
744  that any Unicode newline sequence should be recognized. The Unicode newline  that any Unicode newline sequence should be recognized.
745  sequences are the three just mentioned, plus the single characters VT (vertical  .P
746  tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line  In an ASCII/Unicode environment, the Unicode newline sequences are the three
747  separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit  just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form
748  library, the last two are recognized only in UTF-8 mode.  feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
749    (paragraph separator, U+2029). For the 8-bit library, the last two are
750    recognized only in UTF-8 mode.
751    .P
752    When PCRE is compiled to run in an EBCDIC (mainframe) environment, the code for
753    CR is 0x0d, the same as ASCII. However, the character code for LF is normally
754    0x15, though in some EBCDIC environments 0x25 is used. Whichever of these is
755    not LF is made to correspond to Unicode's NEL character. EBCDIC codes are all
756    less than 256. For more details, see the
757    .\" HREF
758    \fBpcrebuild\fP
759    .\"
760    documentation.
761  .P  .P
762  The newline setting in the options word uses three bits that are treated  The newline setting in the options word uses three bits that are treated
763  as a number, giving eight possibilities. Currently only six are used (default  as a number, giving eight possibilities. Currently only six are used (default
# Line 960  below Line 974  below
974  in the section on matching a pattern.  in the section on matching a pattern.
975  .P  .P
976  If studying the pattern does not produce any useful information,  If studying the pattern does not produce any useful information,
977  \fBpcre_study()\fP returns NULL. In that circumstance, if the calling program  \fBpcre_study()\fP returns NULL by default. In that circumstance, if the
978  wants to pass any of the other fields to \fBpcre_exec()\fP or  calling program wants to pass any of the other fields to \fBpcre_exec()\fP or
979  \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block.  \fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block. However,
980    if \fBpcre_study()\fP is called with the PCRE_STUDY_EXTRA_NEEDED option, it
981    returns a \fBpcre_extra\fP block even if studying did not find any additional
982    information. It may still return NULL, however, if an error occurs in
983    \fBpcre_study()\fP.
984  .P  .P
985  The second argument of \fBpcre_study()\fP contains option bits. There are three  The second argument of \fBpcre_study()\fP contains option bits. There are three
986  options:  further options in addition to PCRE_STUDY_EXTRA_NEEDED:
987  .sp  .sp
988    PCRE_STUDY_JIT_COMPILE    PCRE_STUDY_JIT_COMPILE
989    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE    PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
# Line 974  options: Line 992  options:
992  If any of these are set, and the just-in-time compiler is available, the  If any of these are set, and the just-in-time compiler is available, the
993  pattern is further compiled into machine code that executes much faster than  pattern is further compiled into machine code that executes much faster than
994  the \fBpcre_exec()\fP interpretive matching function. If the just-in-time  the \fBpcre_exec()\fP interpretive matching function. If the just-in-time
995  compiler is not available, these options are ignored. All other bits in the  compiler is not available, these options are ignored. All undefined bits in the
996  \fIoptions\fP argument must be zero.  \fIoptions\fP argument must be zero.
997  .P  .P
998  JIT compilation is a heavyweight optimization. It can take some time for  JIT compilation is a heavyweight optimization. It can take some time for
# Line 1022  real application there should be tests f Line 1040  real application there should be tests f
1040  Studying a pattern does two things: first, a lower bound for the length of  Studying a pattern does two things: first, a lower bound for the length of
1041  subject string that is needed to match the pattern is computed. This does not  subject string that is needed to match the pattern is computed. This does not
1042  mean that there are any strings of that length that match, but it does  mean that there are any strings of that length that match, but it does
1043  guarantee that no shorter strings match. The value is used by  guarantee that no shorter strings match. The value is used to avoid wasting
1044  \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP to avoid wasting time by trying to  time by trying to match strings that are shorter than the lower bound. You can
1045  match strings that are shorter than the lower bound. You can find out the value  find out the value in a calling program via the \fBpcre_fullinfo()\fP function.
 in a calling program via the \fBpcre_fullinfo()\fP function.  
1046  .P  .P
1047  Studying a pattern is also useful for non-anchored patterns that do not have a  Studying a pattern is also useful for non-anchored patterns that do not have a
1048  single fixed starting character. A bitmap of possible starting bytes is  single fixed starting character. A bitmap of possible starting bytes is
# Line 2667  Cambridge CB2 3QH, England. Line 2684  Cambridge CB2 3QH, England.
2684  .rs  .rs
2685  .sp  .sp
2686  .nf  .nf
2687  Last updated: 17 June 2012  Last updated: 07 September 2012
2688  Copyright (c) 1997-2012 University of Cambridge.  Copyright (c) 1997-2012 University of Cambridge.
2689  .fi  .fi

Legend:
Removed from v.978  
changed lines
  Added in v.1031

  ViewVC Help
Powered by ViewVC 1.1.5