Skip to content

Commit

Permalink
Implement PCRE2_NEWLINE_NUL.
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilipHazel committed May 26, 2017
1 parent 772d857 commit 3d80fa4
Show file tree
Hide file tree
Showing 40 changed files with 1,274 additions and 1,105 deletions.
5 changes: 4 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ SET(PCRE2GREP_MAX_BUFSIZE "1048576" CACHE STRING
"Buffer maximum size parameter for pcre2grep. See PCRE2GREP_MAX_BUFSIZE in config.h.in for details.")

SET(PCRE2_NEWLINE "LF" CACHE STRING
"What to recognize as a newline (one of CR, LF, CRLF, ANY, ANYCRLF).")
"What to recognize as a newline (one of CR, LF, CRLF, ANY, ANYCRLF, NUL).")

SET(PCRE2_HEAP_MATCH_RECURSE OFF CACHE BOOL
"Obsolete option: do not use")
Expand Down Expand Up @@ -344,6 +344,9 @@ ENDIF(PCRE2_NEWLINE STREQUAL "ANY")
IF(PCRE2_NEWLINE STREQUAL "ANYCRLF")
SET(NEWLINE_DEFAULT "5")
ENDIF(PCRE2_NEWLINE STREQUAL "ANYCRLF")
IF(PCRE2_NEWLINE STREQUAL "NUL")
SET(NEWLINE_DEFAULT "6")
ENDIF(PCRE2_NEWLINE STREQUAL "NUL")

IF(NEWLINE_DEFAULT STREQUAL "")
MESSAGE(FATAL_ERROR "The PCRE2_NEWLINE variable must be set to one of the following values: \"LF\", \"CR\", \"CRLF\", \"ANY\", \"ANYCRLF\".")
Expand Down
1 change: 1 addition & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ all the tests can run with clang's sanitizing options.
33. Implement extra compile options in the compile context and add the first
one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES.

34. Implement newline type PCRE2_NEWLINE_NUL.


Version 10.23 14-February-2017
Expand Down
5 changes: 5 additions & 0 deletions RunGrepTest
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,11 @@ $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >
printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep
$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep

printf "abc\0def" >testNinputgrep

printf "%c--------------------------- Test N7 ------------------------------\r\n" - >>testtrygrep
$valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | sed 's/\x00/ZERO/' >>testtrygrep

$cf $srcdir/testdata/grepoutputN testtrygrep
if [ $? != 0 ] ; then exit 1; fi

Expand Down
7 changes: 6 additions & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,10 @@ AC_ARG_ENABLE(newline-is-any,
AS_HELP_STRING([--enable-newline-is-any],
[use any valid Unicode newline sequence]),
ac_pcre2_newline=any)
AC_ARG_ENABLE(newline-is-nul,
AS_HELP_STRING([--enable-newline-is-nul],
[use NUL (binary zero) as newline character]),
ac_pcre2_newline=nul)
enable_newline="$ac_pcre2_newline"

# Handle --enable-bsr-anycrlf
Expand Down Expand Up @@ -360,6 +364,7 @@ case "$enable_newline" in
crlf) ac_pcre2_newline_value=3 ;;
any) ac_pcre2_newline_value=4 ;;
anycrlf) ac_pcre2_newline_value=5 ;;
nul) ac_pcre2_newline_value=6 ;;
*)
AC_MSG_ERROR([invalid argument \"$enable_newline\" to --enable-newline option])
;;
Expand Down Expand Up @@ -658,7 +663,7 @@ AC_DEFINE_UNQUOTED([NEWLINE_DEFAULT], [$ac_pcre2_newline_value], [
The value of NEWLINE_DEFAULT determines the default newline character
sequence. PCRE2 client programs can override this by selecting other values
at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY),
and 5 (ANYCRLF).])
5 (ANYCRLF), and 6 (NUL).])

if test "$enable_bsr_anycrlf" = "yes"; then
AC_DEFINE([BSR_ANYCRLF], [], [
Expand Down
1 change: 1 addition & 0 deletions doc/html/pcre2_config.html
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ <h1>pcre2_config man page</h1>
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
PCRE2_NEWLINE_NUL
PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit
PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0
Expand Down
1 change: 1 addition & 0 deletions doc/html/pcre2_pattern_info.html
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ <h1>pcre2_pattern_info man page</h1>
PCRE2_NEWLINE_CRLF
PCRE2_NEWLINE_ANY
PCRE2_NEWLINE_ANYCRLF
PCRE2_NEWLINE_NUL
PCRE2_INFO_RECURSIONLIMIT Obsolete synonym for PCRE2_INFO_DEPTHLIMIT
PCRE2_INFO_SIZE Size of compiled pattern
</pre>
Expand Down
1 change: 1 addition & 0 deletions doc/html/pcre2_set_newline.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ <h1>pcre2_set_newline man page</h1>
PCRE2_NEWLINE_CRLF CR followed by LF only
PCRE2_NEWLINE_ANYCRLF Any of the above
PCRE2_NEWLINE_ANY Any Unicode newline sequence
PCRE2_NEWLINE_NUL The NUL character (binary zero)
</pre>
The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is
invalid.
Expand Down
9 changes: 6 additions & 3 deletions doc/html/pcre2api.html
Original file line number Diff line number Diff line change
Expand Up @@ -783,8 +783,9 @@ <h1>pcre2api man page</h1>
This specifies which characters or character sequences are to be recognized as
newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only),
PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), or
PCRE2_NEWLINE_ANY (any Unicode newline sequence).
sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above),
PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the
NUL character, that is a binary zero).
</P>
<P>
A pattern can override the value set in the compile context by starting with a
Expand Down Expand Up @@ -1106,6 +1107,7 @@ <h1>pcre2api man page</h1>
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
PCRE2_NEWLINE_NUL The NUL character (binary zero)
</pre>
The default should normally correspond to the standard sequence for your
operating system.
Expand Down Expand Up @@ -2121,6 +2123,7 @@ <h1>pcre2api man page</h1>
PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF)
PCRE2_NEWLINE_ANY Any Unicode line ending
PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF
PCRE2_NEWLINE_NUL The NUL character (binary zero)
</pre>
This identifies the character sequence that will be recognized as meaning
"newline" while matching.
Expand Down Expand Up @@ -3468,7 +3471,7 @@ <h1>pcre2api man page</h1>
</P>
<br><a name="SEC42" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 May 2017
Last updated: 26 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>
Expand Down
10 changes: 6 additions & 4 deletions doc/html/pcre2grep.html
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,11 @@ <h1>pcre2grep man page</h1>
<br><a name="SEC4" href="#TOC1">BINARY FILES</a><br>
<P>
By default, a file that contains a binary zero byte within the first 1024 bytes
is identified as a binary file, and is processed specially. (GNU grep also
identifies binary files in this manner.) See the <b>--binary-files</b> option
for a means of changing the way binary files are handled.
is identified as a binary file, and is processed specially. (GNU grep
identifies binary files in this manner.) However, if the newline type is
specified as "nul", that is, the line terminator is a binary zero, the test for
a binary file is not applied. See the <b>--binary-files</b> option for a means
of changing the way binary files are handled.
</P>
<br><a name="SEC5" href="#TOC1">OPTIONS</a><br>
<P>
Expand Down Expand Up @@ -934,7 +936,7 @@ <h1>pcre2grep man page</h1>
</P>
<br><a name="SEC15" href="#TOC1">REVISION</a><br>
<P>
Last updated: 11 April 2017
Last updated: 26 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>
Expand Down
11 changes: 6 additions & 5 deletions doc/html/pcre2pattern.html
Original file line number Diff line number Diff line change
Expand Up @@ -214,10 +214,10 @@ <h1>pcre2pattern man page</h1>
Newline conventions
</b><br>
<P>
PCRE2 supports five different conventions for indicating line breaks in
PCRE2 supports six different conventions for indicating line breaks in
strings: a single CR (carriage return) character, a single LF (linefeed)
character, the two-character sequence CRLF, any of the three preceding, or any
Unicode newline sequence. The
character, the two-character sequence CRLF, any of the three preceding, any
Unicode newline sequence, or the NUL character (binary zero). The
<a href="pcre2api.html"><b>pcre2api</b></a>
page has
<a href="pcre2api.html#newlines">further discussion</a>
Expand All @@ -226,13 +226,14 @@ <h1>pcre2pattern man page</h1>
</P>
<P>
It is also possible to specify a newline convention by starting a pattern
string with one of the following five sequences:
string with one of the following sequences:
<pre>
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
(*NUL) the NUL character (binary zero)
</pre>
These override the default and the options given to the compiling function. For
example, on a Unix system where LF is the default newline sequence, the pattern
Expand Down Expand Up @@ -3444,7 +3445,7 @@ <h1>pcre2pattern man page</h1>
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 18 April 2017
Last updated: 26 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>
Expand Down
3 changes: 2 additions & 1 deletion doc/html/pcre2syntax.html
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,7 @@ <h1>pcre2syntax man page</h1>
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode newline sequence
(*NUL) the NUL character (binary zero)
</PRE>
</P>
<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br>
Expand Down Expand Up @@ -598,7 +599,7 @@ <h1>pcre2syntax man page</h1>
</P>
<br><a name="SEC27" href="#TOC1">REVISION</a><br>
<P>
Last updated: 18 April 2017
Last updated: 26 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>
Expand Down
10 changes: 5 additions & 5 deletions doc/html/pcre2test.html
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ <h1>pcre2test man page</h1>
linksize the configured internal link size (2, 3, or 4)
exit code is set to the link size
newline the default newline setting:
CR, LF, CRLF, ANYCRLF, or ANY
CR, LF, CRLF, ANYCRLF, ANY, or NUL
exit code is always 0
bsr the default setting for what \R matches:
ANYCRLF or ANY
Expand Down Expand Up @@ -367,8 +367,8 @@ <h1>pcre2test man page</h1>
</P>
<P>
The #newline_default command specifies a list of newline types that are
acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, or
ANY (in upper or lower case), for example:
acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF,
ANY, or NUL (in upper or lower case), for example:
<pre>
#newline_default LF Any anyCRLF
</pre>
Expand Down Expand Up @@ -655,7 +655,7 @@ <h1>pcre2test man page</h1>
<P>
The <b>newline</b> modifier specifies which characters are to be interpreted as
newlines, both in the pattern and in subject lines. The type must be one of CR,
LF, CRLF, ANYCRLF, or ANY (in upper or lower case).
LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
</P>
<br><b>
Information about a pattern
Expand Down Expand Up @@ -1816,7 +1816,7 @@ <h1>pcre2test man page</h1>
</P>
<br><a name="SEC21" href="#TOC1">REVISION</a><br>
<P>
Last updated: 17 May 2017
Last updated: 26 May 2017
<br>
Copyright &copy; 1997-2017 University of Cambridge.
<br>
Expand Down
Loading

0 comments on commit 3d80fa4

Please sign in to comment.