Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcre2grep: add --posix-pattern-file for compatibility with other grep #428

Merged
merged 1 commit into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,18 @@ there is also the log of commit messages.
Version 10.45 xx-xxx-2024
-------------------------

1. Change 6 of 10.44 broke 32-bit compiles because pcre2test's reporting of
memory size was changed to the entire compiled data block, instead of just the
pattern and tables data, so as to align with the new length restriction.
1. Change 6 of 10.44 broke 32-bit tests because pcre2test's reporting of
memory size was changed to the entire compiled data block, instead of just the
pattern and tables data, so as to align with the new length restriction.
Because the block's header contains pointers, this meant the pcre2test output
was different in 32-bit mode. A patch by Carlo reverts to the preevious state
and makes sure that any limit set by pcre2_set_max_pattern_compiled_length()
also avoids the internal struct overhead.

2. Add --posix-pattern-file to pcre2grep to allow processing of empty patterns
through the -f option, as well as patterns that end in space characters for
compatibility with other grep tools.


Version 10.44 07-June-2024
--------------------------
Expand Down
29 changes: 29 additions & 0 deletions RunGrepTest
Original file line number Diff line number Diff line change
Expand Up @@ -861,6 +861,35 @@ echo "---------------------------- Test 153 -----------------------------" >>tes
(cd $srcdir; $valgrind $vjs $pcre2grep -nA3 --no-group-separator 'four' ./testdata/grepinputx) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 154 -----------------------------" >>testtrygrep
>testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 155 -----------------------------" >>testtrygrep
echo "" >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 156 -----------------------------" >>testtrygrep
echo "" >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep --posix-pattern-file --file $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 157 -----------------------------" >>testtrygrep
echo "spaces " >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -o --posix-pattern-file --file=$builddir/testtemp1grep ./testdata/grepinputv >testtemp2grep && $valgrind $vjs $pcre2grep -q "s " testtemp2grep) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 158 -----------------------------" >>testtrygrep
echo "spaces." >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep -f $builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

echo "---------------------------- Test 159 -----------------------------" >>testtrygrep
printf "spaces.\015\012" >testtemp1grep
(cd $srcdir; $valgrind $vjs $pcre2grep --posix-pattern-file -f$builddir/testtemp1grep ./testdata/grepinputv) >>testtrygrep
echo "RC=$?" >>testtrygrep

# Now compare the results.

Expand Down
6 changes: 3 additions & 3 deletions doc/html/pcre2_set_max_pattern_compiled_length.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ <h1>pcre2_set_max_pattern_compiled_length man page</h1>
</b><br>
<P>
This function sets, in a compile context, the maximum size (in bytes) for the
memory needed to hold the compiled version of a pattern that is compiled with
this context. The result is always zero. If a pattern that is passed to
<b>pcre2_compile()</b> with this context needs more memory, an error is
memory needed to hold the compiled version of a pattern that is using this
context. The result is always zero. If a pattern that is passed to
<b>pcre2_compile()</b> referencing this context needs more memory, an error is
generated. The default is the largest number that a PCRE2_SIZE variable can
hold, which is effectively unlimited.
</P>
Expand Down
14 changes: 12 additions & 2 deletions doc/html/pcre2grep.html
Original file line number Diff line number Diff line change
Expand Up @@ -391,9 +391,10 @@ <h1>pcre2grep man page</h1>
command line, no delimiters should be used. What constitutes a newline when
reading the file is the operating system's default interpretation of \n. The
<b>--newline</b> option has no effect on this option. Trailing white space is
removed from each line, and blank lines are ignored. An empty file contains no
removed from each line, and blank lines are ignored unless the
<b>--posix-pattern-file</b> option is also provided. An empty file contains no
patterns and therefore matches nothing. Patterns read from a file in this way
may contain binary zeros, which are treated as ordinary data characters.
may contain binary zeros, which are treated as ordinary character literals.
<br>
<br>
If this option is given more than once, all the specified files are read. A
Expand Down Expand Up @@ -808,6 +809,15 @@ <h1>pcre2grep man page</h1>
allowing \w to match Unicode letters and digits.
</P>
<P>
<b>--posix-pattern-file</b>
When patterns are provided with the <b>-f</b> option, do not trim trailing
spaces or ignore empty lines in a similar way than other grep tools. To keep
the behaviour consistent with older versions, if the pattern read was
terminated with CRLF (as character literals) then both characters won't be
included as part of it, so if you really need to have pattern ending in '\r',
use a escape sequence or provide it by a different method.
</P>
<P>
<b>-q</b>, <b>--quiet</b>
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
Expand Down
13 changes: 11 additions & 2 deletions doc/pcre2grep.1
Original file line number Diff line number Diff line change
Expand Up @@ -337,9 +337,10 @@ Read patterns from the file, one per line. As is the case with patterns on the
command line, no delimiters should be used. What constitutes a newline when
reading the file is the operating system's default interpretation of \en. The
\fB--newline\fP option has no effect on this option. Trailing white space is
removed from each line, and blank lines are ignored. An empty file contains no
removed from each line, and blank lines are ignored unless the
\fB--posix-pattern-file\fP option is also provided. An empty file contains no
patterns and therefore matches nothing. Patterns read from a file in this way
may contain binary zeros, which are treated as ordinary data characters.
may contain binary zeros, which are treated as ordinary character literals.
.sp
If this option is given more than once, all the specified files are read. A
data line is output if any of the patterns match it. A file name can be given
Expand Down Expand Up @@ -701,6 +702,14 @@ option settings within patterns that affect individual classes. For example,
when in UCP mode, the sequence (?aP) restricts [:word:] to ASCII letters, while
allowing \ew to match Unicode letters and digits.
.TP
\fB--posix-pattern-file\fP
When patterns are provided with the \fB-f\fP option, do not trim trailing
spaces or ignore empty lines in a similar way than other grep tools. To keep
the behaviour consistent with older versions, if the pattern read was
terminated with CRLF (as character literals) then both characters won't be
included as part of it, so if you really need to have pattern ending in '\er',
use a escape sequence or provide it by a different method.
.TP
\fB-q\fP, \fB--quiet\fP
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
Expand Down
3 changes: 2 additions & 1 deletion src/config.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ sure both macros are undefined; an emulation function will then be used. */
/* Define to 1 if you have the <unistd.h> header file. */
#undef HAVE_UNISTD_H

/* Define to 1 if the compiler supports simple visibility declarations. */
/* Define to 1 if the compiler supports GCC compatible visibility
declarations. */
#undef HAVE_VISIBILITY

/* Define to 1 if you have the <wchar.h> header file. */
Expand Down
41 changes: 38 additions & 3 deletions src/pcre2grep.c
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ static BOOL show_total_count = FALSE;
static BOOL silent = FALSE;
static BOOL utf = FALSE;
static BOOL posix_digit = FALSE;
static BOOL posix_pattern_file = FALSE;

static uint8_t utf8_buffer[8];

Expand Down Expand Up @@ -428,6 +429,7 @@ used to identify them. */
#define N_POSIX_DIGIT (-26)
#define N_GROUP_SEPARATOR (-27)
#define N_NO_GROUP_SEPARATOR (-28)
#define N_POSIX_PATFILE (-29)

static option_item optionlist[] = {
{ OP_NODATA, N_NULL, NULL, "", "terminate options" },
Expand All @@ -449,6 +451,7 @@ static option_item optionlist[] = {
{ OP_PATLIST, 'e', &match_patdata, "regex(p)=pattern", "specify pattern (may be used more than once)" },
{ OP_NODATA, 'F', NULL, "fixed-strings", "patterns are sets of newline-separated strings" },
{ OP_FILELIST, 'f', &pattern_files_data, "file=path", "read patterns from file" },
{ OP_NODATA, N_POSIX_PATFILE, NULL, "posix-pattern-file", "use POSIX semantics for pattern files" },
{ OP_FILELIST, N_FILE_LIST, &file_lists_data, "file-list=path","read files to search from file" },
{ OP_NODATA, N_FOFFSETS, NULL, "file-offsets", "output file offsets, not text" },
{ OP_STRING, N_GROUP_SEPARATOR, &group_separator, "group-separator=text", "set separator between groups of lines" },
Expand Down Expand Up @@ -1448,7 +1451,34 @@ while ((c = fgetc(f)) != EOF)
return yield;
}

/*************************************************
* Read one pattern from file *
*************************************************/

/* Wrap around read_one_line() to make sure any terminating '\n' is not
included in the pattern and empty patterns are correctly identified.

Arguments:
buffer the buffer to read into
length maximum number of characters to read and report how many were
f the file

Returns: TRUE if a pattern was read into buffer
*/

static BOOL
read_pattern(char *buffer, PCRE2_SIZE *length, FILE *f)
{
*buffer = '\0';
*length = read_one_line(buffer, *length, f);
if (*length > 0 && buffer[*length-1] == '\n') *length = *length - 1;
if (posix_pattern_file && *length > 0 && buffer[*length-1] == '\r')
{
*length = *length - 1;
if (*length == 0) return TRUE;
}
return (*length > 0 || *buffer == '\n');
}

/*************************************************
* Find end of line *
Expand Down Expand Up @@ -3598,6 +3628,7 @@ switch(letter)
case N_NOJIT: use_jit = FALSE; break;
case N_ALLABSK: extra_options |= PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK; break;
case N_NO_GROUP_SEPARATOR: group_separator = NULL; break;
case N_POSIX_PATFILE: posix_pattern_file = TRUE; break;
case 'a': binary_files = BIN_TEXT; break;
case 'c': count_only = TRUE; break;
case N_POSIX_DIGIT: posix_digit = TRUE; break;
Expand Down Expand Up @@ -3808,11 +3839,15 @@ else
filename = name;
}

while ((patlen = read_one_line(buffer, sizeof(buffer), f)) > 0)
while ((patlen = sizeof(buffer)) && read_pattern(buffer, &patlen, f))
{
while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--;
if (!posix_pattern_file)
{
while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--;
}

linenumber++;
if (patlen == 0) continue; /* Skip blank lines */
if (!posix_pattern_file && patlen == 0) continue; /* Skip blank lines */

/* Note: this call to add_pattern() puts a pointer to the local variable
"buffer" into the pattern chain. However, that pointer is used only when
Expand Down
1 change: 1 addition & 0 deletions testdata/grepinputv
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
26 changes: 26 additions & 0 deletions testdata/grepoutput
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,7 @@ The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
RC=0
---------------------------- Test 52 ------------------------------
fox jumps
Expand Down Expand Up @@ -1169,6 +1170,7 @@ The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
RC=0
---------------------------- Test 146 -----------------------------
(standard input):A123B
Expand Down Expand Up @@ -1253,3 +1255,27 @@ RC=0
36-sixteen
37-seventeen
RC=0
---------------------------- Test 154 -----------------------------
RC=1
---------------------------- Test 155 -----------------------------
RC=1
---------------------------- Test 156 -----------------------------
The quick brown
fox jumps
over the lazy dog.
This time it jumps and jumps and jumps.
This line contains \E and (regex) *meta* [characters].
The word is cat in this line
The caterpillar sat on the mat
The snowcat is not an animal
A buried feline in the syndicate
trailing spaces
RC=0
---------------------------- Test 157 -----------------------------
RC=0
---------------------------- Test 158 -----------------------------
trailing spaces
RC=0
---------------------------- Test 159 -----------------------------
trailing spaces
RC=0
Expand Down
Loading