Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault occurs with -P (PCRE2 JIT matching) "\w" regex pattern #241

Closed
jhawthor opened this issue Nov 29, 2022 · 5 comments
Closed
Labels
problem Something isn't working due to a (minor) problem

Comments

@jhawthor
Copy link

Hardware:
Apple MacBook Pro 16in, 2021
Chip Apple M1 Pro
Memory 16GB
macOS Ventura 13.0.1

Description of issue:
Using the default compression from macOS, I created a zip folder of a javascript application which contains PNG and ICO files. Ugrep throws a segmentation fault when usig a regex pattern search. When I remove the png files from the archive, the pattern search works correctly. The pattern also works correctly with the ICO files in the archive.

Error:
Execution command with full archive:
ugrep -P -z --zmax=1 -n '[\w-]+@([\w-]+.)+[\w-]+' EventTracke.zip
Error:
[1] 31161 segmentation fault ugrep -P -z --zmax=1 -n '[\w-]+@([\w-]+.
)+[\w-]+' EventTracke.zip

Removing the png files from the archive:
ugrep -P -z --zmax=1 -n '[\w-]+@([\w-]+.*)+[\w-]+' EventTracke.zip
output:
{EventTracker/src/CommentCreateForm.js}:7:/// Author: dale@mywork.uk.com
{EventTracker/src/services/comments/index.js}:20:/// Modified By: dale@mywork.uk.com
No Error. Email is found.

@genivia-inc
Copy link
Member

I'll work on this.

I verified this on a MacOS 12.5.1 M1 Pro and I get the same problem with the command ugrep -P -z -n '[\w-]+@([\w-]+.)+[\w-]+' ugrep.zip. If I don't use -P then it works fine. The issue is correlated to pcre2 perhaps? Also -P with -I works fine, to ignore binary matches.

@genivia-inc genivia-inc added the problem Something isn't working due to a (minor) problem label Nov 29, 2022
@genivia-inc
Copy link
Member

I had to divert my attention to another project first, thank you for your patience.

After debugging the issue with ./build.sh CXXFLAGS=-g CFLAGS=-g and lldb (LLDB is painful, compared to GDB!), I found that PCRE2 JIT crashes at this line in ugrep/include/reflex/pcre2matcher.h:382:

rc = pcre2_jit_match(opc_, reinterpret_cast<PCRE2_SPTR>(buf_), end_, pos_, flg, dat_, ctx_);

Without JIT everything works fine. To verify this, I tested this code that does the same thing to match the input with PCRE2, but without JIT:

rc = pcre2_match(opc_, reinterpret_cast<PCRE2_SPTR>(buf_), end_, pos_, flg, dat_, ctx_);

Is this a bug in PCRE2 JIT perhaps when running on an M1? It certainly looks that way. This needs further investigation to find out.

@genivia-inc
Copy link
Member

A simplified test case with only -P perl matching with PCRE2:

ugrep -P "[\w-]+@([\w-]+.)+[\w-]+" ugrep/tests/archive2.tgz

This fails with EXC_BAD_ACCESS in the JIT code generated by PCRE2 for this regex pattern:

Process 52513 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1007fc3da)
    frame #0: 0x000000010070e490
->  0x10070e490: ldrh   w2, [x2, x0, lsl #1]
    0x10070e494: lsl    x0, x2, #3
    0x10070e498: lsl    x2, x2, #2
    0x10070e49c: add    x2, x2, x0
Target 0: (ugrep) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1007fc3da)
  * frame #0: 0x000000010070e490
    frame #1: 0x00000001007806a4 libpcre2-8.0.dylib`pcre2_jit_match_8 + 256

@genivia-inc genivia-inc changed the title segmentation fault occurs with zip file containing .png with "\w" regex pattern segmentation fault occurs with -P (PCRE2 JIT matching) "\w" regex pattern Dec 7, 2022
@genivia-inc genivia-inc changed the title segmentation fault occurs with -P (PCRE2 JIT matching) "\w" regex pattern segmentation fault occurs with -P (PCRE2 JIT matching) "\w" regex pattern but only on MacOS M1 Dec 7, 2022
@genivia-inc
Copy link
Member

Thanks for your patience. I normally address problems right away, but wasn't able to do this time due to several other important obligations.

After more testing, I am now convinced this is a problem with PCRE2 for this specific regex pattern when matching a binary file. When the regex pattern is compiled with pcre2_jit_compile(opc_, PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_HARD the problem persists. When PCRE2_JIT_PARTIAL_HARD is removed then the problem goes away. So something in the JIT code crashes in pcre2_jit_match() when the pattern is compiled with PCRE2_JIT_PARTIAL_HARD. I tried with different PCRE2 parameters to narrow this down. This problem can also happen on other platforms besides MacOS M1.

I will create a POC by isolating the problem in a few lines of C++ code to submit to the PCRE2 folks for them to analyze and fix. There isn't much else I can do on my end.

@genivia-inc
Copy link
Member

I've reported this JIT issue: PCRE2Project/pcre2#180

@genivia-inc genivia-inc changed the title segmentation fault occurs with -P (PCRE2 JIT matching) "\w" regex pattern but only on MacOS M1 segmentation fault occurs with -P (PCRE2 JIT matching) "\w" regex pattern Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Something isn't working due to a (minor) problem
Projects
None yet
Development

No branches or pull requests

2 participants