Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARTIAL_HARD with MATCH_INVALID_UTF does not give partial match on incomplete multibyte #239

Closed
jagprog5 opened this issue Apr 20, 2023 · 1 comment

Comments

@jagprog5
Copy link

Suppose I create a match pattern: 0xE6, 0xBC, 0xA2. This pattern gives a single complete utf8 character.

I then compile the pattern with PARTIAL_HARD and MATCH_INVALID_UTF, and use the compiled pattern to match against the subject string consisting of only the first byte of the pattern: 0xE6.

I expect the match to give a partial match consisting of the entire subject string. Instead, it gives no match. Is this correct behavior?

@carenas
Copy link
Contributor

carenas commented Apr 20, 2023

MATCH_INVALID_UTF means (ironically) that anything that is not perfectly valid UTF will be ignored, hence why you can't match an incomplete UTF subject.

If not in UTF mode (which means not using PCRE2_UTF nor PCRE2_MATCH_INVALID_UTF) you can:

$ pcre2test
PCRE2 version 10.42 2022-12-11
  re> /e6 bc a2/hex
data> \xe6\=ph
Partial match: \xe6

note the use of hex in pcre2test is just to avoid the ambiguity of using instead \x, so don't expect that in your pattern string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants