Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Make regular expression behavior with $ and \r consistent with CPU #4170

Closed
andygrove opened this issue Nov 19, 2021 · 0 comments · Fixed by #4239
Closed

[FEA] Make regular expression behavior with $ and \r consistent with CPU #4170

andygrove opened this issue Nov 19, 2021 · 0 comments · Fixed by #4239
Assignees
Labels
feature request New feature or request

Comments

@andygrove
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Since rapidsai/cudf#9715 was merged, cuDF is consistent with Python, where the '$' EOL regex pattern character (without MULTILINE set) should match at the very end of a string and also just before the end of the string if the end of that string contains a new-line.

Java has similar behavior but considers \r to be a new-line character in this context, where cuDF and Python do not.

This means that a pattern such as a$ would match both a\n or a\r with Spark on CPU, but would only match a\n on the GPU.

Describe the solution you'd like
I would like the behavior to be consistent.

We either need additional support in cuDF to emulate Java behavior here, or we need to have our own version of the regex kernels, or perhaps we can find some workaround in the plugin but I don't have any ideas yet for how to do this in a simple and low-risk way.

Describe alternatives you've considered
None

Additional context
None.

@andygrove andygrove added feature request New feature or request ? - Needs Triage Need team to review and classify labels Nov 19, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 23, 2021
@andygrove andygrove added this to the Nov 30 - Dec 10 milestone Nov 30, 2021
@andygrove andygrove self-assigned this Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants