Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FOLLOW ON] Combining regex parsing in transpiling and regex rewrite in rlike #10817

Closed
thirtiseven opened this issue May 15, 2024 · 0 comments · Fixed by #10822
Closed

[FOLLOW ON] Combining regex parsing in transpiling and regex rewrite in rlike #10817

thirtiseven opened this issue May 15, 2024 · 0 comments · Fixed by #10822
Assignees
Labels
performance A performance related task/issue

Comments

@thirtiseven
Copy link
Collaborator

thirtiseven commented May 15, 2024

          nit: Could we have a follow on issue to figure out how to parse the regexp once instead of multiple times?

Originally posted by @revans2 in #10715 (comment)_

For regex operations like rlike, the input regex is transpiled to ast, then another regex string that is supported by cuDF, if it can't be transpiled, then the regex operation falls back to cpu. Transpiling is performed in tagExprForGpu.

#10715 introduced regex rewrite for rlike, it also needs to parse a regex string to ast in convertToGpu. This operation can be combined with the parsing in transpiling to save time and make the code cleaner.

We can refactor the transpiler code to split it into two steps: regex string to ast and ast to new regex string, and then move the regex rewrite to tagExprForGpu and then save the optimization type in Meta.

@thirtiseven thirtiseven self-assigned this May 15, 2024
@thirtiseven thirtiseven changed the title [FOLLOWUP] Combining regex parsing in transpiling and regex rewrite in rlike` [FOLLOW ON] Combining regex parsing in transpiling and regex rewrite in rlike May 15, 2024
@sameerz sameerz added the performance A performance related task/issue label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants