Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rlike support #3796

Merged
merged 10 commits into from
Oct 19, 2021
Merged

Add Rlike support #3796

merged 10 commits into from
Oct 19, 2021

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Oct 12, 2021

Closes #2.

This PR implements the RLike expression which works by calling cuDF's contains_re function.

There are a number of known issues where results are not consistent with Apache Spark and these are documented in the compatibility guide.

The feature is disabled by default.

I have filed a follow-on issue #3797 for improving the support.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove added the feature request New feature or request label Oct 12, 2021
@andygrove andygrove added this to the Oct 4 - Oct 15 milestone Oct 12, 2021
@andygrove andygrove self-assigned this Oct 12, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove mentioned this pull request Oct 13, 2021
docs/compatibility.md Show resolved Hide resolved
docs/compatibility.md Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
val pattern = if (rhs.isValid) {
rhs.getValue.asInstanceOf[UTF8String].toString
} else {
null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not look like we have tested the null case. Does CUDF do the correct thing in these cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the null path can't be hit here. If I test with expr RLIKE null then the rlike expression itself is optimized out of the plan and replaced with a null literal. Also, we only attempt to run on GPU if the rhs is a Literal. Are there any cases here where isValid could be false? If not, perhaps I should remove this conditional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the conditional and throw an exception instead with an explanation about how this is never reached. Just to be defensive.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 45c7e0a into NVIDIA:branch-21.12 Oct 19, 2021
@andygrove andygrove deleted the rlike branch November 30, 2021 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] RLIKE support
4 participants