Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patterns such (3?)+ should now fall back to CPU #4715

Merged
merged 10 commits into from
Feb 18, 2022

Conversation

NVnavkumar
Copy link
Collaborator

@NVnavkumar NVnavkumar commented Feb 7, 2022

Partial fix for #4487.

Previously patterns like (3?)+ or (\A)+ would hang in libcudf. In some cases, that has now been corrected upstream and will now thrown an exception as an unsupported pattern. This fix will intercept these patterns in the transpiler to throw the appropriate RegexUnsupportedException

Fuzz testing currently brings up more possible edge cases that break in the plugin, so this is only considered a partial fix at moment. Further testing can be performed to finally handle the remaining edge cases.

Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
@sameerz sameerz added the task Work required that improves the product but is not user facing label Feb 11, 2022
@sameerz sameerz added this to the Feb 14 - Feb 25 milestone Feb 11, 2022
@NVnavkumar NVnavkumar changed the title [WIP] Patterns such (3?)+ should now fall back to CPU Patterns such (3?)+ should now fall back to CPU Feb 17, 2022
@NVnavkumar NVnavkumar marked this pull request as ready for review February 17, 2022 19:32
@NVnavkumar
Copy link
Collaborator Author

build

Signed-off-by: Navin Kumar <navink@nvidia.com>
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nits that aren't must-fix, lgtm.


private def isSupportedRepetitionBase(e: RegexAST): Boolean = {
e match {
case RegexEscaped(ch) if ch != 'd' && ch != 'w' => // example: "\B?"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comments should be consistent

Suggested change
case RegexEscaped(ch) if ch != 'd' && ch != 'w' => // example: "\B?"
case RegexEscaped(ch) if ch != 'd' && ch != 'w' =>
// example: "\B?"

case (RegexChar(a), _) if "$^".contains(a) =>
// example: "$*"
throw new RegexUnsupportedException(nothingToRepeat)
case (_, QuantifierFixedLength(0)) | (_, QuantifierVariableLength(0,Some(0)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case (_, QuantifierFixedLength(0)) | (_, QuantifierVariableLength(0,Some(0)))
case (_, QuantifierFixedLength(0)) | (_, QuantifierVariableLength(0, Some(0)))

throw new RegexUnsupportedException(nothingToRepeat)

case _ =>
case (RegexGroup(_, term), QuantifierVariableLength(_,None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case (RegexGroup(_, term), QuantifierVariableLength(_,None))
case (RegexGroup(_, term), QuantifierVariableLength(_, None))

@NVnavkumar NVnavkumar merged commit 203d273 into NVIDIA:branch-22.04 Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants