Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] window on unbounded preceeding and unbounded following can produce incorrect results. #4084

Closed
revans2 opened this issue Nov 11, 2021 · 2 comments · Fixed by #4116
Closed
Assignees
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf P0 Must have for release

Comments

@revans2
Copy link
Collaborator

revans2 commented Nov 11, 2021

Describe the bug
I tried to do a very basic window operation. The simplest I could think of.

spark.time(spark.range(5L).selectExpr("SUM(1) OVER()").show(truncate=false))

This produces nulls for all values on the GPU, but on the CPU it is all 5s. I thought this might be related to ordering, but it is not.

spark.time(spark.range(5L).selectExpr("SUM(1) OVER(ORDER BY id ROWS BETWEEN unbounded preceding and unbounded following)").collect())

does the same thing.

It gets even odder with COUNT, that returns values, that are wrong, but the cast to string is all empty strings...

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Nov 11, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 12, 2021
@revans2 revans2 added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Nov 12, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Nov 12, 2021

This is a cudf bug

rapidsai/cudf#9672

I'll see if I can come up with a fix, but if we cannot get a fix soon we will have to fall back to the CPU for any row based window query where the values could overflow. (which might be all of them? or might be anything larger than 1/2 Int.MaxValue) I need to do some math to figure it out.

@revans2
Copy link
Collaborator Author

revans2 commented Nov 15, 2021

The CUDF fix was just merged. I'll put up some tests soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants