Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue in GpuArrayExists where a parent view outlived the child #5232

Merged

Conversation

abellina
Copy link
Collaborator

Signed-off-by: Alessandro Bellina abellina@nvidia.com

Closes #5183.

This fixes a subtle bug with GpuArrayExists.legacyExists that manifests itself with the CUDA async allocator, but is not caused by it.

In the old code, noNullsChildView was closed when replaceChildNullsByFalseView returned, and it was assumed to be a valid child in the calling code (legacyExists). I changed it so this child view is alive and well while we call existsReduce.

Note this seems to me like a bug that exists in 22.04 also.

@abellina
Copy link
Collaborator Author

@tgravescs @sameerz @jlowe fyi it wasn't a cuDF bug as I originally thought. Should we backport this to 22.04?

@abellina abellina added the bug Something isn't working label Apr 13, 2022
@abellina abellina self-assigned this Apr 13, 2022
@abellina abellina added this to the Apr 4 - Apr 15 milestone Apr 13, 2022
@abellina
Copy link
Collaborator Author

build

@abellina abellina changed the base branch from branch-22.06 to branch-22.04 April 13, 2022 13:43
@abellina
Copy link
Collaborator Author

I re-targeted this for 22.04 since it technically is a bug in 22.04

@jlowe
Copy link
Member

jlowe commented Apr 13, 2022

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM.

As for 22.04 it is not a high priority because the call path is off by default on Spark 3.x. However, the fix is also very localized to this call path, thus it's a strict improvement without risking to break anything that was not broken before.

@abellina abellina merged commit ab1ba06 into NVIDIA:branch-22.04 Apr 13, 2022
@abellina abellina deleted the fix/gpu_array_exists_3vloff_issue branch April 13, 2022 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] UCX EGX integration test array_test.py::test_array_exists failures
3 participants