Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shutdown bugs in the RAPIDS Shuffle Manager #2950

Merged
merged 3 commits into from
Jul 19, 2021

Conversation

abellina
Copy link
Collaborator

@abellina abellina commented Jul 16, 2021

Signed-off-by: Alessandro Bellina abellina@nvidia.com

While debugging another issue, I noticed these were causing NPEs in the logs.

  • the stop() function in the shuffle manager can be called multiple times, so I am protecting against calling .close() multiple times in things like the bounce buffers
  • the worker and context can be null at various times during shut down, protecting against that too and fixing the order of shutdown in UCX.scala so it doesn't happen.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina added bug Something isn't working shuffle things that impact the shuffle plugin labels Jul 16, 2021
@abellina abellina added this to the July 5 - July 16 milestone Jul 16, 2021
@jlowe
Copy link
Member

jlowe commented Jul 16, 2021

build

1 similar comment
@pxLi
Copy link
Collaborator

pxLi commented Jul 19, 2021

build

@abellina abellina merged commit ca35279 into NVIDIA:branch-21.08 Jul 19, 2021
@abellina abellina deleted the shuffle/fix_ucx_shutdown_bugs branch July 19, 2021 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants