Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuPartitioning should close CVs before releasing semaphore #6913

Conversation

abellina
Copy link
Collaborator

Contributes to #6746, found while investigating #6758.

This is probably big in some cases? Before this change we would have copied to host, released the GPU, and then close the source columns. In some scenarios I could see N concurrent tasks reach here and end up allowing N more before we close the GPU buffers.

It is close in time, so I am not sure how much in practice we would see it.

@abellina abellina added bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Oct 25, 2022
@abellina abellina self-assigned this Oct 25, 2022
@abellina abellina changed the title GpuPartitioning should close CVs before releasing semaphore [BUG] GpuPartitioning should close CVs before releasing semaphore Oct 25, 2022
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina force-pushed the oom/partitioner_should_close_gpu_columns_earlier branch from a1553ac to c665346 Compare October 25, 2022 17:43
@abellina
Copy link
Collaborator Author

build

Copy link
Collaborator

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good change to me.

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 0c478c3 into NVIDIA:branch-22.12 Oct 25, 2022
@abellina abellina deleted the oom/partitioner_should_close_gpu_columns_earlier branch October 25, 2022 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants