Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change unit tests that force ooms to specify the oom type (gpu|cpu) #10130

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

jbrennan333
Copy link
Collaborator

Now that #10013 is merged, we can differentiate between cpu and gpu when forcing ooms in unit tests.
As we add support for host memory retries, there is the potential that some forced ooms in unit tests will cause ooms in host memory retry loops, instead of the intended gpu retry loops. This already happened with GeneratedInternalRowToCudfRowIteratorRetrySuite, which was fixed by #10087.
To prevent this, this patch changes all of the unit tests to use the longer form of RmmSpark.forceRetryOOM|forceSplitAndRetryOOM and explicitly specify the oom injection type.

Signed-off-by: Jim Brennan <jimb@nvidia.com>
@jbrennan333 jbrennan333 added feature request New feature or request test Only impacts tests reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Dec 29, 2023
@jbrennan333 jbrennan333 self-assigned this Dec 29, 2023
@jbrennan333
Copy link
Collaborator Author

build

1 similar comment
@jbrennan333
Copy link
Collaborator Author

build

gerashegalov
gerashegalov previously approved these changes Dec 30, 2023
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sameerz sameerz removed the feature request New feature or request label Dec 31, 2023
@jbrennan333
Copy link
Collaborator Author

build

1 similar comment
@jbrennan333
Copy link
Collaborator Author

build

@jbrennan333
Copy link
Collaborator Author

Thanks @gerashegalov! I've updated copyrights. Can you take another look?

@jbrennan333 jbrennan333 merged commit ed1fa9f into NVIDIA:branch-24.02 Jan 3, 2024
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants