Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config to limit maximum RMM pool size #517

Merged
merged 6 commits into from
Aug 10, 2020

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Aug 5, 2020

Signed-off-by: Jason Lowe jlowe@nvidia.com

Fixes #488.

This adds a new plugin config to artificially limit the amount of GPU memory that will be used by RMM in pool mode.

Note that this depends on rapidsai/cudf#5855 and must not be merged until that change is published in the cudf-0.15-SNAPSHOT artifact.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added the feature request New feature or request label Aug 5, 2020
@jlowe jlowe added this to the Aug 3 - Aug 14 milestone Aug 5, 2020
@jlowe jlowe self-assigned this Aug 5, 2020
@jlowe jlowe marked this pull request as draft August 5, 2020 20:12
@jlowe
Copy link
Member Author

jlowe commented Aug 5, 2020

build

docs/configs.md Outdated Show resolved Hide resolved
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
abellina
abellina previously approved these changes Aug 6, 2020
tgravescs
tgravescs previously approved these changes Aug 6, 2020
@jlowe jlowe marked this pull request as ready for review August 7, 2020 16:13
@jlowe
Copy link
Member Author

jlowe commented Aug 7, 2020

build

@jlowe jlowe dismissed stale reviews from tgravescs and abellina via 68c8c57 August 7, 2020 19:04
@jlowe
Copy link
Member Author

jlowe commented Aug 7, 2020

build

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
} else {
// Do not attempt to enforce any artificial pool limit based on queried GPU memory size
// if config indicates all GPU memory should be used.
0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should be more clear. In the case of UVM setting it above 1.0 might be logical so we should document in the config that this is happening.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RMM's pool_memory_resource will still query the GPU and use it's reported total memory as a maximum limit internally, although I'm unsure what that API does when UVM is being used. I'll still add a comment saying the maximum limit is disabled if the specified fraction >= 1.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe
Copy link
Member Author

jlowe commented Aug 7, 2020

build

@jlowe
Copy link
Member Author

jlowe commented Aug 10, 2020

@revans2 this is ready for another look.

@jlowe jlowe merged commit b83a480 into NVIDIA:branch-0.2 Aug 10, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add config to limit maximum RMM pool size

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Address review comments

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Do not attempt to specify a limit when max alloc fraction is 1

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Document artifical limit is not enforced when max alloc fraction == 1

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add config to limit maximum RMM pool size

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Address review comments

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Do not attempt to specify a limit when max alloc fraction is 1

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* Document artifical limit is not enforced when max alloc fraction == 1

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe deleted the rmm-pool-size-limit branch September 10, 2021 15:31
pxLi added a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022
Signed-off-by: Peixin Li <pxli@nyu.edu>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Ability to limit total GPU memory used
4 participants