-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config to limit maximum RMM pool size #517
Conversation
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
build |
build |
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
} else { | ||
// Do not attempt to enforce any artificial pool limit based on queried GPU memory size | ||
// if config indicates all GPU memory should be used. | ||
0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I should be more clear. In the case of UVM setting it above 1.0 might be logical so we should document in the config that this is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RMM's pool_memory_resource
will still query the GPU and use it's reported total memory as a maximum limit internally, although I'm unsure what that API does when UVM is being used. I'll still add a comment saying the maximum limit is disabled if the specified fraction >= 1.
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Jason Lowe <jlowe@nvidia.com>
build |
@revans2 this is ready for another look. |
* Add config to limit maximum RMM pool size Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Address review comments Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Do not attempt to specify a limit when max alloc fraction is 1 Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Document artifical limit is not enforced when max alloc fraction == 1 Signed-off-by: Jason Lowe <jlowe@nvidia.com>
* Add config to limit maximum RMM pool size Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Address review comments Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Do not attempt to specify a limit when max alloc fraction is 1 Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Document artifical limit is not enforced when max alloc fraction == 1 Signed-off-by: Jason Lowe <jlowe@nvidia.com>
Signed-off-by: Peixin Li <pxli@nyu.edu>
Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Signed-off-by: Jason Lowe jlowe@nvidia.com
Fixes #488.
This adds a new plugin config to artificially limit the amount of GPU memory that will be used by RMM in pool mode.
Note that this depends on rapidsai/cudf#5855 and must not be merged until that change is published in the cudf-0.15-SNAPSHOT artifact.