Add config to limit maximum RMM pool size #517

jlowe · 2020-08-05T20:11:58Z

Signed-off-by: Jason Lowe jlowe@nvidia.com

Fixes #488.

This adds a new plugin config to artificially limit the amount of GPU memory that will be used by RMM in pool mode.

Note that this depends on rapidsai/cudf#5855 and must not be merged until that change is published in the cudf-0.15-SNAPSHOT artifact.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2020-08-05T20:12:33Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

docs/configs.md

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2020-08-07T16:13:54Z

build

jlowe · 2020-08-07T19:05:00Z

build

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

revans2 · 2020-08-07T21:45:28Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDeviceManager.scala

+      } else {
+        // Do not attempt to enforce any artificial pool limit based on queried GPU memory size
+        // if config indicates all GPU memory should be used.
+        0


We should document this

Sorry I should be more clear. In the case of UVM setting it above 1.0 might be logical so we should document in the config that this is happening.

I think RMM's pool_memory_resource will still query the GPU and use it's reported total memory as a maximum limit internally, although I'm unsure what that API does when UVM is being used. I'll still add a comment saying the maximum limit is disabled if the specified fraction >= 1.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2020-08-07T23:00:46Z

build

jlowe · 2020-08-10T18:28:11Z

@revans2 this is ready for another look.

* Add config to limit maximum RMM pool size Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Address review comments Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Do not attempt to specify a limit when max alloc fraction is 1 Signed-off-by: Jason Lowe <jlowe@nvidia.com> * Document artifical limit is not enforced when max alloc fraction == 1 Signed-off-by: Jason Lowe <jlowe@nvidia.com>

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>

Add config to limit maximum RMM pool size

b047bb1

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe added the feature request New feature or request label Aug 5, 2020

jlowe added this to the Aug 3 - Aug 14 milestone Aug 5, 2020

jlowe self-assigned this Aug 5, 2020

jlowe marked this pull request as draft August 5, 2020 20:12

abellina reviewed Aug 6, 2020

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Outdated Show resolved Hide resolved

abellina reviewed Aug 6, 2020

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala Outdated Show resolved Hide resolved

tgravescs reviewed Aug 6, 2020

View reviewed changes

docs/configs.md Outdated Show resolved Hide resolved

Address review comments

b8fbaeb

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

abellina previously approved these changes Aug 6, 2020

View reviewed changes

tgravescs previously approved these changes Aug 6, 2020

View reviewed changes

jlowe marked this pull request as ready for review August 7, 2020 16:13

Merge branch 'branch-0.2' into rmm-pool-size-limit

68c8c57

jlowe dismissed stale reviews from tgravescs and abellina via 68c8c57 August 7, 2020 19:04

Do not attempt to specify a limit when max alloc fraction is 1

fa00136

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

revans2 reviewed Aug 7, 2020

View reviewed changes

jlowe added 2 commits August 7, 2020 17:49

Document artifical limit is not enforced when max alloc fraction == 1

fbc6962

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

Merge branch 'branch-0.2' into rmm-pool-size-limit

41d4f87

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

revans2 approved these changes Aug 10, 2020

View reviewed changes

jlowe merged commit b83a480 into NVIDIA:branch-0.2 Aug 10, 2020

jlowe deleted the rmm-pool-size-limit branch September 10, 2021 15:31

pxLi added a commit to pxLi/spark-rapids that referenced this pull request May 12, 2022

Init blossom-ci workflow (NVIDIA#517)

0030645

Signed-off-by: Peixin Li <pxli@nyu.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config to limit maximum RMM pool size #517

Add config to limit maximum RMM pool size #517

jlowe commented Aug 5, 2020

jlowe commented Aug 5, 2020

jlowe commented Aug 7, 2020

jlowe commented Aug 7, 2020

revans2 Aug 7, 2020

revans2 Aug 7, 2020

jlowe Aug 7, 2020

jlowe commented Aug 7, 2020

jlowe commented Aug 10, 2020

Add config to limit maximum RMM pool size #517

Add config to limit maximum RMM pool size #517

Conversation

jlowe commented Aug 5, 2020

jlowe commented Aug 5, 2020

jlowe commented Aug 7, 2020

jlowe commented Aug 7, 2020

revans2 Aug 7, 2020

Choose a reason for hiding this comment

revans2 Aug 7, 2020

Choose a reason for hiding this comment

jlowe Aug 7, 2020

Choose a reason for hiding this comment

jlowe commented Aug 7, 2020

jlowe commented Aug 10, 2020