Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default rmm alloc fraction to the max to avoid unnecessary fragmentation #2846

Merged
merged 3 commits into from
Aug 16, 2021

Conversation

rongou
Copy link
Collaborator

@rongou rongou commented Jun 29, 2021

With the current default of 0.9 for spark.rapids.memory.gpu.allocFraction, we first allocate 90% of GPU memory, and then when we run out of memory additional GPU memories are allocated. Because of how CUDA Unified Virtual Addressing (UVA) is set up, subsequently allocated addresses are discontinuous, causing unnecessary fragmentation.

Helps with #2504 but doesn't completely fixes it. Having a single continuous GPU address space also helps with debugging and visualization.

Signed-off-by: Rong Ou rong.ou@gmail.com

@rongou rongou added shuffle things that impact the shuffle plugin ease of use Makes the product simpler to use or configure labels Jun 29, 2021
@rongou rongou requested review from jlowe and abellina June 29, 2021 21:44
@rongou rongou self-assigned this Jun 29, 2021
@@ -330,7 +330,7 @@ object RapidsConf {
s"configured via $RMM_ALLOC_MAX_FRACTION_KEY.")
.doubleConf
.checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
.createWithDefault(0.9)
.createWithDefault(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this default means a new warning now appears on startup with the default configs. For example:

21/06/30 14:38:36 WARN GpuDeviceManager: Initial RMM allocation (15568.75 MB) is larger than the adjusted maximum allocation (15136.5 MB), lowering initial allocation to the adjusted maximum allocation.

It would be nice to not emit warnings with the default configs on most setups. We should consider whether this should be a warning or just an info, or maybe change the conditions under which it is a warning.

Copy link
Collaborator

@revans2 revans2 Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see us to take a deep look at how we are calculating the RMM pool size etc.

We have several configs associated with this.

  • spark.rapids.memory.gpu.allocFraction - 0 to 1 for the percentage of free memory allocated for the RMM pool. Percentage of free memory, but docs say available memory.
  • spark.rapids.memory.gpu.maxAllocFraction - 0 to 1 maximum amount of total GPU memory that can be allocated for the RMM pool. This is percentage of total memory except we also remove from it a reserved but reserved is from total, so if I have 1 GiB used already reserved is not doing anything.
  • spark.rapids.memory.gpu.minAllocFraction - 0 to 1 minimum amount of total GPU memory that can be allocated for the RMM pool before we throw an error saying it is too small. this is a percentage of the total memory.
  • spark.rapids.memory.gpu.reserve - number of bytes that have to be reserved for cuda/cudf to load kernels etc so we don't crash because RMM has everything allocated.

The code that does the warnings is a bit confusing too.

  • If initial allocation < min allocation throw an exception
  • if max allocation < min allocation throw an exception
  • if reserve amount >= max allocation throw an exception (not really sure what this is doing at all. If max and min are small but reserve is big that should be OK so long as everything fits.
  • adjust max allocation by subtracting reserve amount from it.
  • if adjusted max allocation <= initial allocation output a warning.

So to me it feels like we should instead.

  • get the reserved amount
  • get the amount of available memory on the GPU by taking free and subtracting reserved from it.
  • min allocation is calculated from the total memory on the GPU
  • if available is <= min allocation throw an exception this is not going to work.
  • max allocation is a percentage of the available memory, not just the free memory
  • initial allocation is a percentage of the available memory.
  • if initial allocation > max allocation output a warning and adjust initial allocation down to max allocation.

We will need to update the docs in the config section accordingly to better explain what is happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed how initial allocation is calculated to take into account the reserve amount. To me this seems to make sense since the most you can allocate is free-reserve.

@sameerz
Copy link
Collaborator

sameerz commented Jul 14, 2021

Please retarget to the 21.10 branch when it is available.

@rongou rongou changed the base branch from branch-21.08 to branch-21.10 July 30, 2021 20:57
Signed-off-by: Rong Ou <rong.ou@gmail.com>
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@jlowe
Copy link
Member

jlowe commented Aug 2, 2021

build

@rongou rongou merged commit 31182c3 into NVIDIA:branch-21.10 Aug 16, 2021
razajafri pushed a commit to razajafri/spark-rapids that referenced this pull request Aug 23, 2021
…ion (NVIDIA#2846)

* default rmm alloc fraction to the max to avoid unnecessary fragmentation

Signed-off-by: Rong Ou <rong.ou@gmail.com>

* initial allocation takes into account of the reserve

Signed-off-by: Rong Ou <rong.ou@gmail.com>
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease of use Makes the product simpler to use or configure shuffle things that impact the shuffle plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants