Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default rmm alloc fraction to the max to avoid unnecessary fragmentation #2846

Merged
merged 3 commits into from
Aug 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Name | Description | Default Value
-----|-------------|--------------
<a name="alluxio.pathsToReplace"></a>spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding alluxio scheme. Eg, when configureis set to "s3:/foo->alluxio://0.1.2.3:19998/foo,gcs:/bar->alluxio://0.1.2.3:19998/bar", which means: s3:/foo/a.csv will be replaced to alluxio://0.1.2.3:19998/foo/a.csv and gcs:/bar/b.csv will be replaced to alluxio://0.1.2.3:19998/bar/b.csv|None
<a name="cloudSchemes"></a>spark.rapids.cloudSchemes|Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: dbfs, s3, s3a, s3n, wasbs, gs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type|None
<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9
<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|1.0
<a name="memory.gpu.debug"></a>spark.rapids.memory.gpu.debug|Provides a log of GPU memory allocations and frees. If set to STDOUT or STDERR the logging will go there. Setting it to NONE disables logging. All other values are reserved for possible future expansion and in the mean time will disable logging.|NONE
<a name="memory.gpu.direct.storage.spill.batchWriteBuffer.size"></a>spark.rapids.memory.gpu.direct.storage.spill.batchWriteBuffer.size|The size of the GPU memory buffer used to batch small buffers when spilling to GDS. Note that this buffer is mapped to the PCI Base Address Register (BAR) space, which may be very limited on some GPUs (e.g. the NVIDIA T4 only has 256 MiB), and it is also used by UCX bounce buffers.|8388608
<a name="memory.gpu.direct.storage.spill.enabled"></a>spark.rapids.memory.gpu.direct.storage.spill.enabled|Should GPUDirect Storage (GDS) be used to spill GPU memory buffers directly to disk. GDS must be enabled and the directory `spark.local.dir` must support GDS. This is an experimental feature. For more information on GDS, see https://docs.nvidia.com/gpudirect-storage/.|false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,8 +171,11 @@ object GpuDeviceManager extends Logging {
// Align workaround for https://github.com/rapidsai/rmm/issues/527
def truncateToAlignment(x: Long): Long = x & ~511L

var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
val minAllocation = truncateToAlignment((conf.rmmAllocMinFraction * info.total).toLong)
val maxAllocation = truncateToAlignment((conf.rmmAllocMaxFraction * info.total).toLong)
val reserveAmount = conf.rmmAllocReserve
var initialAllocation = truncateToAlignment(
(conf.rmmAllocFraction * (info.free - reserveAmount)).toLong)
if (initialAllocation < minAllocation) {
throw new IllegalArgumentException(s"The initial allocation of " +
s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
Expand All @@ -181,7 +184,6 @@ object GpuDeviceManager extends Logging {
s"${RapidsConf.RMM_ALLOC_MIN_FRACTION} (=${conf.rmmAllocMinFraction}) " +
s"and ${toMB(info.total)} MB total memory)")
}
val maxAllocation = truncateToAlignment((conf.rmmAllocMaxFraction * info.total).toLong)
if (maxAllocation < initialAllocation) {
throw new IllegalArgumentException(s"The initial allocation of " +
s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
Expand All @@ -190,7 +192,6 @@ object GpuDeviceManager extends Logging {
s"${RapidsConf.RMM_ALLOC_MAX_FRACTION} (=${conf.rmmAllocMaxFraction}) " +
s"and ${toMB(info.total)} MB total memory)")
}
val reserveAmount = conf.rmmAllocReserve
if (reserveAmount >= maxAllocation) {
throw new IllegalArgumentException(s"RMM reserve memory (${toMB(reserveAmount)} MB) " +
s"larger than maximum pool size (${toMB(maxAllocation)} MB). Check the settings for " +
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ object RapidsConf {
s"configured via $RMM_ALLOC_MAX_FRACTION_KEY.")
.doubleConf
.checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
.createWithDefault(0.9)
.createWithDefault(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this default means a new warning now appears on startup with the default configs. For example:

21/06/30 14:38:36 WARN GpuDeviceManager: Initial RMM allocation (15568.75 MB) is larger than the adjusted maximum allocation (15136.5 MB), lowering initial allocation to the adjusted maximum allocation.

It would be nice to not emit warnings with the default configs on most setups. We should consider whether this should be a warning or just an info, or maybe change the conditions under which it is a warning.

Copy link
Collaborator

@revans2 revans2 Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see us to take a deep look at how we are calculating the RMM pool size etc.

We have several configs associated with this.

  • spark.rapids.memory.gpu.allocFraction - 0 to 1 for the percentage of free memory allocated for the RMM pool. Percentage of free memory, but docs say available memory.
  • spark.rapids.memory.gpu.maxAllocFraction - 0 to 1 maximum amount of total GPU memory that can be allocated for the RMM pool. This is percentage of total memory except we also remove from it a reserved but reserved is from total, so if I have 1 GiB used already reserved is not doing anything.
  • spark.rapids.memory.gpu.minAllocFraction - 0 to 1 minimum amount of total GPU memory that can be allocated for the RMM pool before we throw an error saying it is too small. this is a percentage of the total memory.
  • spark.rapids.memory.gpu.reserve - number of bytes that have to be reserved for cuda/cudf to load kernels etc so we don't crash because RMM has everything allocated.

The code that does the warnings is a bit confusing too.

  • If initial allocation < min allocation throw an exception
  • if max allocation < min allocation throw an exception
  • if reserve amount >= max allocation throw an exception (not really sure what this is doing at all. If max and min are small but reserve is big that should be OK so long as everything fits.
  • adjust max allocation by subtracting reserve amount from it.
  • if adjusted max allocation <= initial allocation output a warning.

So to me it feels like we should instead.

  • get the reserved amount
  • get the amount of available memory on the GPU by taking free and subtracting reserved from it.
  • min allocation is calculated from the total memory on the GPU
  • if available is <= min allocation throw an exception this is not going to work.
  • max allocation is a percentage of the available memory, not just the free memory
  • initial allocation is a percentage of the available memory.
  • if initial allocation > max allocation output a warning and adjust initial allocation down to max allocation.

We will need to update the docs in the config section accordingly to better explain what is happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, changed how initial allocation is calculated to take into account the reserve amount. To me this seems to make sense since the most you can allocate is free-reserve.


val RMM_ALLOC_MAX_FRACTION = conf(RMM_ALLOC_MAX_FRACTION_KEY)
.doc("The fraction of total GPU memory that limits the maximum size of the RMM pool. " +
Expand Down