Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation #2429

Merged
merged 5 commits into from
Jun 3, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,12 @@ Name | Description | Default Value
-----|-------------|--------------
<a name="alluxio.pathsToReplace"></a>spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding alluxio scheme. Eg, when configureis set to "s3:/foo->alluxio://0.1.2.3:19998/foo,gcs:/bar->alluxio://0.1.2.3:19998/bar", which means: s3:/foo/a.csv will be replaced to alluxio://0.1.2.3:19998/foo/a.csv and gcs:/bar/b.csv will be replaced to alluxio://0.1.2.3:19998/bar/b.csv|None
<a name="cloudSchemes"></a>spark.rapids.cloudSchemes|Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: dbfs, s3, s3a, s3n, wasbs, gs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type|None
<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of total GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9
<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9
<a name="memory.gpu.debug"></a>spark.rapids.memory.gpu.debug|Provides a log of GPU memory allocations and frees. If set to STDOUT or STDERR the logging will go there. Setting it to NONE disables logging. All other values are reserved for possible future expansion and in the mean time will disable logging.|NONE
<a name="memory.gpu.direct.storage.spill.batchWriteBuffer.size"></a>spark.rapids.memory.gpu.direct.storage.spill.batchWriteBuffer.size|The size of the GPU memory buffer used to batch small buffers when spilling to GDS. Note that this buffer is mapped to the PCI Base Address Register (BAR) space, which may be very limited on some GPUs (e.g. the NVIDIA T4 only has 256 MiB), and it is also used by UCX bounce buffers.|8388608
<a name="memory.gpu.direct.storage.spill.enabled"></a>spark.rapids.memory.gpu.direct.storage.spill.enabled|Should GPUDirect Storage (GDS) be used to spill GPU memory buffers directly to disk. GDS must be enabled and the directory `spark.local.dir` must support GDS. This is an experimental feature. For more information on GDS, see https://docs.nvidia.com/gpudirect-storage/.|false
<a name="memory.gpu.maxAllocFraction"></a>spark.rapids.memory.gpu.maxAllocFraction|The fraction of total GPU memory that limits the maximum size of the RMM pool. The value must be greater than or equal to the setting for spark.rapids.memory.gpu.allocFraction. Note that this limit will be reduced by the reserve memory configured in spark.rapids.memory.gpu.reserve.|1.0
<a name="memory.gpu.minAllocFraction"></a>spark.rapids.memory.gpu.minAllocFraction|The fraction of total GPU memory that limits the minimum size of the RMM pool. The value must be less than or equal to the setting for spark.rapids.memory.gpu.allocFraction.|0.25
<a name="memory.gpu.oomDumpDir"></a>spark.rapids.memory.gpu.oomDumpDir|The path to a local directory where a heap dump will be created if the GPU encounters an unrecoverable out-of-memory (OOM) error. The filename will be of the form: "gpu-oom-<pid>.hprof" where <pid> is the process ID.|None
<a name="memory.gpu.pool"></a>spark.rapids.memory.gpu.pool|Select the RMM pooling allocator to use. Valid values are "DEFAULT", "ARENA", and "NONE". With "DEFAULT", `rmm::mr::pool_memory_resource` is used; with "ARENA", `rmm::mr::arena_memory_resource` is used. If set to "NONE", pooling is disabled and RMM just passes through to CUDA memory allocation directly. Note: "ARENA" is the recommended pool allocator if CUDF is built with Per-Thread Default Stream (PTDS), as "DEFAULT" is known to be unstable (https://github.com/NVIDIA/spark-rapids/issues/1141)|ARENA
<a name="memory.gpu.pooling.enabled"></a>spark.rapids.memory.gpu.pooling.enabled|Should RMM act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. DEPRECATED: please use spark.rapids.memory.gpu.pool instead.|true
Expand Down
5 changes: 3 additions & 2 deletions docs/tuning-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,9 @@ Default value: `0.9`

Allocating memory on a GPU can be an expensive operation. RAPIDS uses a pooling allocator
called [RMM](https://github.com/rapidsai/rmm) to mitigate this overhead. By default, on startup
the plugin will allocate `90%` (`0.9`) of the memory on the GPU and keep it as a pool that can
be allocated from. If the pool is exhausted more memory will be allocated and added to the pool.
the plugin will allocate `90%` (`0.9`) of the _available_ memory on the GPU and keep it as a pool
that can be allocated from. If the pool is exhausted more memory will be allocated and added to
the pool.
Most of the time this is a huge win, but if you need to share the GPU with other
[libraries](additional-functionality/ml-integration.md) that are not aware of RMM this can lead
to memory issues, and you may need to disable pooling.
Expand Down
1 change: 1 addition & 0 deletions integration_tests/run_pyspark_from_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ else
then
export PYSP_TEST_spark_rapids_memory_gpu_allocFraction=$MEMORY_FRACTION
export PYSP_TEST_spark_rapids_memory_gpu_maxAllocFraction=$MEMORY_FRACTION
export PYSP_TEST_spark_rapids_memory_gpu_minAllocFraction=0
python "${RUN_TESTS_COMMAND[@]}" "${TEST_PARALLEL_OPTS[@]}" "${TEST_COMMON_OPTS[@]}"
else
"$SPARK_HOME"/bin/spark-submit --jars "${ALL_JARS// /,}" \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,16 +168,24 @@ object GpuDeviceManager extends Logging {
// Align workaround for https://github.com/rapidsai/rmm/issues/527
def truncateToAlignment(x: Long): Long = x & ~511L

var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.total).toLong)
if (initialAllocation > info.free) {
logWarning(s"Initial RMM allocation (${toMB(initialAllocation)} MB) is " +
s"larger than free memory (${toMB(info.free)} MB)")
var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what if free is tiny? do we want a minimum, sometimes I can see this having caught bad setups. Meaning you tried to start 2 exectuors on same gpu when you shouldn't have, so now this will hide that kind of failures. I guess you will probably start seeing failures but I'm wondering how hard they will be to debug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I now see Bobby made similar comment about minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had missed Bobby's previous comment somehow. I am working on addressing this now. I also noticed that some of the existing error messages need a little rework as well.

val minAllocation = truncateToAlignment((conf.rmmAllocMinFraction * info.total).toLong)
if (initialAllocation < minAllocation) {
throw new IllegalArgumentException(s"The initial allocation of " +
s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
s"(=${conf.rmmAllocFraction}) and ${toMB(info.free)} MB free memory) was less than " +
s"the minimum allocation of ${toMB(minAllocation)} (calculated from " +
s"${RapidsConf.RMM_ALLOC_MIN_FRACTION} (=${conf.rmmAllocMinFraction}) " +
s"and ${toMB(info.total)} MB total memory)")
}
val maxAllocation = truncateToAlignment((conf.rmmAllocMaxFraction * info.total).toLong)
if (maxAllocation < initialAllocation) {
throw new IllegalArgumentException(s"${RapidsConf.RMM_ALLOC_MAX_FRACTION} " +
s"configured as ${conf.rmmAllocMaxFraction} which is less than the " +
s"${RapidsConf.RMM_ALLOC_FRACTION} setting of ${conf.rmmAllocFraction}")
throw new IllegalArgumentException(s"The initial allocation of " +
s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
s"(=${conf.rmmAllocFraction}) and ${toMB(info.free)} MB free memory) was more than " +
s"the maximum allocation of ${toMB(maxAllocation)} (calculated from " +
s"${RapidsConf.RMM_ALLOC_MAX_FRACTION} (=${conf.rmmAllocMaxFraction}) " +
s"and ${toMB(info.total)} MB total memory)")
}
val reserveAmount = conf.rmmAllocReserve
if (reserveAmount >= maxAllocation) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -320,10 +320,11 @@ object RapidsConf {
.createOptional

private val RMM_ALLOC_MAX_FRACTION_KEY = "spark.rapids.memory.gpu.maxAllocFraction"
private val RMM_ALLOC_MIN_FRACTION_KEY = "spark.rapids.memory.gpu.minAllocFraction"
private val RMM_ALLOC_RESERVE_KEY = "spark.rapids.memory.gpu.reserve"

val RMM_ALLOC_FRACTION = conf("spark.rapids.memory.gpu.allocFraction")
.doc("The fraction of total GPU memory that should be initially allocated " +
.doc("The fraction of available GPU memory that should be initially allocated " +
"for pooled memory. Extra memory will be allocated as needed, but it may " +
"result in more fragmentation. This must be less than or equal to the maximum limit " +
s"configured via $RMM_ALLOC_MAX_FRACTION_KEY.")
Expand All @@ -340,6 +341,13 @@ object RapidsConf {
.checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
.createWithDefault(1)

val RMM_ALLOC_MIN_FRACTION = conf(RMM_ALLOC_MIN_FRACTION_KEY)
.doc("The fraction of total GPU memory that limits the minimum size of the RMM pool. " +
s"The value must be less than or equal to the setting for $RMM_ALLOC_FRACTION.")
.doubleConf
.checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
.createWithDefault(0.25)

val RMM_ALLOC_RESERVE = conf(RMM_ALLOC_RESERVE_KEY)
.doc("The amount of GPU memory that should remain unallocated by RMM and left for " +
"system use such as memory needed for kernels, kernel launches or JIT compilation.")
Expand Down Expand Up @@ -1354,6 +1362,8 @@ class RapidsConf(conf: Map[String, String]) extends Logging {

lazy val rmmAllocMaxFraction: Double = get(RMM_ALLOC_MAX_FRACTION)

lazy val rmmAllocMinFraction: Double = get(RMM_ALLOC_MIN_FRACTION)

lazy val rmmAllocReserve: Long = get(RMM_ALLOC_RESERVE)

lazy val hostSpillStorageSize: Long = get(HOST_SPILL_STORAGE_SIZE)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@ class GpuDeviceManagerSuite extends FunSuite with Arm {
TrampolineUtil.cleanupAnyExistingSession()
val totalGpuSize = Cuda.memGetInfo().total
val initPoolFraction = 0.1
val minPoolFraction = 0.01
val maxPoolFraction = 0.2
val conf = new SparkConf()
.set(RapidsConf.POOLED_MEM.key, "true")
.set(RapidsConf.RMM_POOL.key, "ARENA")
.set(RapidsConf.RMM_ALLOC_FRACTION.key, initPoolFraction.toString)
.set(RapidsConf.RMM_ALLOC_MIN_FRACTION.key, minPoolFraction.toString)
.set(RapidsConf.RMM_ALLOC_MAX_FRACTION.key, maxPoolFraction.toString)
.set(RapidsConf.RMM_ALLOC_RESERVE.key, "0")
try {
Expand Down