Skip to content

Commit

Permalink
When PTDS is enabled, print warning if the allocator is not ARENA (NV…
Browse files Browse the repository at this point in the history
…IDIA#1677)

* When PTDS is enabled, print warning if the allocator is not ARENA

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

* Apply suggestions from code review

Per-thread -> Per-Thread

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
  • Loading branch information
abellina and jlowe authored Feb 8, 2021
1 parent 625eb81 commit d893fd1
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Name | Description | Default Value
<a name="memory.gpu.direct.storage.spill.enabled"></a>spark.rapids.memory.gpu.direct.storage.spill.enabled|Should GPUDirect Storage (GDS) be used to spill GPU memory buffers directly to disk. GDS must be enabled and the directory `spark.local.dir` must support GDS. This is an experimental feature. For more information on GDS, see https://docs.nvidia.com/gpudirect-storage/.|false
<a name="memory.gpu.maxAllocFraction"></a>spark.rapids.memory.gpu.maxAllocFraction|The fraction of total GPU memory that limits the maximum size of the RMM pool. The value must be greater than or equal to the setting for spark.rapids.memory.gpu.allocFraction. Note that this limit will be reduced by the reserve memory configured in spark.rapids.memory.gpu.reserve.|1.0
<a name="memory.gpu.oomDumpDir"></a>spark.rapids.memory.gpu.oomDumpDir|The path to a local directory where a heap dump will be created if the GPU encounters an unrecoverable out-of-memory (OOM) error. The filename will be of the form: "gpu-oom-<pid>.hprof" where <pid> is the process ID.|None
<a name="memory.gpu.pool"></a>spark.rapids.memory.gpu.pool|Select the RMM pooling allocator to use. Valid values are "DEFAULT", "ARENA", and "NONE". With "DEFAULT", `rmm::mr::pool_memory_resource` is used; with "ARENA", `rmm::mr::arena_memory_resource` is used. If set to "NONE", pooling is disabled and RMM just passes through to CUDA memory allocation directly.|ARENA
<a name="memory.gpu.pool"></a>spark.rapids.memory.gpu.pool|Select the RMM pooling allocator to use. Valid values are "DEFAULT", "ARENA", and "NONE". With "DEFAULT", `rmm::mr::pool_memory_resource` is used; with "ARENA", `rmm::mr::arena_memory_resource` is used. If set to "NONE", pooling is disabled and RMM just passes through to CUDA memory allocation directly. Note: "ARENA" is the recommended pool allocator if CUDF is built with Per-thread Default Stream (PTDS), as "DEFAULT" is known to be unstable (https://github.com/NVIDIA/spark-rapids/issues/1141)|ARENA
<a name="memory.gpu.pooling.enabled"></a>spark.rapids.memory.gpu.pooling.enabled|Should RMM act as a pooling allocator for GPU memory, or should it just pass through to CUDA memory allocation directly. DEPRECATED: please use spark.rapids.memory.gpu.pool instead.|true
<a name="memory.gpu.reserve"></a>spark.rapids.memory.gpu.reserve|The amount of GPU memory that should remain unallocated by RMM and left for system use such as memory needed for kernels, kernel launches or JIT compilation.|1073741824
<a name="memory.host.spillStorageSize"></a>spark.rapids.memory.host.spillStorageSize|Amount of off-heap host memory to use for buffering spilled GPU data before spilling to local disk|1073741824
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,11 @@ object GpuDeviceManager extends Logging {
if (conf.isPooledMemEnabled) {
init = conf.rmmPool match {
case c if "default".equalsIgnoreCase(c) =>
if (Cuda.isPtdsEnabled) {
logWarning("Configuring the DEFAULT allocator with a CUDF built for " +
"Per-Thread Default Stream (PTDS). This is known to be unstable! " +
"We recommend you use the ARENA allocator when PTDS is enabled.")
}
features += "POOLED"
init | RmmAllocationMode.POOL
case c if "arena".equalsIgnoreCase(c) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,9 @@ object RapidsConf {
.doc("Select the RMM pooling allocator to use. Valid values are \"DEFAULT\", \"ARENA\", and " +
"\"NONE\". With \"DEFAULT\", `rmm::mr::pool_memory_resource` is used; with \"ARENA\", " +
"`rmm::mr::arena_memory_resource` is used. If set to \"NONE\", pooling is disabled and RMM " +
"just passes through to CUDA memory allocation directly.")
"just passes through to CUDA memory allocation directly. Note: \"ARENA\" is the " +
"recommended pool allocator if CUDF is built with Per-Thread Default Stream (PTDS), " +
"as \"DEFAULT\" is known to be unstable (https://github.com/NVIDIA/spark-rapids/issues/1141)")
.stringConf
.createWithDefault("ARENA")

Expand Down

0 comments on commit d893fd1

Please sign in to comment.