Skip to content

Commit

Permalink
Tweak RAPIDS Shuffle Manager configs for 21.08 (#2994)
Browse files Browse the repository at this point in the history
* The CDH shuffle manager was missing from the docs, and removes the reference to the Spark 3.2.0 shuffle manager until there is support

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

* Make UCX configuration clearer, by specifying minimum and recommended settings

* Move UCX_IB_RX_QUEUE_LEN to recommended list. Reorder for clarity

* Review comments: Make the shim part of the package generic
  • Loading branch information
abellina authored Jul 22, 2021
1 parent ca51ed2 commit 2ddb450
Showing 1 changed file with 34 additions and 15 deletions.
49 changes: 34 additions & 15 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,23 +290,42 @@ In this section, we are using a docker container built using the sample dockerfi
| 3.0.3 | com.nvidia.spark.rapids.spark303.RapidsShuffleManager |
| 3.0.4 | com.nvidia.spark.rapids.spark304.RapidsShuffleManager |
| 3.1.1 | com.nvidia.spark.rapids.spark311.RapidsShuffleManager |
| 3.1.1 CDH | com.nvidia.spark.rapids.spark311cdh.RapidsShuffleManager |
| 3.1.2 | com.nvidia.spark.rapids.spark312.RapidsShuffleManager |
| 3.1.3 | com.nvidia.spark.rapids.spark313.RapidsShuffleManager |
| 3.2.0 | com.nvidia.spark.rapids.spark320.RapidsShuffleManager |

2. Recommended settings for UCX 1.10.1+
```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.spark301.RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_RNDV_SCHEME=put_zcopy \
--conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1 \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
--conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}
```

2. Settings for UCX 1.10.1+:

Minimum configuration:

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n
```

Recommended configuration:

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
--conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \
--conf spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp \
--conf spark.executorEnv.UCX_RNDV_SCHEME=put_zcopy \
--conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1
```

Please replace `[shim package]` with the appropriate value. For example, the full class name for
Apache Spark 3.1.3 is: `com.nvidia.spark.rapids.spark313.RapidsShuffleManager`.

Please note `LD_LIBRARY_PATH` should optionally be set if the UCX library is installed in a
non-standard location.
Expand Down

0 comments on commit 2ddb450

Please sign in to comment.