Skip to content

Commit

Permalink
Merge pull request #2997 from NVIDIA/branch-21.08
Browse files Browse the repository at this point in the history
[auto-merge] branch-21.08 to branch-21.10 [skip ci] [bot]
  • Loading branch information
nvauto authored Jul 22, 2021
2 parents 269ff40 + 2ddb450 commit 1f01135
Showing 1 changed file with 34 additions and 15 deletions.
49 changes: 34 additions & 15 deletions docs/additional-functionality/rapids-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,23 +290,42 @@ In this section, we are using a docker container built using the sample dockerfi
| 3.0.3 | com.nvidia.spark.rapids.spark303.RapidsShuffleManager |
| 3.0.4 | com.nvidia.spark.rapids.spark304.RapidsShuffleManager |
| 3.1.1 | com.nvidia.spark.rapids.spark311.RapidsShuffleManager |
| 3.1.1 CDH | com.nvidia.spark.rapids.spark311cdh.RapidsShuffleManager |
| 3.1.2 | com.nvidia.spark.rapids.spark312.RapidsShuffleManager |
| 3.1.3 | com.nvidia.spark.rapids.spark313.RapidsShuffleManager |
| 3.2.0 | com.nvidia.spark.rapids.spark320.RapidsShuffleManager |

2. Recommended settings for UCX 1.10.1+
```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.spark301.RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_RNDV_SCHEME=put_zcopy \
--conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1 \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
--conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR}
```

2. Settings for UCX 1.10.1+:

Minimum configuration:

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n
```

Recommended configuration:

```shell
...
--conf spark.shuffle.manager=com.nvidia.spark.rapids.[shim package].RapidsShuffleManager \
--conf spark.shuffle.service.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.executor.extraClassPath=${SPARK_CUDF_JAR}:${SPARK_RAPIDS_PLUGIN_JAR} \
--conf spark.executorEnv.UCX_ERROR_SIGNALS= \
--conf spark.executorEnv.UCX_MEMTYPE_CACHE=n \
--conf spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024 \
--conf spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp \
--conf spark.executorEnv.UCX_RNDV_SCHEME=put_zcopy \
--conf spark.executorEnv.UCX_MAX_RNDV_RAILS=1
```

Please replace `[shim package]` with the appropriate value. For example, the full class name for
Apache Spark 3.1.3 is: `com.nvidia.spark.rapids.spark313.RapidsShuffleManager`.

Please note `LD_LIBRARY_PATH` should optionally be set if the UCX library is installed in a
non-standard location.
Expand Down

0 comments on commit 1f01135

Please sign in to comment.