Skip to content

Commit

Permalink
Add docs about Spark 3.1 in standalone modes not needing extra class …
Browse files Browse the repository at this point in the history
…path (NVIDIA#1699)

* Add comment about Spark 3.1 in standalone modes not needing extra class
path

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change to use ,

* clarify jar name
  • Loading branch information
tgravescs authored Feb 10, 2021
1 parent bab272f commit 62823d9
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions docs/get-started/getting-started-on-prem.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ machine or multiple machines for distributed setup.
The first step is to [Install Spark](#install-spark), the
[RAPIDS Accelerator for Spark jars](#download-the-rapids-jars), and the
[GPU discovery script](#install-the-gpu-discovery-script) on all the nodes you want to use.
See the note at the end of this section if using Spark 3.1.1 or above.
After that choose one of the nodes to be your master node and start the master. Note that the
master process does **not** need a GPU to function properly.

Expand Down Expand Up @@ -176,6 +177,12 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.plugins=com.nvidia.spark.SQLPlugin
```

Please note that if you are using Spark 3.1.1 or higher, the RAPIDS Accelerator for Apache Spark plugin jar
and CUDF jar do not need to be installed on all the nodes and the configs
`spark.executor.extraClassPath` and `spark.driver.extraClassPath` can be replaced in the above
command with `--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}`.
This will automatically distribute the jars to the nodes for you.

## Running on YARN

YARN requires you to [Install Spark](#install-spark), the
Expand Down Expand Up @@ -223,7 +230,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
```

### YARN 2.10 with Isolation and GPU Scheduling Enabled
Expand All @@ -250,7 +257,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
```

### YARN without Isolation
Expand Down Expand Up @@ -288,7 +295,7 @@ $SPARK_HOME/bin/spark-shell \
--conf spark.resources.discoveryPlugin=com.nvidia.spark.ExclusiveModeGpuDiscoveryPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
```

## Running on Kubernetes
Expand Down Expand Up @@ -367,7 +374,7 @@ $SPARK_HOME/bin/spark-shell --master yarn \
--conf spark.task.resource.gpu.amount=0.166 \
--conf spark.executor.resource.gpu.amount=1 \
--files $SPARK_RAPIDS_DIR/getGpusResources.sh
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
```

## Example Join Operation
Expand Down

0 comments on commit 62823d9

Please sign in to comment.