Update PandasUDF doc #2089

wjxiz1992 · 2021-04-07T10:24:42Z

Fixes #2053.
Update Pandas UDF doc with more details description.

Signed-off-by: Allen Xu wjxiz1992@gmail.com

tgravescs · 2021-04-07T13:12:46Z

docs/additional-functionality/rapids-udfs.md


- **Share GPU with JVM**: Let the Python process share JVM GPU. The Python process could run on the
-  same GPU with JVM.
+- **GPU Assignment(Scheduling) in Python Process**: Let the Python process share the same GPU with Spark executor JVM. Without this feature, some use case in PandasUDF(a.k.a an `independent` process) will likely to use other GPUs other than the one we want it to run on. e.g. user can launch a TensorFlow session inside Pandas UDF and the machine contains 8 GPUs. user launchs 8 Spark executors. Without this GPU sharing feature, TensorFlow will automatically use all 8 GPUs it can detects which will definitly conflict with existing Spark executor JVM processes.


Suggested change

- **GPU Assignment(Scheduling) in Python Process**: Let the Python process share the same GPU with Spark executor JVM. Without this feature, some use case in PandasUDF(a.k.a an `independent` process) will likely to use other GPUs other than the one we want it to run on. e.g. user can launch a TensorFlow session inside Pandas UDF and the machine contains 8 GPUs. user launchs 8 Spark executors. Without this GPU sharing feature, TensorFlow will automatically use all 8 GPUs it can detects which will definitly conflict with existing Spark executor JVM processes.

- **GPU Assignment(Scheduling) in Python Process**: Let the Python process share the same GPU with Spark executor JVM. Without this feature, in a non-isolated environment, some use cases with PandasUDF (a.k.a an `independent` process) can try to use GPUs other than the one we want it to run on. For example, the user could launch a TensorFlow session inside Pandas UDF and the machine contains 8 GPUs. Without this GPU sharing feature, TensorFlow will automatically use all 8 GPUs which will conflict with existing Spark executor JVM processes.

tgravescs · 2021-04-07T13:13:57Z

docs/additional-functionality/rapids-udfs.md

+
+    ```
+    --conf spark.rapids.sql.exec.ArrowEvalPythonExec=true \
+    --conf spark.rapids.sql.exec.MapInPandasExec=false \


did you mean to have these false? or are these false due to performance?

Removed and add a status table for them for better view.

tgravescs · 2021-04-07T13:14:13Z

docs/additional-functionality/rapids-udfs.md

+    --conf spark.rapids.sql.exec.WindowInPandasExec=true
+    ```
+
+    These configs are the switches for each type of PandasUDF execution plan. Some of theme are set to false by default due to not supported or performance issue.


s/theme/them/

Removed as suggested.

tgravescs · 2021-04-07T13:16:51Z

docs/additional-functionality/rapids-udfs.md

+    --conf spark.rapids.python.memory.gpu.allocFraction=0.1 \
+    --conf spark.rapids.python.memory.gpu.maxAllocFraction= 0.2 \
+    ```
+    Same to the [RMM pooling for JVM](../tuning-guide.md#pooled-memory), here the pooling serves the same way but for Python process.  `half of the rest GPU memory` will be used by default if it is not specified.


to be clear here, is this the python process will assume it can use half the memory of the GPU? I think we should mention the other spark.rapids.memory.gpu.allocFraction setting here.

tgravescs · 2021-04-07T13:21:01Z

docs/additional-functionality/rapids-udfs.md

+    ```
+    --conf spark.rapids.python.concurrentPythonWorkers=2 \
+    ```
+    This parameter aims to limit the total concurrent running `Python process` in 1 Spark executor. This parameter is set to 0 by default which means there's not limit for concurrent Python workers. Note that for certain cases, setting this value too small may result a `hang` for your Spark job because a PandasUDF may produces multiple python process and each will try to acquire the python GPU process semaphore. This may bring a dead lock situation becasue a Spark job will not preceed until all its tasks are finished. For example, in the Pandas UDF stage, 2 tasks are running and each task launches 3 Python process and we set this parameter to 4.


I'm a little confused by this, how do I know how many python processes I need? On spark I usually get one per python process per task. Can we add more detail here

I made wrong description here, sorry for that. Discussed with Liangcai today and re-state with more specific example.

docs/additional-functionality/rapids-udfs.md

wjxiz1992 · 2021-04-08T13:03:35Z

Hi, @jlowe @tgravescs @firestarman I updated the doc, main modifications are:

add a Exec support status table, this aims to help users know what exec switch they should turn on and know the support status.
remove unnecessary parts.
refine the concurrentPythonWorkers parts with more details example.

The 3) are most difficult to explain as it relies on the implementation a lot, please help check if that part is clear for readers.
Thanks a lot!

tgravescs · 2021-04-08T13:12:10Z

docs/additional-functionality/rapids-udfs.md

+  |----------------------|----------|--------|
+  |ArrowEvalPythonExec|[Series to Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series), [Iterator of Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series) and [Iterator of Multiple Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#iterator-of-multiple-series-to-iterator-of-series)| supported|
+  |MapInPandasExec| [Map](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#map)| supported|
+  | WindowInPandasExec | [Window](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-scalar)| supported|


nit looks like we have extra spaces in a couple of these, we may also switch that column to be second and have use case third

tgravescs · 2021-04-08T13:21:36Z

docs/additional-functionality/rapids-udfs.md

+Accelerator has a 1-1 mapping support for each of them. Not all PandasUDF types are data-transfer
+accelerated at present:
+
+  | Spark Execution Plan | Use Case | Status |


can we change Status to be data transfer accelerated or accelerated just to be more clear

Use data transfer accelerated.

tgravescs · 2021-04-08T13:34:40Z

docs/additional-functionality/rapids-udfs.md

+    ![Python concurrent worker](/docs/img/concurrentPythonWorker.PNG)
+
+    In this case, each PandasUDF will launch a Python process. At this moment two python process
+    in each task acquired their semaphore but neither of them are able to proceed becasue both


because is misspelled

docs/additional-functionality/rapids-udfs.md

jlowe

Just a small update but otherwise lgtm.

docs/additional-functionality/rapids-udfs.md

sameerz

Looks good to me, aside from the requested changes.

docs/additional-functionality/rapids-udfs.md

wjxiz1992 · 2021-04-09T03:55:44Z

Add one more example for GpuArrowEvalPython suggested by team. along with some doc clean.

firestarman · 2021-04-09T04:51:29Z

LGTM

docs/additional-functionality/rapids-udfs.md

krajendrannv · 2021-04-12T00:46:31Z

docs/additional-functionality/rapids-udfs.md

-1. Make sure GPU exclusive mode is disabled. Note that this will not work if you are using exclusive
-   mode to assign GPUs under spark.
-2. Currently the python files are packed into the spark rapids plugin jar.
+1. Make sure GPU `exclusive` mode is disabled. Note that this will not work if you are using exclusive


For On-prem YARN instructions, we add a line "refer to nvidia-smi documentation".

Do we need to add the link or sample like this?
nvidia-smi -i 0 -c EXCLUSIVE_PROCESS # Set GPU 0 to exclusive mode, run as root.

Also, if this mandatory for YARN, should we say that more explicitly?

Thanks for review!
Added a line to show the command to set GPU to Default mode.
I think no need to say if it's for yarn or for other environment. The mode here is only for GPU hardware. If the GPU is in Default mode, it will always be default mode no matter on Yarn or on Standalone or Local.

viadea · 2021-04-10T06:03:17Z

docs/additional-functionality/rapids-udfs.md

-  same GPU with JVM.
+- **GPU Assignment(Scheduling) in Python Process**: Let the Python process share the same GPU with
+Spark executor JVM. Without this feature, in a non-isolated environment, some use cases with
+PandasUDF (a.k.a an `independent` process) can try to use GPUs other than the one we want it to


Suggested change

PandasUDF (a.k.a an `independent` process) can try to use GPUs other than the one we want it to

Pandas UDF (an `independent` python daemon process) can try to use GPUs other than the one we want it to

viadea · 2021-04-10T06:06:48Z

docs/additional-functionality/rapids-udfs.md

@@ -174,31 +178,102 @@ To enable _GPU Scheduling for Pandas UDF_, you need to configure your spark job
    --py-files ${SPARK_RAPIDS_PLUGIN_JAR}


Could you also change the standalone part's rapids-4-spark_2.12-0.5.0-SNAPSHOT.jar to ${SPARK_RAPIDS_PLUGIN_JAR} ?

viadea · 2021-04-10T06:07:39Z

docs/additional-functionality/rapids-udfs.md

    ```

-Please note the data transfer acceleration only supports scalar UDF and Scalar iterator UDF currently. 
-You could choose the exec you need to enable.
+Please note: every type of PandasUDF on Spark is run by a specific Spark execution plan. RAPIDS


Suggested change

Please note: every type of PandasUDF on Spark is run by a specific Spark execution plan. RAPIDS

Please note: every type of Pandas UDF on Spark is run by a specific Spark execution plan. RAPIDS

viadea · 2021-04-10T06:08:14Z

docs/additional-functionality/rapids-udfs.md

-Please note the data transfer acceleration only supports scalar UDF and Scalar iterator UDF currently. 
-You could choose the exec you need to enable.
+Please note: every type of PandasUDF on Spark is run by a specific Spark execution plan. RAPIDS
+Accelerator has a 1-1 mapping support for each of them. Not all PandasUDF types are data-transfer


Suggested change

Accelerator has a 1-1 mapping support for each of them. Not all PandasUDF types are data-transfer

Accelerator has a 1-1 mapping support for each of them. Not all Pandas UDF types are data-transfer

viadea · 2021-04-10T06:09:45Z

docs/additional-functionality/rapids-udfs.md

+
+  | Spark Execution Plan|Data Transfer Accelerated|Use Case|
+  |----------------------|----------|--------|
+  |ArrowEvalPythonExec|yes|[Series to Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series), [Iterator of Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series) and [Iterator of Multiple Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#iterator-of-multiple-series-to-iterator-of-series)| supported|


Suggested change

|ArrowEvalPythonExec|yes|[Series to Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series), [Iterator of Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series) and [Iterator of Multiple Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#iterator-of-multiple-series-to-iterator-of-series)| supported|

|ArrowEvalPythonExec|yes|[Series to Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#series-to-series), [Iterator of Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#iterator-of-series-to-iterator-of-series) and [Iterator of Multiple Series to Iterator of Series](https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html#iterator-of-multiple-series-to-iterator-of-series)| supported|

viadea · 2021-04-10T06:17:58Z

docs/additional-functionality/rapids-udfs.md

+    deadlock situation because a Spark job will not proceed until all its tasks are finished.
+
+    For example, in a specific Spark Stage that contais 3 PandasUDFs, 2 Spark tasks are running and
+    each task launches 3 Python process while we set this `concurrentPythonWorkers` to 4.


Suggested change

each task launches 3 Python process while we set this `concurrentPythonWorkers` to 4.

each task launches 3 Python processes while we set this `spark.rapids.python.concurrentPythonWorkers` to 4.

viadea · 2021-04-10T06:19:07Z

docs/additional-functionality/rapids-udfs.md

+
+    ![Python concurrent worker](../img/concurrentPythonWorker.PNG)
+
+    In this case, each PandasUDF will launch a Python process. At this moment two Python process


Suggested change

In this case, each PandasUDF will launch a Python process. At this moment two Python process

In this case, each Pandas UDF will launch a Python process. At this moment two Python processes

viadea · 2021-04-10T06:21:17Z

docs/additional-functionality/rapids-udfs.md

+    in each task(in light green) acquired their semaphore but neither of them are able to proceed
+    because both of them are waiting for their third semaphore to start the task.
+
+    Another example is to use ArrowEvalPythonExec, with the following code:


Suggested change

Another example is to use ArrowEvalPythonExec, with the following code:

Another example is to use `ArrowEvalPythonExec` with the following code:

viadea · 2021-04-10T06:23:20Z

docs/additional-functionality/rapids-udfs.md

+              +- GpuArrowEvalPython
+    ```
+    This means each Spark task will trigger 2 Python processes. In this case, if we set
+    `concurrentPythonWorkers=2`, it will also probably result in a hang as we allow 2 tasks running


Suggested change

`concurrentPythonWorkers=2`, it will also probably result in a hang as we allow 2 tasks running

`spark.rapids.python.concurrentPythonWorkers=2`, it will also probably result in a hang as we allow 2 tasks running

viadea · 2021-04-10T06:24:26Z

docs/additional-functionality/rapids-udfs.md

+    ```
+    This means each Spark task will trigger 2 Python processes. In this case, if we set
+    `concurrentPythonWorkers=2`, it will also probably result in a hang as we allow 2 tasks running
+    and each of them has 2 Python processes. Let's say Task_1_Process_1 and Task_2_Process_1


Suggested change

and each of them has 2 Python processes. Let's say Task_1_Process_1 and Task_2_Process_1

and each of them spawns 2 Python processes. Let's say Task_1_Process_1 and Task_2_Process_1

Thanks for review! Updated.

docs/additional-functionality/rapids-udfs.md

Update Pandas UDF doc with more details description Signed-off-by: Allen Xu <wjxiz1992@gmail.com>

sameerz · 2021-04-20T01:30:39Z

build

viadea · 2021-05-14T18:29:19Z

@firestarman @wjxiz1992 Just found a possible typo here spark.rapids.python.gpu.enabled should be changed to spark.rapids.sql.python.gpu.enabled. If you agree, do you want to create a PR to correct 0.5 and 0.6 doc?thx

revans2 · 2021-05-14T20:08:11Z

@firestarman @wjxiz1992 Just found a possible typo here spark.rapids.python.gpu.enabled should be changed to spark.rapids.sql.python.gpu.enabled. If you agree, do you want to create a PR to correct 0.5 and 0.6 doc?thx

Yes it should be spark.rapids.sql.python.gpu.enabled. That is what is in the code for the config.

* Update PandasUDF doc Update Pandas UDF doc with more details description Signed-off-by: Allen Xu <wjxiz1992@gmail.com> * Resolve comments * More doc clean * doc clean and table reformat * doc clean * doc clean * Doc update * resolve comments * Resolve comments * Resolve comments * resolve comments * resolve comments * Resolve comments

wjxiz1992 requested review from firestarman and tgravescs April 7, 2021 10:24

tgravescs reviewed Apr 7, 2021

View reviewed changes

jlowe requested changes Apr 7, 2021

View reviewed changes

firestarman reviewed Apr 8, 2021

View reviewed changes

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

wjxiz1992 requested review from viadea, tgravescs, firestarman and jlowe April 8, 2021 03:44

tgravescs reviewed Apr 8, 2021

View reviewed changes

jlowe requested changes Apr 8, 2021

View reviewed changes

jlowe self-requested a review April 8, 2021 16:52

jlowe reviewed Apr 8, 2021

View reviewed changes

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

sameerz requested changes Apr 8, 2021

View reviewed changes

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

sameerz added the documentation Improvements or additions to documentation label Apr 8, 2021

firestarman requested changes Apr 9, 2021

View reviewed changes

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

jlowe previously approved these changes Apr 9, 2021

View reviewed changes

wjxiz1992 dismissed jlowe’s stale review via ea09abf April 10, 2021 02:57

wjxiz1992 requested review from sameerz and firestarman April 10, 2021 02:58

sameerz requested changes Apr 11, 2021

View reviewed changes

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

krajendrannv reviewed Apr 12, 2021

View reviewed changes

viadea requested changes Apr 12, 2021

View reviewed changes

sameerz reviewed Apr 12, 2021

View reviewed changes

docs/additional-functionality/rapids-udfs.md Outdated Show resolved Hide resolved

sameerz previously approved these changes Apr 13, 2021

View reviewed changes

wjxiz1992 requested a review from viadea April 16, 2021 01:57

viadea previously approved these changes Apr 18, 2021

View reviewed changes

wjxiz1992 added 13 commits April 20, 2021 09:23

Update PandasUDF doc

87af3a0

Update Pandas UDF doc with more details description Signed-off-by: Allen Xu <wjxiz1992@gmail.com>

Resolve comments

8c474a5

More doc clean

faf8a56

doc clean and table reformat

5487b33

doc clean

84430b0

doc clean

65442be

Doc update

e8a9178

resolve comments

df3f7c6

Resolve comments

ad459a7

Resolve comments

3775ccb

resolve comments

e7b45ab

resolve comments

34d4061

Resolve comments

7cf1787

wjxiz1992 dismissed stale reviews from viadea, firestarman, and sameerz via 7cf1787 April 20, 2021 01:24

wjxiz1992 force-pushed the pandasudf-update branch from 58f6911 to 7cf1787 Compare April 20, 2021 01:24

sameerz approved these changes Apr 20, 2021

View reviewed changes

sameerz merged commit a50f9dd into NVIDIA:branch-0.5 Apr 20, 2021

	PandasUDF (a.k.a an `independent` process) can try to use GPUs other than the one we want it to
	Pandas UDF (an `independent` python daemon process) can try to use GPUs other than the one we want it to

		@@ -174,31 +178,102 @@ To enable _GPU Scheduling for Pandas UDF_, you need to configure your spark job
		--py-files ${SPARK_RAPIDS_PLUGIN_JAR}

	Please note: every type of PandasUDF on Spark is run by a specific Spark execution plan. RAPIDS
	Please note: every type of Pandas UDF on Spark is run by a specific Spark execution plan. RAPIDS

	Accelerator has a 1-1 mapping support for each of them. Not all PandasUDF types are data-transfer
	Accelerator has a 1-1 mapping support for each of them. Not all Pandas UDF types are data-transfer

	each task launches 3 Python process while we set this `concurrentPythonWorkers` to 4.
	each task launches 3 Python processes while we set this `spark.rapids.python.concurrentPythonWorkers` to 4.


		![Python concurrent worker](../img/concurrentPythonWorker.PNG)

		In this case, each PandasUDF will launch a Python process. At this moment two Python process

	In this case, each PandasUDF will launch a Python process. At this moment two Python process
	In this case, each Pandas UDF will launch a Python process. At this moment two Python processes

	Another example is to use ArrowEvalPythonExec, with the following code:
	Another example is to use `ArrowEvalPythonExec` with the following code:

	`concurrentPythonWorkers=2`, it will also probably result in a hang as we allow 2 tasks running
	`spark.rapids.python.concurrentPythonWorkers=2`, it will also probably result in a hang as we allow 2 tasks running

	and each of them has 2 Python processes. Let's say Task_1_Process_1 and Task_2_Process_1
	and each of them spawns 2 Python processes. Let's say Task_1_Process_1 and Task_2_Process_1

Update PandasUDF doc #2089

Update PandasUDF doc #2089

Conversation

wjxiz1992 commented Apr 7, 2021 • edited by jlowe Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjxiz1992 commented Apr 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlowe left a comment

Choose a reason for hiding this comment

sameerz left a comment

Choose a reason for hiding this comment

wjxiz1992 commented Apr 9, 2021

firestarman commented Apr 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sameerz commented Apr 20, 2021

viadea commented May 14, 2021

revans2 commented May 14, 2021

wjxiz1992 commented Apr 7, 2021 •

edited by jlowe

Loading