PythonRunner Changes [databricks] #10274

razajafri · 2024-01-25T01:16:47Z

This is a backport of the changes from branch-24.02 which are required for the plugin to build on Databricks 11.3 after changes from apache/spark#42385 were ported over by the Databricks team.

This PR consists of the following (clean) cherry-picks that were needed. The order of cherry-pick is from bottom to top.

fb8537cb2 (HEAD -> python-runner, private/python-runner) Fixed 330db Shims to Adopt the PythonRunner Changes [databricks] (#10232)
f8d4c7384 Do some refactor for the Python UDF code to try to reduce duplicate code. (#9902)
1936ae796 Fix a potential data corruption for Pandas UDF (#9942)
c6496bd73 Fix a hang for Pandas UDFs on DB 13.3[databricks] (#9833)
764a923b4 Download Maven from apache.org archives (#10225)

Some changes may not be needed but I left them there so the cherry-pick is a clean pick

Fixes NVIDIA#10224 Replace broken install using apt by downloading Maven from apache.org. Signed-off-by: Gera Shegalov <gera@apache.org>

fix NVIDIA#9493 fix NVIDIA#9844 The python runner uses two separate threads to write and read data with Python processes, however on DB13.3, it becomes single-threaded, which means reading and writing run on the same thread. Now the first reading is always ahead of the first writing. But the original BatchQueue will wait on the first reading until the first writing is done. Then it will wait forever. Change made: - Update the BatchQueue to support asking for a batch instead of waiting unitl one is inserted into the queue. This can eliminate the order requirement of reading and writing. - Introduce a new class named BatchProducer to work with the new BatchQueue to support rows number peek on demand for the reading. - Apply this new BatchQueue to relevant plans. - Update the Python runners to support writing one batch one time for the singled-threaded model. - Found an issue about PythonUDAF and RunningWindoFunctionExec, it may be a bug specific to DB 13.3, and add a test (test_window_aggregate_udf_on_cpu) for it. - Other small refactors --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com>

This PR moves the BatchQueue into the DataProducer to share the same lock as the output iterator returned by asIterator, and make the batch movement from the input iterator to the batch queue be an atomic operation to eliminate the race when appending the batches to the queue.

…ode. (NVIDIA#9902) Signed-off-by: Firestarman <firestarmanllc@gmail.com>

…DIA#10232) This PR removes the old 330db shims in favor of the new Shims, similar to the one in 341db. **Tests:** Ran udf_test.py on Databricks 11.3 and they all passed. fixes NVIDIA#10228 --------- Signed-off-by: raza jafri <rjafri@nvidia.com>

razajafri · 2024-01-25T16:30:47Z

build

jlowe · 2024-01-26T16:38:58Z

...spark311/scala/org/apache/spark/sql/rapids/execution/python/shims/GpuArrowPythonRunner.scala

+{"spark": "331"}
+{"spark": "332"}
+{"spark": "332cdh"}
+{"spark": "332db"}


Curious why this version of Databricks is here -- did it really not get the same changes but the version before and after did?

Ah, Databricks 12.2 doesn't have this change yet. Sounds like we're about to get broken again when it eventually does. Guess we'll backport to that one in the future when that happens.

update download page to v23.12.2 for the Databricks hotfix: NVIDIA#10274 Signed-off-by: Tim Liu <timl@nvidia.com>

update download page to v23.12.2 for the Databricks hotfix: #10274 Signed-off-by: Tim Liu <timl@nvidia.com>

This reverts commit dacc6fe.

gerashegalov and others added 5 commits January 24, 2024 18:42

Download Maven from apache.org archives (NVIDIA#10225)

764a923

Fixes NVIDIA#10224 Replace broken install using apt by downloading Maven from apache.org. Signed-off-by: Gera Shegalov <gera@apache.org>

Do some refactor for the Python UDF code to try to reduce duplicate c…

f8d4c73

…ode. (NVIDIA#9902) Signed-off-by: Firestarman <firestarmanllc@gmail.com>

razajafri requested review from jlowe, revans2, tgravescs, GaryShen2008 and NvTimLiu as code owners January 25, 2024 01:16

sameerz added the task Work required that improves the product but is not user facing label Jan 25, 2024

jlowe reviewed Jan 26, 2024

View reviewed changes

jlowe approved these changes Jan 26, 2024

View reviewed changes

razajafri merged commit dacc6fe into NVIDIA:branch-23.12 Jan 26, 2024
37 of 38 checks passed

razajafri deleted the python-runner branch January 26, 2024 17:00

NvTimLiu mentioned this pull request Jan 30, 2024

Upgrade version to 23.12.2-SNAPSHOT [databricks] #10323

Merged

NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this pull request Jan 30, 2024

update download page for v23.12.2 release

e788b9c

update download page to v23.12.2 for the Databricks hotfix: NVIDIA#10274 Signed-off-by: Tim Liu <timl@nvidia.com>

NvTimLiu mentioned this pull request Jan 30, 2024

update download page for v23.12.2 release [skip ci] #10329

Merged

NvTimLiu added a commit that referenced this pull request Jan 31, 2024

update download page for v23.12.2 release (#10329)

353c025

update download page to v23.12.2 for the Databricks hotfix: #10274 Signed-off-by: Tim Liu <timl@nvidia.com>

razajafri mentioned this pull request Feb 16, 2024

Plug-in Build Failing for Databricks 11.3 #10432

Closed

razajafri added a commit to razajafri/spark-rapids that referenced this pull request Feb 16, 2024

Revert "PythonRunner Changes [databricks] (NVIDIA#10274)"

55661be

This reverts commit dacc6fe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PythonRunner Changes [databricks] #10274

PythonRunner Changes [databricks] #10274

razajafri commented Jan 25, 2024

razajafri commented Jan 25, 2024

jlowe Jan 26, 2024

jlowe Jan 26, 2024

PythonRunner Changes [databricks] #10274

PythonRunner Changes [databricks] #10274

Conversation

razajafri commented Jan 25, 2024

razajafri commented Jan 25, 2024

jlowe Jan 26, 2024

Choose a reason for hiding this comment

jlowe Jan 26, 2024

Choose a reason for hiding this comment