Build python output schema from udf expressions #1794

firestarman · 2021-02-23T03:47:14Z

Build the Python output schema from the Python UDF expressions instead of the plan result attributes, because the result attributes are NOT always equal to the Python output schema.

For example, on databricks when projecting only one column from a Python UDF output where containing multiple result columns, there will be only one attribute in the result attributes for the projecting output, but the output schema for this Python UDF contains multiple columns.

Closes #1644

Signed-off-by: Firestarman firestarmanllc@gmail.com

Because the result attributes are NOT always equal to the python output schema. For example, on databricks when projecting only one column from a python UDF output where containing multiple result columns, there will be only one attribute in the result attributes for the projecting output, but the output schema for this python udf contains multiple columns. Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman · 2021-02-23T03:47:26Z

build

firestarman · 2021-02-23T03:58:26Z

build

firestarman · 2021-02-23T04:24:43Z

build

revans2 · 2021-02-23T14:34:19Z

...gin/src/main/scala/org/apache/spark/sql/rapids/execution/python/GpuArrowEvalPythonExec.scala

@@ -555,7 +555,7 @@ case class GpuArrowEvalPythonExec(

    // cache in a local to avoid serializing the plan
    val inputSchema = child.output.toStructType
-    val pythonOutputSchema = StructType.fromAttributes(resultAttrs)
+    val pythonOutputSchema = StructType.fromAttributes(udfs.map(_.resultAttribute))


nit: Great catch on this by the way. Could we add some comments here explaining what is happening and why we are not using resultAttrs that was passed in? If I made the mistake before then a comment will hopefully help someone not break it going back again, especially if it only shows up for databricks.

Good suggestion, updated.

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman · 2021-02-24T01:27:24Z

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman mentioned this pull request Feb 23, 2021

[BUG] test_window_aggregate_udf_array_from_python fails on databricks #1644

Closed

revans2 previously approved these changes Feb 23, 2021

View reviewed changes

Add comments for the change

6580c70

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

firestarman dismissed revans2’s stale review via 6580c70 February 24, 2021 01:25

revans2 approved these changes Feb 24, 2021

View reviewed changes

revans2 merged commit d2b6bfc into NVIDIA:branch-0.4 Feb 24, 2021

firestarman deleted the fix-py-out-schema branch February 25, 2021 01:18

sameerz added the bug Something isn't working label Mar 1, 2021

sameerz added this to the Feb 16 - Feb 26 milestone Mar 1, 2021

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Build python output schema from udf expressions (NVIDIA#1794)

59f405c

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Build python output schema from udf expressions (NVIDIA#1794)

49b4b4c

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build python output schema from udf expressions #1794

Build python output schema from udf expressions #1794

firestarman commented Feb 23, 2021 •

edited

Loading

firestarman commented Feb 23, 2021

firestarman commented Feb 23, 2021

firestarman commented Feb 23, 2021

revans2 Feb 23, 2021

firestarman Feb 24, 2021

firestarman commented Feb 24, 2021

Build python output schema from udf expressions #1794

Build python output schema from udf expressions #1794

Conversation

firestarman commented Feb 23, 2021 • edited Loading

firestarman commented Feb 23, 2021

firestarman commented Feb 23, 2021

firestarman commented Feb 23, 2021

revans2 Feb 23, 2021

Choose a reason for hiding this comment

firestarman Feb 24, 2021

Choose a reason for hiding this comment

firestarman commented Feb 24, 2021

firestarman commented Feb 23, 2021 •

edited

Loading