-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build python output schema from udf expressions #1794
Conversation
Because the result attributes are NOT always equal to the python output schema. For example, on databricks when projecting only one column from a python UDF output where containing multiple result columns, there will be only one attribute in the result attributes for the projecting output, but the output schema for this python udf contains multiple columns. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
build |
1 similar comment
build |
@@ -555,7 +555,7 @@ case class GpuArrowEvalPythonExec( | |||
|
|||
// cache in a local to avoid serializing the plan | |||
val inputSchema = child.output.toStructType | |||
val pythonOutputSchema = StructType.fromAttributes(resultAttrs) | |||
val pythonOutputSchema = StructType.fromAttributes(udfs.map(_.resultAttribute)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Great catch on this by the way. Could we add some comments here explaining what is happening and why we are not using resultAttrs
that was passed in? If I made the mistake before then a comment will hopefully help someone not break it going back again, especially if it only shows up for databricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, updated.
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Build the Python output schema from the Python UDF expressions instead of the plan result attributes, because the result attributes are NOT always equal to the Python output schema.
For example, on databricks when projecting only one column from a Python UDF output where containing multiple result columns, there will be only one attribute in the result attributes for the projecting output, but the output schema for this Python UDF contains multiple columns.
Closes #1644
Signed-off-by: Firestarman firestarmanllc@gmail.com