Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] legacy cast of a struct column to string with a single nested null column yields null instead of '[]' #2309

Closed
gerashegalov opened this issue Apr 29, 2021 · 2 comments · Fixed by #2395
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Apr 29, 2021

Describe the bug
In Spark 3.0.x and Spark 3.1+ with the legacy mode, casting a struct with a single nested null-valued field produces a null string instead of []

Steps/Code to reproduce bug

import pyspark.sql.functions as F
import pyspark.sql.types as T

# for Spark 3.1+
spark.conf.set('spark.sql.legacy.castComplexTypesToString.enabled', True)

data = [ (('gera',),), ((None,),), (None,) ]
df = spark.createDataFrame(data)
df.select(df._1.cast(T.StringType())).collect()

Broken output:

[Row(_1='[gera]'), Row(_1=None), Row(_1=None)]

Expected behavior
Correct output:

[Row(_1='[gera]'), Row(_1='[]'), Row(_1=None)]

Environment details (please complete the following information)
Reproduces in a local REPL

Additional context
CI test failures in #2274

@gerashegalov gerashegalov added the bug Something isn't working label Apr 29, 2021
@gerashegalov
Copy link
Collaborator Author

Non-legacy cast is correct.

@gerashegalov
Copy link
Collaborator Author

related to #1604

@gerashegalov gerashegalov added P0 Must have for release ? - Needs Triage Need team to review and classify labels Apr 30, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label May 4, 2021
@sameerz sameerz added this to the May 10 - May 21 milestone May 4, 2021
gerashegalov added a commit to gerashegalov/spark-rapids that referenced this issue May 11, 2021
Fixes NVIDIA#2309 and NVIDIA#2315

Signed-off-by: Gera Shegalov <gera@apache.org>
gerashegalov added a commit that referenced this issue May 14, 2021
Refactors struct cast to string such that there no need for a dedicated method handling the legacy mode cast. Fixes #2309 and #2315

Signed-off-by: Gera Shegalov gera@apache.org
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
Refactors struct cast to string such that there no need for a dedicated method handling the legacy mode cast. Fixes NVIDIA#2309 and NVIDIA#2315

Signed-off-by: Gera Shegalov gera@apache.org
nartal1 pushed a commit to nartal1/spark-rapids that referenced this issue Jun 9, 2021
Refactors struct cast to string such that there no need for a dedicated method handling the legacy mode cast. Fixes NVIDIA#2309 and NVIDIA#2315

Signed-off-by: Gera Shegalov gera@apache.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants