Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Spark 4] Invalid results from Casting timestamps to integral types #11555

Open
mythrocks opened this issue Oct 1, 2024 · 0 comments
Open
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues

Comments

@mythrocks
Copy link
Collaborator

Description
With ANSI off, when a TIMESTAMP column is cast to BYTE, the output from the spark-rapids plugin differs from that of Apache Spark 4.

Repro
Consider the following single-row dataframe containing a single timestamp. When read back through the plugin on Spark 4, we would expect a null row:

sql(" select timestamp('4106-11-27 08:07:45.336457') as t").write.mode("overwrite").parquet("/tmp/myth/repro")

spark.conf.set("spark.sql.ansi.enabled", false)

spark.read.parquet("/tmp/myth/repro").selectExpr("CAST(t AS BYTE)").show

On Apache Spark 4, this results in a null row:

+----+
|   t|
+----+
|NULL|
+----+

With the RAPIDS plugin, the result is non-null:

+---+
|  t|
+---+
| 81|
+---+

Expected behaviour
Note that the plugin's result matches the result from Spark 3.x. Spark 4's behaviour seems to be a departure from Spark 3.x.
Ideally, the plugin's behaviour would match that of the Spark version with which it's running.

@mythrocks mythrocks added bug Something isn't working Spark 4.0+ Spark 4.0+ issues labels Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

No branches or pull requests

1 participant