[BUG] [Spark 4] Invalid results from Casting timestamps to integral types #11555

mythrocks · 2024-10-01T23:11:55Z

Description
With ANSI off, when a TIMESTAMP column is cast to BYTE, the output from the spark-rapids plugin differs from that of Apache Spark 4.

Repro
Consider the following single-row dataframe containing a single timestamp. When read back through the plugin on Spark 4, we would expect a null row:

sql(" select timestamp('4106-11-27 08:07:45.336457') as t").write.mode("overwrite").parquet("/tmp/myth/repro")

spark.conf.set("spark.sql.ansi.enabled", false)

spark.read.parquet("/tmp/myth/repro").selectExpr("CAST(t AS BYTE)").show

On Apache Spark 4, this results in a null row:

+----+
|   t|
+----+
|NULL|
+----+

With the RAPIDS plugin, the result is non-null:

+---+
|  t|
+---+
| 81|
+---+

Expected behaviour
Note that the plugin's result matches the result from Spark 3.x. Spark 4's behaviour seems to be a departure from Spark 3.x.
Ideally, the plugin's behaviour would match that of the Spark version with which it's running.

The text was updated successfully, but these errors were encountered:

mythrocks added bug Something isn't working Spark 4.0+ Spark 4.0+ issues labels Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] [Spark 4] Invalid results from Casting timestamps to integral types #11555

[BUG] [Spark 4] Invalid results from Casting timestamps to integral types #11555

mythrocks commented Oct 1, 2024

[BUG] [Spark 4] Invalid results from Casting timestamps to integral types #11555

[BUG] [Spark 4] Invalid results from Casting timestamps to integral types #11555

Comments

mythrocks commented Oct 1, 2024