Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Rounding past the size that can be stored in a type produces incorrect results #4273

Closed
revans2 opened this issue Dec 2, 2021 · 3 comments · Fixed by #4420
Closed
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@revans2
Copy link
Collaborator

revans2 commented Dec 2, 2021

There are some odd corner cases when rounding for integers and floating point values. If you try to round past what the number can hold Spark will typically produce a 0 all of the time. For integers there is still some overflow issues possible, so it is not 100% that way. For CUDF with integers we can get numbers that are way off and for Floating point values we can get back a NaN. This is a real corner case though because we only support a constant scale value so someone would have to get things rather messed up.

We have fixed this for Decimal though.

@revans2 revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 2, 2021
@Salonijain27 Salonijain27 added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Dec 7, 2021
@firestarman
Copy link
Collaborator

firestarman commented Dec 15, 2021

@revans2 Hi, could you share how to reproduce this ? I tried several cases via spark-shell and seems CUDF worked fine.

scala> sql("select round(127, -6)").show
21/12/15 05:40:17 WARN GpuOverrides: 
*Exec <ProjectExec> will run on GPU
  *Expression <Alias> 0 AS round(127, -6)#176 will run on GPU
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec

+--------------+
|round(127, -6)|
+--------------+
|             0|

scala> sql("select round(127Y, -1)").show
21/12/15 05:49:03 WARN GpuOverrides: 
*Exec <ProjectExec> will run on GPU
  *Expression <Alias> -126 AS round(127, -1)#225 will run on GPU
  ! <RDDScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.RDDScanExec

+--------------+
|round(127, -1)|
+--------------+
|          -126|
+--------------+

@sperlingxx
Copy link
Collaborator

Hi @revans2, I failed to find the corner cases which GPU and Spark produces different results for int round and float round.

For int round, select round(2147483647, -1) produces -2147483646 both on GPU and CPU. And select round(100, -5) produces 0 consistently.
For float round, I could not find cases producing NaN values on GPU.

Could you show some examples? Thanks.

@revans2
Copy link
Collaborator Author

revans2 commented Dec 16, 2021

scala> val df = Seq(1, 2, 3, 4).toDF("a")
scala> df.selectExpr("cast(a as byte)").selectExpr("a", "round(a, -100) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, -100) as GPU_ROUND").show()
+---+---------+---------+
|  a|CPU_ROUND|GPU_ROUND|
+---+---------+---------+
|  1|        0|        2|
|  2|        0|        3|
|  3|        0|        4|
|  4|        0|        5|
+---+---------+---------+

scala> df.selectExpr("cast(a as short)").selectExpr("a", "round(a, -100) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, -100) as GPU_ROUND").show()
+---+---------+---------+
|  a|CPU_ROUND|GPU_ROUND|
+---+---------+---------+
|  1|        0|        2|
|  2|        0|        3|
|  3|        0|        4|
|  4|        0|        5|
+---+---------+---------+

scala> df.selectExpr("cast(a as int)").selectExpr("a", "round(a, -100) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, -100) as GPU_ROUND").show()
+---+---------+-----------+
|  a|CPU_ROUND|  GPU_ROUND|
+---+---------+-----------+
|  1|        0|-2147483648|
|  2|        0|-2147483648|
|  3|        0|-2147483648|
|  4|        0|-2147483648|
+---+---------+-----------+

scala> df.selectExpr("cast(a as long)").selectExpr("a", "round(a, -100) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, -100) as GPU_ROUND").show()
+---+---------+--------------------+
|  a|CPU_ROUND|           GPU_ROUND|
+---+---------+--------------------+
|  1|        0|-9223372036854775808|
|  2|        0|-9223372036854775808|
|  3|        0|-9223372036854775808|
|  4|        0|-9223372036854775808|
+---+---------+--------------------+

scala> df.selectExpr("cast(a as float)").selectExpr("a", "round(a, -100) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, -100) as GPU_ROUND").show()
+---+---------+---------+
|  a|CPU_ROUND|GPU_ROUND|
+---+---------+---------+
|1.0|      0.0|      NaN|
|2.0|      0.0|      NaN|
|3.0|      0.0|      NaN|
|4.0|      0.0|      NaN|
+---+---------+---------+

scala> df.selectExpr("cast(a as double)").selectExpr("a", "round(a, -400) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, -400) as GPU_ROUND").show()
+---+---------+---------+
|  a|CPU_ROUND|GPU_ROUND|
+---+---------+---------+
|1.0|      0.0|      NaN|
|2.0|      0.0|      NaN|
|3.0|      0.0|      NaN|
|4.0|      0.0|      NaN|
+---+---------+---------+

scala> df.selectExpr("cast(a as float)").selectExpr("a", "round(a, 100) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, 100) as GPU_ROUND").show()
+---+---------+---------+
|  a|CPU_ROUND|GPU_ROUND|
+---+---------+---------+
|1.0|      1.0|      NaN|
|2.0|      2.0|      NaN|
|3.0|      3.0|      NaN|
|4.0|      4.0|      NaN|
+---+---------+---------+

scala> df.selectExpr("cast(a as double)").selectExpr("a", "round(a, 400) as CPU_ROUND").repartition(1).orderBy("a").selectExpr("*", "round(a, 400) as GPU_ROUND").show()
+---+---------+---------+
|  a|CPU_ROUND|GPU_ROUND|
+---+---------+---------+
|1.0|      1.0|      NaN|
|2.0|      2.0|      NaN|
|3.0|      3.0|      NaN|
|4.0|      4.0|      NaN|
+---+---------+---------+

It happens when you round past what the type can theoretically hold.

byte = -3
short = -5
int = -10
long = -19
float = +/-39
double = +/-309

This is why it is a real corner case. No one should ever do it, because it makes no sense to. But we still want to cover all of the possibilities. While you are at it there is some Decimal code that fixes/works around this same issue for us.

// This is going to overflow unless we do something else first. But for round to work all
// we actually need is 1 decimal place more than the target decimalPlaces, so we can cast
// to this first (which will truncate the extra information), and then round to the desired
// result
val intermediateDType = DType.create(input.getType.getTypeId, (-decimalPlaces) + 1)
withResource(input.castTo(intermediateDType)) { truncated =>
truncated.round(decimalPlaces, mode)
}

I overthought things when I wrote it, or perhaps didn't think enough, and it will always return 0, ignoring the overflow, so it might be nice to simplify it and make it just always return 0's in that case instead of casting to truncate followed by rounding.

@sperlingxx sperlingxx changed the title [BUG] Rouding past the size that can be stored in a type produces incorrect results [BUG] Rounding past the size that can be stored in a type produces incorrect results Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
4 participants