Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] getJsonObject gets infinity instead of zero for a very small float number #1923

Closed
res-life opened this issue Apr 3, 2024 · 8 comments · Fixed by #1944
Closed

[BUG] getJsonObject gets infinity instead of zero for a very small float number #1923

res-life opened this issue Apr 3, 2024 · 8 comments · Fixed by #1944
Assignees
Labels
bug Something isn't working

Comments

@res-life
Copy link
Collaborator

res-life commented Apr 3, 2024

Describe the bug
CPU gets zero for number: 9.299999257686047e-0005603333574677677
But GPU gets infinity.
The number is a very small value nearly zero, so zero is the correct answer.

Steps/Code to reproduce bug
get_json_object("9.299999257686047e-0005603333574677677", "$")
CPU: 0.0
GPU: infinity

Additional context
JNI uses cuDF string to double and double to string.
cuDF can not hanlde very small float value, need to discuss how to fix:

  • fix in JNI
  • fix in cuDF
@res-life res-life added bug Something isn't working ? - Needs Triage labels Apr 3, 2024
@thirtiseven
Copy link
Collaborator

For number normalization, first use stod to convert string to double, then convert back to string with ftos_converter:

double d_value = cudf::strings::detail::stod(cudf::string_view(current_token_start_pos, number_token_len))
return spark_rapids_jni::ftos_converter::double_normalization(d_value, destination)

It seems that cudf::strings::detail::stod will return inf for this case, because the precision is very large for a normal use of string to double. We could try to have a workaround in jni to fix this issue.

I tested that in jni's string to double kernel, this case will return null. maybe investigating on how jackson handles this issue can be helpful.

@revans2
Copy link
Collaborator

revans2 commented Apr 11, 2024

Can you please file an issue in CUDF about this? If they are getting it wrong I think they would like to know about it.

Also we need to file an issue in this repo as cast with this number is also doing the wrong thing.

Seq("9.299999257686047e-0005603333574677677", "1.0").toDF("s").repartition(1).selectExpr("CAST(s as double)").collect.foreach(println)

on CPU we get back 0.0 and 1.0
on the GPU we get back null and 1.0

Our cast uses

return cudf::jni::release_as_jlong(spark_rapids_jni::string_to_float(
internally. Might be good to see if we can use a single implementation instead of the one that #1944 added.

@res-life
Copy link
Collaborator Author

Haoyang filed one follow-up issue in cuDF repo: rapidsai/cudf#15508

@thirtiseven
Copy link
Collaborator

Filed rapidsai/cudf#15508 in cuDF.

I use cuDF’s implementation because our string to double does not have a single kernel to call easily, It needs some refactoring to be used for getJsonObject or other usage. And it is not a bit to bit match to Spark also, although it should be much closer than one in cuDF, we might want to fix it in a long term.

@revans2
Copy link
Collaborator

revans2 commented Apr 11, 2024

we might want to fix it in a long term.

Then please file an issue to do so.

@res-life
Copy link
Collaborator Author

we might want to fix it in a long term.

Then please file an issue to do so.

Filed an issue to track

@res-life
Copy link
Collaborator Author

Also we need to file an issue in this repo as cast with this number is also doing the wrong thing.

Haoyang filed #1960 to track.

@res-life
Copy link
Collaborator Author

Close this issue as PR #1944 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants