Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the overflow of container type when casting floats to decimal #5766

Merged
merged 2 commits into from
Jun 8, 2022

Conversation

sperlingxx
Copy link
Collaborator

@sperlingxx sperlingxx commented Jun 7, 2022

Fixes #5765

This PR is to fix the potential overflow when casting float/double to decimal. The overflow occurs on the container decimal for HALF_UP round. The precision of the container decimal should increase along with the scale in case of cornor cases like #5765.

Signed-off-by: sperlingxx lovedreamf@gmail.com

@sperlingxx
Copy link
Collaborator Author

build

@ttnghia
Copy link
Collaborator

ttnghia commented Jun 7, 2022

I assume that this should go to 22.06 release.

andygrove
andygrove previously approved these changes Jun 7, 2022
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks reasonable to me but I have not worked on the decimal implementation and am not an expert in this area.

withResource(checked.castTo(containerType)) { container =>
container.round(dt.scale, cudf.RoundMode.HALF_UP)
withResource(container.round(dt.scale, cudf.RoundMode.HALF_UP)) { rd =>
rd.castTo(targetType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; Doesn't this cast end up creating a redundant copy of the column in many (all?) cases? Wondering if we should check if the resulting type from round is already the desired cudf type. I would expect cudf's round to deterministically produce a result type based on the input cast type which makes me wonder why the cast is necessary here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlowe The cast here is for cases that underlying cuDF decimal type got promoted as precision + 1 (from Decimal32 to Decimal64). For case like that, I think we need to convert back to the original cuDF decimal type, to keep align with the precision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are pre-casting to the promoted precision but asking cudf to round to the originally desired precision. Do we know for certain that cudf does not already put the result in the appropriate type as part of the round implementation? Even if it does not, it is not always the case that the the promotion cast actually did anything (e.g.: it was DECIMAL32 before and after, because precision+1 didn't actually need DECIMAL64).

This isn't must-fix for proper functionality, more of a performance optimization for what may be a common case.

Copy link
Collaborator

@ttnghia ttnghia Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for certain that cudf does not already put the result in the appropriate type as part of the round implementation?

I'm not sure if I understand your question correctly. In cudf, floating-point is just shifted (mul/div) by the scale value then rounded (towards zero) to integer for internal storage.
https://github.com/rapidsai/cudf/blob/fae22213537a003c74e2da9feac1bda38f562f6f/cpp/include/cudf/fixed_point/fixed_point.hpp#L230

Copy link
Collaborator

@ttnghia ttnghia Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cudf can just fix the accuracy problem easily by just simply not rounding toward zero! That simple fix will not cause such issue on the plugin side but I'm not sure if that is the desired behavior in cudf. I'll file an issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sameerz sameerz added the bug Something isn't working label Jun 7, 2022
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
@sperlingxx sperlingxx requested a review from jlowe June 8, 2022 03:26
@sperlingxx
Copy link
Collaborator Author

Conducted a force push as rebasing to 22.06

@sperlingxx
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with it as-is, but it would be good to have a followup for what appears to be an unnecessary post-round cast in some (all?) cases.

@sperlingxx sperlingxx merged commit 59d05d2 into NVIDIA:branch-22.06 Jun 8, 2022
@sperlingxx sperlingxx deleted the fix_cast_fp_to_dec branch June 8, 2022 15:58
@ttnghia
Copy link
Collaborator

ttnghia commented Jun 8, 2022

Regarding to the reason behind this bug, I repost an explain from my cudf issue (rapidsai/cudf#11079 (comment)):


(reposting this message from slack)

There is a long history behind this and truncation is definitely the desired and intentional behaviour.

Here are the main motivations for truncation:

  1. Consistency with CNL. CNL is being proposed for the standard library and consistency with standard types is a goal of libcudf. Note that (IIRC) every other C++ fixed-point library has made the same choice.
  2. Relying on the behaviour of integer types in C++ greatly simplifies certain binary operations like divison for fixed-point
  3. We can (and have) provided rounding functionality through the cudf::round API so on the Spark side of things you are able to get whatever behaviour you want

Note that at one point, we actually did have rounding functionality baked into floating point construction for fixed-point. However, it wasn't actually to address the desire to be more "accurate" as @ttnghia is pointing out, it was to try and address inherent issues in floating point (such as 1.001 not being representable by floating point, meaning if you construct a fixed-point with scale -3, you end up with 1 , missing the .001). @harrism and I had many discussions about this and both agreed that "ideally" fixed-point should be able to avoid this at the end of the day. However, it ended up presenting too many issues / complications and not actually comprehensively fixing all cases. And note, trying to be "better" was not consistent with what CNL and other C++ fixed-point libraries do. So we decided at a certain point to just accept the inherent flaws of floating point and follow CNL. This was done in the following small PR: rapidsai/cudf#6544 If you take a look at the unit tests, it is very clear how the behaviour changed by looking at the before and after. Furthermore, there was a bunch of "opposition research" done at one point in to what other C++ fixed-point libraries do. Our goal is not to be similar to Julia or Go or Python, but to be a generic C++ backend that provides all the tools to do whatever you want in the front end language (like cuDF or Spark-RAPIDS).

tgravescs added a commit that referenced this pull request Jun 8, 2022
* Correct the error message for test_mod_pmod_by_zero (#5781)

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Update the error string for test_cast_neg_to_decimal_err on 330[databricks] (#5784)

* Update the error string for test_cast_neg_to_decimal_err on 330

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* address comments

Signed-off-by: Firestarman <firestarmanllc@gmail.com>

* Throw an exception when attempting to read columnar encrypted Parquet files on the GPU [databricks] (#5761)

* Throw useful message when parquet columnar encryption enabled

* update message

* fix message

* handle native encrypted

* move variable

* cleanup

* fix native check

* cleanup imports

* fix import order

* Sign off

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Shim the parquet crypto exception check

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* shim 320cdh

* Add test for parquet encryption

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix rounds over decimal in Spark 330+ (#5786)

Passes the datatype of round-like functions directly to GPU overrides, so as to adapt different Spark versions.

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

* Fix the overflow of container type when casting floats to decimal (#5766)

Fixes #5765

Fix the potential overflow when casting float/double to decimal. The overflow occurs on the container decimal for HALF_UP round.
 
Signed-off-by: sperlingxx <lovedreamf@gmail.com>

Co-authored-by: Liangcai Li <firestarmanllc@gmail.com>
Co-authored-by: Alfred Xu <lovedreamf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Container decimal overflow when casting float/double to decimal
5 participants