Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Min and Max aggregations involving infinity produce incorrect results #11352

Closed
ttnghia opened this issue Jul 26, 2022 · 1 comment · Fixed by #11357
Closed

[BUG] Min and Max aggregations involving infinity produce incorrect results #11352

ttnghia opened this issue Jul 26, 2022 · 1 comment · Fixed by #11357
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Jul 26, 2022

Currently, min and max aggregations for numeric types firstly initialize the results by an identity value then call a device operator. The identity is:

  • cuda::std::numeric_limits<T>::max(), if the aggregation is min, and
  • cuda::std::numeric_limits<T>::lowest(), if the aggregation is max.

When the input is floating-point data that has infinity, the min and max are computed incorrectly. That's because min(max, infinity) == max and max(lowest, -infinity) == lowest. In other words, for floating-point numbers the lowest value is still bigger than -infinity and max is still smaller than infinity.

We should specialize the identity value for floating-point numbers.

@ttnghia ttnghia added bug Something isn't working Needs Triage Need team to review and classify labels Jul 26, 2022
@ttnghia ttnghia self-assigned this Jul 26, 2022
@bdice bdice added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Jul 26, 2022
@bdice
Copy link
Contributor

bdice commented Jul 26, 2022

template <typename T,
std::enable_if_t<!std::is_same_v<T, cudf::string_view> && !cudf::is_dictionary<T>() &&
!cudf::is_fixed_point<T>()>* = nullptr>
static constexpr T identity()
{
// chrono types do not have std::numeric_limits specializations and should use T::min()
// https://eel.is/c++draft/numeric.limits.general#6
if constexpr (cudf::is_chrono<T>()) return T::min();
return cuda::std::numeric_limits<T>::lowest();
}

This should probably read something like:

  template <typename T,
            std::enable_if_t<!std::is_same_v<T, cudf::string_view> && !cudf::is_dictionary<T>() &&
                             !cudf::is_fixed_point<T>()>* = nullptr>
  static constexpr T identity()
  {
    // chrono types do not have std::numeric_limits specializations and should use T::min()
    // https://eel.is/c++draft/numeric.limits.general#6
    if constexpr (cudf::is_chrono<T>()) {
        return T::min();
    } else if constexpr (cuda::std::numeric_limits<T>::has_infinity) {
        return -cuda::std::numeric_limits<T>::infinity();
    }
    return cuda::std::numeric_limits<T>::lowest();
  }

and similarly for the min identity with +inf.

rapids-bot bot pushed a commit that referenced this issue Jul 28, 2022
… in device operators `min` and `max` (#11357)

This fixes a bug of device operators `min` and `max` in generating the `identity` value for floating-point numbers. In particular:
 * `min::identity()` should return `cuda::std::numeric_limits<T>::infinity()` instead of `cuda::std::numeric_limits<T>::max()`, and
 * `max::identity()` should return `-cuda::std::numeric_limits<T>::infinity()` instead of `cuda::std::numeric_limits<T>::lowest()`.

Closes #11352.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #11357
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
2 participants