-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inf/NaN comparisons for FLOAT orderby in window functions #13635
Fix inf/NaN comparisons for FLOAT orderby in window functions #13635
Conversation
This is a follow-up to rapidsai#13512 (which added support for floating point order-by columns in window functions), and rapidsai#13606 (which fixed how negative values are handled for floating point order-by). This commit fixes how `NaN` and `+/- Infinity` values are handled for floating point. Prior to this commit, the calculations for range window extents depended on the behaviour of `thrust::less<float>` and `thrust::greater<float>`, as well as addition/subtraction on `+/- Infinity`. This produced some unexpected results: 1. `thrust::less`/`greater` on `NaN` does not produce strict ordering. 2. Addition/Subtraction on the numerical values of `Infinity` could produce finite values that interfere with window extent calculations. Ideally, the results should have remained infinite. This commit adds custom comparators with `NaN` awareness, to better handle columns that contain `NaN`s. It also fixes range calculations where `Infinity` is involved. Tests have been added to cover ASC/DESC order sorting on `FLOAT`, with `NaN` and `Infinity` values.
Depends on rapidsai/cudf#13635. This commit adds support for floating point order-by columns in RANGE based window functions. Prior to this commit, when the `GpuWindowExec` was presented with an order-by column of floating-point type, the entire window operation would fall back to CPU execution. This should now execute entirely on the GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few small comments. Mostly confused about ordering when one element is NaN.
Depends on rapidsai/cudf#13635. This commit adds support for floating point order-by columns in RANGE based window functions. Prior to this commit, when the `GpuWindowExec` was presented with an order-by column of floating-point type, the entire window operation would fall back to CPU execution. This should now execute entirely on the GPU.
Depends on rapidsai/cudf#13635. This commit adds support for floating point order-by columns in RANGE based window functions. Prior to this commit, when the `GpuWindowExec` was presented with an order-by column of floating-point type, the entire window operation would fall back to CPU execution. This should now execute entirely on the GPU.
Depends on rapidsai/cudf#13635. This commit adds support for floating point order-by columns in RANGE based window functions. Prior to this commit, when the `GpuWindowExec` was presented with an order-by column of floating-point type, the entire window operation would fall back to CPU execution. This should now execute entirely on the GPU. Example query: ```sql SELECT COUNT(1) OVER (PARTITION BY grp ORDER BY float_column RANGE BETWEEN 1.023 PRECEDING AND 3.14159 FOLLOWING) FROM my_float_table; ```
Depends on rapidsai/cudf#13635. This commit adds support for floating point order-by columns in RANGE based window functions. Prior to this commit, when the `GpuWindowExec` was presented with an order-by column of floating-point type, the entire window operation would fall back to CPU execution. This should now execute entirely on the GPU. Example query: ```sql SELECT COUNT(1) OVER (PARTITION BY grp ORDER BY float_column RANGE BETWEEN 1.023 PRECEDING AND 3.14159 FOLLOWING) FROM my_float_table; ``` Signed-off-by: MithunR <mythrocks@gmail.com>
1. Document why nan-aware comparators are required. 2. Reuse thrust::less and thrust::greater.
I've added tests for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the test coverage, thank you for taking the time to add the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change and the tests added are very welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. One small typo in comment block.
/merge |
Thank you for the reviews, all. I have merged this change. |
#8637) * Support Float order-by columns for RANGE window functions Depends on rapidsai/cudf#13635. This commit adds support for floating point order-by columns in RANGE based window functions. Prior to this commit, when the `GpuWindowExec` was presented with an order-by column of floating-point type, the entire window operation would fall back to CPU execution. This should now execute entirely on the GPU. Example query: ```sql SELECT COUNT(1) OVER (PARTITION BY grp ORDER BY float_column RANGE BETWEEN 1.023 PRECEDING AND 3.14159 FOLLOWING) FROM my_float_table; ``` Signed-off-by: MithunR <mythrocks@gmail.com>
Description
This is a follow-up to #13512 (which added support for floating point order-by columns in window functions), and #13606 (which fixed how negative values are handled for floating point order-by).
This commit fixes how
NaN
and+/- Infinity
values are handled for floating point.Prior to this commit, the calculations for range window extents depended on the behaviour of
thrust::less<float>
andthrust::greater<float>
, as well as addition/subtraction on+/- Infinity
. This produced some unexpected results:thrust::less
/greater
onNaN
does not produce strict ordering.Infinity
could produce finite values that interfere with window extent calculations. Ideally, the results should have remained infinite.This commit adds custom comparators with
NaN
awareness, to better handle columns that containNaN
s. It also fixes range calculations whereInfinity
is involved.Tests have been added to cover ASC/DESC order sorting on
FLOAT
, withNaN
andInfinity
values.Checklist