-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor histogram
reduction using cuco::static_map::insert_or_apply
#16485
base: branch-24.12
Are you sure you want to change the base?
Refactor histogram
reduction using cuco::static_map::insert_or_apply
#16485
Conversation
// compute_row_frequencies does not provide stable ordering | ||
thrust::sort_by_key(rmm::exec_policy(stream), | ||
distinct_indices->begin(), | ||
distinct_indices->end(), | ||
distinct_counts->mutable_view().begin<int64_t>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this sort impact the groupby histogram performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is slight regression around 5% on small input ( with 100'000) , but overall improvement hides this extra sort_by_key . The overall speed up is upto 30% as the input size increases, even with this extra sorting step.
Improves histogram reduce aggregation performance upto 50%.
|
Improves performance on groupby_histogram upto 30% on most cases. (Some minor regressions around 5% on lower inputs due to extra sorting step).
|
map.insert(pair_iter, pair_iter + input.num_rows(), key_hasher, key_equal, stream.value()); | ||
} | ||
|
||
auto const key_equal = row_comp.equal_to<false>(has_nulls, null_equality::EQUAL, value_comp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if has_nested_columns
is true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested types are not yet supported with histogram. So the current implementation throws an error.
cudf/cpp/src/reductions/histogram.cu
Line 113 in 4a939c4
"Nested types are not yet supported in histogram aggregation.", |
Description
Refactors
histogram
reduce aggregation and groupby aggregation using cuco's latest static_map insert_or_apply feature.Checklist