Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits #9616

Merged
merged 17 commits into from
Nov 11, 2021

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Nov 5, 2021

Closes #9176

  • Update bitmask_and and bitmask_or to return both resulting mask and count of unset bits
  • Refactor related implementations to use new bitmask_and/or
  • Update unit tests

@PointKernel PointKernel added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function breaking Breaking change labels Nov 5, 2021
@PointKernel PointKernel requested a review from a team November 5, 2021 00:50
@PointKernel PointKernel self-assigned this Nov 5, 2021
@PointKernel PointKernel requested review from hyperbolic2346 and bdice and removed request for a team November 5, 2021 00:50
@PointKernel PointKernel removed the improvement Improvement / enhancement to an existing function label Nov 5, 2021
@codecov
Copy link

codecov bot commented Nov 5, 2021

Codecov Report

Merging #9616 (8844c4e) into branch-21.12 (ab4bfaa) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head 8844c4e differs from pull request most recent head 9baa00a. Consider uploading reports for the commit 9baa00a to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9616      +/-   ##
================================================
- Coverage         10.79%   10.68%   -0.11%     
================================================
  Files               116      117       +1     
  Lines             18869    19867     +998     
================================================
+ Hits               2036     2123      +87     
- Misses            16833    17744     +911     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f041a47...9baa00a. Read the comment docs.

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

cpp/src/groupby/sort/aggregate.cpp Outdated Show resolved Hide resolved
@PointKernel PointKernel requested a review from a team as a code owner November 5, 2021 19:06
@PointKernel PointKernel changed the title [WIP] Update bitmask_and to return both resulting mask and count of valid bits Update bitmask_and to return a struct of resulting mask, count of valid bits, and count of null bits Nov 5, 2021
@PointKernel PointKernel added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Nov 5, 2021
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good work. This is a nice cleanup!

@bdice
Copy link
Contributor

bdice commented Nov 6, 2021

Quick high level feedback — proper review to follow.

I’m a little unsure about the new struct being named bitmask. It seems like we normally define the RMM device buffer as “the mask” and this name is very generic. Also I’m a little unsure about what the invariants of this type should be. I saw one instance where both the set bits and unset bits were zero, but I expected the invariant (set + unset = total). I feel like this struct might need to carry information about the number of total bits, since that cannot be known from only the device buffer (it’s a byte buffer but the relevant size is in bits and may not align). I’ll think some more and read this more closely on Monday.

edit: My concern about the invariants of the type might have been invalid. I think I overlooked that the set and unset bits were both zero in a case where the code exited early because the total number of rows was also zero. 😛 This looked fine in the other places I saw.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @PointKernel. I have some suggestions and questions attached. I thought some more since my previous comment and I'm okay with naming the resulting struct bitmask (I couldn't come up with a better name).

cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved
cpp/src/bitmask/null_mask.cu Outdated Show resolved Hide resolved
cpp/src/groupby/hash/groupby.cu Outdated Show resolved Hide resolved
@github-actions github-actions bot added the Java Affects Java cuDF API. label Nov 9, 2021
@PointKernel PointKernel changed the title Update bitmask_and to return a struct of resulting mask, count of valid bits, and count of null bits Update bitmask_and and bitmask_or to return a struct of resulting mask, valid count, and null count Nov 9, 2021
@PointKernel PointKernel changed the title Update bitmask_and and bitmask_or to return a struct of resulting mask, valid count, and null count Update bitmask_and and bitmask_or to return a struct of resulting mask and null count Nov 10, 2021
@PointKernel PointKernel changed the title Update bitmask_and and bitmask_or to return a struct of resulting mask and null count Update bitmask_and and bitmask_or to return a pair of resulting mask and null count Nov 10, 2021
Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this much better without struct bitmask. I also lament not being able to name the returns well, but I feel like a pair of two things is much easier to reason about here.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @PointKernel! I am really happy with how this turned out. I have a couple final comments, resolve them however you see fit. 🚀

cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/null_mask.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/null_mask.hpp Outdated Show resolved Hide resolved
cpp/src/binaryop/binaryop.cpp Show resolved Hide resolved
mr);
ret_validity_buffers.push_back(std::move(new_mask));
return std::make_pair(
reinterpret_cast<bitmask_type const*>(ret_validity_buffers.back().data()), null_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider static_cast here, which should be safer than reinterpret_cast for the void* returned by rmm::device_buffer's data() method.

Suggested change
reinterpret_cast<bitmask_type const*>(ret_validity_buffers.back().data()), null_count);
static_cast<bitmask_type const*>(ret_validity_buffers.back().data()), null_count);

Side note: There are two other reinterpret_casts in this file that I think are safe to delete (their values already seem to be the right types) if we wish to eliminate the appearance of potentially unsafe reinterpret_casts in this file.

cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/null_mask.hpp Outdated Show resolved Hide resolved
@PointKernel PointKernel changed the title Update bitmask_and and bitmask_or to return a pair of resulting mask and null count Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits Nov 10, 2021
@PointKernel
Copy link
Member Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ba2b51d into rapidsai:branch-21.12 Nov 11, 2021
@PointKernel PointKernel deleted the count-valid-bits branch November 11, 2021 18:49
rapids-bot bot pushed a commit that referenced this pull request Jan 12, 2022
This PR updates the function `cudf::detail::inplace_bitmask_and` to return the null count of the result. This change aligns `inplace_bitmask_and` with behavior changes introduced in #9616 to return null counts from functions acting on bitmasks. This will be helpful for #9621.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Robert Maynard (https://github.com/robertmaynard)

URL: #9904
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Java) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Update bitmask_and to also compute count of valid bits
6 participants