Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add support for select_k on CSR matrix #2140

Merged
merged 23 commits into from
Apr 8, 2024

Conversation

rhdong
Copy link
Member

@rhdong rhdong commented Jan 31, 2024

Approvers:

- This PR is one part of the feature of rapidsai#1969
- Add the API of 'select_k' accepting CSR as input
- Add the API of 'segmented_copy'

Authors:
  - James Rong (https://github.com/rhdong)

Approvers:
  - Ben Frederickson (https://github.com/benfred)
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)
@rhdong rhdong requested review from a team as code owners January 31, 2024 16:12
@rhdong rhdong added enhancement New feature or request feature request New feature or request non-breaking Non-breaking change 3 - Ready for Review labels Jan 31, 2024
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I only have a concern regarding the use of the memory resources and device properties. In general, we should use raft "workspace resource" for all temporary allocations in raft algorithms. This gives a greater control, the explicit interface to query the available memory, and slightly better performance in some edge cases.
For your reference, there's another PR that fixes this for all other parts of matrix::select_k: #1786

cpp/include/raft/matrix/detail/matrix.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/select_k-inl.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/select_k-inl.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/select_k-inl.cuh Outdated Show resolved Hide resolved
@rhdong
Copy link
Member Author

rhdong commented Jan 31, 2024

Thanks for the PR! I only have a concern regarding the use of the memory resources and device properties. In general, we should use raft "workspace resource" for all temporary allocations in raft algorithms. This gives a greater control, the explicit interface to query the available memory, and slightly better performance in some edge cases. For your reference, there's another PR that fixes this for all other parts of matrix::select_k: #1786

Thank you for pointing this out; let me fix it!

@cjnolet
Copy link
Member

cjnolet commented Mar 4, 2024

@rhdong can you fix the merge conflicts in this PR please?

@rhdong rhdong changed the base branch from branch-24.04 to branch-24.06 March 30, 2024 08:30
@rhdong
Copy link
Member Author

rhdong commented Apr 1, 2024

Hi @achirkin, @benfred, @cjnolet , when you have a moment, could you please help review this PR? Thank you!

Here is the benchmark on A100 with 80GB PCIE.

  • Try to reuse the CUDA code of select_k as much as possible without hurting its performance.

Original SelectK:

-------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------
SelectK/float/uint32_t/kAuto/8/manual_time                        0.629 ms        0.712 ms         1113 20000#500#256
SelectK/float/uint32_t/kAuto/17/manual_time                       0.296 ms        0.380 ms         2360 1000#10000#256
SelectK/float/uint32_t/kAuto/26/manual_time                       0.400 ms        0.483 ms         1751 100#100000#256
SelectK/float/uint32_t/kAuto/35/manual_time                       0.487 ms        0.570 ms         1439 10#1000000#256

After-optimized SelectK:

-------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------
SelectK/float/uint32_t/kAuto/8/manual_time                        0.623 ms        0.702 ms         1122 20000#500#256
SelectK/float/uint32_t/kAuto/17/manual_time                       0.322 ms        0.403 ms         2177 1000#10000#256
SelectK/float/uint32_t/kAuto/26/manual_time                       0.354 ms        0.436 ms         1978 100#100000#256
SelectK/float/uint32_t/kAuto/35/manual_time                       0.428 ms        0.508 ms         1634 10#1000000#256

SelectK_CSR(Sparisity 0.1/0.2/0.5):

-------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------
SelectKCsrTest<float, uint32_t>/8/manual_time                     0.104 ms        0.186 ms         6692 20000#500#256#0.1
SelectKCsrTest<float, uint32_t>/17/manual_time                    0.115 ms        0.198 ms         6125 1000#10000#256#0.1
SelectKCsrTest<float, uint32_t>/26/manual_time                    0.135 ms        0.219 ms         5178 100#100000#256#0.1
SelectKCsrTest<float, uint32_t>/35/manual_time                    0.214 ms        0.297 ms         3232 10#1000000#256#0.1

SelectKCsrTest<float, uint32_t>/78/manual_time                    0.175 ms        0.257 ms         4006 20000#500#256#0.2
SelectKCsrTest<float, uint32_t>/87/manual_time                    0.161 ms        0.245 ms         4361 1000#10000#256#0.2
SelectKCsrTest<float, uint32_t>/96/manual_time                    0.194 ms        0.277 ms         3616 100#100000#256#0.2
SelectKCsrTest<float, uint32_t>/105/manual_time                   0.268 ms        0.352 ms         2609 10#1000000#256#0.2

SelectKCsrTest<float, uint32_t>/148/manual_time                   0.370 ms        0.452 ms         1896 20000#500#256#0.5
SelectKCsrTest<float, uint32_t>/157/manual_time                   0.240 ms        0.323 ms         2922 1000#10000#256#0.5
SelectKCsrTest<float, uint32_t>/166/manual_time                   0.276 ms        0.359 ms         2541 100#100000#256#0.5
SelectKCsrTest<float, uint32_t>/175/manual_time                   0.353 ms        0.436 ms         2005 10#1000000#256#0.5

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's nice to see much of the select-k code being reused with minimal modifications! Here's my input; mostly minor things, but please check the alignment of the buffers in the radix select.

cpp/include/raft/matrix/detail/select_k-inl.cuh Outdated Show resolved Hide resolved
cpp/test/matrix/select_k_csr.cu Outdated Show resolved Hide resolved
cpp/include/raft/matrix/copy.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/copy.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/detail/matrix.cuh Outdated Show resolved Hide resolved
@rhdong
Copy link
Member Author

rhdong commented Apr 2, 2024

Thanks, it's nice to see much of the select-k code being reused with minimal modifications! Here's my input; mostly minor things, but please check the alignment of the buffers in the radix select.

Hi @achirkin, Thank you for these valuable review comments, I will fix them as soon as possible!

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments, LGTM

@@ -44,6 +45,16 @@ void select_k(raft::resources const& handle,
bool sorted = false,
SelectAlgo algo = SelectAlgo::kAuto,
const IdxT* len_i = nullptr) RAFT_EXPLICIT;

template <typename T, typename IdxT>
void select_k(raft::resources const& handle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @rhdong, I do have a nitpick here, now that I finally have time to go through this PR. I'm okay doing this change in an immediate follow-on, but the sparse k-selection should be in raft/sparse/matrix, not in the dense APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cjnolet , thank you for the comment! I just moved the public API to the raft::sparse scope. Considering many shared source codes, I temporarily didn't move more code to the sparse directory like detail::select_k; if you prefer more, please let me know.

Copy link
Member

@cjnolet cjnolet Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @rhdong please move the impl details over to sparse also. This is introducing too much confusion having them not be co-located. Also, where possible we should have implementation details calling into public apis for different namespaces.

Copy link
Member Author

@rhdong rhdong Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @rhdong please move the impl details over to sparse also. This is introducing too much confusion having them not be co-located.

Sure, I will.

@cjnolet
Copy link
Member

cjnolet commented Apr 8, 2024

/merge

@rapids-bot rapids-bot bot merged commit 4a20d03 into rapidsai:branch-24.06 Apr 8, 2024
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge CMake cpp enhancement New feature or request feature request New feature or request non-breaking Non-breaking change Public API
Projects
Development

Successfully merging this pull request may close these issues.

3 participants