Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support scatter() for list columns #6768

Merged
merged 15 commits into from
Nov 21, 2020

Conversation

mythrocks
Copy link
Contributor

This change adds support for cudf::scatter() to be called on list columns. This implementation supports the following child types:

  1. Fixed width
  2. Strings
  3. List columns (containing further types)

Struct support will be added as a follow-on.
The implementation follows a similar approach to cudf::scatter() for string_view. A list_device_view was added to represent the device view of a list row.
(Note: Collapsing this into list_view is proving difficult because list_view is included in types.hpp, and the dependencies form a Gordian Knot.)

Supports lists of:
  1. fixed width data types
  2. strings
  3. lists (of all of the above, and lists(of ...))
1. Additional type checking for child columns
2. Moved list_device_view functions inline
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@codecov
Copy link

codecov bot commented Nov 14, 2020

Codecov Report

Merging #6768 (845ea85) into branch-0.17 (71d4c34) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.17    #6768   +/-   ##
============================================
  Coverage        81.87%   81.87%           
============================================
  Files               96       96           
  Lines            16079    16079           
============================================
  Hits             13165    13165           
  Misses            2914     2914           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71d4c34...845ea85. Read the comment docs.

@mythrocks mythrocks self-assigned this Nov 15, 2020
@mythrocks mythrocks added 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond labels Nov 15, 2020
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a challenging PR. Nice work, it's overall very clean and understandable.

My suggestions are in two main categories:

  1. Keeping up with in-progress work to replace cudaStream_t with rmm::cuda_stream_view
  2. Better parallelism for the scatter offset computation.

cpp/include/cudf/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/list_device_view.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/list_device_view.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very proper. No further comment apart from the ones already posted my Mark.

cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
@mythrocks
Copy link
Contributor Author

@devavret, @harrism, would you be averse to our checking this version of scatter() for lists into 0.17?

The main thing pending here is @harrism's suggestion better parallelize child-column construction. For now, it's only as efficient as scatter() for string_view. As much as I'd like to sort this out immediately, the calculation is slightly non-trivial. We have a possible solution, but I'd like to tackle it outside of this issue. (Apropos, I have raised #6791. If time permits, I'll hit this next.)

@harrism
Copy link
Member

harrism commented Nov 19, 2020

Just need @devavret to click approve unless he has other changes to request.

@mythrocks
Copy link
Contributor Author

Thanks for the review, @harrism. I'll try get to #6791 at the earliest opportunity.

@mythrocks
Copy link
Contributor Author

Re-upped, after resolving conflicts.

@mythrocks
Copy link
Contributor Author

Thanks for the reviews, @harrism, @devavret!

I'm investigating the build failure indicated in the failing-check:

/var/lib/jenkins/workspace/rapidsai/gpuci-a100/cudf/prb/cudf-gpu-build/cpp/include/cudf/detail/scatter.cuh(149): error: function "cudf::detail::column_scatterer_impl<cudf::list_view, MapIterator>::operator() [with MapIterator=thrust::transform_iterator<cudf::detail::index_converter<cudf::size_type>, cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>]" cannot be called with the given argument list
            argument types are: (const cudf::column_view, thrust::transform_iterator<cudf::detail::index_converter<cudf::size_type>, cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>, thrust::transform_iterator<cudf::detail::index_converter<cudf::size_type>, cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>, const cudf::column_view, rmm::cuda_stream_view, rmm::mr::device_memory_resource *)
            object type is: cudf::detail::column_scatterer_impl<cudf::list_view, thrust::transform_iterator<cudf::detail::index_converter<cudf::size_type>, cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>>
          detected during:
            instantiation of "std::unique_ptr<cudf::column, std::default_delete<cudf::column>> cudf::detail::column_scatterer<MapIterator>::operator()<Element>(const cudf::column_view &, MapIterator, MapIterator, const cudf::column_view &, rmm::cuda_stream_view, rmm::mr::device_memory_resource *) const [with MapIterator=thrust::transform_iterator<cudf::detail::index_converter<cudf::size_type>, cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>, Element=cudf::list_view]" 
/usr/include/c++/7/bits/stl_algo.h(4345): here
            instantiation of "_OIter std::transform(_IIter1, _IIter1, _IIter2, _OIter, _BinaryOperation) [with _IIter1=__gnu_cxx::__normal_iterator<const cudf::column_view *, std::vector<cudf::column_view, std::allocator<cudf::column_view>>>, _IIter2=__gnu_cxx::__normal_iterator<const cudf::column_view *, std::vector<cudf::column_view, std::allocator<cudf::column_view>>>, _OIter=__gnu_cxx::__normal_iterator<std::unique_ptr<cudf::column, std::default_delete<cudf::column>> *, std::vector<std::unique_ptr<cudf::column, std::default_delete<cudf::column>>, std::allocator<std::unique_ptr<cudf::column, std::default_delete<cudf::column>>>>>, _BinaryOperation=lambda [](const auto &, const auto &)->auto]" 
(298): here
            instantiation of "std::unique_ptr<cudf::table, std::default_delete<cudf::table>> cudf::detail::scatter(const cudf::table_view &, MapIterator, MapIterator, const cudf::table_view &, __nv_bool, rmm::cuda_stream_view, rmm::mr::device_memory_resource *) [with MapIterator=cudf::detail::input_indexalator]" 
/var/lib/jenkins/workspace/rapidsai/gpuci-a100/cudf/prb/cudf-gpu-build/cpp/src/copying/scatter.cu(260): here

@harrism
Copy link
Member

harrism commented Nov 20, 2020

Maybe because my cuda_stream_view PR merged?

@mythrocks
Copy link
Contributor Author

Maybe because my cuda_stream_view PR merged?

Yep, that was it. I'm rebuilding and rerunning my tests at my end. I'll post the corrected version shortly.

@mythrocks
Copy link
Contributor Author

mythrocks commented Nov 20, 2020

I've pushed a fix for the build break. I have to remind myself that the CI doesn't have REPEATABLE_READ for running the CI tests. :]

@mythrocks mythrocks merged commit b391cb7 into rapidsai:branch-0.17 Nov 21, 2020
@mythrocks mythrocks deleted the scatter-lists branch November 21, 2020 19:37
@mythrocks
Copy link
Contributor Author

I've committed this now. Thank you so much for the reviews, all.

harrism pushed a commit that referenced this pull request Nov 26, 2020
Addendum to #6768, to support scatter on columns of type list<struct>. The hard part was the tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants