Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scatter() on lists-of-struct columns #6817

Merged
merged 10 commits into from
Nov 26, 2020

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Nov 20, 2020

Addendum to #6768, to support scatter on columns of type list<struct>.

Depends on #6768.

The hard part was the tests.

(This branch is based off of that in #6768, and shows all commits from that PR. Only the last 4 commits are material to struct_view support.)

@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@mythrocks mythrocks self-assigned this Nov 20, 2020
@mythrocks mythrocks added the 2 - In Progress Currently a work in progress label Nov 20, 2020
@mythrocks mythrocks marked this pull request as draft November 20, 2020 10:09
@mythrocks
Copy link
Contributor Author

mythrocks commented Nov 20, 2020

The for-loop at scatter.cuh#L662 couldn't be boiled down to a 2-sequence std::transform() without obfuscating the logic with a bunch of captures. I'm open to suggestions there. :/
Edit: How fortunate that @codereport is tagged on this too. :]

@mythrocks
Copy link
Contributor Author

I'll leave this as a draft PR until its dependency (#6768) is committed.

@codecov
Copy link

codecov bot commented Nov 21, 2020

Codecov Report

Merging #6817 (cc73681) into branch-0.17 (f3b0e06) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.17    #6817   +/-   ##
============================================
  Coverage        81.94%   81.94%           
============================================
  Files               96       96           
  Lines            16164    16164           
============================================
  Hits             13246    13246           
  Misses            2918     2918           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3b0e06...cc73681. Read the comment docs.

@mythrocks mythrocks changed the title Support scatter() for list<struct> Add support for scatter() on lists-of-struct columns Nov 21, 2020
@mythrocks
Copy link
Contributor Author

I'll leave this as a draft PR until its dependency (#6768) is committed.

#6768 has been committed. I'll wait for the CI to complete on this PR, and switch out of DRAFT mode.

@mythrocks mythrocks marked this pull request as ready for review November 22, 2020 00:49
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few small comments.

cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/lists/detail/scatter.cuh Outdated Show resolved Hide resolved
1. Review: Switch loop to std::transform()
2. Review: Remove superfluous comments
3. Review: Correct the use of cuda_stream_view
@harrism harrism added the libcudf Affects libcudf (C++/CUDA) code. label Nov 24, 2020
@mythrocks mythrocks added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Nov 24, 2020
Copy link
Contributor

@codereport codereport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. For the unit tests, I find them pretty hard to read the way they are written, with all of the varying "parameter list indentation" and vertical lists for initializer lists, i.e.:

 auto expected_strings =
    strings_column_wrapper{{"eight",
                            "eight",
                            "eight",
                            "one",
                            "one",
                            "nine",
                            "nine",
                            "nine",
                            "nine",
                            "three",
                            "three",
                            "four",
                            "four",
                            "five",
                            "five"},
                           make_counting_transform_iterator(0, [](auto i) { return i != 1; })};

It's hard to give comments though on changes to make. Ideally though, I aim for unit tests that end up looking like this:

TYPED_TEST(FixedPointTypes, FixedPointSimple)
{
  using namespace numeric;
  using decimalXX  = TypeParam;
  using RepType    = cudf::device_storage_type_t<decimalXX>;
  using fp_wrapper = cudf::test::fixed_point_column_wrapper<RepType>;

  auto const mask     = cudf::test::fixed_width_column_wrapper<bool>{0, 1, 1, 1, 0, 0};
  auto const a        = fp_wrapper{{110, 220, 330, 440, 550, 660}, scale_type{-2}};
  auto const b        = fp_wrapper{{0, 0, 0, 0, 0, 0}, scale_type{-2}};
  auto const expected = fp_wrapper{{0, 220, 330, 440, 0, 0}, scale_type{-2}};
  auto const result   = cudf::copy_if_else(a, b, mask);

  CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result->view());
}

I might set up a follow up PR / give a better code talk at some point to share my thoughts / see what people think.

@mythrocks
Copy link
Contributor Author

@codereport, I agree about the indentation in the tests. These did not start out looking this way.I can try disabling the code formatter around the column definitions.

@codereport
Copy link
Contributor

@codereport, I agree about the indentation in the tests. These did not start out looking this way.I can try disabling the code formatter around the column definitions.

Yea, my recommendation would be to try and write the code in a way that plays well with clang-format. I intentionally keep my "literal" tests (aka tests that have input literally written out) small so that they don't line break. For "comprehensive" unit tests, i do the same with thrust iterators.

@mythrocks
Copy link
Contributor Author

mythrocks commented Nov 25, 2020

Yea, my recommendation would be to try and write the code in a way that plays well with clang-format.

I'm not sure to what degree this can be done, given the complexity of list<struct> construction. I'm in the process of giving this a go. I should have an updated version shortly.

P.S. I wasn't a fan of how clang-format lands up removing the list grouping in the child columns. I'm putting this back in manually.

@mythrocks
Copy link
Contributor Author

I should have an updated version shortly.

Done. Resolved merge conflicts, and added (arguably) better formatting for literal columns used in list<struct> tests.

@harrism
Copy link
Member

harrism commented Nov 26, 2020

I don't like turning clang-format off for more than a few lines. If someone comes in and changes some code and relies on clang-format to clean it up, we will end up with an accretion of terrible formatting. If it's too verbose to create a list<struct>, then use aliases and/or write a convenience wrapper.

@mythrocks
Copy link
Contributor Author

I don't like turning clang-format off for more than a few lines...

I've now restricted where clang-format is turned off to just the literal definitions, so that it's still clear where the list rows start/end.

@harrism
Copy link
Member

harrism commented Nov 26, 2020

I don't like turning clang-format off for more than a few lines...

I've now restricted where clang-format is turned off to just the literal definitions, so that it's still clear where the list rows start/end.

Thanks, that is much nicer.

@harrism harrism merged commit c34d9bf into rapidsai:branch-0.17 Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants