add a resource adapter to align on a specified size #768

rongou · 2021-05-04T21:20:06Z

Fixes #762

include/rmm/mr/device/block_aligned_resource_adaptor.hpp

tests/mr/device/block_aligned_mr_tests.cpp

include/rmm/mr/device/aligned_resource_adaptor.hpp

tests/mr/device/aligned_mr_tests.cpp

…adapter

include/rmm/mr/device/aligned_resource_adaptor.hpp

…adapter

hyperbolic2346 · 2021-05-17T16:32:03Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+ * Since a larger alignment size has some additional overhead, the user can specify a threshold
+ * size. If an allocation's size falls below the threshold, it is aligned to the default size. Only
+ * allocations with a size above the threshold are aligned to the custom alignment size.


I still don't understand this part of the change. Why would you not want to align smaller allocations? If GDS requires it, it requires it. If you don't want it, use a different adaptor for those allocations. I get that a small allocation is wasteful if the alignment is high, but I don't see why we would want to return an allocation that wasn't aligned properly.

Roughly speaking there are two types of allocations. The first is the small allocations needed for, say, a boolean mask. These have a pretty short lifetime, and would never be written out to disk. The second type is the large buffers that hold columnar data. They are longer lived, can be used for shuffle, and may be spilled to host memory/disk or GDS when GPU memory is full. Only these large buffers that are read/written by GDS need to be 4k aligned.

If we count the frequencies of allocations by size, they tend to follow a Zipf/Pareto distribution with smaller allocations appearing more frequently. If we 4k align all of them, not only we'd be wasting GPU memory, it also adds overhead keeping track of the mapping. In a long running job that's already under memory pressure, it could add noticeable overhead.

To be clear, the smaller allocations are still aligned by upstream memory resources (both pool and arena mr align them to 256), they are just not aligned to the bigger size.

Should the allocations that don't care about alignment use a different adaptor then? One adaptor with no alignment requirements and one with alignment? It seems very odd to me to have an aligned allocator not align certain allocations. If the allocator aligns data to a specific alignment, I would expect that all allocations are aligned if they go through that adaptor.

In theory yes, but then you'd have the same branching logic in the client code. Right now for the cuDF JNI layer, we only have one device memory resource.

You can always set the threshold to 0 and then all allocations are aligned the same way. Having this here is really just a convenience for our use case.

BTW GDS does handle unaligned buffers, it'll just copy them to internal bounce buffers, so slightly less efficient.

hyperbolic2346 · 2021-05-17T16:33:50Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+  /**
+   * @brief Construct an aligned resource adaptor using `upstream` to satisfy allocation requests.
+   *
+   * @throws `rmm::logic_error` if `upstream == nullptr`


Suggested change

* @throws `rmm::logic_error` if `upstream == nullptr`

* @throws `rmm::logic_error` if `upstream == nullptr`

* @throws `rmm::logic_error` if `allocation_alignment is not a multiple of 256`

* @throws `rmm::logic_error` if `alignment_threshold is not a multiple of 256`

hyperbolic2346 · 2021-05-17T16:34:39Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+      alignment_threshold_{alignment_threshold}
+  {
+    RMM_EXPECTS(nullptr != upstream, "Unexpected null upstream resource pointer.");
+    RMM_EXPECTS(allocation_alignment % 256 == 0, "Allocation alignment is not a multiple of 256.");


Why is this a requirement? The alignment supported could be arbitrary, but the use of detail::align_up forces this to be a power of 2. This code would pass for an alignment of 768 bytes, but fail in the call to detail::align_up.

hyperbolic2346 · 2021-05-17T16:36:26Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+   */
+  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
+  {
+    if (allocation_alignment_ == default_allocation_alignment || bytes < alignment_threshold_) {


This seems dangerous. What happens if CUDA changes the default alignment? This could very easily be forgotten and result in a bug.

This is just an adapter, the upstream memory resource should properly align the allocation.

Well, it will align it how it wants, which may not match the if statement here. In the admittedly unlikely event that CUDA started to return 16 byte aligned memory, this code would pass through to it if the user requested 256-byte alignment unless this was remembered and adjusted. At that point this would be incorrect.

This is certainly a pedantic complaint at best, but I have seen bugs like this in the past and I would like to avoid such things as they are difficult to track. If we can't find the default alignment from CUDA, I am ok with letting this go.

Good point. We have 256 hard coded all over the place. I added a constant in aligned.hpp and replaced all the hard-coded values.

hyperbolic2346 · 2021-05-17T16:37:11Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+      auto const address         = reinterpret_cast<std::size_t>(pointer);
+      auto const aligned_address = rmm::detail::align_up(address, allocation_alignment_);
+      void* aligned_pointer      = reinterpret_cast<void*>(aligned_address);
+      if (pointer != aligned_pointer) {


This check is a nice touch.

hyperbolic2346 · 2021-05-17T16:41:25Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+    if (this == &other)
+      return true;
+    else {
+      auto aligned_other = dynamic_cast<aligned_resource_adaptor<Upstream> const*>(&other);


I don't think this will do what you want. If the adaptors are not the exact same adaptor, you will check the upstreams for equality, but are not verifying this level at all. I would expect checks for the alignment values and thresholds.

hyperbolic2346 · 2021-05-17T16:43:07Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+   */
+  [[nodiscard]] std::pair<size_t, size_t> do_get_mem_info(cuda_stream_view stream) const override
+  {
+    return upstream_->get_mem_info(stream);


This isn't 100% accurate and I think we should indicate that somewhere. If I have an alignment of 4 megs and attempt to allocate all memory that this returns as available the allocation will probably fail due to alignment requirements. I think adding a comment in the docs above the function is sufficient though.

hyperbolic2346 · 2021-05-17T16:44:45Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+   */
+  std::size_t upstream_allocation_size(std::size_t bytes) const
+  {
+    auto const aligned_size = detail::align_up(bytes, allocation_alignment_);


Why are we padding out the allocation to alignment size? Is the expectation that this adaptor is the only one active on the GPU and we want the next allocation will come back aligned? If so, we're even more wasteful though because of the alignment - default alignment padding being added on the next line. I'm not understanding the value of this.

For GDS to consider a buffer to be 4k aligned, both the base pointer and size need to be multiples of 4k.

I would expect that the incoming request for GDS is aligned to 4k then, not that the rmm adaptor makes this assumption for all incoming requests.

The allocation requests could come from any operator, which may not be aware of GDS at all.

hyperbolic2346 · 2021-05-17T16:58:53Z

tests/mr/device/aligned_mr_tests.cpp

+{
+  mock_resource mock;
+  auto construct_alignment = [](auto* r, std::size_t a) { aligned_adaptor mr{r, a}; };
+  EXPECT_THROW(construct_alignment(&mock, 255), rmm::logic_error);


I don't understand why we are mocking these things here. Why aren't we testing the actual resource?

We are, the mock is the upstream resource, which we don't really care about.

hyperbolic2346 · 2021-05-17T16:59:31Z

tests/mr/device/aligned_mr_tests.cpp

+  cuda_stream_view stream;
+  void* pointer = reinterpret_cast<void*>(123);
+  // device_memory_resource aligns to 8.
+  EXPECT_CALL(mock, do_allocate(8, stream)).WillOnce(Return(pointer));


This seems to circumvent the entire class we are testing. Am I reading this incorrectly?

The mock is the upstream resource.

…-adapter

harrism

I requested before that the tests should check that the resulting alignment of the allocations is correct. But now they just seem to be checking if mocked calls return mocked pointers.

harrism · 2021-05-18T04:18:59Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+  }
+
+  /**
+   * @brief Compare the upstream resource to another.


This doesn't just compare the upstream resource. It compares this resource to another.

Suggested change

* @brief Compare the upstream resource to another.

* @brief Compare this resource to another.

rongou

@harrism added a test to verify the pointer returned from a real memory resource.

rongou · 2021-05-18T16:08:36Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+ * Since a larger alignment size has some additional overhead, the user can specify a threshold
+ * size. If an allocation's size falls below the threshold, it is aligned to the default size. Only
+ * allocations with a size above the threshold are aligned to the custom alignment size.


In theory yes, but then you'd have the same branching logic in the client code. Right now for the cuDF JNI layer, we only have one device memory resource.

You can always set the threshold to 0 and then all allocations are aligned the same way. Having this here is really just a convenience for our use case.

BTW GDS does handle unaligned buffers, it'll just copy them to internal bounce buffers, so slightly less efficient.

rongou · 2021-05-18T16:21:54Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+   */
+  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
+  {
+    if (allocation_alignment_ == default_allocation_alignment || bytes < alignment_threshold_) {


Good point. We have 256 hard coded all over the place. I added a constant in aligned.hpp and replaced all the hard-coded values.

rongou · 2021-05-18T16:22:09Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+  }
+
+  /**
+   * @brief Compare the upstream resource to another.


rongou · 2021-05-18T16:23:37Z

include/rmm/mr/device/aligned_resource_adaptor.hpp

+   */
+  std::size_t upstream_allocation_size(std::size_t bytes) const
+  {
+    auto const aligned_size = detail::align_up(bytes, allocation_alignment_);


The allocation requests could come from any operator, which may not be aware of GDS at all.

…-adapter

rongou · 2021-05-19T19:09:54Z

@gpucibot merge

Depends on rapidsai/rmm#768. Authors: - Rong Ou (https://github.com/rongou) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #8266

add a resource adapter to align on block size

b9a9922

rongou added feature request New feature or request 3 - Ready for review Ready for review by team non-breaking Non-breaking change cpp Pertains to C++ code labels May 4, 2021

rongou requested review from hyperbolic2346 and jrhemstad May 4, 2021 21:20

rongou requested review from a team as code owners May 4, 2021 21:20

rongou self-assigned this May 4, 2021

rongou requested a review from harrism May 4, 2021 21:20

github-actions bot added the CMake label May 4, 2021

jrhemstad reviewed May 4, 2021

View reviewed changes

include/rmm/mr/device/block_aligned_resource_adaptor.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed May 4, 2021

View reviewed changes

include/rmm/mr/device/block_aligned_resource_adaptor.hpp Outdated Show resolved Hide resolved

harrism reviewed May 5, 2021

View reviewed changes

tests/mr/device/block_aligned_mr_tests.cpp Outdated Show resolved Hide resolved

rename to aligned_resource_adapter

9c99339

rongou changed the title ~~add a resource adapter to align on block size~~ add a resource adapter to align on a specified size May 5, 2021

clang format

1020688

hyperbolic2346 suggested changes May 6, 2021

View reviewed changes

rongou added 2 commits May 12, 2021 15:57

Merge remote-tracking branch 'upstream/branch-0.20' into block-align-…

e52427d

…adapter

make base pointer naturally aligned

b4d282e

rongou requested review from harrism, hyperbolic2346 and jrhemstad May 14, 2021 00:07

jrhemstad requested changes May 14, 2021

View reviewed changes

include/rmm/mr/device/aligned_resource_adaptor.hpp Outdated Show resolved Hide resolved

rongou added 3 commits May 14, 2021 10:06

Merge remote-tracking branch 'upstream/branch-0.20' into block-align-…

b82ff5e

…adapter

keep track of aligned pointers in a hash map

6319e70

clang format

4a323ca

rongou requested a review from jrhemstad May 14, 2021 21:14

jrhemstad approved these changes May 17, 2021

View reviewed changes

hyperbolic2346 suggested changes May 17, 2021

View reviewed changes

rongou added 2 commits May 17, 2021 11:34

Merge remote-tracking branch 'upstream/branch-21.06' into block-align…

fc8655c

…-adapter

address review comments

08d7357

rongou mentioned this pull request May 18, 2021

support RMM aligned resource adapter in JNI [skip ci] rapidsai/cudf#8266

Merged

harrism requested changes May 18, 2021

View reviewed changes

extract constant for cuda alignment size of 256

c283cb2

rongou commented May 18, 2021

View reviewed changes

clang format

ce01a8f

rongou requested review from hyperbolic2346, jrhemstad and harrism May 18, 2021 16:27

harrism approved these changes May 19, 2021

View reviewed changes

jrhemstad approved these changes May 19, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/branch-21.06' into block-align…

c12afa6

…-adapter

hyperbolic2346 approved these changes May 19, 2021

View reviewed changes

rapids-bot bot merged commit 2a5aa46 into rapidsai:branch-21.06 May 19, 2021

rongou deleted the block-align-adapter branch May 20, 2021 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a resource adapter to align on a specified size #768

add a resource adapter to align on a specified size #768

rongou commented May 4, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 18, 2021

rongou May 18, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 18, 2021

rongou May 18, 2021

hyperbolic2346 May 17, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 18, 2021

rongou May 18, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

hyperbolic2346 May 17, 2021

rongou May 17, 2021

harrism left a comment

harrism May 18, 2021

rongou May 18, 2021

rongou left a comment

rongou May 18, 2021

rongou May 18, 2021

rongou May 18, 2021

rongou May 18, 2021

rongou commented May 19, 2021

	* @brief Compare the upstream resource to another.
	* @brief Compare this resource to another.

add a resource adapter to align on a specified size #768

add a resource adapter to align on a specified size #768

Conversation

rongou commented May 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rongou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rongou commented May 19, 2021