Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: AlignmentBuffer<> helper for block-oriented Hashes #3693

Merged
merged 6 commits into from
Sep 19, 2023

Conversation

reneme
Copy link
Collaborator

@reneme reneme commented Sep 12, 2023

Introduction and Motivation

As promised I am looking into a way to simplify the alignment code in block-oriented algorithms. To my mind, there is quite a bit of code duplication in many implementations. Also, its fairly critical code that handles raw memory, offsets and internal state that could easily go wrong.

In fact, I even found an issue in the existing code, while working on this. Below is the ::add_data() implementation for MDx_HashFunction in Botan 3.1.1. Latest master is slightly different but contains the same issue.

Let's assume this is called twice on a pristine instance of SHA-512 (block size 128 bytes):

  1. add_data({...}, 1)
  2. add_data({...}, 126)

The first call fills m_buffer with one byte and sets m_position = 1. Both is happening at buffer_insert at the very end of the function (because m_position was initially 0).
In the second call we append an additional 126 bytes to m_buffer (in the first buffer_insert(), as m_position is initially 1). The block still isn't full, so no compress_n() is invoked.
The issue: We again set m_position = 127 only at the very end after the second buffer_insert (that copies the 126 bytes once more into the same buffer location). The offset calculations check out, so there's no functional problem. But frankly, to me this doesn't look like intentional control flow.

Side note: one needs to know that buffer_insert() "fills m_buffer with as many bytes from input as possible, but at most m_buffer.size() - m_position".

void MDx_HashFunction::add_data(const uint8_t input[], size_t length) {
   const size_t block_len = static_cast<size_t>(1) << m_block_bits;

   m_count += length;

   if(m_position) {
      buffer_insert(m_buffer, m_position, input, length);

      if(m_position + length >= block_len) {
         compress_n(m_buffer.data(), 1);
         input += (block_len - m_position);
         length -= (block_len - m_position);
         m_position = 0;
      }
   }

   // Just in case the compiler can't figure out block_len is a power of 2
   const size_t full_blocks = length >> m_block_bits;
   const size_t remaining = length & (block_len - 1);

   if(full_blocks > 0) {
      compress_n(input, full_blocks);
   }

   buffer_insert(m_buffer, m_position, input + full_blocks * block_len, remaining);
   m_position += remaining;
}

Description of Proposed Improvement

This is an attempt on encapsulating the intermittent buffering into an opinionated helper class with a (hopefully) clear and minimal interface. @randombit I'd be grateful for feedback early-on. I'll describe the details of the helper class' API in Doxygen comments.

With the helper, MDx_HashFunction::add_data() would roughly look like that and not pose the issue described above:

void MDx_HashFunction::add_data(std::span<const uint8_t> input) {
   BufferSlicer in(input);

   while(!in.empty()) {
      if(const auto one_block = m_buffer.handle_unaligned_data(in)) {
         compress_n(one_block->data(), 1);
      }

      if(m_buffer.in_alignment()) {
         const auto [aligned_data, full_blocks] = m_buffer.aligned_data_to_process(in);
         if(full_blocks > 0) {
            compress_n(aligned_data.data(), full_blocks);
         }
      }
   }

   m_count += input.size();
}

A quick trail benchmark run (average of 4 runs ./botan speed SHA-1) :

Branch MiB/s
this1 1830
master 1829

@coveralls
Copy link

coveralls commented Sep 12, 2023

Coverage Status

coverage: 91.756% (+0.03%) from 91.726% when pulling 66438aa on Rohde-Schwarz:refactor/alignment_buffer into a8d4d18 on randombit:master.

@reneme reneme force-pushed the refactor/alignment_buffer branch 2 times, most recently from 53a1765 to 71d6866 Compare September 12, 2023 11:17
@randombit
Copy link
Owner

Haven't reviewed this carefully yet but the broad idea looks ok to me

@reneme
Copy link
Collaborator Author

reneme commented Sep 12, 2023

Haven't reviewed this carefully yet but the broad idea looks ok to me

That'll do for now, thanks. I'll push this forward tomorrow.

@reneme reneme force-pushed the refactor/alignment_buffer branch 2 times, most recently from 1b7434e to 028853c Compare September 13, 2023 09:37
@reneme
Copy link
Collaborator Author

reneme commented Sep 13, 2023

This converged nicely for all hash implementations. The HashFunction::add_data() implementations are now basically all equivalent and unifying that into a single base implementation (using an abstract compress() method) should be easy to do.

I'll add more unit tests and documentation for AlignmentBuffer<> after lunch and will remove the "Draft" status afterwards.

Future Work

Applying this to the hash implementations is probably enough complexity for this pull request. Other places (e.g. block cipher modes, MACs) might benefit from this refactoring as well, though.

The Keccak permutation state may or may not benefit from AlignmentBuffer<> as well. Though, with the Keccak state not being a simple byte buffer, there's more work to be done.

@randombit Are there more places?

@reneme reneme force-pushed the refactor/alignment_buffer branch 2 times, most recently from b5c4340 to f8dfdc3 Compare September 13, 2023 12:48
@reneme reneme marked this pull request as ready for review September 13, 2023 12:48
@reneme reneme force-pushed the refactor/alignment_buffer branch 3 times, most recently from 59d13ac to 6edaf07 Compare September 13, 2023 13:29
@reneme reneme mentioned this pull request Sep 13, 2023
@reneme reneme changed the title Refactor: AlignmentBuffer<> helper for block-oriented Algorithms Refactor: AlignmentBuffer<> helper for block-oriented Hashes Sep 13, 2023
Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, thanks. I left a few nit comments plus some ideas for possible improvements, ptal.

src/lib/hash/blake2/blake2b.cpp Show resolved Hide resolved
src/lib/utils/alignment_buffer.h Outdated Show resolved Hide resolved
src/lib/utils/alignment_buffer.h Show resolved Hide resolved
* @tparam T the container type to use for the alignment buffer
*/
template <concepts::contiguous_container T>
class AlignmentBuffer {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about making defer_final_block a template parameter? Since I don't think we do (or would ever) have a use case where you would toggle between them at runtime.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, but perhaps more contentiously, if we made the buffer size a template parameter then

a) we might gain some minimal performance improvements due to the the block size being known to be fixed, eg when dividing by the buffer size.
b) (more importantly) we could skip using a container entirely and store the buffer in a simple array, avoiding the heap allocation.

This would be less flexible though, which might matter for some future case where we are doing runtime dispatch of the specific algorithm based on hardware support. Eg we want to align to 16 bytes if running regular C++ but 128 bytes if we're doing something with SIMD.

Also it would require exposing the internal block size in the headers of the various algorithms, but these headers are now internal only anyway so this doesn't seem so bad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm certainly in favor of making things more "constexpr" here. Both the block size and the defer_final_block make sense as template parameters. Also I've been tinkering with a template-customizable append() function (to allow using this in Keccak where -- among other things -- we'd need to XOR instead of "overwrite"). The latter is certainly future work, though.

Regarding std::array<>: I had that in mind, but didn't go for it because we can't use the mlocked allocator for stack allocations. Is that something we can (and want to) live with?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the mlock aspect is that critical tbh. We spray intermediate values everywhere on the stack in the course of computing anything, and it's pretty unavoidable that someone with access to a core dump or swap file is going to be able to recover information.

If we go the array route should in any case scrub the memory on clear or destructor; that already puts us ahead of the game compared to most implementations I've seen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll look into that tomorrow.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied both suggestions, making defer_final_block and the block_size into template parameters. The latter proved problematic for MDx_HashFunction as this info is currently a runtime information (behind a virtual method). Once #3550 is finished, that shouldn't be a problem anymore, though.

src/lib/utils/alignment_buffer.h Show resolved Hide resolved
@reneme
Copy link
Collaborator Author

reneme commented Sep 13, 2023

Thanks for the swift review! Re:BOTAN_STATE_CHECK(): I changed the asserts (and also used BOTAN_ARG_CHECK where applicable). But, is that really correct? This class is internal and tightly coupled with the buffered computation use-case. If any of those ever throw, it is because we made a programming mistake in the consumer implementations. No particularly hard feelings about this, though, just double-checking.

@randombit
Copy link
Owner

Hmm you have a point there. In a nominal sense Invalid_State is the correct throw ("A request was made on an object while the object was in a state where the operation cannot be performed.") but you're right that since this is an internal-only API it throwing implies a bug elsewhere within the library, rather than some usage error by the application.

@reneme
Copy link
Collaborator Author

reneme commented Sep 13, 2023

since this is an internal-only API it throwing implies a bug elsewhere within the library, rather than some usage error by the application.

I read that as "let's revert 5a3ff50", right?

This commit contains a few additional tests for the state checks. I'm strongly assuming that it is not sensible to try and check asserts in the unittests, though. Particularly after introducing ./configure.py --terminate-on-assert. 😅

@randombit
Copy link
Owner

Yeah let's revert that, sorry for the confusion.

@reneme reneme force-pushed the refactor/alignment_buffer branch 3 times, most recently from 8e70dfc to f28bffe Compare September 14, 2023 15:52
src/lib/utils/alignment_buffer.h Show resolved Hide resolved
src/lib/hash/blake2/blake2b.h Outdated Show resolved Hide resolved
Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good - left a couple of nitpicky comments

src/lib/hash/skein/skein_512.h Outdated Show resolved Hide resolved
src/lib/utils/alignment_buffer.h Outdated Show resolved Hide resolved
@reneme
Copy link
Collaborator Author

reneme commented Sep 17, 2023

For the record: I pushed another change after your approval (documenting the enum class AlignmentBufferFinalBlock). Please check and merge at your convenience.

Copy link
Owner

@randombit randombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@randombit randombit merged commit 1f26e75 into randombit:master Sep 19, 2023
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants