-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: AlignmentBuffer<> helper for block-oriented Hashes #3693
Refactor: AlignmentBuffer<> helper for block-oriented Hashes #3693
Conversation
4135030
to
cd9f6b6
Compare
53a1765
to
71d6866
Compare
Haven't reviewed this carefully yet but the broad idea looks ok to me |
That'll do for now, thanks. I'll push this forward tomorrow. |
1b7434e
to
028853c
Compare
This converged nicely for all hash implementations. The I'll add more unit tests and documentation for Future WorkApplying this to the hash implementations is probably enough complexity for this pull request. Other places (e.g. block cipher modes, MACs) might benefit from this refactoring as well, though. The Keccak permutation state may or may not benefit from @randombit Are there more places? |
b5c4340
to
f8dfdc3
Compare
59d13ac
to
6edaf07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, thanks. I left a few nit comments plus some ideas for possible improvements, ptal.
* @tparam T the container type to use for the alignment buffer | ||
*/ | ||
template <concepts::contiguous_container T> | ||
class AlignmentBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about making defer_final_block
a template parameter? Since I don't think we do (or would ever) have a use case where you would toggle between them at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, but perhaps more contentiously, if we made the buffer size a template parameter then
a) we might gain some minimal performance improvements due to the the block size being known to be fixed, eg when dividing by the buffer size.
b) (more importantly) we could skip using a container entirely and store the buffer in a simple array, avoiding the heap allocation.
This would be less flexible though, which might matter for some future case where we are doing runtime dispatch of the specific algorithm based on hardware support. Eg we want to align to 16 bytes if running regular C++ but 128 bytes if we're doing something with SIMD.
Also it would require exposing the internal block size in the headers of the various algorithms, but these headers are now internal only anyway so this doesn't seem so bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm certainly in favor of making things more "constexpr
" here. Both the block size and the defer_final_block
make sense as template parameters. Also I've been tinkering with a template-customizable append()
function (to allow using this in Keccak where -- among other things -- we'd need to XOR instead of "overwrite"). The latter is certainly future work, though.
Regarding std::array<>
: I had that in mind, but didn't go for it because we can't use the mlocked allocator for stack allocations. Is that something we can (and want to) live with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the mlock aspect is that critical tbh. We spray intermediate values everywhere on the stack in the course of computing anything, and it's pretty unavoidable that someone with access to a core dump or swap file is going to be able to recover information.
If we go the array route should in any case scrub the memory on clear
or destructor; that already puts us ahead of the game compared to most implementations I've seen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'll look into that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied both suggestions, making defer_final_block
and the block_size
into template parameters. The latter proved problematic for MDx_HashFunction
as this info is currently a runtime information (behind a virtual method). Once #3550 is finished, that shouldn't be a problem anymore, though.
6edaf07
to
5a3ff50
Compare
Thanks for the swift review! Re: |
Hmm you have a point there. In a nominal sense |
I read that as "let's revert 5a3ff50", right? This commit contains a few additional tests for the state checks. I'm strongly assuming that it is not sensible to try and check asserts in the unittests, though. Particularly after introducing |
Yeah let's revert that, sorry for the confusion. |
8e70dfc
to
f28bffe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good - left a couple of nitpicky comments
4d7f083
to
66bd11a
Compare
7738948
to
66438aa
Compare
For the record: I pushed another change after your approval (documenting the enum class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Introduction and Motivation
As promised I am looking into a way to simplify the alignment code in block-oriented algorithms. To my mind, there is quite a bit of code duplication in many implementations. Also, its fairly critical code that handles raw memory, offsets and internal state that could easily go wrong.
In fact, I even found an issue in the existing code, while working on this. Below is the
::add_data()
implementation forMDx_HashFunction
in Botan 3.1.1. Latest master is slightly different but contains the same issue.Let's assume this is called twice on a pristine instance of SHA-512 (block size 128 bytes):
add_data({...}, 1)
add_data({...}, 126)
The first call fills
m_buffer
with one byte and setsm_position = 1
. Both is happening atbuffer_insert
at the very end of the function (becausem_position
was initially 0).In the second call we append an additional 126 bytes to
m_buffer
(in the firstbuffer_insert()
, asm_position
is initially 1). The block still isn't full, so nocompress_n()
is invoked.The issue: We again set
m_position = 127
only at the very end after the secondbuffer_insert
(that copies the 126 bytes once more into the same buffer location). The offset calculations check out, so there's no functional problem. But frankly, to me this doesn't look like intentional control flow.Side note: one needs to know that
buffer_insert()
"fillsm_buffer
with as many bytes frominput
as possible, but at mostm_buffer.size() - m_position
".Description of Proposed Improvement
This is an attempt on encapsulating the intermittent buffering into an opinionated helper class with a (hopefully) clear and minimal interface. @randombit I'd be grateful for feedback early-on. I'll describe the details of the helper class' API in Doxygen comments.
With the helper,
MDx_HashFunction::add_data()
would roughly look like that and not pose the issue described above:A quick trail benchmark run (average of 4 runs
./botan speed SHA-1
) :