-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc: provide a mechanism for encoded message buffer recycling #6613
Conversation
2588937
to
6155d26
Compare
When writing data on an HTTP writer, the call completes once the frame is added to the control buffer queue. Therefore, there is no way to know when the actual write is completed, as it occurs asynchronously afterward. `transport.Options` now allows passing a callback that's called exactly once all the data has been committed to the underlying http2 layer.
6155d26
to
0262853
Compare
Users may now provide a `SharedBufferPool` for the purpose of encoding messages.
Users may now provide a `SharedBufferPool` for the purpose of encoding messages.
Provide a callback to the transport layer that inserts the encoded message buffer to the buffer pool once the message has been copied or sent over the wire and no longer referenced anywhere. Note that the encoded message buffer is also shared with user code through handlers. Those should not keep references to these buffers after they return.
Codecs may now implement an additional interface, `BufferedCodec`. It gives codec writers an option to use pre-existing memory when marshaling messages, if possible. Implementing the interface is entirely optional, and no changes to existing codecs are necessary.
When the user provides a buffer pool for message encoding and the codec implements `codec.BufferedCodec`, `MarshalWithBuffer` is called instead of `Marshal`. We detect whether the codec supports memory reuse through conditional casting, akin to the technique `io.Copy` uses when the provided interfaces support either `io.RederFrom` or `io.WriterTo`.
The implementation uses the `proto.Buffer` API that allows passing a pre-existing buffer for the library to marshal into.
Asserts that passing a buffer pool combined with a compatible codec (protobuf in this case) results in buffer reuse.
0262853
to
bfa059a
Compare
I ran the default benchmarks on my branch as well. This compare without (before) providing a encoder buffer pool, and with one (after):
|
Hey @HippoBaro looks like we were working on the same thing (see #6608) though yours seems significantly further along. Either way I just want to see this feature checked in, regardless of how it's implemented so feel free to steal whatever code you might find useful (if any) from my PR. |
Add an option to use a shared encoder buffer pool during benchmarks.
e1ae3c0
to
feb5b68
Compare
Apologies! I didn't think of looking at closed PR; I took a quick look at your PR, and it goes one step further to enable memory reuse when compressing. Mine doesn't for the sake of keeping it simple, but this is also something I would like to get merged eventually. |
No worries, better two brains on this than one! |
@HippoBaro : This PR contains a bunch of changes. We appreciate the contribution. But it would be better if you can open an issue, explain the problem you are facing, and the proposed solution. That way we can discuss options for the solution before actually having to look at code. That way, we would also be able to break up the effort into smaller, more manageable PRs. Thanks. |
Thank you @easwars! I added an issue to discuss the solution: #6619. I will leave this implementation PR open for reference and make it reflect any changes to the approach we discuss. |
Marking this as blocked on: #6619 |
I hope you don't mind, but I'd like to close this until we have more time to spend on these types of issues, and that could be some time (until next quarter). We want to more holistically look at buffer sharing/re-use before continuing down this path, and don't have the cycles to do so for now. Let's discuss further in the issues, and please save your branch for now in case we do decide we should implement this, or want to reference it for discussions. |
RELEASE NOTES:
This PR adds a new public API to allow users to pass a buffer pool
(implementing the pre-existing
grpc.SharedBufferPool
). When used inconjunction with a compatible
encoding.Codec
, the memory used tomarshal messages may be reused indefinitely, significantly reducing the
garbage collection overhead for clients and servers dealing with high
number of messages or high volume of data.
Motivation
We are currently working on a service that uses gRPC streaming to stream
(potentially) large files, in chunks, back to gRPC clients over the
network. We measured that the Go allocation volume per second is roughly
equal to the network throughput of the host. This creates GC cycles that
introduce latency spikes and prevent us from predictably saturating the
network at a reasonable CPU cost.
After investigation, we have isolated the source of most of these
allocations to protobuf slice creation during message serialization.
Results
Using this patch-set, CPU usage was reduced by 27% and by up to 64% in
one (quite synthetic) extreme case. Allocation volume per second dropped
almost 100%, from 805MiB/s to 5MiB/s.
This PR also commits some benchmarks to demonstrate the effect in an
ideal scenario:
Notes
I am unsure if this is a direction the project maintainers are
comfortable pursuing, and this is my first time contributing. Please let
me know if you'd rather I break up the PR into smaller ones (the commit
log is clean and should let us do that easily). Or, if there is a better way
to achieve this!
Additionally, there are a few things of note I considered and would
appreciate feedback on:
can already be passed today, such as through the
grpc.RecvBufferPool
or
[With]SharedWriteBuffer
options;much the same way as encoders do (some already store a buffer pool
internally);
logic that only pays off when the messages are larger than around 64
bytes.
to handlers, which may keep references to them. This strikes me as a bad
practice that would lead to bugs, but I am unable to find a written
documentation about it.
Thank you!