-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up String::push
and String::insert
#124810
base: master
Are you sure you want to change the base?
speed up String::push
and String::insert
#124810
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
#[unstable(feature = "char_internals", reason = "exposed only for libstd", issue = "none")] | ||
#[doc(hidden)] | ||
#[inline] | ||
pub unsafe fn encode_utf8_raw_unchecked(code: u32, dst: &mut [MaybeUninit<u8>]) -> &mut [u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pondering: How useful is it to be dealing in slices for this? Could this return, say, (usize, [u8; 4])
and thus not ever need to worry about the indirections? That would presumably resolve the zeroing issue, since it'd just be shifting together a 32-bit number (since [u8; 4]
is passes as i32
in our LLVM ABI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, some functions use the encode_utf8
API, which requires a slice. Using a [u8; 4]
makes it act as a number indeed, but it still needs to be xor
ed with itself and is later moved to a buffer for bcmp
in this case.
_ => self.vec.extend_from_slice(ch.encode_utf8(&mut [0; 4]).as_bytes()), | ||
let len = self.len(); | ||
let ch_len = ch.len_utf8(); | ||
self.reserve(ch_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the previous, I wonder about making this .reserve(4)
, and just always copying the 4 bytes into the buffer, with only the set_len
needing to use the actual length, so that it's always just one store rather than needing a variable number of stores depending on the data width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reserving 4 bytes and doing a single store makes little difference other than getting rid of the unsafe, but reserving 4 bytes and doing the same writes makes the non-reallocating path 20% instructions shorter. However, it may cause the string to take up extra space: say, an ASCII char is pushed to a 63-byte string, making it allocate 128 bytes. Is this acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question. I started a zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/String.3A.3Apush.20capacity.20guarantees/near/438525052
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting here that this effectively just means that you'll have up to three extra bytes reserved always, since reserve takes into account existing capacity. It does slow down the case of, say, adding a newline to an existing string, but it would speed up repeated insertions, which are probably the bigger performance hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a variety of thoughts; let me know what you think.
Also, is there anything here for which it would make sense to have a codegen test to confirm what's happening? Or some other test to help confirm it's better?
A codegen check for the absence of |
Insufficient permissions to issue commands to rust-timer. |
@lincot: 🔑 Insufficient privileges: not in try users |
Insufficient permissions to issue commands to rust-timer. |
☔ The latest upstream changes (presumably #116113) made this pull request unmergeable. Please resolve the merge conflicts. |
There are merge commits (commits with multiple parents) in your changes. We have a no merge policy so these commits will need to be removed for this pull request to be merged. You can start a rebase with the following commands:
The following commits are merge commits: |
9511918
to
89fa55e
Compare
☔ The latest upstream changes (presumably #127840) made this pull request unmergeable. Please resolve the merge conflicts. |
89fa55e
to
2cb20b3
Compare
☔ The latest upstream changes (presumably #130511) made this pull request unmergeable. Please resolve the merge conflicts. |
Addresses the concerns described in #116235.
The performance gain comes mainly from avoiding temporary buffers.
Complex pattern matching in
encode_utf8
(introduced in #67569) has been simplified to a comparison and an exhaustivematch
in theencode_utf8_raw_unchecked
helper function. It takes a slice ofMaybeUninit<u8>
because otherwise we'd have to construct a normal slice to uninitialized data, which is not desirable, I guess.Several functions still have that unneeded zeroing, but a single instruction is not that important, I guess.
@rustbot label T-libs C-optimization A-str