Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use size_hint when using read_to_end on something that impl's Read #113927

Closed
v1gnesh opened this issue Jul 21, 2023 · 7 comments
Closed

Use size_hint when using read_to_end on something that impl's Read #113927

v1gnesh opened this issue Jul 21, 2023 · 7 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@v1gnesh
Copy link

v1gnesh commented Jul 21, 2023

In the default_read_to_end implementation, a parameter for size_hint is available.
However, when doing the actual read_to_end, it's initialized with None.

When using the the below pattern to implement a custom iterator over a file, when the max possible line length is known, it would be helpful to make use of the size_hint parameter, assuming it's a slightly faster path (?).

This will be, in spirit, a continuation of #89165, #89582, and #110655

impl<R: Read> Iterator for LineIter<R> {
    type Item = io::Result<Vec<u8>>;

    fn next(&mut self) -> Option<Self::Item> {

        let mut data = Vec::with_capacity(MAX_LINE_LENGTH);
        file.by_ref().take(x).read_to_end(&mut data)

In short, I'm asking if the ability to pass a size_hint of x (from the take) into the read_to_end function will be useful, and if it'd be slightly better performance wise?

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 21, 2023
@Jules-Bertholet
Copy link
Contributor

@rustbot label T-libs I-slow

@rustbot rustbot added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 21, 2023
@chenyukang
Copy link
Member

chenyukang commented Jul 23, 2023

If you means fs file, seems we added size_hint at here.

rust/library/std/src/fs.rs

Lines 766 to 769 in 1c44af9

fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {
let size = buffer_capacity_required(self);
buf.reserve(size.unwrap_or(0));
io::default_read_to_end(self, buf, size)

For the default read_to_end , because we don't have a default method to get the guessing size?
I don't have the whole code snippet you are showing, if we want to passing the x from take, seems need to add an API such as read_to_end_with_hint(self, &mut data, x).

@v1gnesh
Copy link
Author

v1gnesh commented Jul 23, 2023

@chenyukang, yes I noticed the addition to fs has read_to_end.
I wanted to ask if a similar thing can be provided for use when reading larger than memory files (through an iter in this case).

x is an unsigned int that is calculated in the next() function. It is found at run-time as we iterate through a file, and contains the length of each Item.

It may be as simple as surfacing the default_read_to_end function, as that'll allow us to use Option<len> in place of None for the size_hint.

As a side note, I wonder if the ability to define functions with optional parameters will simplify things in general.
Certainly don't know anything about Rust's language design, etc. ... just curious.

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 24, 2023
@saethlin
Copy link
Member

I am very confused by this issue, maybe you can clear things up.

In short, I'm asking if the ability to pass a size_hint of x (from the take) into the read_to_end function will be useful, and if it'd be slightly better performance wise?

My basic response to this is: Why would we do this? Isn't it just as much work to call buf.reserve(capacity) as it is to call read_to_end(&mut buf, capacity)? std::fs::read has very good reason to handle the reserving itself, there's no way for a caller to reserve for it. But I don't see why the change described here needs a library change.

I wanted to ask if a similar thing can be provided for use when reading larger than memory files (through an iter in this case).

If the file is larger than memory, surely the size hint would also be larger than memory, and thus acting on it would just OOM faster. So I don't see how a size hint helps here. If you want to iterate through a file larger than memory, use BufReader, and you can use that type to pick a capacity that suits your use case.

As a side note, I wonder if the ability to define functions with optional parameters will simplify things in general.
Certainly don't know anything about Rust's language design, etc. ... just curious.

You're looking for https://github.com/rust-lang/rfcs, you may find some interesting reading material by searching the PRs on that repo. I did a quick skim and I didn't find anything about optional arguments specifically, but I did find a lot of discussion on an RFC about keyword arguments.

@v1gnesh
Copy link
Author

v1gnesh commented Jul 25, 2023

If the file is larger than memory, surely the size hint would also be larger than memory, and thus acting on it would just OOM faster. So I don't see how a size hint helps here.

Ah, sorry, I could have been clearer.
The file can be larger than memory, but it is structured in such a way that creating a custom iterator works, because the file is like this

| item | item | item |
u8 u8 u8 u8 ............ u8 u8 u8 u8 ............ u8 u8 u8 u8 ............

... where those u8s represent the length (including itself) of each Item in the file.
This is the x I refer to in the first comment. File size is huge, but the size of each line/item is easy to get.

Does that help?

Because read_to_end writes only to a Vec, and I'm looking to avoid the chance of growing the Vec because the Item sizes are known, either using a size_hint, or writing to a sized array instead of a Vec (I'm assuming) would be better?
Or is there a better way that you'd recommend?

@saethlin
Copy link
Member

It sounds to me like you don't want read_to_end at all. You probably want the feature called read_buf so you can create a large uninit buffer, then call Read::read_buf_exact to fill the buffer.

But more generally, this sounds like you're suggesting a new API, in which case you should follow https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html or try to start a conversation about your idea on https://internals.rust-lang.org/. This repo's issue tracker is not the right place for the kind of discussion you're trying to have; there's just not the right audience here. You can find the right audience for developing a new feature at those links.

@v1gnesh
Copy link
Author

v1gnesh commented Jul 25, 2023

Okay, thanks for the guidance! I'll close this off then.

@v1gnesh v1gnesh closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants