Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize File::read_to_end and read_to_string #89582

Merged
merged 1 commit into from
Oct 9, 2021

Commits on Oct 7, 2021

  1. Optimize File::read_to_end and read_to_string

    Reading a file into an empty vector or string buffer can incur
    unnecessary `read` syscalls and memory re-allocations as the buffer
    "warms up" and grows to its final size. This is perhaps a necessary evil
    with generic readers, but files can be read in smarter by checking the
    file size and reserving that much capacity.
    
    `std::fs::read` and `read_to_string` already perform this optimization:
    they open the file, reads its metadata, and call `with_capacity` with
    the file size. This ensures that the buffer does not need to be resized
    and an initial string of small `read` syscalls.
    
    However, if a user opens the `File` themselves and calls
    `file.read_to_end` or `file.read_to_string` they do not get this
    optimization.
    
    ```rust
    let mut buf = Vec::new();
    file.read_to_end(&mut buf)?;
    ```
    
    I searched through this project's codebase and even here are a *lot* of
    examples of this. They're found all over in unit tests, which isn't a
    big deal, but there are also several real instances in the compiler and
    in Cargo. I've documented the ones I found in a comment here:
    
    rust-lang#89516 (comment)
    
    Most telling, the `Read` trait and the `read_to_end` method both show
    this exact pattern as examples of how to use readers. What this says to
    me is that this shouldn't be solved by simply fixing the instances of it
    in this codebase. If it's here it's certain to be prevalent in the wider
    Rust ecosystem.
    
    To that end, this commit adds specializations of `read_to_end` and
    `read_to_string` directly on `File`. This way it's no longer a minor
    footgun to start with an empty buffer when reading a file in.
    
    A nice side effect of this change is that code that accesses a `File` as
    a bare `Read` constraint or via a `dyn Read` trait object will benefit.
    For example, this code from `compiler/rustc_serialize/src/json.rs`:
    
    ```rust
    pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
        let mut contents = Vec::new();
        match rdr.read_to_end(&mut contents) {
    ```
    
    Related changes:
    
    - I also added specializations to `BufReader` to delegate to
      `self.inner`'s methods. That way it can call `File`'s optimized
      implementations if the inner reader is a file.
    
    - The private `std::io::append_to_string` function is now marked
      `unsafe`.
    
    - `File::read_to_string` being more efficient means that the performance
      note for `io::read_to_string` can be softened. I've added @camelid's
      suggested wording from:
    
      rust-lang#80218 (comment)
    jkugelman committed Oct 7, 2021
    Configuration menu
    Copy the full SHA
    a990c76 View commit details
    Browse the repository at this point in the history