Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading file from disk redirected to stdin is faster than directly from disk #99458

Closed
Rudxain opened this issue Jul 19, 2022 · 5 comments
Closed

Comments

@Rudxain
Copy link
Contributor

Rudxain commented Jul 19, 2022

I'm using the byte iterator to get the bytes from a file, and it seems this post is still relevant.

This is my benchmark code:
Direct:

head -c 10000000 /dev/urandom > rand; time ./xorsum rand

Out:

a22b487c4955107f rand

real    0m7.210s
user    0m2.398s
sys     0m4.810s

Redirect:

head -c 10000000 /dev/urandom > rand; time ./xorsum < rand

Out:

5ae99bfc1279743c -

real    0m0.330s
user    0m0.326s
sys     0m0.004s

xorsum is the crate I'm developing. The code I benched is not exactly the same as the one in my repo (Currently. I'll commit my local clone in some minutes after doing some minor changes)
And yes, I ran cargo build -r to optimize it. Memory-mapping and file-caching shouldn't affect bench results, because I always used random data

Update: permalink to the commit

@Urgau
Copy link
Member

Urgau commented Jul 19, 2022

When using Stdin the input is internally buffered to reduce the number of syscalls required to read the input, but the same isn't done for a bare File, in this case you should use your File with a BufReader::new like the Stdin use internally.

Something like this should do it:

- xor_hasher(std::fs::File::open(&p_a)?.bytes(), sbox)
+ xor_hasher(std::io::BufReader::new(std::fs::File::open(&p_a)?).bytes(), sbox)

@Rudxain
Copy link
Contributor Author

Rudxain commented Jul 19, 2022

@Urgau Thank you so much! I got these timings now:

real    0m0.143s
user    0m0.135s
sys     0m0.008s

But I was wondering, what's the rationale behind the file byte iterator not using a BufReader by default? Is it related to low-level transparency? (like multi-threading being 1:1 mapping)

Rudxain added a commit to Rudxain/xorsum that referenced this issue Jul 19, 2022
Suggested by: rust-lang/rust#99458 (comment)

Fixes (partially) #16
@Urgau
Copy link
Member

Urgau commented Jul 19, 2022

This has nothing to do with the Bytes iterator but more with File not doing buffered reads.

As for the actual reason for why Stdin does buffering I have no idea, I find it a bit strange that it does (maybe it's considered high-lever than File).
As for the reason why File doesn't do buffering, it's mainly because File is just a handle/reference to the "an open file on the filesystem", nothing more nothing less.

@the8472
Copy link
Member

the8472 commented Jul 19, 2022

maybe it's considered high-lever than File

It is. The stdio handles provide thread-safety (e.g. incrementally writing out messages without line-tearing) by wrapping the raw IO in a mutex. And since it already is higher-level it also provides buffering on top.

#78515 aims to make stdout buffering switchable.

@Rudxain
Copy link
Contributor Author

Rudxain commented Jul 19, 2022

Thank you all for the info, I think the issue can be closed for now. If there's any reason to keep it open, please let me know

@Rudxain Rudxain closed this as completed Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants