Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp: fix possible OOM and partial write with large files #6694

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

neyo8826
Copy link

There are 2 fixes:

  • pwrite does not guarantee that all bytes will be written, write_all_at is a proper wrapper over it
  • sparse_copy_without_hole (triggered on ZFS) can eat RAM if file is large and there are not enough holes; now the write happens at max 16 MiB chunks (as far as I tested, it saturates SSD anyways without much CPU)

I have tested it with checking sha256sum of a 4 GiB file.

@@ -141,6 +141,8 @@ where
}
let src_fd = src_file.as_raw_fd();
let mut current_offset: isize = 0;
let step = std::cmp::min(size, 16 * 1024 * 1024) as usize; // 16 MiB
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment to explain why you are doing this

current_offset.try_into().unwrap(),
)
};
for i in (0..len).step_by(step) {
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, please document this a bit more :) (i know it wasn't documented before :)

@sylvestre
Copy link
Sponsor Contributor

thanks
did you do some benchmarking to see the impact on perf?
https://github.com/sharkdp/hyperfine/ is great for this

@neyo8826
Copy link
Author

More docs :)

I have "benchmarked" it with my large file (technically not sparse, but heuristic triggered it I suppose).
I was slower even before implementing the chunking because now it actually writes the whole file (thank god for the checksumming in my build tool, that's how I discovered the bug).
After adding the chunking, the performance stayed the same, it was IO saturated (GCP high perf SSD)
For small files it should behave the same with 1 read and write call.

@sylvestre
Copy link
Sponsor Contributor

any idea of the impact on slow harddrive ?

@neyo8826
Copy link
Author

Well, if the copy is between 2 HDDs, then it shouldnt matter.
If they are on the same, there will be more head movement, and the write cache will matter a lot (8MB to 1GB?)
We do not sync manually, so even the FS cache will help.
I would like to see a proper benchmark with both consumer and enterprise HDDs, but it is not possible for me right now.

16MB seems a lot, it should balance out anything with the FS help (just my opinion)

Copy link

GNU testsuite comparison:

GNU test failed: tests/cp/sparse-2. tests/cp/sparse-2 is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cp/sparse-extents-2. tests/cp/sparse-extents-2 is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
GNU test failed: tests/cp/sparse-extents-2. tests/cp/sparse-extents-2 is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants