Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use remove_dir_all in cargo clean to resolve race condition #11442

Merged
merged 5 commits into from
May 31, 2023

Conversation

trevyn
Copy link
Contributor

@trevyn trevyn commented Nov 30, 2022

Fixes #11441

@rustbot
Copy link
Collaborator

rustbot commented Nov 30, 2022

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ehuss (or someone else) soon.

Please see the contribution instructions for more information.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 30, 2022
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it might introduce a bit more readdir calls underneath the hood, though I am fine with that. It was using remove_dir_all back to the day before this got merged.

@epage
Copy link
Contributor

epage commented Dec 1, 2022

I believe it might introduce a bit more readdir calls underneath the hood, though I am fine with that. It was using remove_dir_all back to the day before this got merged.

I had vague memories of another PR related to it but it just touched test code (#11333).

The thing that doesn't quite feel right to me is that this feels like a hack: "If new contents are added to the "removed" directory, let's try removing the contents again". How many "agains" are sufficient?. This is inherently a race condition that we can't win theoretically though we might be able to make it work practically. I'm just wondering if its an effort that is worth it.

@weihanglo
Copy link
Member

This is inherently a race condition that we can't win theoretically though we might be able to make it work practically.

My original point is back in the day Cargo didn't have a progress bar for cargo clean, it just remove_dir_all regardlessly. Despite that, I completely agree with you.

Just give a data from mine: I use cargo clean a lot everyday on macOS and never seen this issue.

@epage
Copy link
Contributor

epage commented Dec 1, 2022

My original point is back in the day Cargo didn't have a progress bar for cargo clean, it just remove_dir_all regardlessly. Despite that, I completely agree with you.

It did a remove_dir_all on the root which then does the same walking that the progress bar code does.

This brings up an interesting thought: This change is effectively a remove_dir_all that does another remove_dir_all on failure. If we feel this is needed here, do any of the other cases where we call remove_dir_all need this error recovery?

@trevyn
Copy link
Contributor Author

trevyn commented Dec 1, 2022

Apparently this is also hitting Windows: brson/wasm-opt-rs#116 (comment)

And the rabbit hole goes deep: rust-lang/rust#29497

Would it make sense to accept that removing contents and then calling remove_dir does not reliably remove a directory, and delegate any necessary complexity to std::fs::remove_dir_all?

In which case, paths::remove_dir_all is actually doing the same manual "remove contents then call fs::remove_dir" thing, instead of calling std::fs::remove_dir_all.

@ChrisDenton
Copy link
Member

ChrisDenton commented Dec 1, 2022

Note that, as far as I know, cargo clean was persistently failing in that case even with retries, where as std::fs::remove_dir_all succeeds first time. But unfortunately I've not had time to investigate the issue further. Also note that the Windows std code has improved a fair bit more since I last commented on rust-lang/rust#29497 (though there are still some more improvements to be made).

But yeah, remove_dir_all is highly specialized for different platforms (especially in light of CVE-2022-21658) whereas I guess the cargo code isn't?

@trevyn
Copy link
Contributor Author

trevyn commented Dec 24, 2022

r? @weihanglo

@rustbot rustbot assigned weihanglo and unassigned ehuss Dec 24, 2022
@weihanglo
Copy link
Member

But yeah, remove_dir_all is highly specialized for different platforms (especially in light of CVE-2022-21658) whereas I guess the cargo code isn't?

It is only a naive filesystem walk and remove. The purpose of Cargo doing this is to display the progress bar if needed and show info in verbose mode. There might be ways to improve but I don't see any alternative can remove Walkdir entirely.


@trevyn. The question epage raised might need a response to clarify in what situation a remove failure needs a recovery. I don't have a particular preference though.

This brings up an interesting thought: This change is effectively a remove_dir_all that does another remove_dir_all on failure. If we feel this is needed here, do any of the other cases where we call remove_dir_all need this error recovery?

@ChrisDenton
Copy link
Member

@weihanglo the cargo way has better progress bar and error messaging which is great. The standard library can't offer that currently (though I would like to expose some of the primitives so crates can build upon them in the future). But I was more responding to @epage concern that this would be basically just a remove_dir_all that calls remove_dir_all on failure. My point was that these methods are in practice different enough to warrant the attempt. Even if the fallback is, sadly, less user friendly whatever its advantages.

@rbtcollins
Copy link
Contributor

FWIW I'm very happy to work with folk to extend the remove_dir_all crate to have appropriate callbacks/what-have-you so that the progress bar can be built as the deletion proceeds without double-handling anything. Note that the cost of building the progress bar is itself most of the time of the deletion, so walking upfront isn't particular beneficial IMO.

@rbtcollins
Copy link
Contributor

Two other thoughts.

The 0.5 and lower versions of the remove_dir_all crate renamed deleted entries into a temp dir (in order to avoid retries on files that don't get removed from the directory atomically). This would have the effect that programs like rust-analyzer doing concurrent compilations into the target that is being cleaned would get an error, and recreate the deleted directory, rather than the remove-dir-all invocation itself failing ... until the unlink of the root directory itself.

If programs work with _at style calls, this approach doesn't help, because they will end up writing to the renamed directory. OTOH anything that is spawning lots of processes - like rust-analyzer - will invariably end up operating on paths, as passing capabilities around is very uncommon so far.

So one possible approach would be to :

  • rename the top level paths being deleted to 'path.clean_'
  • use the remove-dir-all implementation of choice to delete the newly named old directory
  • recover from interrupts by deleting any existing prefix.clean* paths in target/ as well as the current things to clean.

This would cause new compilations to operate in a fresh directory, and should be quite robust. We could put this back in the remove_dir_all crate, but theres enough app-specific logic there that I suspect just doing the rename and interrupt-safety work outside the recursive-delete workhorse would make sense.

@HamishPoole
Copy link

HamishPoole commented May 10, 2023

The thing that doesn't quite feel right to me is that this feels like a hack: "If new contents are added to the "removed" directory, let's try removing the contents again". How many "agains" are sufficient?. This is inherently a race condition that we can't win theoretically though we might be able to make it work practically. I'm just wondering if its an effort that is worth it.

This brings up an interesting thought: This change is effectively a remove_dir_all that does another remove_dir_all on failure. If we feel this is needed here, do any of the other cases where we call remove_dir_all need this error recovery?

Agree with @epage on this. Ran into this issue, #11441

Given a similar issue is in #11872

Root cause is clearly not MacOS related (still stumped why .DS_Store appeared in my target directory).

std::fs::remove_dir
std::fs::remove_dir_all
and
cargo_util::paths:remove_file

All clearly seem to fail on multiple platforms without logging a proper error.

error: could not remove build directory

Caused by:
  failed to remove directory `/Users/e/Documents/GitHub/turbo/target/wasm32-unknown-unknown`

Caused by:
  Directory not empty (os error 66)

std::remove_dir should panic here IMO. It doesn't. Adding panics to cargo clean is the wrong level of abstraction.

Fixing std::remove_dir_all to panic if it fails to remove a directory is the best solution I think.

As per #11441 (comment), the Finder issue was impossible for me to replicate.

@trevyn
Copy link
Contributor Author

trevyn commented May 28, 2023

@weihanglo : To address @epage 's question:

This change is effectively a remove_dir_all that does another remove_dir_all on failure. If we feel this is needed here, do any of the other cases where we call remove_dir_all need this error recovery?

in light of @ChrisDenton 's observation that:

[fs::]remove_dir_all is highly specialized for different platforms

I've added a fallback from paths::remove_dir_all to fs::remove_dir_all on error.

Please let me know if this is suitable, and if you have any other concerns!

@trevyn trevyn requested a review from weihanglo May 28, 2023 16:28
@epage
Copy link
Contributor

epage commented May 30, 2023

I've added a fallback from paths::remove_dir_all to fs::remove_dir_all on error.

I've not refreshed myself on the context of this PR but if there is a stronger variant for us to fallback to that will do the right thing for us with maybe some degraded quality of experience, that makes sense to me.

src/cargo/ops/cargo_clean.rs Outdated Show resolved Hide resolved
Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

As a side note, we still look for a better way around remove_dir_all. The performance of the ad-hoc implementation in Cargo is not pretty good, especially for walking the filesystem.

@weihanglo
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented May 31, 2023

📌 Commit ac3ccdd has been approved by weihanglo

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 31, 2023
@bors
Copy link
Collaborator

bors commented May 31, 2023

⌛ Testing commit ac3ccdd with merge 58b22f0...

@weihanglo weihanglo changed the title cargo clean: use remove_dir_all fix: use remove_dir_all in cargo clean to resolve race condition May 31, 2023
@bors
Copy link
Collaborator

bors commented May 31, 2023

☀️ Test successful - checks-actions
Approved by: weihanglo
Pushing 58b22f0 to master...

@bors bors merged commit 58b22f0 into rust-lang:master May 31, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 4, 2023
Update cargo

14 commits in f7b95e31642e09c2b6eabb18ed75007dda6677a0..b0fa79679e717cd077b7fc0fa4166f47107f1ba9
2023-05-30 19:25:02 +0000 to 2023-06-03 14:19:48 +0000
- Emit error when users try to use a toolchain via the `add` or `install` command (rust-lang/cargo#12226)
- Support "default" option for `build.jobs` (rust-lang/cargo#12222)
- Fix typo in changelog (rust-lang/cargo#12227)
- chore: Sort `-Z` flags match statement (rust-lang/cargo#12223)
- Update curl-sys (rust-lang/cargo#12218)
- Bump to 0.73.0; update changelog (rust-lang/cargo#12219)
- refactor: housekeeping for 1.70.0 (rust-lang/cargo#12217)
- nit: fix typo in changelog for 1.70 (rust-lang/cargo#12215)
- Remove a noop `.clone` (rust-lang/cargo#12213)
- refactor: compiler invocations (rust-lang/cargo#12211)
- cargo clean: use `remove_dir_all` (rust-lang/cargo#11442)
- Add a small note about indexes ignoring SemVer build metadata. (rust-lang/cargo#12206)
- Revert "chore: detect the channel a PR wants to merge into" (rust-lang/cargo#12204)
- Don't distinguish `Debuginfo::None` and `Debuginfo::Explicit(None)` (rust-lang/cargo#12205)

r? `@ghost`
RalfJung pushed a commit to RalfJung/miri that referenced this pull request Jun 11, 2023
Update cargo

14 commits in f7b95e31642e09c2b6eabb18ed75007dda6677a0..b0fa79679e717cd077b7fc0fa4166f47107f1ba9
2023-05-30 19:25:02 +0000 to 2023-06-03 14:19:48 +0000
- Emit error when users try to use a toolchain via the `add` or `install` command (rust-lang/cargo#12226)
- Support "default" option for `build.jobs` (rust-lang/cargo#12222)
- Fix typo in changelog (rust-lang/cargo#12227)
- chore: Sort `-Z` flags match statement (rust-lang/cargo#12223)
- Update curl-sys (rust-lang/cargo#12218)
- Bump to 0.73.0; update changelog (rust-lang/cargo#12219)
- refactor: housekeeping for 1.70.0 (rust-lang/cargo#12217)
- nit: fix typo in changelog for 1.70 (rust-lang/cargo#12215)
- Remove a noop `.clone` (rust-lang/cargo#12213)
- refactor: compiler invocations (rust-lang/cargo#12211)
- cargo clean: use `remove_dir_all` (rust-lang/cargo#11442)
- Add a small note about indexes ignoring SemVer build metadata. (rust-lang/cargo#12206)
- Revert "chore: detect the channel a PR wants to merge into" (rust-lang/cargo#12204)
- Don't distinguish `Debuginfo::None` and `Debuginfo::Explicit(None)` (rust-lang/cargo#12205)

r? `@ghost`
@ehuss ehuss added this to the 1.72.0 milestone Jun 22, 2023
bors added a commit that referenced this pull request Aug 15, 2023
[beta-1.72.0] bump cargo-util to 0.2.5

#11442 changed `cargo-util` at the last minute but we failed to bump the version.

And due to #12347, version bump check didn't work well at that time.

In order to make CI pass, the following PRs are also cherry-picked:

* #12450
* #12491
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Command-clean S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cargo clean: error: could not remove build directory
10 participants