Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock bootstrap (x.py) build directory #88310

Merged
merged 1 commit into from
Jan 1, 2022

Conversation

worldeva
Copy link
Contributor

@worldeva worldeva commented Aug 25, 2021

Closes #76661, closes #80849,
x.py creates a lock file at project_root/lock.db

r? @jyn514 , because he was one that told me about this~

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @jyn514 (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 25, 2021
@worldeva worldeva changed the title Lock for Bootstrap build directory Lock bootstrap (x.py) build directory Aug 25, 2021
@jyn514
Copy link
Member

jyn514 commented Aug 25, 2021

Happy to be responsible for this, but @Mark-Simulacrum will want to take a look too I'm sure.

@jyn514 jyn514 added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Aug 25, 2021
Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this doesn't lock in bootstrap.py, but maybe that's ok - certainly I haven't seen any bugs reported about it.

// terminal 1
$ x check
downloading https://ci-artifacts.rust-lang.org/rustc-builds/db002a06ae9154a35d410550bc5132df883d7baa/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
############################################################################### 100.0%
extracting /home/joshua/src/rust/rust/build/cache/llvm-db002a06ae9154a35d410550bc5132df883d7baa-False/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz

// terminal 2
$ x build
Updating only changed submodules
Submodules updated in 0.02 seconds
downloading https://ci-artifacts.rust-lang.org/rustc-builds/db002a06ae9154a35d410550bc5132df883d7baa/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
###                                                                               4.9%^C

I don't feel qualified to review the flock code, but I also think I may not have to - I left a comment below.

I'm very happy with how this fixes the parallel builds problem :) thank you for tackling it!

@@ -0,0 +1,206 @@
//! Handles creating a file lock to avoid race conditions[1].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the third flock implementation in-tree, there's also one for rustdoc ... maybe we should separate this out into a tiny crate or something? It would be hard to get cargo to use it since it's a submodule, but we could at least share it with rustdoc.

cc @rust-lang/cargo, do you have opinions on this? Would you be willing to separate the flock implementation into a different crate, and maybe put it on rust-lang/flock? I don't think there's any need to publish to crates.io but it would be nice to share at least between different tools.
cc @kinnison @rbtcollins, I'm sure rustup has its own implementation of flock too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-lang/rustup#988 (comment) and the top level summary are probably relevant.

tl;dr: don't want the cargo implementation, it is buggy; but also haven't gotten around to fixing the issue in rustup, which is complected with rust-lang/rustup#2441 making this a little gnarly to think through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. x.py itself probably doesn't need an airtight flock implementation, I wouldn't expect people to be using NFS for builds. I totally get that rustup does have to deal with those scenarios though.

Does rustup's flock implementation have any downsides compared to cargo's? Could we start from that as the shared codebase instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After working a bit on standalone flock implementation for a bit, I wonder why we don't just use fcntl(2)? There is a issue with the lock being able to be acquired by two threads of the same process, but we don't seem to be doing that, and it works over nfs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a issue with the lock being able to be acquired by two threads of the same process

That would be an issue for the flock implementation in rustc itself though as multiple rustc sessions can run inside the same process when using custom drivers. For example rls used to do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an issue for the flock implementation in rustc itself though as multiple rustc sessions can run inside the same process when using custom drivers. For example rls used to do this.

We could avoid that with a mutex, right? So any thread has to unlock a mutex just to call flock? Then you get both per-thread and per-process synchronization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly that flock will always succeed if a file has already been locked elsewhere inside the same process, that would require serializing all rustc sessions such that each session completely finishes until the next session locks any file, even if it isn't the same file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have one for rustup yet; we just have quite clear understanding of the problem space we're dealing with :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also fd-lock, which worked relatively well for me in the past (admittedly it was a small project). https://crates.io/crates/fd-lock/

That said, perhaps you need more control than it gives.

@Mark-Simulacrum
Copy link
Member

So... this might be a little "out there", but it feels simpler than the solution this suggests and avoids duplicating code from Cargo. Both Python 2.7.x and Python 3 natively have sqlite built-in, as far as I can tell, via the sqlite3 module.

That means that we could likely implement this by "just" leveraging sqlite transactions to block. This is sort of overkill, but I'd expect it to work pretty well across any platform or filesystem, since sqlite itself is pretty battle tested.

It's also true that the python script currently manages a bit of state via pretty adhoc files (e.g., for the download stamps). I wouldn't change those today, there's not much point, but in the future it might make sense to leverage a more structured format that sqlite trivially gives us for this storage and without creating a bunch of files all over the place.

I'm curious to hear what folks think of this idea. I haven't actually tried a sample implementation, either, but I'd naively expect it to work out to at most a couple dozen lines of Python -- not too complex. Since sqlite comes pre-compiled for us through python, this seems a nice solution that avoids some of the hassle we'd otherwise run into. If and/or when we migrate away from the python wrapper, we can explore alternatives (such as flock proposed by this PR), but for now this would seem better. It also avoids any parallelism issues in the python script's downloads and such, which also seems good -- I'm not sold those are bug free today.

I do have mixed feelings about this so I could be convinced the impl in this PR is something we should just go with, but if the sqlite-based solution actually works and isn't too hard, then it seems preferable to me personally on the grounds of not having to maintain pretty low-level platform-specific code in bootstrap -- which is a place I'd really prefer to avoid that.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 1, 2021
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 20, 2021
@jyn514
Copy link
Member

jyn514 commented Sep 25, 2021

Hi @worldeva - do you know when you'll get a chance to look at this again?

@apiraino apiraino added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label Oct 14, 2021
@rust-log-analyzer

This comment has been minimized.

@worldeva
Copy link
Contributor Author

worldeva commented Dec 5, 2021

Sorry for the long delay for the tiny commit; life came up >,<
As per suggestion, I've removed all the stuff in bootstrap-proper and moved it over to a sqlite3 based bit in x.py, it hopefully should be good to go now!~

src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved
src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@jyn514 jyn514 removed their assignment Dec 5, 2021
@jyn514 jyn514 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 5, 2021
Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me with the tidy lint fixed :)

src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved
src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved
src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved
src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved
src/bootstrap/bootstrap.py Outdated Show resolved Hide resolved

from time import time

def acquire_lock():
try:
con = sqlite3.Connection("xlock.db", timeout=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this creating the database in the working directory? I think we probably want it in build/ -- build.build_dir probably?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to creating it in build.rust_dir: The lock locks a lot more than just the stuff in build (submodules, etc), and the folder may not exist at the time of lock creation.

@Mark-Simulacrum
Copy link
Member

Mostly just some documentation, but I expect we can move ahead once that's added. It'd be good to do some spot checking locally to confirm this mostly works, and make a note in the documentation (and PR description) the location of the database file, so if there's some bug with this it's easy to delete it.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 7, 2021
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Dec 8, 2021
…Simulacrum

Lock bootstrap (x.py) build directory

Closes rust-lang#76661,  closes rust-lang#80849,
`x.py` creates a lock file at `project_root/lock.db`

r? ``@jyn514`` , because he was one that told me about this~
@matthiaskrgr
Copy link
Member

@bors r-

#91664 (comment)
Failed in a rollup.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 8, 2021
@worldeva
Copy link
Contributor Author

worldeva commented Dec 9, 2021

Should be fixed!~

pass
return curs
except ImportError:
print("warning: sqlite3 not available, skipping lock")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth asking users to file an issue if they see this (on rust-lang/rust) and making it clear that the lock isn't necessary for correctness of the results (something like "build directory lock", maybe)?

@worldeva worldeva force-pushed the bootstrap-locking branch 2 times, most recently from 2742b0d to f25e4bf Compare December 24, 2021 23:43
@worldeva
Copy link
Contributor Author

Done-
The one thing that I'm mildly worried about is "consider filing a issue" makes it seem more important than it is- should I perhaps make it even more explicit with a "this lock is only to prevent multiple concurrent bootstraps from running" or such?
(...Though we're straying into the bikeshedding territory now :p )

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 25, 2021
Prevent spurious build failures and other bugs caused by parallel runs of
x.py. We back the lock with sqlite, so that we have a cross-platform locking
strategy, and which can be done nearly first in the build process (from Python),
which helps move the lock as early as possible.
@Mark-Simulacrum
Copy link
Member

Slightly adjusted wording, but seems good to me otherwise. Thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Dec 31, 2021

📌 Commit 1326bd6 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 31, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 1, 2022
…askrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#88310 (Lock bootstrap (x.py) build directory)
 - rust-lang#92097 (Implement split_at_spare_mut without Deref to a slice so that the spare slice is valid)
 - rust-lang#92412 (Fix double space in pretty printed TryBlock)
 - rust-lang#92420 (Fix whitespace in pretty printed PatKind::Range)
 - rust-lang#92457 (Sync rustc_codegen_gcc)
 - rust-lang#92460 ([rustc_builtin_macros] add indices to format_foreign::printf::Substitution::Escape)
 - rust-lang#92469 (Make tidy check for magic numbers that spell things)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 5a5c928 into rust-lang:master Jan 1, 2022
@rustbot rustbot added this to the 1.59.0 milestone Jan 1, 2022
@worldeva worldeva deleted the bootstrap-locking branch March 25, 2022 23:23
@worldeva worldeva restored the bootstrap-locking branch March 25, 2022 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet