Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelizing the install process + PoC! #8187

Open
bmartinn opened this issue May 3, 2020 · 56 comments
Open

Parallelizing the install process + PoC! #8187

bmartinn opened this issue May 3, 2020 · 56 comments
Labels
state: needs discussion This needs some more discussion type: enhancement Improvements to functionality

Comments

@bmartinn
Copy link

bmartinn commented May 3, 2020

First of all, kudos guys on the v20+ release 👍 , wheel caching is much better and the addition of cached built wheels is an awesome time saver.

TL;DR
Reference implementation for pip install --parallel , which will download inspect and install packages in parallel, highly accelerating the installation environment of an entire project (or actually any requirements.txt file).
A few speedup numbers (Ubuntu w/ python 3.6):

Requirements variant  | Parallel+D/L | PyPi v20.1+D/L | Speedup incl' Download | Parallel | PyPi v20.1 | Speedup Cached Wheels
--------------------- | ------------ | -------------- | ---------------------- | -------- | ---------- | ----------------------
A                     | 0:58.14      | 1:51.08        | x1.90                  | 0:16.16  | 0:33.57    | x2.08
B                     | 5:47.36      | 6:55.49        | x1.19                  | 0:52.19  | 1:12.07    | x1.37
C                     | 0:56.08      | 1:44.34        | x1.86                  | 0:14.36  | 0:29.21    | x2.01
D                     | 0:36.45      | 1:39.55        | x2.71                  | 0:14.59  | 0:33.20    | x2.22
--------------------- | ------------ | -------------- | ---------------------- | -------- | ---------- | ----------------------
Average               |              |                | x1.91                  |          |            | x1.92

Details:
We heavily rely on pip to set up environments on our ever changing production systems.
Specifically either setting a new virtual environment or inside a docker, the full implementation of our ML/DL agent is open-source and can be found here: https://github.com/allegroai/trains

With the introduction of pip 20.1 we decided to try and parallelize the pip install process in order to save precious spin up time.

  1. The package resolving & download was moved into a ThreadPool (RequirementSet was added a global lock). The only caveat is the download progress bar (more on that down below).

The reason for using Threads is the fact that most of the speedup here is Network and I/O which are parallelized well on python Threads. The second reason is the way RequirementSet is constancy growing with every package discovered, and it's requirements. Sharing the set among all threads
is quite trivial, as opposed to sharing them across processes.

  1. Wheel files (only) installation process moved to a Process Pool, as installing the wheels will not execute any script, and can be parallelized without risk.

The reason for choosing Processes over Threads is the limiting factor here is actually CPU cycles in the installation process. Also the Wheel unpacking and copying is totally independent from one another and the rest of the process, and lastly order has no meaning in this stage as the requirements order is only important when building packages.

  1. Unpacking the wheels was optimized to reuse the unpacked folders, this happens between the inspection part (resolver) on a cached wheel and the installation of the same wheel (essentially what happened was the wheel was unpacked twice)

  2. Solving the parallel progress bar: The progress bar has now a global lock, the first progress-bar to acquire it, will be the one outputting the progress to the tty. Once the progress-bar is done, another instance can acquire the lock and will continue to report from its current progress stage.
    This looks something like:

|████████████████████████████████| 101 kB 165 kB/s 
|█                           | 71 kB 881 kB/s eta 0:00:03

The full reference code can be found here
Usage example:

pip install --parallel -r requirements.txt
@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label May 3, 2020
@bmartinn
Copy link
Author

bmartinn commented May 3, 2020

Benchmark test details.

Take the download (D/L) column time/speed with a grain of salt, download speed is not stable enough between runs, but obviously parallel download is much more efficient.

Requirements variant B is actually somewhat of an outlier, it contains the torch==1.5.0 package, which is a 700+Mb package. Obviously the download and the installation process is limited by the single slowest package, and that is the case for variant B. The reason for having it in the first place, is that we use it quite often (and also TensorFlow, a package almost the same size). For comparison scikit-learn is only a few tens of Mb.
The other variants are a more off-the-shelf mixture of packages.

Notice that in requirements variants A/B/C all the package versions are fixed.
Attached requirement files used for the benchmark:

reqsA.txt
reqsB.txt
reqsC.txt
reqsD.txt

@McSinyx
Copy link
Contributor

McSinyx commented May 4, 2020

Thank you for sharing the benchmark, the design as well as the proof of concept. I am glad to see the positive benchmark for larger (I supposed by estimating the size of others based on that of B, I'd love to have them to test in the future though) collections, compared to the ones tested under GH-3981.

I made a quick run and found that the PoC still have the issue with

We need to find a way to clean up correctly when SIGINT is issued. Since the progress bar is initialized in a different thread we cannot set the signal handler as InterruptibleMixin did before. That functionality is currently commented out and needs to be somehow restored before we merge this.

I am unaware if this is because you and your team has not worked on that or there's still a technical obstacle to be solved. If the latter is the case, I'd love to know what you found in extra to the quote above.

I've also submitted a plan for solving GH-825 as part of GSoC, however it's purely a plan. Theoretically it should work, but I can't be sure that it definitely will in practice. I'd be really grateful if you take a look at it and leave some comments. Note that I'll only know if it's approved by pip's maintainers tomorrow (May 5), so don't take it too seriously.

Given the situation in GH-8169, I wonder how Pool performs under a single-core machine, and is there any perk that we're yet to be aware of. I also suspect multiprocessing.ThreadPool to have same problem with multiprocessing.dummy.Pool, however I'm not sure on how to reproduce (that's why the issue was opened in the first place).

@bmartinn
Copy link
Author

bmartinn commented May 4, 2020

Thanks @McSinyx for the references!

To answer a few of your concerns:

  1. have them to test in the future though) compared to the ones tested under Download requirements in parallel #3981.

Requirement files added here

  1. Notice that Download requirements in parallel #3981 implementation is quite outdated (more than 3 years old), and actually the good people of pypa did a great a job with v20, simplifying the process of package download and dependency resolving. This is the main reason for us to tackle this issue now, as the solution is relatively straight forward.

  2. If you have a single core, Pool should not be used (performance loss is negligible, but why would you create another subprocess, right :).
    See the if statement and program flow here
    You could actually add to the if multiprocessing.cpu_count() > 1 and that will make sure that even if one uses the --parallel flag but has a single core, there is no performance loss.
    The same can be added here

  3. Regrading the Progress Bar signal handler, see here
    Basically only the main thread will replace the handler, any other thread will just ignore it.
    This means that ctrl-c will be caught and processed, while still allowing multiple threads to report progress bars. To be honest, the self.finish() that is called, on the signal handler seems quite redundant, the only thing it does is flush the tty, and return the cursor. Both will happen anyhow when the process quits (just my 2 cents :)...

On a different note, if you are willing to help out with this code and make it PR worthy, please don't be a stranger, feel free to push to the forked repo 😄

@raviv
Copy link

raviv commented May 4, 2020

This is awesome.
Looking forward for the PR.

@uranusjr
Copy link
Member

uranusjr commented May 4, 2020

Also note that pip is in the middle of the process of integrating a new resolver. So make sure you're able to also introduce the parallelism to the new resolver, otherwise all the good work would be lost once the transition is finished.

@bmartinn
Copy link
Author

bmartinn commented May 4, 2020

@uranusjr thanks for the heads up!
Where could I find the new resolver ?

EDIT:
Are you referring to ?
https://github.com/pypa/pip/blob/master/src/pip/_internal/resolution/resolvelib/resolver.py
If so, the implementation is almost the same, should work without any real effort.
How do I make sure I'm using it (seems like the default is "legacy")?

@uranusjr
Copy link
Member

uranusjr commented May 4, 2020

@bmartinn Your observations are correct. The new resolver can be enabled (in 20.1 or later) by passing --unstable-feature=resolver to pip.

@McSinyx
Copy link
Contributor

McSinyx commented May 5, 2020

@bmartinn, thank you for the requirement files as well as explanations 2 and 3. As of the termination, yesterday I notice that after canceling in the middle of parallel download, my cursor disappeared, so there might be something wrong. However, to make the UI user-friendly, I believe that it might be a good idea to only having the main thead to take care of all of the displaying, as in my proposal, which is now accepted (yay, and thank PyPA maintainers and others for commenting during the drafting process).

To @uranusjr as well, I'm not sure how the proposal was reviewed, but if you haven't taken a look at it, could you please scheme through the implementation idea (section 3.3.3) if that fits your vision of coordinating into the new resolver, i.e. does that cause any trouble for the on going implementation of the resolver?

BTW I feel the need of saying that I've just got (physically) back to school this week and is still trying to get used to the new and denser schedule, so please tolerate me if in the next one or two days I am not able to carefully examine things to meaningfully participate in the discussion.

@bmartinn
Copy link
Author

bmartinn commented May 5, 2020

@McSinyx, following your suggestion, I pushed a fix for the Logger handler issue (basically forcing at least one instance to be created on the main thread), let me know if it solved the cursor problem.

@uranusjr I'm testing the --unstable-feature=resolver feature, and it seems to download tar.gz and not .whl files, are you guys switching to tag.gz ?

@uranusjr
Copy link
Member

uranusjr commented May 5, 2020

@bmartinn pip maintains a wheel cache and won’t redownload if you had downloaded a wheel previously. You probably need to supply --no-cache or something.

@bmartinn
Copy link
Author

bmartinn commented May 5, 2020

Apologies @uranusjr I was seeing the future download being tar.gz and apparently there is no wheel for it (it compiles), other packages behave as expected and download .whl files :)
Thanks!

@bmartinn
Copy link
Author

bmartinn commented May 7, 2020

Hi @uranusjr
I updated the code here to support the new Resolver as well as the legacy one. I like the new resolver / resolution design, and I tried to make the parallelization as agnostic as possible, basically there is another resolution resolver running in the background try to guess ahead what will be the package to download.

This design should hold with the continuous development of the new Resolver/Resolution classes.

With your approval I'd like to push it as a PR :)

BTW:
It seems the new resolver is slower then the current default one, but I guess that's reasonable as it looks like it is still under development 👍
Currently with the new experimental Resolver, we still get a x1.7 factor speedup through the parallelization, so at least we have that.

EDIT:
@uranusjr
Do you think it is better to have the ResolverLib parallelization PR to resolvelib or here?

@uranusjr
Copy link
Member

uranusjr commented May 7, 2020

I think it would be a reasonable addition, but some considerations are needed. threading (and by extension parts of multiprocessing) is not a universally available module (#8169), so a fallback to single-thread operation would be needed on such platforms. The main task (IMO) would be to come up with a reasonable abstraction so we don’t end up maintaing two partially duplicated pieces of code.

@bmartinn
Copy link
Author

bmartinn commented May 7, 2020

Sure, we can quite quickly protect the import and disable the feature automatically.
(notice that in my code, the parallelization is an additional flag anyhow)

The main task (IMO) would be to come up with a reasonable abstraction so we don’t end up maintaining two partially duplicated pieces of code.

You mean regrading the ResolverLib ?

@uranusjr
Copy link
Member

uranusjr commented May 7, 2020

I meant regarding within ResolveLib, the parallelisation needs to reduce duplicated resolving logic implementations. In your current implementation, for example, there are two completely separate pieces of code for updating criteria in single-thread and multi-threaded modes, which would be hard to maintain if we want to revise the resolution logic.

@bmartinn
Copy link
Author

bmartinn commented May 7, 2020

Yes @uranusjr you are correct, I put it there just so the execution path is easier to see (i.e. when viewing the diff), I'll extract it out to a function and make sure we call it, obviously easier to maintain :)

Still with that in mind, should it be pushed to pip or ResolverLib repository, it seems you already have a PR updating to the new ResolverLib version, and the parallelization itself is fully contained in the new ResolverLib, so I guess that makes most sense (from maintenance perspective as well), to PR it there and then just have it synced back to pip?!
What do you think ?

EDIT:
@uranusjr I extracted the "inital requirements to criteria" method and added safeguards on multiprocessing (basically disabling the parallelism if multiprocessing is not supported)

@McSinyx
Copy link
Contributor

McSinyx commented May 8, 2020

Hi @bmartinn, it took me quite a while to get back in shape 😄 IIUC, your branch has three separate enhancements:

  1. Parallel install by multiprocessing
  2. Parallel download by multithreading
  3. Cache wheel

I don't have enough understanding to comment on 3 (I like the idea, just haven't carefully read the code), but for the other two:

Parallel install

It's true that the current situation is much better now than in the past where download/resolve/build/install were coupled. I believe there shouldn't be any problem with throwing packages to be installed to a pool since currently we only display Installing collected packages and Successfully installed.

I'd also be nice if we can have parallel build, but I think it's out of scope of this discussion. 3 features in a single branch is already really hard for me to follows, and I believe pip's maintainers might have the same issue.

Parallel download

I have two main concerns over the design:

  1. It seems that every requirement resolution is fed to the pool. Does this risk re-resolve and re-download? Is there anyway to avoid this without refactoring? If it's the case, I'll proceed with my GSoC in a less trivial approach, and I'd love too have some helps on design as well as testing and feedback.
  2. The UX: either if we go with grouping downloads into known-size collections and we can display common progress; otherwise we can do it similar to install output (start and finish download).

@bmartinn
Copy link
Author

bmartinn commented May 8, 2020

TL;DR
See "A few thoughts:" at the bottom

Hi @McSinyx
Yes, your observations are correct :)

  1. Regrading the install process, as you noticed parallelizing the wheel unpacking installation is not a problem to have in multiple process as they are true independent. builds however can replay on other packages in the build process, so building them in parallel is a bit more tricky, hence as a first step I left it serially built (notice that actually there some work that needs to be done there regardless of the parallelization as the build process is not aware of dependencies, but actually it should, especially when installing packages from get repos where you have to build them even if they are pure python, we have seen multiple installations break due to that, and it will be great to see someone tackle the problem ;)
    A short note, the installation process can be very slow if you have multiple packages and if all packages are cached the most time consuming part of the pip install process. It can easily be a 30 sec process before parallelization and a few seconds after...

  2. Regrading the Download process, actually this is tied with the resolving process. When packages are resolved they need to be downloaded in order to get all the dependencies from the wheel files. If you have a list of packages with fixed versions (that you know will not collide), then you only download each package once. But if you have to "resolve" some collisions, you might end-up downloading several versions of the package until you manage to remove all collisions between dependency versions. More specifically the current resolver (under the folder 'legacy' in the master, but still the default one), will try to "optimistically" hope that all versions can co-exist with one another. The parallelization of this resolver is quite trivial after adding some protection mechanisms so that we can can access a collection from multiple threads safely. Since each resolve will download it's own package we essentially parallelize the download process (but also the inspection of the cache etc.) this is quite efficient and easy to maintain.

The new Resolver is more sophisticated and tries to resolve collisions by iterating over the least dependent packages first, then slowly "climb" the decencies tree. The actual implantation comes from this package (resolvelib)[https://github.com/sarugaku/resolvelib].
In order to parallelize the new resolver, we decide on a bit different approach, we have a copy of the Resolution class (that does most of the heavy lifting in terms of downloading / resolving specific versions), then we try to "guess ahead" what the main thread Resolution will need, and get those packages, and their dependencies in advance. Every time the main Resolution instance collects a new package, it will sync its state to the Background Resolution instance, so that the background can better guess what is needed. Notice that the actual decision making is done by the main instance, and the background instance can get download the "wrong" version of the package, but that okay, because it's just in the background, so no harm done :) Then when the main Resolution will need a specific package, it might just download it itself.

UI design. The latest progress bar implementation (again thank you for noticing the hide cursor bug),
will give the user as much information as possible, but still, hopefully, make the log readable.
Starting a new download will be printed in a new line, ending of a background download will be printed as a full progress bar with all the info. and the current download will be printed as the last line (like in serial installation). The only issue with storing "the background downloads and then printing them is the following. 1. the data is currently not passed to the progress bar itself, so it does know which package it's reporting for. 2. If we do not report it being downloaded, it might reduce from transparency...

A few thoughts:

  • I would like to have the "legacy: resolver parallelization and the parallel instillation PR'ed. With the obvious approval from @uranusjr @pradyunsg
  • Regrading the wheel caching, I can't see any reason not have that as well in the PR
  • I think, the main challenge is the new Resolver. Currently it is twice as slow comparing to the "legacy" one, and that is with packages with fixed versions. Even with parallelization introduced it is not very fast. I believe a lot of work can be done there, and since this is the future IMHO this is where the GSoC efforts should be directed to. I don't think that the new Resolver parallelization I have on my branch should be PR'ed at the moment, but it can be a good starting point for you, and I will be happy to help :)
  • UI, I think that your suggstion will be a great addition :)

@McSinyx
Copy link
Contributor

McSinyx commented May 8, 2020

Regarding PRs, please make multiple ones for each feature, as per pip's guideline. Since IMHO we have our mini roadmap here, it'd be easier to discuss with welly-scoped patches attached. Personally I feel that the ones involving multithreading and multiprocessing will not be merged until we have a better understanding and/or consensus over the situation at GH-8169, but the caching should not suffer such delay.

About the progress bar, as an user, frankly I feel that the current mock-up is quite confusing. Even with changing number of download at the same time, we can just redraw the entire screen, but (1) it's complicated and (2) it'd be a mess on half-broken terminals which in my experience are quite common. That being said, it that's the way to go, I'd try my best help making it happen. However, I still have concern over duplicate downloads of future ones, the kind of concern of I don't really know if it's avoidable. We might need to refactor the following to do convenient hashing of URL

_internal.operations.prepare.RequirementPreparer.prepare_linked_requirement:
    _internal.operations.prepare.unpack_url:
        _internal.operations.prepare.get_http_url:
            _internal.operations.prepare._download_http_url  # actual download

and I'm not a fan of while not ready: sleep for a short interval for parallel waiting for the same thing. Also, I start to feel that I'm talking too abstract here and I might not know all what I say, so please split your branch into PRs so we can better discuss.

I'd love to hear elaboration on the parallel build, but this might not be the right place, since there're so many things to be discussed at the same time here already. AFAIK there's no ticket dedicate for it, so feel free to create one (or just tell me to do it).

@bmartinn
Copy link
Author

bmartinn commented May 9, 2020

Thanks @McSinyx sounds like a plan to me :)
btw:
What do you mean by

I'm not a fan of while not ready: sleep for a short interval

Where did you see this pseudo code?

@McSinyx
Copy link
Contributor

McSinyx commented May 10, 2020

Where did you see this pseudo code?

For the full context, what I was trying to say was

I still have concern over duplicate downloads of future ones [...] and I'm not a fan of while not ready: sleep for a short interval for parallel waiting for the same thing

e.g. if A is needed by both B and C and we don't control the collection of packages to be downloaded, then if, say, B initiate the download first, and the progress is not finished when C ask for A too, then the thread resolving C would need to wait for that future A to finish. There could be another way that I'm not aware of but if that's not the case then we need to do void loops to wait for it and personally I don't like it.

@pradyunsg
Copy link
Member

Heads up: This is on my "take a look at" list, but I'll have my hands full right now at least for the next few days, with pip 20.1.1.

@pradyunsg
Copy link
Member

pradyunsg commented May 31, 2020

I'm super excited that @bmartinn has done so much useful work on this, and filed a PR for this as well! It's much appreciated! I'm 100% onboard for doing this.

However, to use Brett Cannon's excellent analogy, you're gifting a puppy here and I'm wondering how accepting this will affect my life once the puppy is all grown up. I really think we should carefully consider how this affects various other parts of pip, before moving too far with this:

  1. consider the overall implications of introducing parallelization in pip (Add utilities for parallelization #8320).
  2. how this affects the "topological" order of the installation (see https://pip.pypa.io/en/stable/reference/pip_install/#installation-order)
  • the important characteristic is "don't end up with a broken installation"
  • we'd likely have to install things at the same "level" -- this should be pretty easy to do with the new resolver, no idea about the older resolver implementation. :)
  1. explore about how this affects warn before overwriting files owned by previously installed wheels #8119 and other similar issues.
  2. the rollout has to be "smooth", with an opt-in initially so that we can get feedback from users who are willing to experiment first, before rolling it out generally.

I do have some thoughts on the specific implementation here (w.r.t. output etc) but I think we should address the more meaty design/technical concerns around the effects of such a change (like the ones above and possibly more!), prior to getting to the implementation/CLI details.

FWIW, 4 is covered by pip's existing --unstable-feature flag already, so I'm not too concerned about that TBH.


One thing I'll note here, is that there's a minor overlap with @McSinyx's GSoC project for this summer here (#825, parallel downloads), in that they both need parallelization. However, beyond the discussion about how pip does parallelization (#8320), I don't think there's any overlap between these two tasks, since the GSoC project is more about what happens during dependency resolution and this specific issue is more about what happens after the dependency resolution process is completed.

@pradyunsg pradyunsg added state: needs discussion This needs some more discussion type: enhancement Improvements to functionality labels May 31, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label May 31, 2020
@bmartinn
Copy link
Author

@chrahunt kudos on the quick toy implementation! very nice!

  1. I totally forgot that one can put a file in the "root" dist package folder. As you pointed, quite nasty...
  2. More to the point, as @pfmoore mentioned, this is not about making the final setup working, there is no guarantee to make it in a usable state (serial or parallel installation). That said, our goal here as in warn before overwriting files owned by previously installed wheels #8119 is to convey the broken state to the user (i.e. output a warning)
  3. @pfmoore you are correct, the definition "race condition" is not very accurate, I think that atomic is more appropriate.
    Specifically if the implementation of "warn if unzipping and overwriting file" is "If exists then warn" the statement itself is not atomic, so two processes can bypass it even though they should not (both context switches after the "if" and before the creation of the file)
  4. Posix file systems support atomic "create file if it does not exist", which is designed to solve exactly these kind of use cases (files and pipe are used quite often as a way to implement locking mechanisms).
  5. I'm not sure how warn before overwriting files owned by previously installed wheels #8119 implemented the warn-if-overwrite, but since the unzipping process of the wheel is actually fully pythonic, we could use os.open("filename", os.O_CREAT | os.O_EXCL) which will fail if the file already exists, and is of course an atomic call.
  6. This means that you will be able to generate a warning if two packages are stepping on the other's files, but the order of the warnings is not guaranteed. For example if we take @chrahunt code, and extend it to have two files in the root folder, then you might end up with a warning that package A overwritten file1 and package B overwritten file2. Nonetheless we will get a warning, and the user will know they have a broken setup.

WDYT?

@chrahunt
Copy link
Member

chrahunt commented Jul 11, 2020

in my view, "race condition" implies "non-deterministic chance of getting a broken result rather than a success"

I think this view is fine - what matters isn't the definition but the fact that we're trying to avoid bad outcomes (for users, for maintenance burden, for debugging).

The situation I gave above was one in which a race condition can occur under a multi-process installation, not a race condition itself. There are places in code that we make assumptions that could be rendered false and lead to failure in the face of multi-process installation. See #8562 (comment) for an explicit example: now if this code overlaps with this code in separate processes then the latter will raise an exception¹. We probably make similar assumptions in the rest of the wheel installation code, the internal utilities it uses, 3rd party libraries we use, and standard library modules we use.

Using code in multiprocessing forces contributors to consider those situations (and more) on an ongoing basis (e.g. development, code review). That can be worth it, but we should be intentional about deciding it.

¹ In the proposed implementation an exception will be swallowed and we will try to install sequentially. IMO that approach leads to a much worse experience, because terminal failure of any one wheel installation will leave all but that one wheel installed. That environment would potentially be much more broken than today.

@pfmoore
Copy link
Member

pfmoore commented Jul 11, 2020

Point taken.

If I now understand correctly, you're flagging a general problem with any form of parallel installation, not just a possible issue with this PR. Which would imply that any proposal to do parallel installs needs to start with (some form of) design document, discussing the trade-offs, integrity guarantees that we want to make, and explicit "undefined behaviour" situations where we don't offer guarantees and consider the use case unsupported.

That seems fair - although in practice, I suspect it will result in parallel installs being postponed for a relatively long time - in my experience open source workflows are not good at handling the sort of broad design questions we're talking about here. I'd love to be proved wrong, though!

@bmartinn
Copy link
Author

@chrahunt I might be missing something here, but with all examples, you will end up with a broken (i.e. overwritten) setup. This is bad, but unavoidable. The only thing you need is to make sure you convey it to the user.
Are we in agreement on this one?

If we are, then again the implementations you pointed are all atomic issues not a race condition1, and yes they should probably be dealt with, I think we can assume Locks are supported. Anyhow I think that at least to begin with, parallel installation / download / resolving, should be turned off by default. It only makes sense to have this feature when pip is used to build an entire environment, and then the speed benefit is huge, and I hope it is safe to assume you do not have packages overwriting one another.

1 Atomic operation assumes correctness outside of the atomic call as long as only a single atomic call is executed on the "system" at a single moment. Race condition is defined when there is no guarantee who will get to point A first, but that does not assume incorrectness of code, if it breaks correctness than this falls under the need for an atomic-section. In your example even introducing an atomic-section will not solve the race-condition, there is no guarantee which package will be installed first, and that is okay, as long as we do not end up with two packages installed without knowing they overwrote one another.

@bmartinn
Copy link
Author

That seems fair - although in practice, I suspect it will result in parallel installs being postponed for a relatively long time

@pfmoore I'm with you on this one and IMHO this is exactly the opposite of the open-source spirit. If you try to "play it safe" and try to grantee all scenarios, you are limiting your ability to introduce innovation. Pip dropping python 2.7 and soon python 3.5 support is a good example, we have to keep moving forward. I think that trying to protected against extreme scenarios which are broken to begin with, is missing the point of speeding up the process. And since I think that these days accelerating the pip install process will affect so many users and systems, as they relay on pip to restore environments and do that very often, this is a notable goal. just my 2 cents :)

BTW:
This is a draft implementation, and should be considered as a proof-of-concept :)

@pfmoore
Copy link
Member

pfmoore commented Jul 11, 2020

IMHO this is exactly the opposite of the open-source spirit. If you try to "play it safe" and try to grantee all scenarios, you are limiting your ability to introduce innovation.

This, I disagree with. The problem here is a combination of compatibility and support, it's nothing to do with an "open-source spirit". Pip is used so widely, and in so many different workflows and environments, that we have to be extremely careful to not break things - that's true just as much for an open source project as for a paid one. IMO, the reason open source has become so popular and common, is precisely because volunteers and open source maintainers care about their projects and users, and take issues like backward compatibility seriously (typically more seriously than many commercial vendors).

Open source gives us more options and flexibility to innovate, but not at the expense of our users.

@chrahunt
Copy link
Member

chrahunt commented Jul 11, 2020

you will end up with a broken (i.e. overwritten) setup. This is bad, but unavoidable. The only thing you need is to make sure you convey it to the user. Are we in agreement on this one?

We disagree here. The point of #8119 is to introduce a deprecation notice, which would turn into a hard failure. So I think a broken setup will be avoidable in that case.

Anyhow I think that at least to begin with, parallel installation / download / resolving, should be turned off by default. It only makes sense to have this feature when pip is used to build an entire environment, and then the speed benefit is huge, and I hope it is safe to assume you do not have packages overwriting one another.

I don't think this addresses the maintenance overhead I mentioned above. Once we support multi-processing here then it's something we need to think about during code review and development, regardless of whether it is the default. On a side note, having code that is not often run increases the chances that we will break it inadvertently.

If you try to "play it safe" and try to grantee all scenarios, you are limiting your ability to introduce innovation.

And that is totally OK! Everything is a trade-off. Having a slightly slower installs that, IMO, are easier to reason about, review code for, contribute code for, and support users using may be more aligned to our desired outcomes.

And since I think that these days accelerating the pip install process will affect so many users and systems, as they relay on pip to restore environments and do that very often, this is a notable goal.

I agree 100%, this is an impactful thing to look into.

This is a draft implementation, and should be considered as a proof-of-concept :)

Noted!

@bmartinn
Copy link
Author

bmartinn commented Jul 11, 2020

@pfmoore I think I failed to explain myself, I'm not saying we should break stuff or stop backwards compatibility, this will be horrible. My point is a proper design document that will cover all angles is challenging and very hard to achieve in the asynchronous workflows of open-source community. This is why so many times these things evolve organically and iteratively, where each step improves the previous one. This is basically my suggestion in regards to parallelization in general. This also connects with @chrahunt point, it might take a few iterations until we achieve stability, and that is exactly why I'm against pushing parallelization as default behavior, I really think this is too complicate to end with a single PR.

We disagree here. The point of #8119 is to introduce a deprecation notice, which would turn into a hard failure. So I think a broken setup will be avoidable in that case.

@chrahunt, my bad, I was not aware this is a warning before instillation, and not during/after. BTW I would imaging that just listing files before unzipping them will take a toll on speed, but of course you can always parallelize the check it self... Anyhow you are most definitely correct, the only way to implement warn before unzip is to test everything before installing. And again I do not understand if #8119 produces warning before the specific wheel in unpacked or should stop the installation before any package is unpacked. But I guess that is a discussion that should be in the #8119 thread.

I don't think this addresses the maintenance overhead I mentioned above. Once we support multi-processing here then it's something we need to think about during code review and development, regardless of whether it is the default. On a side note, having code that is not often run increases the chances that we will break it inadvertently.

It does not address the maintenance issue, and this is always challenging with multiprocessing workloads. That said, this is exactly why I think this feature should only be turned on with a specific flag when installing, since it will take time until it is stable.
I think a step in the right direction was done by @McSinyx #8320 , which is introducing parallelization utilities that will be used throughout the code, in order to solve any atomic-section, races, etc.

@pradyunsg
Copy link
Member

pradyunsg commented Jul 15, 2020

Looks like there's been some spirited meta-esque discussion here, containing words and phrases like "innovation" / "Everything is a trade-off" / "open source spirit" / "flexibility to innovate" etc and on interpretations of meaning of phrases. I don't want to add to that spirited discussion (mostly because I don't have the mental bandwidth for it right now).

I have faith that everyone here will take a moment to step away if they need to, and avoid escalating things.

The new resolver installs dependencies before dependents

Well... the old one did that too! However, that logic is difficult to understand (for me at least), without spending a lot of time frowning at the screen and going "huh?" multiple times over the course of 20 minutes.

(yes, I've spent a substantial amount of time on the old resolver. yes, I am eagerly waiting to delete all of that code.)

in order to solve any atomic-section, races, etc

Those utilities are not going to "solve" race conditions and logical/design issues. They do not to avoid the need to consider the implications of parallel/async programming.

They avoid having the entire codebase riddled with calls to multiprocessing/multithreading etc, and provide a clean way to have fallback-logic when a platform doesn't have support for some parallelization routine. [note 1]


[note 1]: I went too far with an analogy here so I removed all of that text, but I wanted to note that the ending was: A nail gun's inactive-unless-against-a-surface logic isn't going to help if you put it on your foot and press the trigger. It's a nice package that removes the need for directly yielding a hammer, but that doesn't inherently make it always-safe. :)

@pradyunsg pradyunsg changed the title Reference implementation for parallel pip install (multithread / multiprocess) Parallelizing the install process + PoC! Jul 15, 2020
@pradyunsg
Copy link
Member

pradyunsg commented Jul 15, 2020

I think everything I've said in #8187 (comment) is still True. Allow me to use more words what I was trying to convey by referring to Brett's post:

This would be wonderful and I would love if we can make this speedup happen. But, we ought to consider both the benefits and drawbacks of it.

In an ideal case, we'd have no drawbacks when comparing parallel installs to serial installs. Sadly, we're not in an ideal world (gestures at everything) so I think we ought to figure out how parallel installs would differ from serial installs. Knowing this will help us when we're implementing this, reviewing the implementation, planning the rollout and communicating to users during the rollout.

Yes, all this is a lot of work (and perhaps "inertia" even) but I don't want us to release something that'll "not work" for a substantial chunk of the Python community. If someone disagrees that we should not be careful about avoiding disruption, please say so and elaborate on why.

OTOH, I really don't care how we do this -- all I know is that I'm not volunteering to do this work myself. Whether this takes the form of prototyping where we discover things as we go, or a design document that we discuss before starting implementation, or some other workflow -- I really don't care, as long as I can provide feedback before merging the code into master (since master is where we make releases to end users).

I'll note that poetry recently implemented parallel installs and there's ideas that can be borrowed if we want to do so.

@bmartinn
Copy link
Author

Thanks @pradyunsg ! That sounds like a plan to me.
I will gladly start a design document draft, as obviously this feature is dear to my heart :)
What would be the best way to have everyone contribute to the document (I'm not aware of any docs-feature built into GitHub ...) ?
Should I just open a repo with a markdown document, and everyone could edit it via commits ?

@pradyunsg
Copy link
Member

@bmartinn Awesome! Thanks for being willing to champion this! ^>^

If you wanna collaboratively work on a design document, I'd suggest just pick a platform that lets you put text in a place and share it -- a GitHub Gist / HackMD / Etherpad / Google Docs / DropBox Paper etc. :)

The main difference between these platforms is the level of access control for various folks -- I think Google Docs with "view" access would likely work best, since folks can (publicly) suggest edits and make comments on specific parts of the content, while the author/owner is the only person who can actually make those changes. YMMV tho, and as long as it's possible to provide feedback on the content "near" it (i.e. not in this issue), I'm on board. :)

@McSinyx
Copy link
Contributor

McSinyx commented Aug 9, 2020

I'm passing by just to mention that after GH-8562, caching unpacked wheel is no longer needed.

@SmartManoj
Copy link

I'm passing by just to mention that after GH-8562, caching unpacked wheel is no longer needed.

What is the meaning of GH and expansion for other labels?

@McSinyx
Copy link
Contributor

McSinyx commented Oct 4, 2020

It's the same as #8562, but IMHO it looks classier and I grew a habit of using it after watching projects with multiple issue trackers (not at the same time though, but they use GH-* to distinguish with issues from another tracker).

@ctoth
Copy link

ctoth commented Jun 8, 2021

After reading through the discussion here, in #825, #7819, #8320, #8161, #8169, #8448 it's clear that this is a problem that some people want to solve.
It also seems like the core maintainers are somewhat opposed to this feature, or are at least not enthusiastic about it. Specifically, it seems like every one of these threads end rather abruptly with no clear path forward.
How can I best contribute to parallelizing Pip installs?

It seems to me that the amount of time saved multiplied over the number of people who wait for Pip installs makes this a fantastic way that I can find to give people across the world back the one thing we don't ever get more of, time.

@pfmoore
Copy link
Member

pfmoore commented Jun 8, 2021

It also seems like the core maintainers are somewhat opposed to this feature, or are at least not enthusiastic about it.

I think that the core maintainers¹ are painfully aware of all of the pitfalls that may make this problem a lot harder than people expect. This may come across as lack of enthusiasm, but it really isn't. However, we do tend to get burned out by needing to be the ones reining all of the (understandable) enthusiasm of people wanting to contribute to this feature. Nobody likes being the party pooper all the time 🙂

Specifically, it seems like every one of these threads end rather abruptly with no clear path forward.

That's quite possibly true - I haven't reviewed the various threads and issues you link to so I'll take your word on this. But that's likely to be more because the original contributors hit the problem of not having good answers to all of the potential questions, and gave up. That's totally understandable, and one of the biggest problems with any project on the scale of something like pip - larger pieces of work are difficult to complete, because they need more time than the typical "interested volunteer" has available.

How can I best contribute to parallelizing Pip installs?

Well, I don't want to suggest that there's any magic answer here - core maintainer time is extremely limited and you may find that it's hard to get anyone's attention. But if you are interested in giving it a try, here are some suggestions:

  1. Collect together all of the questions that have been brought up and discussed in the various threads (for example, I spotted one about "race conditions" where parallel installs risk breaking the environment if two packages both contain the same file). Having one place where all of the things someone needs to think about can be viewed together would be very helpful, compared to having to read through the content of 6 or more long threads.
  2. Come up with a design document that addresses those concerns. No code needed at this point, just a high level explanation of what can and can't be done in parallel, how we serialise things where needed, how we degrade gracefully on platforms that don't have multiprocessing/threading, etc.
  3. Review the existing proposals/PRs, and work out why they weren't sufficient, and how (or if) they could be improved to address the issues raised. Essentially, work out how we could build on what's already been done, rather than simply starting from scratch again.

If that doesn't burn you out, then we'll have made a good step forward, far better than someone just diving it with another proof of concept.

As I said, though, I think you'll find this is a much harder problem than you imagine. Hopefully, trying to tackle it won't leave you too disillusioned - there are lots of much more tractable problems that could use some work, and we would definitely be glad to see some more enthusiastic contributors pitching in!

Thanks for asking the question, and I'm sorry if this response feels more negative than you hoped... I really would like to see something like this happen, but it won't be a quick project 🙂

¹ I'm speaking for myself mainly, but I imagine the other core devs have similar feelings.

@bmartinn
Copy link
Author

bmartinn commented Jun 8, 2021

@ctoth I'm hoping we are close to merging here: PR #8215 , this is the lowest hanging fruit in terms of parallelization, and it offer great inital improvement over the standard serial installation process.
@pfmoore could you maybe help out? I think we need some guidance there and hopefully we could see this joint effort merged 🤞

@pfmoore
Copy link
Member

pfmoore commented Jun 9, 2021

@pfmoore could you maybe help out? I think we need some guidance there and hopefully we could see this joint effort merged

No, sorry. I'm extremely limited on time for pip at the moment, and what time I have is mostly restricted to drive-by comments on issues (like my comment here 🙂) or if I'm lucky, thinking about stuff that I've been meaning to work on myself.

@bmartinn
Copy link
Author

@pfmoore no worries, I can totally understand 🙂

@tgross35
Copy link

tgross35 commented Dec 7, 2021

@bmartinn sorry for the soft poke but has there been any progress with this? Sad to see the nearly complete PR still open, my docker builds would love the boost :)

@bmartinn
Copy link
Author

bmartinn commented Dec 7, 2021

@tgross35 no worries, I really appreciate the poke, this is how we keep PRs alive :)
I have to admit I kind of lost track of the status of the PR ...
I think we need to decide on next steps in terms of implementation, I remember a few push backs but not really sure if they still apply. If you can help with that (i.e. understand what's missing) I will adapt the code :)
wdyt?

@pfmoore
Copy link
Member

pfmoore commented Dec 8, 2021

Next steps are probably to do the research I noted in this comment above.

@GuillaumeTong
Copy link

Is there anyone still working on this? It's been nearly a year since any activity. The subject of this PR sounds like a very intuitive way to speed up pip processes, and the comments seem to indicate that it is nearly finished. It'd be a shame to let it be wasted.

@pradyunsg
Copy link
Member

pradyunsg commented Oct 21, 2022

See the comment immediately before yours, which provides context on what the next steps here are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: needs discussion This needs some more discussion type: enhancement Improvements to functionality
Projects
None yet
Development

No branches or pull requests