Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(experimental) Add partial-wheel-download functionality, to reduce time spent downloading wheels that are eventually discarded and allow parallel downloads #8448

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Jun 16, 2020

TODO

This change improves the time to resolve tensorflow==1.14.0 by 25% when --unstable-feature=shallow_wheels is specified, bringing the v2 resolver's performance up to par with the v1 resolver.

Problem

See discussion in #7819 (comment) and #8442 (comment).

We would like to be able to get dependency information from very large requirements without having to download the very large wheel file first, as that can take several minutes, and the size of the dependency information is extremely small in comparison. If we can avoid blocking on downloading entire wheel files, we can perform all the time-sensitive resolution logic first, and download everything in parallel at the end. Avoiding the download of dependencies in the first place also opens up the possibility of allowing a "dry-run" resolve in the future, as described in #53.

#7819 introduces the concept of "shallowly downloading" a remote wheel file, in order to obtain the contends of its METADATA entry. This can be done by reading the central directory record at the end of the zip file, which contains a listing of the binary offsets of all files in the zip archive next to their filenames (see https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT).

The new unstable resolver makes it very very easy to swap out parts of resolution logic, so we would like to try to "shallowly download" wheel files instead of downloading the entire file to read its metadata entries to make progress in the resolver.

Solution

  • Add pip._internal.network.shallow.{httpfile,zipfile,wheel} which spread out the logic of shallowly downloading wheels into individually-testable modules.
  • Create the ShallowWheelDistribution class and return it from RequirementPreparer.prepare_linked_requirement() when --unstable-feature=shallow_wheels is specified on the command line.

Result

This change improves the time to resolve tensorflow==1.14.0 by 25% when --unstable-feature=shallow_wheels is specified, bringing the v2 resolver's performance up to par with the v1 resolver.

> rm -rf tmp ~/Library/Caches/pip && time pip download --unstable-feature=resolver --unstable-feature=shallow_wheels -d tmp/ tensorflow==1.14.0 &>/dev/null
12.69s user 1.18s system 79% cpu 17.479 total
> rm -rf tmp ~/Library/Caches/pip && time pip download --unstable-feature=resolver -d tmp/ tensorflow==1.14.0 &>/dev/null
15.53s user 3.67s system 87% cpu 22.012 total
> rm -rf tmp ~/Library/Caches/pip && time pip download -d tmp/ tensorflow==1.14.0 &>/dev/null
11.07s user 3.63s system 85% cpu 17.176 total

Note that --unstable-feature=shallow_wheels does not appear to work when used with the v1 resolver. I haven't investigated why, since the v1 resolver is deprecated.

@cosmicexplorer cosmicexplorer marked this pull request as ready for review June 16, 2020 13:38
@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch from e12a7c4 to feca106 Compare June 16, 2020 13:43
@cosmicexplorer
Copy link
Contributor Author

Let me know if people aren't ready for this yet -- I'm not trying to rush anything. I happened to find a neat way to link up the current state of the shallow downloading work to the in-progress unstable resolver in a usable way so I decided to create a PR.

@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch 4 times, most recently from 7debab9 to 5759b82 Compare June 16, 2020 14:49
@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch 4 times, most recently from 1dd0e15 to 1b9dfa9 Compare June 16, 2020 20:48
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Jun 18, 2020
@McSinyx
Copy link
Contributor

McSinyx commented Jun 18, 2020

I'm stopping by to say I'm finalizing the magical lazy file-like object and it's been a lot of fun.

@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch from 1b9dfa9 to addd3ca Compare June 19, 2020 05:43
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jun 19, 2020
@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch 3 times, most recently from c28b3a2 to 1891d79 Compare June 23, 2020 10:00
@cosmicexplorer cosmicexplorer changed the title Add experimental utilities to shallowly download wheel files to improve the performance of pip download Add shallow download utilities under --unstable-feature=shallow_wheels -- brings v2 resolver performance on par with v1 Jun 23, 2020
@cosmicexplorer cosmicexplorer changed the title Add shallow download utilities under --unstable-feature=shallow_wheels -- brings v2 resolver performance on par with v1 Add shallow download utilities under --unstable-feature=shallow_wheels -- makes v2 resolver as fast as v1 Jun 23, 2020
@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch from 1891d79 to 6ce0a30 Compare June 23, 2020 10:50
@cosmicexplorer cosmicexplorer changed the title Add shallow download utilities under --unstable-feature=shallow_wheels -- makes v2 resolver as fast as v1 Add shallow download utilities under --unstable-feature=shallow_wheels for a 25% resolve speedup Jun 23, 2020
@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch from 6ce0a30 to e9c6ec3 Compare June 23, 2020 12:31
make types pass

add --shallow-wheels cli arg

add news

rename news

make the metadata test pass on windows

use --shallow-wheels unconditionally and remove the cli arg

download all wheels at the end of the run

add a hack to avoid signal() erroring in a background thread

avoid using shallow wheels for non-remote file paths

add --unstable-feature=shallow_wheels!
@cosmicexplorer cosmicexplorer force-pushed the add-httpfile-shallow-wheel-download-utils branch from e9c6ec3 to 9df5c8f Compare June 23, 2020 17:17
@pradyunsg pradyunsg changed the title Add shallow download utilities under --unstable-feature=shallow_wheels for a 25% resolve speedup (experimental) Add partial-wheel-download functionality, to reduce time spent downloading wheels that are eventually discarded and allow parallel downloads Jun 23, 2020
@pradyunsg
Copy link
Member

pradyunsg commented Jun 24, 2020

Let's not touch the issue title any more. :)

(hello to the GitHub folks that I've sent here)

@ofek
Copy link
Sponsor Contributor

ofek commented Jun 30, 2020

Need rebase fyi

@McSinyx
Copy link
Contributor

McSinyx commented Jun 30, 2020

Per #8467 (comment), the patch to implement this feature can be reduced down quite a bit lot. @cosmicexplorer, I think there're 3 directions we can go:

  1. You reworking this PR into a smaller patch
  2. You dropping this and file another ticket
  3. You dropping this and I file another ticket

I'm happy either way, but if you're going to take care of this, keep in mind that the parallelization of download still needs some discussion involving the UI/UX (mainly the progress bar), i.e. please don't do it now. Additionally, as mentioned earlier, I'm somewhat employed to do this, so don't feel uncomfortable if you want to leave the task to me. I'm sure people evolving in pip's development and usage are already really appreciate you from sharing the idea to proof of concepts and full-fledged PR on this enhancement.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@pradyunsg
Copy link
Member

@cosmicexplorer Let's close this, as per #7819 (comment)?

@cosmicexplorer
Copy link
Contributor Author

Yes!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants