Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against mismatched --requirements-pex. #2392

Merged
merged 3 commits into from
Mar 28, 2024

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Mar 24, 2024

Pex cannot yet handle turning installed wheel chroots back into wheel
files (see #2299). As such, until that time, it must reject attempts
to combine PEXes with different wheel packaging.

Clarifies #2391

Pex cannot yet handle turning installed wheel chroots back into wheel
files (see pex-tool#2299). As such, until that time, it must reject attempts
to combine PEXes with different wheel packaging.

Clarifies pex-tool#2391
@@ -892,6 +892,27 @@ def build_pex(
"Adding distributions from pexes: {}".format(" ".join(options.requirements_pexes))
):
for requirements_pex in options.requirements_pexes:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the Pants-aware reviewers comment on why this option is used by Pants at all? Its an odd duck option in the PEX cli and smells like a hack for Pants sake, but I can't recall exactly why this was deemed necessary. Since PEX_PATH nets you the same thing, but not as a single file, I suspect this must have to do with creating a single file PEX more quickly than it might otherwise be created? I'd love to know a bit more to help evaluate if this can be axed for Pex 3.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pants uses this only to support "local dists" - which is how it supports building (unversioned, unpublished) local native code into wheels (via setup.py) and then packaging those into the final .pex file along with local Python code.

To quote you: "The local PEX is a hack and can be "broken" since its never exposed to or used by end users. It's just there to emulate building a wheel for each local dist and then adding that wheel to another PEX."

So some alternative way to add unpublished wheels directly to a pex would be fine. There is more context in that issue about the perf reasons that led to the current solution, and you have suggested an alternative there that Pants could adopt and is more robust, it just takes some work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the pointers. I'll have to re-read. Clearly though PEX already supports this, you -f dir/ for the dir containing the unpublished wheel and just do a normal resolve, lock, what have you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @benjyw, so after re-reading, Pants gets itself into trouble by not performing a normal resolve. If it's at all possible to:

  1. Build the wheel
  2. Do the resolve with either pex -f find-links/ ... or else pex {normal reqs...} {wheel1} {wheel2} ... {wheeln) ....

That is the sane thing to do, IOW treat the wheels Pants builds like any other with no PEX hacks that attempt to stitch together resolves post-facto. In general, and this includes PEX_PATH, any attempt to cobble together a sys.path outside of running a single unified resolve is asking for all sorts of trouble as a user and all sorts of complication for Pex to support since Pex is not a resolver, it just uses one.

jsirois added a commit to jsirois/pex that referenced this pull request Mar 24, 2024
pex/bin/pex.py Outdated
Comment on lines 899 to 902
die(
"The {option} option was selected but the --requirements-pex "
"{requirements_pex} is built with {opposite_option}. Any --requirements-pex "
"you want to merge into the main PEX must be built with {option}.".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm a bit confused, as a test like the following seems to have matching options but still fails with the same "Is a directory" error:

pex --no-pre-install-wheels -o /tmp/first.pex cowsay
pex --no-pre-install-wheels -o /tmp/second.pex --requirements-pex /tmp/first.pex --layout packed

This error message suggests this should work. Am I misunderstanding it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works in the test in this PR (minus packed, your grid made layout appear irrelevant).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I read poorly. Back to drawing board.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw I don't repro:

:; ./dtox.sh inspect
jsirois@617430d8f0ac:/development/pex$ python -mpex --no-pre-install-wheels -o /tmp/first.pex cowsay
jsirois@617430d8f0ac:/development/pex$ python -mpex --no-pre-install-wheels -o /tmp/second.pex --requirements-pex /tmp/first.pex --layout packed
jsirois@617430d8f0ac:/development/pex$ PEX_SCRIPT=cowsay /tmp/second.pex/__main__.py -t Moo!
  ____
| Moo! |
  ====
    \
     \
       ^__^
       (oo)\_______
       (__)\       )\/\
           ||----w |
           ||     ||
jsirois@617430d8f0ac:/development/pex$
exit

Let me dig a bit more to see if I can gedanken whatever it is you're hitting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah @huonw I've got nothing. The new matrixed test passes locally and in CI as did the by-hand replication above; so I'm not sure what is different about our environments that I'm missing. I think this PR is forward progress though and, afaict, introduces no new bugs, just whittling down the existing buggy cases at worst; so I'll proceed. If you can find maybe a dockerable repro case (I used ./dtox.sh inspect above for this reason - so we can share ~the same env) then I can address in a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, ok - Mac specific issue: https://github.com/pex-tool/pex/actions/runs/8457687342/job/23170224480?pr=2392#step:9:2348

Pain in the butt, but at least I have a repro.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I didn't cross my mind that it might be platform-specific. I'll have some time over the next few days so can help fix if doing loops through CI is too unpleasant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the issue is systemic to --requirements-pex X --no-pre-install-wheels; it's just masked on Linux. I'll add a fix patch to this PR tomorrow.

@jsirois jsirois merged commit 70b6d47 into pex-tool:main Mar 28, 2024
26 checks passed
@jsirois jsirois deleted the issues/2391 branch March 28, 2024 20:15
huonw added a commit to pantsbuild/pants that referenced this pull request Apr 3, 2024
This marks all local dist PEXes as internal-only, removing the ability
for them to be anything but internal. This is almost true already,
except for PEXes built via `PexFromTargetsRequest`, where the local
dists PEX used for building the "real" PEX has the same internal status
as that real PEX. In this case, the local dists PEX still isn't surfaced
to users, so it's appropriate for that one to be internal too.

This will probably be slightly faster in isolation (building a
`pex_binary` that uses in-repo `python_distribution`s will be able to
just copy them around with less zip-file manipulation, more often, by
creating packed-layout PEXes). However, the real motivation is
unblocking #20670, where having this PEX built with
`--no-pre-install-wheels` (as internal-only PEXes will, by default) is
required to support downstream PEXes using that argument, at least until
pex-tool/pex#2299 is fixed.

NB. there's still a separate consideration of changing how local dists
are incorporated, which isn't changed or considered here:
pex-tool/pex#2392 (comment)
huonw added a commit to pantsbuild/pants that referenced this pull request Apr 15, 2024
…s of internal pexes (#20670)

This has all internal PEXes be built with settings to improve
performance:

- with `--no-pre-install-wheels`, to package `.whl` directly rather than
unpack and install them. (NB. this requires Pex 2.3.0 to pick up
pex-tool/pex#2392)
- with `PEX_MAX_INSTALL_JOBS`, to use more concurrency for install, when
available

This is designed to be a performance improvement for any processing
where Pants synthesises a PEX internally, like `pants run
path/to/script.py` or `pants test ...`.
pex-tool/pex#2292 has benchmarks for the PEX
tool itself.

For benchmarks, I did some more purposeful ones with tensorflow (PyTorch
seems a bit awkward hard to set-up and Tensorflow is still huge), using
https://gist.github.com/huonw/0560f5aaa34630b68bfb7e0995e99285 . I did 3
runs each of two goals, with 2.21.0.dev4 and with `PANTS_SOURCE`
pointing to this PR, and pulled the numbers out by finding the relevant
log lines:

- `pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d)
test example_test.py`. This involves building 4 separate PEXes partially
in parallel, partially sequentially: `requirements.pex`,
`local_dists.pex` `pytest.pex`, and then `pytest_runner.pex`. The first
and last are the interesting ones for this test.
- `pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d)
run script.py`. This just builds the requirements into `script.pex`.

(NB. these are potentially unrealistic in they're running with all
caching turned off or cleared, so are truly a worst case. This means
they're downloading tensorflow wheels and all the others, each time,
which takes about 30s on my 100Mbit/s connection. Faster connections
will thus see a higher ratio of benefit.)

| goal                | period                       | before (s) | after (s) |
|---------------------|------------------------------|-----------:|----------:|
| `run script.py`     | building requirements        |      74-82 |     49-52 |
| `test some_test.py` | building requirements        |      67-71 |     30-36 |
|                     | building pytest runner       |        8-9 |     17-18 |
|                     | total to start running tests |      76-80 |     53-58 |
 
I also did more adhoc ones on a real-world work repo of mine, which
doesn't use any of the big ML libraries, just running some basic goals
once.

| goal                                              | period                                  | before (s) | after (s) |    |
|---------------------------------------------------|-----------------------------------------|-----------:|----------:|----|
| `pants export` on largest resolve                 | building requirements                   |         66 |        35 |    |
|                                                   | total                                   |         82 |        54 |    |
| "random" `pants test path/to/file.py` (1 attempt) | building requirements and pytest runner |          1 |        49 | 38 |

Fixes #15062
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants