Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate legacy #egg= requirements to PEP 508 direct URL specifier #9429

Closed
tedivm opened this issue Jan 5, 2021 · 17 comments
Closed

Migrate legacy #egg= requirements to PEP 508 direct URL specifier #9429

tedivm opened this issue Jan 5, 2021 · 17 comments
Labels
C: dependency resolution About choosing which dependencies to install C: direct url Direct URL references (PEP 440, PEP 508, PEP 610) state: awaiting PR Feature discussed, PR is needed type: enhancement Improvements to functionality

Comments

@tedivm
Copy link

tedivm commented Jan 5, 2021

Update by maintainer: See #9429 (comment) for the plan to fix this.

Original issue report below:


Environment

  • pip version: pip 20.3.3
  • Python version: 3.7
  • OS: docker (python:3.7 image)

Description

Our Application uses a "requirements.txt" file that contains a reference to a public github repository-

git+ssh://git@github.com/radaisystems/awsscrubber.git

By itself this works fine.

We recently added another library to it-

git+ssh://git@github.com/radaisystems/awsscrubber.git
git+ssh://git@github.com/radaisystems/private.git

This library has a setup.py file in it that also references the "awsscrubber" library (truncated version)-

setup(
    install_requires = [
        'awsscrubber @ git+https://github.com/radaisystems/awsscrubber.git'
    ]
)

By itself both of these have no problem running. As of pip 2.3.3 (and possibly earlier) this results in an error-

Cannot install awsscrubber 0.0.1 (from git+ssh://****@github.com/radaisystems/awsscrubber.git#egg=awsscrubber) and threeproc because these package versions have conflicting dependencies.
ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

The conflict is caused by:
    The user requested awsscrubber 0.0.1 (from git+https://****@github.com/radaisystems/awsscrubber.git#egg=awsscrubber)
    amalthealib 0.1.0 depends on awsscrubber 0.0.1 (from git+https://github.com/radaisystems/awsscrubber.git)

Despite these both pointing at the same repository, and being recognized by pip as being part of the same project.

To resolve this I've tried a few things-

  1. Add "egg" query values to both repos.
  2. Made sure to use "https" or "git" in both places.

Expected behavior

I expect it to install the awsscrubber library.

How to Reproduce

Steps to reproduce are described above, as is the specific error.

Work Arounds

  • Downgrading pip to an older version resolved the problem without causing any conflicts.
  • Having pip use --use-deprecated=legacy-resolver also removed the problem, again without causing any conflicts.
@sbidoul
Copy link
Member

sbidoul commented Jan 5, 2021

Hi! This should work if you use the exact same URL everywhere, so for instance awsscrubber @ git+https://github.com/radaisystems/awsscrubber.git in both setup.py and requirements.txt. Or declare awsscrubber without URL in setup.py and have name @ URL in requirements.txt (the resolver will give priority to the URL).

A similar issue was discussed in #9304.

@tedivm
Copy link
Author

tedivm commented Jan 5, 2021

Hi! This should work if you use the exact same URL everywhere, so for instance awsscrubber @ git+https://github.com/radaisystems/awsscrubber.git in both setup.py and requirements.txt.

I tried this, but the problem is that pip is changing the URL somewhere (it's adding the #egg query in some places), so it's not a matter of matching the lines but matching what PIP is ultimately going to transform those lines in. Turns out that's a huge page.

Or declare awsscrubber without URL in setup.py and have name @ URL in requirements.txt (the resolver will give priority to the URL).

This solution would break every other application that uses the awsscrubber library.

Is it safe to assume that the team considers this regression, which was not documented anywhere I can find, to be a feature and not a bug? It looks like the other issues opened about this are closed as it's expected to break in this scenario- can you point me to any documentation about this, and what the best practices are for referring to git repositories with pip?

@uranusjr
Copy link
Member

uranusjr commented Jan 5, 2021

I tried this, but the problem is that pip is changing the URL somewhere (it's adding the #egg query in some places), so it's not a matter of matching the lines but matching what PIP is ultimately going to transform those lines in. Turns out that's a huge page.

Can you provide a concrete example? I can’t reproduce this:

$ cat a/setup.py
from setuptools import setup
setup(
    name="a",
    install_requires=[
        "resolvelib @ git+https://github.com/sarugaku/resolvelib.git",
    ],
)
$ pip --version
pip 20.3.3 from ...
$ pip install "resolvelib @ git+ssh://git@github.com/sarugaku/resolvelib.git" "a @ git+ssh://git@github.com/uranusjr/pip-9429-package-a.git"
Collecting resolvelib@ git+ssh://git@github.com/sarugaku/resolvelib.git
  Cloning ssh://****@github.com/sarugaku/resolvelib.git to /private/var/folders/tz/3k1hrl1n60lbpkcf4fgxcjqr0000gn/T/pip-install-60ss39uw/resolvelib_c516ee3fe06c43daa27b8540ca1f24a4
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting a@ git+ssh://git@github.com/uranusjr/pip-9429-package-a.git
  Cloning ssh://****@github.com/uranusjr/pip-9429-package-a.git to /private/var/folders/tz/3k1hrl1n60lbpkcf4fgxcjqr0000gn/T/pip-install-60ss39uw/a_3c1242efac0c4239b3cb7bb98e0b47b6
Building wheels for collected packages: a, resolvelib
  Building wheel for a (setup.py) ... done
  Created wheel for a: filename=a-0.0.0-py3-none-any.whl size=981 sha256=c4b2b2f092589627390e178d231f3abefd0f2f3b68a235cfaab9e56033cd86a4
  Stored in directory: /private/var/folders/tz/3k1hrl1n60lbpkcf4fgxcjqr0000gn/T/pip-ephem-wheel-cache-5wdd72_g/wheels/35/03/70/48bc4b7c45537e7754a0d240ceb47b633ce3f49c968901f8cb
  Building wheel for resolvelib (PEP 517) ... done
  Created wheel for resolvelib: filename=resolvelib-0.5.5.dev0-py2.py3-none-any.whl size=12875 sha256=2f073c482f93b6090630ebf73758d9a0d2414a6cc6fcf37fab0e82aaf88cac9c
  Stored in directory: /private/var/folders/tz/3k1hrl1n60lbpkcf4fgxcjqr0000gn/T/pip-ephem-wheel-cache-5wdd72_g/wheels/33/90/17/05634857971f708b1b31ffc26ab2edbf35bad3df2697db330e
Successfully built a resolvelib
Installing collected packages: resolvelib, a
Successfully installed a-0.0.0 resolvelib-0.5.5.dev0

pip does not change the URL for me, and installation finishes as expected.

The test package a is available here.

@uranusjr uranusjr added S: awaiting response Waiting for a response/more information state: needs reproducer Need to reproduce issue labels Jan 5, 2021
@atugushev
Copy link
Contributor

@uranusjr

I've caught the same bug while I was working on the new resolver for pip-tools. Here is the reproducer:

# requirements.txt

https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz
https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz#egg=six
$ pip uninstall six -y; pip install -r requirements.txt --no-cache
Found existing installation: six 1.15.0
Uninstalling six-1.15.0:
  Successfully uninstalled six-1.15.0
Collecting https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz (from -r requirements.txt (line 1))
  Downloading six-1.15.0.tar.gz (33 kB)
Collecting six
  Downloading six-1.15.0.tar.gz (33 kB)
ERROR: Cannot install six 1.15.0 (from https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz#egg=six) and six 1.15.0 (from https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz) because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested six 1.15.0 (from https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz)
    The user requested six 1.15.0 (from https://files.pythonhosted.org/packages/6b/34/415834bfdafca3c5f451532e8a8d9ba89a21c9743a0c59fbd0205c7f9426/six-1.15.0.tar.gz#egg=six)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

@uranusjr
Copy link
Member

uranusjr commented Jan 5, 2021

As mentioned above, this should work if you use the exact same URL everywhere. pip does not change the URL, so it’d fail if you add the #egg= part, and it’s succeed if you don’t. Your example is showing how it’d fail if you use different URLs.

@atugushev
Copy link
Contributor

atugushev commented Jan 5, 2021

Thanks for the clarification! Hmm, It could be an issue if sub-deps have cross dependencies with and without egg parts.

@tedivm
Copy link
Author

tedivm commented Jan 5, 2021

PIP should ignore the egg dependency when it matches the package name, otherwise groups (possibly in different companies) are going to have to coordinate how they define dependencies. The egg=PACKAGENAME is basically a noop and should be treated as such when making comparisons.

If the answer is that everyone should standardize on the packagename @ git+https://github.com/user/packagename.git instead of git+https://github.com/user/packagename.git#egg=packagename then I think this needs to be announced somewhere as the preferred standard. Otherwise it'll end up causing conflicts now that this NOOP is meaningful and enforced with a hard string comparison.

That being said, if you take a super close look at the error message I posted above there's this really interesting bit-

git+ssh://****@github.com/radaisystems/awsscrubber.git#egg=awsscrubber

That exists there despite the fact that I did not use an #egg statement at all. From what I can gather pip is taking the non-egg URL I supplied and rewriting it to include the egg somewhere.

@no-response no-response bot removed the S: awaiting response Waiting for a response/more information label Jan 5, 2021
@sbidoul
Copy link
Member

sbidoul commented Jan 5, 2021

From what I can gather pip is taking the non-egg URL I supplied and rewriting it to include the egg somewhere.

@tedivm could you provide a reproducer for that ? I have tried and could not reproduce, and I don't know a place in the codebase where pip would add the egg fragment by itself (which does not mean such a place could not be hidden somewhere :)

Here is what I tried from the publicly accessible part of your example, with pip 20.3.3:

$ cat setup.py
from setuptools import setup

setup(
    install_requires = [
        'awsscrubber @ git+https://github.com/radaisystems/awsscrubber.git'
    ]
)

$ cat reqs.txt
git+ssh://git@github.com/radaisystems/awsscrubber.git

$ pip install . -r reqs.txt
Processing /tmp/brol/awsscrubber
Collecting git+ssh://****@github.com/radaisystems/awsscrubber.git (from -r reqs.txt (line 1))
  Cloning ssh://****@github.com/radaisystems/awsscrubber.git to /tmp/pip-req-build-whuy95t5
Collecting awsscrubber@ git+https://github.com/radaisystems/awsscrubber.git
  Cloning https://github.com/radaisystems/awsscrubber.git to /tmp/pip-install-c4c_jhod/awsscrubber_866d3becdc3547519ec83c47573cd081
INFO: pip is looking at multiple versions of awsscrubber to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install awsscrubber 0.0.1 (from git+ssh://****@github.com/radaisystems/awsscrubber.git) and unknown==0.0.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested awsscrubber 0.0.1 (from git+ssh://****@github.com/radaisystems/awsscrubber.git)
    unknown 0.0.0 depends on awsscrubber 0.0.1 (from git+https://github.com/radaisystems/awsscrubber.git)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

.

@sbidoul sbidoul added the S: awaiting response Waiting for a response/more information label Jan 6, 2021
@scullionw
Copy link

scullionw commented Jan 7, 2021

I had the same problem, and replacing #egg=package with package @ in both requirements.txt and setup.py fixed it for me.

This is non obvious behaviour though and there was no clue given in the pip error message.. In fact the first problem was the requirement installation taking absurd amounts of time and never finishing. It was only when debugging, when I removed all deps from requirement.txt except the last 2 that I started getting the conflicting dependency error.

So just for clarity, I'm was trying to install private package A, that depends on private package B and private package C. However B also depends on C. I was getting a conflicting dependency error, which would make sense if A needed a different version of C than B does, but this wasn't the case. It was the same version.

As I stated above, I fixed it by going from

packageA's requirements.txt:

git+ssh://git@github.com/Org/Repo@branch#egg=packageC
packageB

packageB's setup.py

install_requires=[
    packageC @ git+ssh://git@github.com/Org/Repo@branch#egg=packageC
]

to

packageA's requirements.txt:

packageC @ git+ssh://git@github.com/Org/Repo@branch
packageB

packageB's setup.py

install_requires=[
    packageC @ git+ssh://git@github.com/Org/Repo@branch
]

@uranusjr
Copy link
Member

uranusjr commented Jan 7, 2021

Special handling of #egg= sounds to me like the most reasonable thing pip can do. The #egg= fragment has traditionally been treated as a special case by various tools (e.g. setuptools), so any reasonable existing source wouldn’t give it meanings that breaks these legacy tools. We can add appropriate deprecating warnings to signal people to drop it and/or switch to PEP 508 syntax.

  1. If a requirement’s does not contain an egg fragment, use it as it.
  2. If a requirement contains an egg fragment, and is not in PEP 508 format, convert it into PEP 508 with a warning.
  3. If a PEP 508 direct URL requirement’s URL part contains an egg fragment containing an identical package name, from the egg fragment with a warning.
  4. If a PEP 508 direct URL requirement’s URL part contains an egg fragment containing a different package name, fail with an error (requirement invalid).

Does this sound reasonable?

@sbidoul
Copy link
Member

sbidoul commented Jan 8, 2021

@uranusjr I'm onboard with deprecating egg fragments, , with a looong deprecation period. I'm also on board with erroring if there is a mismatch between the PEP 508 name and the egg fragment.

  1. If a requirement contains an egg fragment, and is not in PEP 508 format, convert it into PEP 508 with a warning.

Can you elaborate on the kind of conversion you have in mind ?

Also, I think egg fragments remain required for editables in some cases ? Do we want, at some point, to support -e "name @ url" or -e "name @ localdir` (which is not PEP 508 compliant but necessary for editables) ? If we start emitting deprecations for egg fragments, an alternative for editables could be nice, for consistency.

@uranusjr
Copy link
Member

uranusjr commented Jan 8, 2021

Can you elaborate on the kind of conversion you have in mind ?

Something like

def _convert_to_pep508(url):
    egg_frag_pat = re.compile(r"[#&]egg=([^&]*)")
    parts = urllib.parse.urlparse(url)
    frag_match = egg_frag_pat.search(parts.fragment)
    if frag_match is None:
        return None
    parts_no_egg = parts._replace(fragment=...)  # String magic to remove the egg= part from parts.fragment
    return f"{frag_match.group(1)} @ {urllib.parse.urlunparse(parts_no_egg)}"

Also, I think egg fragments remain required for editables in some cases ?

Ah yes, editables. The parse_editable function really needs a revamp. name @ localdir is probably not needed because pip install -e localdir already works without egg=, so all we need is to make it recognise PEP 508.

@sbidoul
Copy link
Member

sbidoul commented Jan 8, 2021

Something like

Sorry I misphrased my question. I rather meant to ask when would we need to do such a conversion.

Regarding editable, I think the name is required if you want to do pip install -e ./a#egg=pkga -e ./b#egg=pkgb where pkgb depends on pkga. (I have not checked with the new resolver though)

@uranusjr
Copy link
Member

uranusjr commented Jan 8, 2021

when would we need to do such a conversion

I’m thinking when a Link is created for an InstallRequirement, IOW in pip._internal.req.constructors.

I believe the new resolver should be smart enough to figure out package names in your case. There may be some hard-checks somewhere (the current logic is a bunch of ad-hoc checks scattered all over the place), so some refactorings may be needed to clear the code path for the new resolver.

@sbidoul
Copy link
Member

sbidoul commented Jan 9, 2021

I believe the new resolver should be smart enough to figure out package names

I confirm it is. So there there are now alternatives to egg fragments for all use cases and we can deprecate it.

@uranusjr
Copy link
Member

uranusjr commented Jan 9, 2021

Nice. I’m modifying this issue to track the migration described in #9429 (comment)

@uranusjr uranusjr changed the title PIP is unable to install a public package from github if a dependency also refers to it Migrate legacy #egg= requirements to PEP 508 direct URL specifier Jan 9, 2021
@uranusjr uranusjr added C: dependency resolution About choosing which dependencies to install C: direct url Direct URL references (PEP 440, PEP 508, PEP 610) state: awaiting PR Feature discussed, PR is needed type: enhancement Improvements to functionality and removed S: awaiting response Waiting for a response/more information state: needs reproducer Need to reproduce issue labels Jan 9, 2021
@uranusjr
Copy link
Member

We actually came to the same conclusion in #1289. So I’ll use that to track instead since it contains a bit more cases to support the deprecation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: dependency resolution About choosing which dependencies to install C: direct url Direct URL references (PEP 440, PEP 508, PEP 610) state: awaiting PR Feature discussed, PR is needed type: enhancement Improvements to functionality
Projects
None yet
Development

No branches or pull requests

5 participants