Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evolving the pip list json output #11223

Closed
sbidoul opened this issue Jul 2, 2022 · 9 comments · Fixed by #11245
Closed

Evolving the pip list json output #11223

sbidoul opened this issue Jul 2, 2022 · 9 comments · Fixed by #11245
Labels
state: needs discussion This needs some more discussion

Comments

@sbidoul
Copy link
Member

sbidoul commented Jul 2, 2022

Currently, pip list --format=json produces a json array.

I think it could be valuable for pip list to output additional information, such as an environment object with PEP 608 environment markers, etc. Such a change would require the pip list output to evolve in a backward incompatible way, in this case because of the current array output that would need to be converted to an object.

It is to be noted that such back-ward incompatible changes will probably be very infrequent, since we'll presumably only want to add new information to the output. An example of backward compatible change is #11097 which adds a metadata property to the list items.

How should such backward-incompatible changes be addressed ?

  • with a new format option such as --format=json+v2 ?
  • with a new commend such as pip inspect ?
@uranusjr
Copy link
Member

uranusjr commented Jul 4, 2022

I don’t like json+v2 since it implies we’ll lose the prettier json key forever. Perhaps --use-feature=list-json-2022 is a better way to transition?

Introduing pip inspect is a good idea if it can have more use cases, but I can’t think of any right now.

@zhuofeng6
Copy link

in my opinion, json+v2 is not a good format, the symbol of '+' is confused, maybe --format=json-list or --format=list-json, --format=json-v2 is better.

@pfmoore
Copy link
Member

pfmoore commented Jul 8, 2022

Regarding incompatible changes to the JSON format, I'd prefer --format=json to remain the way of requesting JSON output permanently. We don't version other --format options (there's no expectation that we'd ever use a --format=columns-v2 flag, for example). If we need to make a backward incompatible change, I'd suggest that either we just follow our normal compatibility processes (use --use-feature for the transition) or we introduce a --format-version option that works the same way (allow 1 and 2 with 1 the default, then make 2 the default, then drop support for 1). I'm -1 on having more than a single version of the format, except over a transition period.

But equally, I would say that adding environment data doesn't really fit the remit of pip list, so a new command (I'd call it something very generic like pip info - "inspect" feels like querying a package to me) would be better for that particular data. Like @uranusjr though, I'd prefer it to have a bit more utility before adding it, though.

@sbidoul
Copy link
Member Author

sbidoul commented Jul 8, 2022

I'd prefer it to have a bit more utility before adding it, though.

The reason I would like to add environment in the pip list output, goes towards providing the necessary CLI to create higher level environment management tools that are installed in a different environment than the target environment they manage, and only relying on pip being present in the target environment.

An important interface is querying what is installed. pip list --format=json, with an additional metadata field (##11097) and perhaps a few other additional fields, does that nicely.

Now, to further analyze the metadata in the context of the target environment, and in particular to process requires_dist metadata, it is important to have access to the environment values in order to evaluate environment markers.

@pfmoore
Copy link
Member

pfmoore commented Jul 8, 2022

Sorry, what I meant was that I'd like a pip inspect/info command to return more data than just "the environment markers". Particularly as I'm not 100% sure what you had in mind for "the environment markers" and why that needs to be a pip command rather than a standalone utility. As far as I can see, pip mostly just uses the defaults from packaging when evaluating markers, so there's nothing particularly pip-specific here (unless I'm misunderstanding).

Anyway, this is separate from the question of "evolving the pip list json output". My view on that is:

  1. pip list returns a list of what's installed, and I'm happy to extend this by adding more "per project" data, but adding data that is global and not per project feels out of scope for pip list and should be a separate command.
  2. What such a command might look like can be debated when a concrete proposal is put forward.
  3. The "array of one object per project" format seems pretty robust for the core functionaility of pip list (as I describe it in (1)).
  4. But in any case, evolving the pip list output can be done with our existing mechanisms (--use-feature).

If having to call pip multiple times to get the information that tools need is too costly, maybe the information gathering code should be broken out into a standalone library which can be called from the sorts of tool you're imagining, and pip can vendor that library. In all honesty, I think that's the best way forward in general - we really should be pushing as much functionality as we possibly can into reusable libraries, rather than locking people into calling pip as a CLI.

@sbidoul
Copy link
Member Author

sbidoul commented Jul 8, 2022

I'd like a pip inspect/info command to return more data than just "the environment markers".

Can you elaborate what you have in mind in terms of additional data ?

Particularly as I'm not 100% sure what you had in mind for "the environment markers"

You can see it in the example provided in #10771.

we really should be pushing as much functionality as we possibly can into reusable libraries,

Agreed, and in this case, the library exists (packaging) and using it to get environment information is a one liner as illustrated in #10771.

rather than locking people into calling pip as a CLI.

The problem with libraries for environment management tools is that, if the library and the tool need to be installed in the target environment to function, then it risks version conflicts, where we rather want such tools to not perturbate the environment they manage with their own dependencies.

So for such cases, relying on pip (that is quite ubiquitous in python environments) to do obtain the information via CLI makes a lot of sense.

About environment (packaging.markers.default_environment(), specifically), I think it makes sense as part of the pip list result, because it is necessary to interpret requires_dist metadata. So from that angle, it is coherent to return it at the same time.

That said I agree that returning an array makes sense for pip list and I'm also not comfortable with doing a backward incompatible change. On the other hand, adding a new command that does almost the same as pip list sounds overkill. Maybe that's why I was initially leaning towards an option to request for a variant json format.

@pfmoore
Copy link
Member

pfmoore commented Jul 8, 2022

Can you elaborate what you have in mind in terms of additional data ?

No, I don't have anything in mind, simply that I don't think that just the environment data is sufficient to justify a whole new pip subcommand. But I do appreciate that saying on the one hand that environment data should be in a separate command but then saying that it's not enough to justify a separate command is not very helpful.

My logic is basically: pip list lists data related to projects. The environment data is not project data, so isn't a good fit for pip list. However, there's currently no good subcommand to use for environment data, and creating a whole new subcommand for just that amount of data seems excessive. I don't have a good solution, other than to say let's not decide just yet, but let the idea simmer for a while before doing anything.

You can see it in the example provided in #10771.

OK. Having that data in pip install --dry-run --report is a much better fit, and I have no problem with it being there. None of that data reflects anything about pip, though, so it's simply a convenience including it.

So for such cases, relying on pip (that is quite ubiquitous in python environments) to do obtain the information via CLI makes a lot of sense.

Well, not really. I have a long-term hope that we can move to a situation where we can do pip install --environment XXX to install into a particular environment without needing pip to be present in that environment. And I don't think we're that far from being able to do that.

Sure, right now, pip is present in most environments. And relying on that is a practical solution for now. But it's not guaranteed to remain that way, and we shouldn't encourage people to think that it is.

About environment (packaging.markers.default_environment(), specifically), I think it makes sense as part of the pip list result, because it is necessary to interpret requires_dist metadata. So from that angle, it is coherent to return it at the same time.

Given that it's literally nothing more than a call to packaging,

python -c "import json; from packaging.markers import default_environment; print(json.dumps(default_environment()))"

is just as good. And if you want to assume nothing but pip, import from pip._vendor.packaging. Yes, we don't guarantee that, but for all practical purposes it will work, just like assuming pip is present will work.

Overall, I think the idea of adding pip output options in support of "environment management" tools is probably something that should wait a little, and ideally be driven by specific problems encountered by actual tools. Trying to solve what feel like hypothetical (at this point in time) use cases is likely to end up with a design that satisfies no-one.

@sbidoul
Copy link
Member Author

sbidoul commented Jul 9, 2022

I have a long-term hope that we can move to a situation where we can do pip install --environment XXX to install into a particular environment without needing pip to be present in that environment. And I don't think we're that far from being able to do that.

I hope we reach that goal one day too.

be driven by specific problems encountered by actual tools. Trying to solve what feel like hypothetical (at this point in time) use cases is likely to end up with a design that satisfies no-one.

IMO these are not hypothetical at all. I drive everything I do from concrete use cases such as this one. And after all, most such tools currently hack, patch or rip off pip in one way or another, so I personally believe that exposing some of the pip logic through a couple of cli+json is a good and pragmatic way to help the ecosystem implement correct solutions without hacking the wheel over and over. Especially when the maintenance cost for pip is low. And this is not in opposition with creating libraries as these solve use cases where libraries are not applicable or impractical.

Anyway, I'll let this rest for a while now. Thanks for your input, as always.

@pfmoore
Copy link
Member

pfmoore commented Jul 9, 2022

I hope we reach that goal one day too.

Actually, it occurred to me that we may even be able to do this right now. I put together a very simple proof of concept and it seems to work. If you put the following script alongside a "lib" directory with pip installed into it (pip install pip --target lib) but with the bin and pip*.dist-info directory removed (so the bundled pip isn't visible in pip list) then it can be run from any Python interpreter to effectively act as a copy of pip in that environment.

#!/usr/bin/env python

import runpy
import sys
import os

lib = os.path.join(os.path.dirname(__file__), "lib")
sys.path.insert(0, lib)

runpy.run_module("pip", run_name="__main__")

I don't think it would take much to turn this into a viable "standalone pip" application (I'd mostly just want to set up an executable wrapper for Windows). I've done some very basic testing - this would need a lot more real-world testing to make sure there aren't any problem edge cases, but it basically seems to work.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
state: needs discussion This needs some more discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants