-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: simplify python dependencies management #27465
Conversation
f41bb40
to
68dd627
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #27465 +/- ##
==========================================
+ Coverage 67.30% 69.71% +2.40%
==========================================
Files 1909 1909
Lines 74678 74665 -13
Branches 8325 8325
==========================================
+ Hits 50262 52051 +1789
+ Misses 22366 20564 -1802
Partials 2050 2050
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
0433754
to
2025754
Compare
POC around simplifying our python dependency setup. Goal ultimately is to restore dependabot support and overall untangle complex nested dependencies. Here I'm aiming at having two bundles: a base/lean bundle, and a development bundle. We used to have testing/integration/docker/local and we're merging all this into development. While the benefits of simplifying are clear, the tradeoff is having a more bloated setup for development. My stance here that having a more bloated dev setup is better than having 3-4 that are confusing and half-broken at times.
05268a7
to
58d3e4a
Compare
8e02b2b
to
150bc7e
Compare
BASE_REQS = "requirements/base.txt" | ||
DEV_REQS = "requirements/development.txt" | ||
|
||
DEV_EXTRAS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this CLI essentially add another layer that we need to work though? I would have thought that the old compile-multi
would have just had a include file for these deps, no? I'm a little concerned that folks will need to come and update this python file whenever they need to tune a dep, so you'd need to add here AND in the toml
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, originally I just had a bash scripts that would run pip-compile --extra {whole_list_here}
, as we need those semantics somewhere. Those semantics used to live in .in
files and needed a new place.
Now I realized I needed a few things on tip of the bash script and thought a tiny CLI would be useful so that :
- I wanted a way to make sure
development.txt
is in sync withbase.txt
for the librairies that they share, and prevent any drift. In the past, that was insured by having the.txt
reference one another. I think I should add a CI check that runspip-compile-superset.py compare-versions
to make sure things don't drift for whatever reason. Happy to add this check part of this PR compile-deps
this subcommand should insure that consistency when changing inputs and/or bumping libs. I mean you could run both pip-compile manually. Basically that means we can update both environments pins atomically- I wanted to offer a way to expand upon what's here. Sure you could
pip install -r requirements/base.txt
andpip install
whatever else on top, but I wanted to offer a way to pip-compile your stuff alongside what's in here. I added docs as part of this PR both for CONTRIBUTING.md and in the producitonizing-related docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but I think we should try to keep a single source of truth for this list. Would it not suffice to add it to the toml
file under the development
"optional" req's? Just trying to avoid the need to update dependency lists in multiple locations going forward. If there end up being multiple requirement sources in the toml file, perhaps this CLI would perform a linting function which would ensure version consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, clarifying here, but the list of --extra
here is referencing named GROUPs of requirements as specified in the pyproject.toml
, these groups are useful in their own rights as someone might want to pick one or compose a collection of those for their envrionments. So those are not individual libs, but named groups of libs pre-defined in pyproject.toml
, and listed in as required for our development.txt bundle
So this little array represent the list of groups we want to compose into our requirements/development.txt
. Now why not having one big group called development
with everything? Well people in the community might want to install different subsets of these groups, say mysql
, snowflake
and trino
. Another company may want redshift
and hive
.
To define our own development.txt, I picked everything that's needed in some form or fashion by CI, plus the stuff used in common dev workflows filed under a development
group. This contains libs like pylint / docker / mypy ...
"docker", | ||
"flask-testing", | ||
"freezegun", | ||
"greenlet>=2.0.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't greenlet
a requirement for gevent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, as requirements/development.txt
compile, greenlet
gets imported via all of these ->
greenlet==3.0.3
# via
# apache-superset (setup.py)
# gevent
# playwright
# shillelagh
the reason why it made this development
group is that somehow it was listed here https://github.com/apache/superset/blob/master/requirements/docker.in#L19 , so I merged that under development
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems related to #26396
@john-bodley this matured quite a bit since you look, would love to get your feedback on all this. I think it's viable as is and should address most of the origin goals while also addressing or mitigating concerns. |
Problems with the current setup:
|
Now starting to think re-writing something like dependabot might be easier than all this ... I think I may factor out everything that's good/clear about this PR that's not related to pip-compile-multi (reducing to 2 .in files, improvements in GHA action re-use, ...) |
Closing this for now in favor of simpler/less disruptive #27505 - might reconsider in the future. |
SUMMARY
POC around simplifying our python dependency setup. Goal ultimately is to:
Find more context here -> #27413
Simplify from 6 to 2 bundles ->
base.txt
anddevelopment.txt
Currently we have 6 bundles -> base/development/testing/integration/docker/local
Here I'm aiming at bringing it down to 2 ->
While the benefits of simplifying are clear, the tradeoff is having a more bloated setup for development. My stance here that having a more bloated dev setup is better than having 3-4 that are confusing and half-broken at times. Also note that while it's super important for the production dependencies to be pinned, it matters a bit less going into the longer tail of dependencies.
I'm certain the current setup with 6 requirements files is overly engineered/complex and should be simplified. Is 2 the sweet spot? Do we need 3 to strike the right balance? Let's POC with 2 and see how it impacts CI / dev enviornments
This is a bit of a POC, if successful, next step could be to move away from
pip-compile-multi and onto a simpler pip-compile setup, and restore
dependatot support.
What's in this PR
I got pulled deeper than I expected in this PR, resulting in related
side-quests that I took on and want to list out here so you understand
what's in the 40 files touched by this PR:
pyproject.toml
the now-standard way to manage deps and project semantics.This is compatible with dependabot and the ecosystem in general
requirements/pip-compile-superset.py
a simple utility/CLI that wraps pip-compilecommands and features around python dependency management, sure you can use pip-compile
directly, but this utility is a bit more opinionated (which files to generate with what input)
pip-compile-superset.py
requirements/*.in
files intopyproject.toml
setup-backend
github action to DRY the python-setup logic in one place.now that is used consistently across a dozen+ actions, we have more consistency and can
ultimately tweak for more perf
Mechanics
pyproject.toml
- as specified in PEP-517 & PEP-518pip-compile
torequirements/base.txt
requirements/development.txt
, by specifying many--extra
that point to group of "extra_requires". This file is a superset of the base file, and contrarily to before, re-itemizes each dependency (pip-compile-multi was able to point one to another - so it'd be DRYer)Composition in specific envs
How to use the existing constructs to build upon them in your own environment, like we do at Preset and Airbnb? What's the pattern to follow given these new mechanics. Listing out the needs/requirements around this:
base.txt
except where specfiedStep by step:
preset-prod-requirements.in
that references theextra
you want, first line should be something like-e .[hive,presto,mysql]
cryptography>=3.0.0,<4.0.0
orsome-lib-only-me-needs
development.txt
output file as your target.cp requirements/development.txt requirements/preset-prod-requirements.txt
so it becomes the basis for your environment.pip-compile -o preset-prod-requirements.txt preset-prod-requirements.in -c requirements/base.txt
. Note that the output here acts as an input as well, and that we're counting on the fact that our input file + the latest output file leads to a new deterministic output that fits the criteria aboveTESTING INSTRUCTIONS
Let CI play its course. This is a draft for now.