chore: simplify python dependencies management#27465
Conversation
f41bb40 to
68dd627
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #27465 +/- ##
==========================================
+ Coverage 67.30% 69.71% +2.40%
==========================================
Files 1909 1909
Lines 74678 74665 -13
Branches 8325 8325
==========================================
+ Hits 50262 52051 +1789
+ Misses 22366 20564 -1802
Partials 2050 2050
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
0433754 to
2025754
Compare
POC around simplifying our python dependency setup. Goal ultimately is to restore dependabot support and overall untangle complex nested dependencies. Here I'm aiming at having two bundles: a base/lean bundle, and a development bundle. We used to have testing/integration/docker/local and we're merging all this into development. While the benefits of simplifying are clear, the tradeoff is having a more bloated setup for development. My stance here that having a more bloated dev setup is better than having 3-4 that are confusing and half-broken at times.
05268a7 to
58d3e4a
Compare
8e02b2b to
150bc7e
Compare
| BASE_REQS = "requirements/base.txt" | ||
| DEV_REQS = "requirements/development.txt" | ||
|
|
||
| DEV_EXTRAS = [ |
There was a problem hiding this comment.
Doesn't this CLI essentially add another layer that we need to work though? I would have thought that the old compile-multi would have just had a include file for these deps, no? I'm a little concerned that folks will need to come and update this python file whenever they need to tune a dep, so you'd need to add here AND in the toml, right?
There was a problem hiding this comment.
Yes, originally I just had a bash scripts that would run pip-compile --extra {whole_list_here}, as we need those semantics somewhere. Those semantics used to live in .in files and needed a new place.
Now I realized I needed a few things on tip of the bash script and thought a tiny CLI would be useful so that :
- I wanted a way to make sure
development.txtis in sync withbase.txtfor the librairies that they share, and prevent any drift. In the past, that was insured by having the.txtreference one another. I think I should add a CI check that runspip-compile-superset.py compare-versionsto make sure things don't drift for whatever reason. Happy to add this check part of this PR compile-depsthis subcommand should insure that consistency when changing inputs and/or bumping libs. I mean you could run both pip-compile manually. Basically that means we can update both environments pins atomically- I wanted to offer a way to expand upon what's here. Sure you could
pip install -r requirements/base.txtandpip installwhatever else on top, but I wanted to offer a way to pip-compile your stuff alongside what's in here. I added docs as part of this PR both for CONTRIBUTING.md and in the producitonizing-related docs.
There was a problem hiding this comment.
Makes sense, but I think we should try to keep a single source of truth for this list. Would it not suffice to add it to the toml file under the development "optional" req's? Just trying to avoid the need to update dependency lists in multiple locations going forward. If there end up being multiple requirement sources in the toml file, perhaps this CLI would perform a linting function which would ensure version consistency.
There was a problem hiding this comment.
Hey, clarifying here, but the list of --extra here is referencing named GROUPs of requirements as specified in the pyproject.toml, these groups are useful in their own rights as someone might want to pick one or compose a collection of those for their envrionments. So those are not individual libs, but named groups of libs pre-defined in pyproject.toml, and listed in as required for our development.txt bundle
So this little array represent the list of groups we want to compose into our requirements/development.txt. Now why not having one big group called development with everything? Well people in the community might want to install different subsets of these groups, say mysql, snowflake and trino. Another company may want redshift and hive.
To define our own development.txt, I picked everything that's needed in some form or fashion by CI, plus the stuff used in common dev workflows filed under a development group. This contains libs like pylint / docker / mypy ...
| "docker", | ||
| "flask-testing", | ||
| "freezegun", | ||
| "greenlet>=2.0.2", |
There was a problem hiding this comment.
Isn't greenlet a requirement for gevent?
There was a problem hiding this comment.
yup, as requirements/development.txt compile, greenlet gets imported via all of these ->
greenlet==3.0.3
# via
# apache-superset (setup.py)
# gevent
# playwright
# shillelagh
the reason why it made this development group is that somehow it was listed here https://github.com/apache/superset/blob/master/requirements/docker.in#L19 , so I merged that under development
|
@john-bodley this matured quite a bit since you look, would love to get your feedback on all this. I think it's viable as is and should address most of the origin goals while also addressing or mitigating concerns. |
|
Problems with the current setup:
|
|
Now starting to think re-writing something like dependabot might be easier than all this ... I think I may factor out everything that's good/clear about this PR that's not related to pip-compile-multi (reducing to 2 .in files, improvements in GHA action re-use, ...) |
|
Closing this for now in favor of simpler/less disruptive #27505 - might reconsider in the future. |
SUMMARY
POC around simplifying our python dependency setup. Goal ultimately is to:
Find more context here -> #27413
Simplify from 6 to 2 bundles ->
base.txtanddevelopment.txtCurrently we have 6 bundles -> base/development/testing/integration/docker/local
Here I'm aiming at bringing it down to 2 ->
While the benefits of simplifying are clear, the tradeoff is having a more bloated setup for development. My stance here that having a more bloated dev setup is better than having 3-4 that are confusing and half-broken at times. Also note that while it's super important for the production dependencies to be pinned, it matters a bit less going into the longer tail of dependencies.
I'm certain the current setup with 6 requirements files is overly engineered/complex and should be simplified. Is 2 the sweet spot? Do we need 3 to strike the right balance? Let's POC with 2 and see how it impacts CI / dev enviornments
This is a bit of a POC, if successful, next step could be to move away from
pip-compile-multi and onto a simpler pip-compile setup, and restore
dependatot support.
What's in this PR
I got pulled deeper than I expected in this PR, resulting in related
side-quests that I took on and want to list out here so you understand
what's in the 40 files touched by this PR:
pyproject.tomlthe now-standard way to manage deps and project semantics.This is compatible with dependabot and the ecosystem in general
requirements/pip-compile-superset.pya simple utility/CLI that wraps pip-compilecommands and features around python dependency management, sure you can use pip-compile
directly, but this utility is a bit more opinionated (which files to generate with what input)
pip-compile-superset.pyrequirements/*.infiles intopyproject.tomlsetup-backendgithub action to DRY the python-setup logic in one place.now that is used consistently across a dozen+ actions, we have more consistency and can
ultimately tweak for more perf
Mechanics
pyproject.toml- as specified in PEP-517 & PEP-518pip-compiletorequirements/base.txtrequirements/development.txt, by specifying many--extrathat point to group of "extra_requires". This file is a superset of the base file, and contrarily to before, re-itemizes each dependency (pip-compile-multi was able to point one to another - so it'd be DRYer)Composition in specific envs
How to use the existing constructs to build upon them in your own environment, like we do at Preset and Airbnb? What's the pattern to follow given these new mechanics. Listing out the needs/requirements around this:
base.txtexcept where specfiedStep by step:
preset-prod-requirements.inthat references theextrayou want, first line should be something like-e .[hive,presto,mysql]cryptography>=3.0.0,<4.0.0orsome-lib-only-me-needsdevelopment.txtoutput file as your target.cp requirements/development.txt requirements/preset-prod-requirements.txtso it becomes the basis for your environment.pip-compile -o preset-prod-requirements.txt preset-prod-requirements.in -c requirements/base.txt. Note that the output here acts as an input as well, and that we're counting on the fact that our input file + the latest output file leads to a new deterministic output that fits the criteria aboveTESTING INSTRUCTIONS
Let CI play its course. This is a draft for now.