-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Use reproducible builds for provider packages #35685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0fa8df9 to
e104ff5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not used so I removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40b1938 to
c3ec014
Compare
This is a follow-up after apache#35586 and it depends on this one. It moves the whole functionality of preparing provider packages to breeze, removing the need of doing it in the Breeze CI image. Since we have Python breeze with its own environment managed via `pipx` we can now make sure that all the necessary packages are installed in this environment and run package building in the same environment Breeze uses. Previously we have been running all the package building inside the CI image for two reasons: * we could rely on the same version of build tools (wheel/setuptools) being installed in the CI image * security of the provider package preparation that used setuptools pre PEP-517 way of building packages that executed setup.py code In order to isolate execution of potentially arbitrary code in setup.py from the HOST environment in CI - where the host environment might have access to secrets and tokens that would allow it to break out of the sandbox for PRs coming from forks. The setup.py file has been prepared by breeze using JINJA templates but it was potentially possible to manipulate provider package directory structure and get "Python" injection into generated setup.py, so it was safer to run it in the isolated Breeze CI environment. This PR makes it secure to run it in the Host environment, because instead of generating setup.cfg and setup.py we generate pyproject.toml with all the necessary information and we are using PEP-517 compliant way of building provider packages - no arbitrary code executed via setup.py is possible this way on the host, so we can safely build provider packages in the host. We are generating declarative pyproject.toml for that rather than imperative setup.py, so we are safe to run the build process in the host without being afraid of executing arbitrary code. We are using flit as build tool - this is one of the popular build tools - created by Python Packaging team. It is simple and not too opinionated, it supports PEP-517 as well as PEP-621, so most of the project mnetadata in pyproject toml can be added to PEP-621 compliant "project" section of pyproject.toml. Together with the change we improves the process of generation of the extracted sources for the providers. Originally we copied the whole sources of Airflow to a single directory (provider_packages) and run sequentially provider packages building from that single directory, however it made it impossible to parallelise such builds - all providers had to be built sequentially. We change the approach now - instead of copying all airflow sources once to the single directory, we build providers in separate subdirectories of files/provider_packages/PROVIDER_ID and we only copy there relevant sources (i.e. only provider's subfolder from the "airflow/providers". This is quite a bit faster (each provider only gets built using only its own sources so just scanning the directory is faster) but it also allows to run package preparation in parallel because each provider is fully isolated from others. This PR also excludes not-needed `prepare_providers_package.py` and unneded `provider_packages` folder used to prepare providers before as well as bash script to build the providers and some unused bash functions.
c3ec014 to
add0f14
Compare
Flit allows to build reproducible packages (packages that can be compared bit-by-bit) providing that source date epoch is set to repeatable value when package is built. This PR implements reproducibility of our builds by freezing the documentation preparation time in provider.yaml as "source date epoch" and always using it when building the package. This way anyone using breeze to build the package will have exactly the same binary package produced, which will make it way easier to verify if the packages are ready for release by the PMC member. We will no longer have to check the sources, PMC members will simply need to build the same packages locally using breeze and see if the generated packages are exactly the same. The "source-date-epoch" fields have been regenerated in this PR as well. Also this PR replaces `lru_cache` method of storing output of `get_provider_metadata_packages` with custom-stored dictionary - thanks to that instead of invalidating whole cache of providers metadata refreshed from yaml files we can refresh individual provider metadata entries after they have been updated. This saves a lot of time for validation - because every time when provider yaml is updated we need to re-read it and re-validate it with json schema, with this change we only do it for the updated provider yaml - which saves about 0.5 a second per provider yaml update and when you update all provides it is done way faster.
add0f14 to
68100ef
Compare
|
Need to wait with PROD build until #35617 gets merged |
|
Closing for #35693 to run it from Apache repository - to get the build PROD image working. |
Flit allows to build reproducible packages (packages that can
be compared bit-by-bit) providing that source date epoch is
set to repeatable value when package is built. This PR implements
reproducibility of our builds by freezing the documentation
preparation time in provider.yaml as "source date epoch" and
always using it when building the package. This way anyone
using breeze to build the package will have exactly the same
binary package produced, which will make it way easier to
verify if the packages are ready for release by the PMC member.
We will no longer have to check the sources, PMC members will
simply need to build the same packages locally using breeze and
see if the generated packages are exactly the same.
Based on #35617 so it should only be merged after that one
(Only last commit counts)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.