-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Converts Dockerfiles to be standalone #22492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello everyone. This is I think the last "serious" optimization of the way our Dockerfiles are constructed - enabled by "dockerfile:1.4" finally released few weeks ago. I was waiting for it to complete what I wanted to achieve from the very beginnig of my Wth this one a lot of problems we had with caching and speed of rebuildong the images on various machines will be gone and we will still continue having a very versatile and flexible Looking forward to merging it soon - together with #22225 it shoudl vastly improve waiting time of |
8267747 to
eed9972
Compare
|
cc: @Bowrna - this one results in a few changes to the part you do on the prod image building #21956 . I've done some of them here, but some more changes are needed to bring the changes also to the python PROD Build too:
|
|
cc: @dstandish @kaxil @jedcunningham @mik-laj - this one also contains something that I promised some time ago - detailed changelog for all 2.* Dockerfiles. I tried to describe in detail what was changed in which version but it might need some clarifications as I have too many assumptions on my Head. Any comments are appreciated. |
|
(and I could split some of those changes but not many BTW). They are all very tightly connected I am afraid. |
docs/docker-stack/docker-examples/customizing/add-build-essential-custom.sh
Outdated
Show resolved
Hide resolved
docs/docker-stack/docker-examples/customizing/own-requirements.sh
Outdated
Show resolved
Hide resolved
478c7d7 to
6821bf2
Compare
fe0ae0c to
c13301c
Compare
In order to build PROD image with inlined scripts, we need to merge a change to `main` to pass DOCKER_CONTEXT_FILES arg as parameter. Related to apache#22492
|
We need to merge #22492 in order to get this PR green. I've added a better error messaging for that case (the build shoudl fail now with a better error message). |
jedcunningham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a number of nits I noticed while looking at the docker_context_files arg.
eac66d0 to
058b989
Compare
|
All language corrections by @jedcunningham fixed :). Also I think all the "checks" shoudl be fix and the build should succeed this time. |
|
Thanks @jedcunningham as usual :) ❤️ |
|
@potiuk Could you explain to me(or point to docs) how inlining the script will remove the permission problem? thank you |
|
When scripts are inlined, they are created by the docker engine COPY <<"EOF" does it. This means that they are created with the default permissions of the engine (always the same in all engines). When the files were copied from the host (originally) permissions were copied from whatever was in the host. And this depended on umask setting in the host (and potentially on git configuration during the checkout). Practically, you could either have g+ or g- depending if your umask was 0002 or 0022 (those are most typical umask settings out there). Additionally there was another problem on Windows. When you check out code on windows filesystem, you also loose executable bit. This we handled by explicitly adding +x when needed. In case of inlining we would have to so it anyway (for example you can see we do it for 'pip' inlined in Docker file). |
058b989 to
36f72fe
Compare
|
Now it shoudl be REALLY Green :) |
|
Green :) |
|
I'd love to merge it. It will make our builds and Breeze Much faster with caching issues :) |
This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
36f72fe to
5b8bbe4
Compare
|
I'd really love that one to be merged. Finishing AIP-26 will be way easier with it :) https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-26+Production-ready+Airflow+Docker+Image |
|
I think we can merge this one (Regardless of the answer of legal). |
|
I'd love to get that one and #22225 in order to: a) speed up all PRs by 10 minutes (cache) |
|
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
This change is one of the biggest optimizations to the Dockerfiles
that from the very beginning was a goal, but it has been enabled
by switching to buildkit and recent relase of support for
the 1.4 dockerfile syntax. This syntax introduced two features:
Both changes allows to solve multiple problems:
COPY for build scripts suffer from permission problems. Depending
on umask setting of the host, the scripts could have different
group permissions and invalidate docker cache. Inlining the
scripts (automatically by pre-commit) gets rid of the problem
completely
COPY --link allows to optimize and parallelize builds for
Dockerfile.ci embedded source code. This should speed up
not only building the images locally but also it will allow
to use more efficiently cache for the CI builds (in case no
source code change, the builds will use pre-cached layers from
the cache more efficiently (and in parallel)
The PROD Dockerfile is now completely standalone. You do not
need to have any folders or files to build Airlfow image. At
the same time the versatility and support for multiple ways
on how you can build the image (as described in
https://airflow.apache.org/docs/docker-stack/build.html is
maintained (this was a goal from the very beginning of the
PROD Dockerfile but it was not easily achievable - heredocs
allow to inline scripts that are used for the build and the
pre-commits will make sure that there is one source of truth
and nicely editable scripts for both PROD and CI Dockerfile.
The last point is really cool, because it allows our users to
build custom dockerfiles without checking out the code of
Airflow, it is enough to download the latest released
Dockerfile and they can easily build the image.
Overall - this change will vastly optimize build speed for
both PROD and CI images in multiple scenarios.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.