Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Mar 23, 2022

This change is one of the biggest optimizations to the Dockerfiles
that from the very beginning was a goal, but it has been enabled
by switching to buildkit and recent relase of support for
the 1.4 dockerfile syntax. This syntax introduced two features:

  • heredocs
  • links for COPY commands

Both changes allows to solve multiple problems:

  • COPY for build scripts suffer from permission problems. Depending
    on umask setting of the host, the scripts could have different
    group permissions and invalidate docker cache. Inlining the
    scripts (automatically by pre-commit) gets rid of the problem
    completely

  • COPY --link allows to optimize and parallelize builds for
    Dockerfile.ci embedded source code. This should speed up
    not only building the images locally but also it will allow
    to use more efficiently cache for the CI builds (in case no
    source code change, the builds will use pre-cached layers from
    the cache more efficiently (and in parallel)

  • The PROD Dockerfile is now completely standalone. You do not
    need to have any folders or files to build Airlfow image. At
    the same time the versatility and support for multiple ways
    on how you can build the image (as described in
    https://airflow.apache.org/docs/docker-stack/build.html is
    maintained (this was a goal from the very beginning of the
    PROD Dockerfile but it was not easily achievable - heredocs
    allow to inline scripts that are used for the build and the
    pre-commits will make sure that there is one source of truth
    and nicely editable scripts for both PROD and CI Dockerfile.

The last point is really cool, because it allows our users to
build custom dockerfiles without checking out the code of
Airflow, it is enough to download the latest released
Dockerfile and they can easily build the image.

Overall - this change will vastly optimize build speed for
both PROD and CI images in multiple scenarios.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk
Copy link
Member Author

potiuk commented Mar 23, 2022

Hello everyone. This is I think the last "serious" optimization of the way our Dockerfiles are constructed - enabled by "dockerfile:1.4" finally released few weeks ago. I was waiting for it to complete what I wanted to achieve from the very beginnig of my Dockerfiles journey, and I am happy we got there finally :) .

Wth this one a lot of problems we had with caching and speed of rebuildong the images on various machines will be gone and we will still continue having a very versatile and flexible Dockerfile - only that customizing the image will be now WAY easier, as it will only require downloading the single Dockerfile which is self-contained now and way faster to rebuild in many circumstances.

Looking forward to merging it soon - together with #22225 it shoudl vastly improve waiting time of Breeze users and speed up the CI image builds significantly.

@potiuk potiuk force-pushed the inline-docker-scripts branch from 8267747 to eed9972 Compare March 23, 2022 17:55
@potiuk
Copy link
Member Author

potiuk commented Mar 23, 2022

cc: @Bowrna - this one results in a few changes to the part you do on the prod image building #21956 . I've done some of them here, but some more changes are needed to bring the changes also to the python PROD Build too:

  • I removed "fix_group_permissions" part - from both the old Breeze and the new one (so you do not need to do anything for that one). This is a good example of "code is not an asset but liability" - by inlining the scripts I could finally get rid of all the permission problems that various host configuration could cause and rebuilding the image with cache will be much more predictable, but also we could get rid of the "not-very-reliable" code that I added to workaround that problem before (not very succesfully often) :). - inlining the scripts to the image solves it completely

  • There is a need to add few changes (you will see it there)
    --build-arg DOCKERF_CONTEXT_FILES="docker-context-files" needs to be added in "prod build" command
    --cache:max should be added also in the CI image (because it is multi-segment image now)

@potiuk
Copy link
Member Author

potiuk commented Mar 23, 2022

cc: @dstandish @kaxil @jedcunningham @mik-laj - this one also contains something that I promised some time ago - detailed changelog for all 2.* Dockerfiles. I tried to describe in detail what was changed in which version but it might need some clarifications as I have too many assumptions on my Head. Any comments are appreciated.

@potiuk
Copy link
Member Author

potiuk commented Mar 23, 2022

(and I could split some of those changes but not many BTW). They are all very tightly connected I am afraid.

@potiuk potiuk force-pushed the inline-docker-scripts branch 3 times, most recently from 478c7d7 to 6821bf2 Compare March 23, 2022 18:51
@potiuk potiuk force-pushed the inline-docker-scripts branch 3 times, most recently from fe0ae0c to c13301c Compare March 23, 2022 21:59
potiuk added a commit to potiuk/airflow that referenced this pull request Mar 23, 2022
In order to build PROD image with inlined scripts, we need to merge
a change to `main` to pass DOCKER_CONTEXT_FILES arg as parameter.

Related to apache#22492
@potiuk
Copy link
Member Author

potiuk commented Mar 23, 2022

We need to merge #22492 in order to get this PR green. I've added a better error messaging for that case (the build shoudl fail now with a better error message).

Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a number of nits I noticed while looking at the docker_context_files arg.

@potiuk potiuk force-pushed the inline-docker-scripts branch from eac66d0 to 058b989 Compare March 24, 2022 07:27
@potiuk
Copy link
Member Author

potiuk commented Mar 24, 2022

All language corrections by @jedcunningham fixed :). Also I think all the "checks" shoudl be fix and the build should succeed this time.

@potiuk
Copy link
Member Author

potiuk commented Mar 24, 2022

Thanks @jedcunningham as usual :) ❤️

@Bowrna
Copy link
Contributor

Bowrna commented Mar 24, 2022

@potiuk Could you explain to me(or point to docs) how inlining the script will remove the permission problem? thank you

@potiuk
Copy link
Member Author

potiuk commented Mar 24, 2022

When scripts are inlined, they are created by the docker engine COPY <<"EOF" does it. This means that they are created with the default permissions of the engine (always the same in all engines).

When the files were copied from the host (originally) permissions were copied from whatever was in the host. And this depended on umask setting in the host (and potentially on git configuration during the checkout). Practically, you could either have g+ or g- depending if your umask was 0002 or 0022 (those are most typical umask settings out there).

Additionally there was another problem on Windows. When you check out code on windows filesystem, you also loose executable bit. This we handled by explicitly adding +x when needed. In case of inlining we would have to so it anyway (for example you can see we do it for 'pip' inlined in Docker file).

@potiuk potiuk force-pushed the inline-docker-scripts branch from 058b989 to 36f72fe Compare March 24, 2022 17:54
@potiuk
Copy link
Member Author

potiuk commented Mar 24, 2022

Now it shoudl be REALLY Green :)

@potiuk
Copy link
Member Author

potiuk commented Mar 24, 2022

Green :)

@potiuk
Copy link
Member Author

potiuk commented Mar 25, 2022

I'd love to merge it. It will make our builds and Breeze Much faster with caching issues :)

This change is one of the biggest optimizations to the Dockerfiles
that from the very beginning was a goal, but it has been enabled
by switching to buildkit and recent relase of support for
the 1.4 dockerfile syntax. This syntax introduced two features:

* heredocs
* links for COPY commands

Both changes allows to solve multiple problems:

* COPY for build scripts suffer from permission problems. Depending
  on umask setting of the host, the scripts could have different
  group permissions and invalidate docker cache. Inlining the
  scripts (automatically by pre-commit) gets rid of the problem
  completely

* COPY --link allows to optimize and parallelize builds for
  Dockerfile.ci embedded source code. This should speed up
  not only building the images locally but also it will allow
  to use more efficiently cache for the CI builds (in case no
  source code change, the builds will use pre-cached layers from
  the cache more efficiently (and in parallel)

* The PROD Dockerfile is now completely standalone. You do not
  need to have any folders or files to build Airlfow image. At
  the same time the versatility and support for multiple ways
  on how you can build the image (as described in
  https://airflow.apache.org/docs/docker-stack/build.html is
  maintained (this was a goal from the very beginning of the
  PROD Dockerfile but it was not easily achievable - heredocs
  allow to inline scripts that are used for the build and the
  pre-commits will make sure that there is one source of truth
  and nicely editable scripts for both PROD and CI Dockerfile.

The last point is really cool, because it allows our users to
build custom dockerfiles without checking out the code of
Airflow, it is enough to download the latest released
Dockerfile and they can easily build the image.

Overall - this change will vastly optimize build speed for
both PROD and CI images in multiple scenarios.
@potiuk potiuk force-pushed the inline-docker-scripts branch from 36f72fe to 5b8bbe4 Compare March 26, 2022 21:28
@potiuk
Copy link
Member Author

potiuk commented Mar 26, 2022

I'd really love that one to be merged. Finishing AIP-26 will be way easier with it :) https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-26+Production-ready+Airflow+Docker+Image

@potiuk
Copy link
Member Author

potiuk commented Mar 27, 2022

I think we can merge this one (Regardless of the answer of legal).

@potiuk
Copy link
Member Author

potiuk commented Mar 27, 2022

I'd love to get that one and #22225 in order to:

a) speed up all PRs by 10 minutes (cache)
b) make main succeed back (now all main builds are timing out on trying to build arm images with emulation.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Mar 27, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit 9cbab95 into apache:main Mar 27, 2022
@potiuk potiuk deleted the inline-docker-scripts branch March 27, 2022 17:19
@Bowrna Bowrna mentioned this pull request Mar 30, 2022
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:production-image Production image improvements and fixes changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants