Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jun 21, 2022

In the new Breeze, switching to using parallelism is a ... breeze.

This PR adds the capability of building the images in parallel in Breeze
locally - for breeze command, but also uses this capability to build the
images in parallel in our CI. Our builds are always executed on
powerful, big machines with lots of CPU and docker run in memory
filesystem with 32GB RAM, so it should be possible to run all builds in
parallel on a single machine rather then spin off parallel machines to
run the builds using the matrix strategy of Github Actions.

Generally speaking - this will either speed up or get 4x cost saving for
the build steps for all the "full test needed" PRs as well as all the
main builds.

There are a number of savings and improvements we can achieve this way:

  1. less overhead for starting and runnning the machines
  2. seems that with the new buildkit, the parallel builds are not
    suffering from some sequential locks (as it used to be, so
    we are basically do the same job using 25% resources for building
    the images.
  3. we will stop having random "one image failed to build" cases - they
    will all either fail or succeed.
  4. Less checks in the output
  5. Production builds will additionally gain from single CI image
    pulled in order to perform the preparation of the packages
    and single package preparation step - it will save 4-5 minutes
    per image.

The disadvantage is a less clear output of such parallel build where
outputs from multiple builds will be interleaved in one CI output.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

@potiuk potiuk requested review from eladkal and ephraimbuddy June 21, 2022 10:30
@potiuk potiuk force-pushed the switch-to-image-building-in-parallel branch 7 times, most recently from 4b87729 to 319bada Compare June 21, 2022 12:23
@potiuk potiuk requested a review from uranusjr June 21, 2022 12:23
@potiuk
Copy link
Member Author

potiuk commented Jun 21, 2022

This one will save quite a lot of elapsed and build time on building our images in CI - especially for main builds.

@potiuk potiuk changed the title Switch to building images in parallell Switch to building images in parallel Jun 21, 2022
@potiuk potiuk force-pushed the switch-to-image-building-in-parallel branch 2 times, most recently from 2897f0f to 07b6fe0 Compare June 21, 2022 12:39
@potiuk potiuk added the full tests needed We need to run full set of tests for this PR to merge label Jun 21, 2022
@potiuk potiuk force-pushed the switch-to-image-building-in-parallel branch 2 times, most recently from d890b9a to cc33e91 Compare June 21, 2022 13:28
@potiuk potiuk requested a review from josh-fell June 21, 2022 14:14
@potiuk
Copy link
Member Author

potiuk commented Jun 21, 2022

All right got some numbers:

For builds with "full tests" (main builds)

  • CI Builds Instead of 4 x running for 1:23 s we have 1 machine running for 2:40 -> w save 2:40 of build time
  • PROD Builds instead of 4 x 8: 32 - we have 1 machine running for 9:47 - whooping saving of 25 minutes (!) of build time

In the new Breeze, switching to using parallelism is a ... breeze.

This PR adds the capability of building the images in parallel in Breeze
locally - for breeze command, but also uses this capability to build the
images in parallel in our CI. Our builds are always executed on
powerful, big machines with lots of CPU and docker run in memory
filesystem with 32GB RAM, so it should be possible to run all builds in
parallel on a single machine rather then spin off parallel machines to
run the builds using the matrix strategy of Github Actions.

Generally speaking - this will either speed up or get 4x cost saving for
the build steps for all the "full test needed" PRs as well as all the
main builds.

There are a number of savings and improvements we can achieve this way:

1) less overhead for starting and runnning the machines
2) seems that with the new buildkit, the parallel builds are not
   suffering from some sequential locks (as it used to be, so
   we are basically do the same job using 25% resources for building
   the images.
3) we will stop having random "one image failed to build" cases - they
   will all either fail or succeed.
4) Less checks in the output
5) Production builds will additionally gain from single CI image
   pulled in order to perform the preparation of the packages
   and single package preparation step - it will save 4-5 minutes
   per image.

The disadvantage is a less clear output of such parallel build where
outputs from multiple builds will be interleaved in one CI output.
@potiuk potiuk force-pushed the switch-to-image-building-in-parallel branch from cc33e91 to 8b847ed Compare June 21, 2022 14:28
@potiuk
Copy link
Member Author

potiuk commented Jun 21, 2022

occasional failures only

@potiuk potiuk merged commit 893d935 into main Jun 21, 2022
@ashb ashb deleted the switch-to-image-building-in-parallel branch June 22, 2022 09:47
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 29, 2022
In the new Breeze, switching to using parallelism is a ... breeze.

This PR adds the capability of building the images in parallel in Breeze
locally - for breeze command, but also uses this capability to build the
images in parallel in our CI. Our builds are always executed on
powerful, big machines with lots of CPU and docker run in memory
filesystem with 32GB RAM, so it should be possible to run all builds in
parallel on a single machine rather then spin off parallel machines to
run the builds using the matrix strategy of Github Actions.

Generally speaking - this will either speed up or get 4x cost saving for
the build steps for all the "full test needed" PRs as well as all the
main builds.

There are a number of savings and improvements we can achieve this way:

1) less overhead for starting and runnning the machines
2) seems that with the new buildkit, the parallel builds are not
   suffering from some sequential locks (as it used to be, so
   we are basically do the same job using 25% resources for building
   the images.
3) we will stop having random "one image failed to build" cases - they
   will all either fail or succeed.
4) Less checks in the output
5) Production builds will additionally gain from single CI image
   pulled in order to perform the preparation of the packages
   and single package preparation step - it will save 4-5 minutes
   per image.

The disadvantage is a less clear output of such parallel build where
outputs from multiple builds will be interleaved in one CI output.

(cherry picked from commit 893d935)
@ephraimbuddy ephraimbuddy added this to the Airflow 2.3.3 milestone Jun 30, 2022
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants