Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Mar 14, 2022

This PR starts an ARM EC2 instance and forwards socket via SSH
so that it can be used by docker buildx build with airflow_cache
multi-platform builders. Then it appends it to the regular
builder making the builds run up to 10 minutes rather than
1.5 hours.

The instance is created only when "main" build reaches the cache
build job/step.

The instances run for maximum 50 minutes and then self-terminate just
in case, also the instance is killed immediately when the job finishes
(no matter if succesfully or not).

Part of #15635


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk force-pushed the optimize-multiplatform-builds branch 18 times, most recently from 43474e4 to f334abe Compare March 21, 2022 12:26
@potiuk potiuk marked this pull request as ready for review March 21, 2022 12:26
@potiuk potiuk requested review from ashb, kaxil and mik-laj as code owners March 21, 2022 12:26
@potiuk potiuk changed the title Optimize multiplatform builds Optimize Multiplatform cache builds Mar 21, 2022
@potiuk
Copy link
Member Author

potiuk commented Mar 21, 2022

This is an optimization (10x speedup) of building our mutliplatform images on CI as cache for other builds. I run it manually on my Macbook so far because of the time it took to build with emulation, but with this one the maximum total time of building the multiplatform images (both CI and PROD) on our CI is down to ~25 minutes from > 3 hours.

Completed, succesful run (I limited it and removed the regular tests there) can be found here: https://github.com/apache/airflow/runs/5626562720?check_suite_focus=true

With this approach we can actually even start thinking about running selected tests on ARM in "main" without actually spinning an ARM runner - that would be a nice step to follow but it is not implemented yet). Github Actions are also close to supporting ARM public runners it seems, so we can eventually get it running for both self-hosted and public runners.

I added several protections against spinnig up too many instances (including self-termination after 50 minutes) , but I will keep watching if we are not running too many of those once we merge it.

@potiuk
Copy link
Member Author

potiuk commented Mar 21, 2022

Hey @ashb -> would love to merge this one as currently main is failing as I had to disable qemu support for our runners (to make sure that ARM engine will be used to build the images).

@potiuk potiuk force-pushed the optimize-multiplatform-builds branch from f334abe to 59b6e77 Compare March 22, 2022 10:44
@potiuk potiuk closed this Mar 22, 2022
@potiuk potiuk reopened this Mar 22, 2022
@potiuk potiuk force-pushed the optimize-multiplatform-builds branch from 59b6e77 to b198559 Compare March 22, 2022 15:40
@potiuk
Copy link
Member Author

potiuk commented Mar 22, 2022

Anyone ? Cache refresh should be enabled soon.

@potiuk
Copy link
Member Author

potiuk commented Mar 23, 2022

:D ?

@potiuk
Copy link
Member Author

potiuk commented Mar 25, 2022

When we merge the two - this one and #22492 .... It will make our caching finally complete for CI and mutli-platform dev builds :)

This PR starts an ARM EC2 instance and forwards socket via SSH
so that it can be used by docker buildx build with airflow_cache
multi-platform builders.

The instance is created only when "main" build reaches the cache
build job/step. The instances run for maximum 50 minutes and
then self-terminate, also the instance is killed when the job
either succeeds or fails.
@potiuk potiuk force-pushed the optimize-multiplatform-builds branch from b198559 to 7d548aa Compare March 26, 2022 21:51
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Mar 27, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit f612a2f into main Mar 27, 2022
potiuk added a commit to potiuk/airflow that referenced this pull request Mar 28, 2022
The ARM instance should be started before cache build 🤦
The apache#22258 introduced optimized cache builds but the sequence
of steps was wrong :(
potiuk added a commit that referenced this pull request Mar 28, 2022
The ARM instance should be started before cache build 🤦
The #22258 introduced optimized cache builds but the sequence
of steps was wrong :(
@potiuk potiuk deleted the optimize-multiplatform-builds branch April 29, 2022 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants