Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented May 21, 2023

We used to have low number of Public runners on GitHub actions and Apache Software Foundation projects were competing for them, that's why we switched build-images workflow to run on self-hosted runners, and we had to add workflow-embedded list of committers to make sure that build-info also uses self-hosted runners.

Recently however the ASF has 900 runners and if anything, it's the public runners that are in abundance, while self-hosted runners are limited, which sometimes leads to "Wait for CI images" jobs timeout while waiting for self-hosted runners to build the image.

Those "wait-for-ci-images" jobs do almost nothing, just waiting, so using self-hosted runners for those is not needed, we can easily switch those to public runners as well.

This PR does the following changes:

  • build-info jobs always run on public runners
  • wait-for-ci-images job are always run on public runners
  • all the other jobs run on either self-hosted or public runners depending who the actor is (committers - run on self-hosted, non-committers, run on public runners)

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk force-pushed the build-info-images-on-public-runners branch 2 times, most recently from ca2f301 to f3b1620 Compare May 21, 2023 21:42
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally we will be able to get rid of this list in yaml (@ashb - ci-infra instruction fix will follow once we merge it).

Copy link
Member Author

@potiuk potiuk May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will save quite a lot of "self-hosted" time for committer builds this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wil save ~ 4 minutes of "feedback time" as this test-pytest-collection is also repeated in each test job.

@potiuk potiuk force-pushed the build-info-images-on-public-runners branch from f3b1620 to 3312c76 Compare May 21, 2023 21:56
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also save quite a lot of "self-hosted" time - building images will now be done on public runners for non-commiter runs.

@potiuk potiuk force-pushed the build-info-images-on-public-runners branch from 3312c76 to e88a41c Compare May 21, 2023 23:33
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committer list is now retrieved dynamically. We do not need to hard-code it at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we have no permissions in the actions to read the list :(, I needed to re-hardcode it.

We used to have low number of Public runners on GitHub actions and
Apache Software Foundation projects were competing for them, that's
why we switched `build-images` workflow to run on self-hosted
runners, and we had to add workflow-embedded list of committers to
make sure that build-info also uses self-hosted runners.

Recently however the ASF has 900 runners and if anything, it's the
public runners that are in abundance, while self-hosted runners are
limited, which sometimes leads to "Wait for CI images" jobs timeout
while waiting for self-hosted runners to build the image.

Those "wait-for-ci-images" jobs do almost nothing, just waiting, so
using self-hosted runners for those is not needed, we can easily
switch those to public runners as well.

Also "wait-for-ci-images" currently performs "test-pytest-collection"
as an optimisation - preventing multiple "test" jobs from failing in
case of pytest collection fails, however with public runners, it adds
extra 4 minutes to tests, and failing pytest collection is rather rare
case, so we can safely remove this step.

This PR does the following changes:

* build-info jobs always run on public runners
* wait-for-ci-images job are always run on public runners
* wait-for-ci-images does not run test for pytest collection
* all the other jobs run on either self-hosted or public runners
  depending who the actor is (committers - run on self-hosted,
  non-committers, run on public runners)
@potiuk potiuk force-pushed the build-info-images-on-public-runners branch from e88a41c to dc39e3e Compare May 21, 2023 23:45
@potiuk
Copy link
Member Author

potiuk commented May 22, 2023

Let's see. I will merge and see how things will play out :)

Living on the edge ;)

@potiuk potiuk merged commit 0e8bff9 into apache:main May 22, 2023
@potiuk potiuk deleted the build-info-images-on-public-runners branch May 22, 2023 17:41
potiuk added a commit to potiuk/airflow that referenced this pull request May 24, 2023
The apache#31451 changed the way to decide which runners should be used
but it also lost the ability of controlling that via the
"use public runners" label.

This PR brings the capability back.
potiuk added a commit that referenced this pull request May 24, 2023
…31516)

The #31451 changed the way to decide which runners should be used
but it also lost the ability of controlling that via the
"use public runners" label.

This PR brings the capability back.
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 8, 2023
After apache#31451 there were cases where re-running failed builds by
committers without rebasing the PR could have caused that the
runs were scheduled on self-hosted runners, but the runner
rejected the build because it came from non-trusted actor.

This PR brings back the calculation of runs-on based on PR user
login rather than on the actor of the build.
potiuk added a commit that referenced this pull request Jun 8, 2023
After #31451 there were cases where re-running failed builds by
committers without rebasing the PR could have caused that the
runs were scheduled on self-hosted runners, but the runner
rejected the build because it came from non-trusted actor.

This PR brings back the calculation of runs-on based on PR user
login rather than on the actor of the build.
potiuk added a commit to potiuk/airflow that referenced this pull request Jun 28, 2023
Onboardig steps included instructions to old place where the
commiter ids should be updated. They moved from ci.yaml to breeze's
global constants in apache#31451
eladkal pushed a commit that referenced this pull request Jun 28, 2023
Onboardig steps included instructions to old place where the
commiter ids should be updated. They moved from ci.yaml to breeze's
global constants in #31451
potiuk added a commit to apache/airflow-ci-infra that referenced this pull request Jun 28, 2023
It has never been used, it was a failed attempt to upgrade
workflows with list of author automatically - but with
apache/airflow#31451 and the change
in the process that we are asking committers to add themselves
to the list in Breeze, we do not need it any more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants