Use explicit and easier to use runs-on approach for CI workflows #38601

potiuk · 2024-03-28T23:14:48Z

Depending on selective checks, but also on the job executed, we choose whether to run job on public runners or self-hosted runners. So far the set of labels to select the runners were passed in a bit inconsistent way. Outputs of selective checks can only be strings and the run-as accepts array of strings (labels) - so we were using fromJSON to convert between the two. And we used runs-on inputs on a number of our workflows to pass the selection.

However this meant that runs-on could be either string or array and that sometimes we passed public/self-hosted labels as strings directly and some of those were hard-coded.

This PR changes it consistently across the board to introduce consistent approach:

build info have no selective checks results yet, so for them runs-on is hardcoded
similarly for "windows" and release jobs that are manually run without running selective checks
selective checks will produce three outuputs - JSON stringiified array of labels:
- default (one that is selected depending on who runs the build)
- public (for cases where we want to force the builds to use public runners
- self-hosted (for cases where we want to force the builds to use self-hosted runners
all the outputs are named <type>-runs-on-as-string to make it clear they are all strings
all inputs of workflows expectings strings are named the same (with as-string suffix and prefix)
whenever a job is run, we pass "runs-on" parameter to be fromJSON with appropriate type we want to use passed as input

This will make it easier to reason on which job is using which type of runner and it will make it easier in the future to make it more flexible when we add ASF self-hosted runners and possibly our own K8S runners, or when we would want to change labels for public runners or self-hosted runners.

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

potiuk · 2024-03-29T13:27:15Z

This one also looks good-to-go with much easier way to manage what runners are used (and I am going to have some follow-ups once this is better organized).

potiuk · 2024-03-29T23:06:49Z

Green :)

potiuk · 2024-03-31T13:17:28Z

@hussein-awala :

I don't think the as-string suffix makes it clearer here, personally, I prefer to keep the name pattern -runs-on or -runs-on-as-json as we can use the string without any issue in most of the cases, but here we want to load an array from the string.

Yep. good point. I also reverted the sequenece and we have now:

      runs-on-as-json-default: ${{ steps.selective-checks.outputs.runs-on-as-json-default }}
      runs-on-as-json-self-hosted: ${{ steps.selective-checks.outputs.runs-on-as-json-self-hosted }}
      runs-on-as-json-public: ${{ steps.selective-checks.outputs.runs-on-as-json-public }}

potiuk · 2024-03-31T14:47:30Z

I think I figured it .... 🤯 🤯 🤯 🤯 🤯 🤯 🤯 🤯 🤯

    # NOTE!!!!! This has to be put in one line for runs-on to recognize the "fromJSON" properly !!!!
    # adding space before (with >) apparently turns the `runs-on` processed line into a string "Array"
    # instead of an array of strings.
    # yamllint disable-line rule:line-length
    runs-on: ${{ (inputs.platform == 'linux/amd64') && fromJSON(inputs.runs-on-as-json-public) || fromJSON(inputs.runs-on-as-json-self-hosted) }}

potiuk · 2024-03-31T14:57:04Z

Also I standardized fromJSON everywhere - in an attempt to see what is causing the problem if I put run-as in two lines of yaml rather than one 🤯

Depending on selective checks, but also on the job executed, we choose whether to run job on public runners or self-hosted runners. So far the set of labels to select the runners were passed in a bit inconsistent way. Outputs of selective checks can only be strings and the `run-as` accepts array of strings (labels) - so we were using fromJSON to convert between the two. And we used runs-on inputs on a number of our workflows to pass the selection. However this meant that runs-on could be either string or array and that sometimes we passed public/self-hosted labels as strings directly and some of those were hard-coded. This PR changes it consistently across the board to introduce consistent approach: * build info have no selective checks results yet, so for them runs-on is hardcoded * similarly for "windows" and release jobs that are manually run without running selective checks * selective checks will produce three outuputs - JSON stringiified array of labels: * default (one that is selected depending on who runs the build) * public (for cases where we want to force the builds to use public runners * self-hosted (for cases where we want to force the builds to use self-hosted runners * all the outputs are named `<type>-runs-on-as-string` to make it clear they are all strings * all inputs of workflows expectings strings are named the same (with as-string suffix and <type> prefix) * whenever a job is run, we pass "runs-on" parameter to be `fromJSON` with appropriate type we want to use passed as input This will make it easier to reason on which job is using which type of runner and it will make it easier in the future to make it more flexible when we add ASF self-hosted runners and possibly our own K8S runners, or when we would want to change labels for public runners or self-hosted runners.

) Depending on selective checks, but also on the job executed, we choose whether to run job on public runners or self-hosted runners. So far the set of labels to select the runners were passed in a bit inconsistent way. Outputs of selective checks can only be strings and the `run-as` accepts array of strings (labels) - so we were using fromJSON to convert between the two. And we used runs-on inputs on a number of our workflows to pass the selection. However this meant that runs-on could be either string or array and that sometimes we passed public/self-hosted labels as strings directly and some of those were hard-coded. This PR changes it consistently across the board to introduce consistent approach: * build info have no selective checks results yet, so for them runs-on is hardcoded * similarly for "windows" and release jobs that are manually run without running selective checks * selective checks will produce three outuputs - JSON stringiified array of labels: * default (one that is selected depending on who runs the build) * public (for cases where we want to force the builds to use public runners * self-hosted (for cases where we want to force the builds to use self-hosted runners * all the outputs are named `<type>-runs-on-as-string` to make it clear they are all strings * all inputs of workflows expectings strings are named the same (with as-string suffix and <type> prefix) * whenever a job is run, we pass "runs-on" parameter to be `fromJSON` with appropriate type we want to use passed as input This will make it easier to reason on which job is using which type of runner and it will make it easier in the future to make it more flexible when we add ASF self-hosted runners and possibly our own K8S runners, or when we would want to change labels for public runners or self-hosted runners. (cherry picked from commit 9da08a5)

potiuk requested review from ashb, jedcunningham and kaxil as code owners March 28, 2024 23:14

boring-cyborg bot added the area:dev-tools label Mar 28, 2024

potiuk requested review from Taragolis, aritra24, eladkal, ephraimbuddy, hussein-awala and pankajkoti and removed request for ashb, jedcunningham and kaxil March 28, 2024 23:14

potiuk force-pushed the consistent-approach-for-runs-on-usage branch 7 times, most recently from 90218a9 to 44befd2 Compare March 29, 2024 00:36

potiuk mentioned this pull request Mar 29, 2024

Use explicit and easier to use runs-on approach for CI workflows #38605

Closed

potiuk force-pushed the consistent-approach-for-runs-on-usage branch from 44befd2 to 472a2e7 Compare March 29, 2024 13:25

potiuk requested a review from jscheffl March 29, 2024 13:27

potiuk force-pushed the consistent-approach-for-runs-on-usage branch from 472a2e7 to a0b3924 Compare March 29, 2024 14:37

potiuk added the full tests needed We need to run full set of tests for this PR to merge label Mar 29, 2024

potiuk force-pushed the consistent-approach-for-runs-on-usage branch 2 times, most recently from 6ad6c1b to f5e2602 Compare March 29, 2024 16:43

potiuk added the canary When set on PR running from apache repo - behave as canary run label Mar 29, 2024

potiuk force-pushed the consistent-approach-for-runs-on-usage branch from b609874 to 6871245 Compare March 31, 2024 13:15

potiuk force-pushed the consistent-approach-for-runs-on-usage branch 13 times, most recently from f9987ef to e036e8c Compare March 31, 2024 14:46

potiuk force-pushed the consistent-approach-for-runs-on-usage branch from e036e8c to 8a55f69 Compare March 31, 2024 14:48

potiuk added this to the Airflow 2.9.0 milestone Apr 1, 2024

potiuk force-pushed the consistent-approach-for-runs-on-usage branch from 8a55f69 to 1a9c8b7 Compare April 1, 2024 08:21

potiuk merged commit 9da08a5 into apache:main Apr 1, 2024

potiuk deleted the consistent-approach-for-runs-on-usage branch April 1, 2024 09:29

ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use explicit and easier to use runs-on approach for CI workflows #38601

Use explicit and easier to use runs-on approach for CI workflows #38601

Uh oh!

potiuk commented Mar 28, 2024

Uh oh!

potiuk commented Mar 29, 2024

Uh oh!

potiuk commented Mar 29, 2024

Uh oh!

potiuk commented Mar 31, 2024

Uh oh!

potiuk commented Mar 31, 2024

Uh oh!

potiuk commented Mar 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use explicit and easier to use runs-on approach for CI workflows #38601

Use explicit and easier to use runs-on approach for CI workflows #38601

Uh oh!

Conversation

potiuk commented Mar 28, 2024

Uh oh!

potiuk commented Mar 29, 2024

Uh oh!

potiuk commented Mar 29, 2024

Uh oh!

potiuk commented Mar 31, 2024

Uh oh!

potiuk commented Mar 31, 2024

Uh oh!

potiuk commented Mar 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants