Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Mar 28, 2024

Depending on selective checks, but also on the job executed, we choose whether to run job on public runners or self-hosted runners. So far the set of labels to select the runners were passed in a bit inconsistent way. Outputs of selective checks can only be strings and the run-as accepts array of strings (labels) - so we were using fromJSON to convert between the two. And we used runs-on inputs on a number of our workflows to pass the selection.

However this meant that runs-on could be either string or array and that sometimes we passed public/self-hosted labels as strings directly and some of those were hard-coded.

This PR changes it consistently across the board to introduce consistent approach:

  • build info have no selective checks results yet, so for them runs-on is hardcoded
  • similarly for "windows" and release jobs that are manually run without running selective checks
  • selective checks will produce three outuputs - JSON stringiified array of labels:
    • default (one that is selected depending on who runs the build)
    • public (for cases where we want to force the builds to use public runners
    • self-hosted (for cases where we want to force the builds to use self-hosted runners
  • all the outputs are named <type>-runs-on-as-string to make it clear they are all strings
  • all inputs of workflows expectings strings are named the same (with as-string suffix and prefix)
  • whenever a job is run, we pass "runs-on" parameter to be fromJSON with appropriate type we want to use passed as input

This will make it easier to reason on which job is using which type of runner and it will make it easier in the future to make it more flexible when we add ASF self-hosted runners and possibly our own K8S runners, or when we would want to change labels for public runners or self-hosted runners.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch 7 times, most recently from 90218a9 to 44befd2 Compare March 29, 2024 00:36
@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch from 44befd2 to 472a2e7 Compare March 29, 2024 13:25
@potiuk
Copy link
Member Author

potiuk commented Mar 29, 2024

This one also looks good-to-go with much easier way to manage what runners are used (and I am going to have some follow-ups once this is better organized).

@potiuk potiuk requested a review from jscheffl March 29, 2024 13:27
@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch from 472a2e7 to a0b3924 Compare March 29, 2024 14:37
@potiuk potiuk added the full tests needed We need to run full set of tests for this PR to merge label Mar 29, 2024
@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch 2 times, most recently from 6ad6c1b to f5e2602 Compare March 29, 2024 16:43
@potiuk
Copy link
Member Author

potiuk commented Mar 29, 2024

Green :)

@potiuk potiuk added the canary When set on PR running from apache repo - behave as canary run label Mar 29, 2024
@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch from b609874 to 6871245 Compare March 31, 2024 13:15
@potiuk
Copy link
Member Author

potiuk commented Mar 31, 2024

@hussein-awala :

I don't think the as-string suffix makes it clearer here, personally, I prefer to keep the name pattern -runs-on or -runs-on-as-json as we can use the string without any issue in most of the cases, but here we want to load an array from the string.

Yep. good point. I also reverted the sequenece and we have now:

      runs-on-as-json-default: ${{ steps.selective-checks.outputs.runs-on-as-json-default }}
      runs-on-as-json-self-hosted: ${{ steps.selective-checks.outputs.runs-on-as-json-self-hosted }}
      runs-on-as-json-public: ${{ steps.selective-checks.outputs.runs-on-as-json-public }}

@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch 13 times, most recently from f9987ef to e036e8c Compare March 31, 2024 14:46
@potiuk
Copy link
Member Author

potiuk commented Mar 31, 2024

I think I figured it .... 🤯 🤯 🤯 🤯 🤯 🤯 🤯 🤯 🤯

    # NOTE!!!!! This has to be put in one line for runs-on to recognize the "fromJSON" properly !!!!
    # adding space before (with >) apparently turns the `runs-on` processed line into a string "Array"
    # instead of an array of strings.
    # yamllint disable-line rule:line-length
    runs-on: ${{ (inputs.platform == 'linux/amd64') && fromJSON(inputs.runs-on-as-json-public) || fromJSON(inputs.runs-on-as-json-self-hosted) }}

@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch from e036e8c to 8a55f69 Compare March 31, 2024 14:48
@potiuk
Copy link
Member Author

potiuk commented Mar 31, 2024

Also I standardized fromJSON everywhere - in an attempt to see what is causing the problem if I put run-as in two lines of yaml rather than one 🤯

@potiuk potiuk added this to the Airflow 2.9.0 milestone Apr 1, 2024
Depending on selective checks, but also on the job executed, we
choose whether to run job on public runners or self-hosted runners.
So far the set of labels to select the runners were passed in a bit
inconsistent way. Outputs of selective checks can only be strings
and the `run-as` accepts array of strings (labels) - so we were
using fromJSON to convert between the two. And we used runs-on
inputs on a number of our workflows to pass the selection.

However this meant that runs-on could be either string or array and
that sometimes we passed public/self-hosted labels as strings
directly and some of those were hard-coded.

This PR changes it consistently across the board to introduce
consistent approach:

* build info have no selective checks results yet, so for them
  runs-on is hardcoded
* similarly for "windows" and release jobs that are manually run
  without running selective checks
* selective checks will produce three outuputs - JSON stringiified
  array of labels:
   * default (one that is selected depending on who runs the build)
   * public (for cases where we want to force the builds to use public
     runners
   * self-hosted (for cases where we want to force the builds to use
     self-hosted runners
* all the outputs are named `<type>-runs-on-as-string` to make it clear
  they are all strings
* all inputs of workflows expectings strings are named the same (with
  as-string suffix and <type> prefix)
* whenever a job is run, we pass "runs-on" parameter to be `fromJSON`
  with appropriate type we want to use passed as input

This will make it easier to reason on which job is using which type
of runner and it will make it easier in the future to make it more
flexible when we add ASF self-hosted runners and possibly our own
K8S runners, or when we would want to change labels for public runners
or self-hosted runners.
@potiuk potiuk force-pushed the consistent-approach-for-runs-on-usage branch from 8a55f69 to 1a9c8b7 Compare April 1, 2024 08:21
@potiuk potiuk merged commit 9da08a5 into apache:main Apr 1, 2024
@potiuk potiuk deleted the consistent-approach-for-runs-on-usage branch April 1, 2024 09:29
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 1, 2024
ephraimbuddy pushed a commit that referenced this pull request Apr 2, 2024
)

Depending on selective checks, but also on the job executed, we
choose whether to run job on public runners or self-hosted runners.
So far the set of labels to select the runners were passed in a bit
inconsistent way. Outputs of selective checks can only be strings
and the `run-as` accepts array of strings (labels) - so we were
using fromJSON to convert between the two. And we used runs-on
inputs on a number of our workflows to pass the selection.

However this meant that runs-on could be either string or array and
that sometimes we passed public/self-hosted labels as strings
directly and some of those were hard-coded.

This PR changes it consistently across the board to introduce
consistent approach:

* build info have no selective checks results yet, so for them
  runs-on is hardcoded
* similarly for "windows" and release jobs that are manually run
  without running selective checks
* selective checks will produce three outuputs - JSON stringiified
  array of labels:
   * default (one that is selected depending on who runs the build)
   * public (for cases where we want to force the builds to use public
     runners
   * self-hosted (for cases where we want to force the builds to use
     self-hosted runners
* all the outputs are named `<type>-runs-on-as-string` to make it clear
  they are all strings
* all inputs of workflows expectings strings are named the same (with
  as-string suffix and <type> prefix)
* whenever a job is run, we pass "runs-on" parameter to be `fromJSON`
  with appropriate type we want to use passed as input

This will make it easier to reason on which job is using which type
of runner and it will make it easier in the future to make it more
flexible when we add ASF self-hosted runners and possibly our own
K8S runners, or when we would want to change labels for public runners
or self-hosted runners.

(cherry picked from commit 9da08a5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants