-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: Workflow job based ephemeral runner scaling #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…l in acceptance test
…ers cant be used as part of workflow_job based scaling
…f the autoscaling section
|
Hi @mumoshu, this looks fantastic! Thank you so much for this hard work! Do I understand that:
|
Yes
Mostly yes. To be clear, actions-runner-controller already has webhook-based autoscale and you can use it today to avoid API rate limit issue.
|
|
Yes, this is great. I already use the webhook autoscaler, it's great for scaling fast but it still produces a lot of calls to check whether the jobs are active or idle etc. |
|
@sledigabel Just to be extra sure- Do your HRA spec contains either
Perhaps you meant "runners are active or idle"? Then it's very likely you have In that case, you can already omit it in favor of webhook-based autoscale, as long as your |
|
@mumoshu It runs both at the moment, but I think all the API calls are mostly coming from the checks: BTW on those number of calls I think there would be an opportunity to "save" some calls. I can create an issue separately for that. Thanks again for this! |
|
Also, I love that ephemeral runners are now an upstream feature! :-) |
This add support for two upcoming enhancements on GitHub side of self-hosted runners, ephemeral runners and
workflow_jowevents. You can't use these yet.These features are not yet generally available to all GitHub users. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released necessary features on their end.
Ephemeral runners
The former, ephemeral runners, is basically the reliable alternative to
--once, which we've been using when you enabledephemeral: true(default in actions-runner-controller).--oncehas been suffering from a race issue #466.--ephemeralfixes that.To enable ephemeral runners with
actions/runner, you give--ephemeraltoconfig.sh. This updated version ofactions-runner-controllerdoes it for you, by using--ephemeralinstead of--oncewhen you setRUNNER_FEATURE_FLAG_EPHEMERAL=true.Please read the section
Ephemeral Runnersin the updated version of our README for more information.Note that ephemeral runners is not released on GitHub yet. And
RUNNER_FEATURE_FLAG_EPHEMERAL=truewon't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!workflow_jobeventsworkflow_jobis the additional webhook event that corresponds to each GitHub Actions workflow job run. It providesactions-runner-controllera solid foundation to improve our webhook-based autoscale.Formerly, we've been exploiting webhook events like
check_runfor autoscaling. However, as none of our supported events has includedlabels, you had to configure a HRA to only match relevantcheck_runevents. It wasn't trivial.In contract, a
workflow_jobevent payload containslabelsof runners requested.actions-runner-controlleris able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment bylabelsincluded in the webhook payload. So all you need to use webhook-based autoscale will be to enableworkflow_jobon GitHub and expose actions-runner-controller's webhook server to the internet.Note that the current implementation of
workflow_jobsupport works in two ways, increment and decrement. An increment happens when the webhook server receivesworkflow_jobofqueuedstatus. A decrement happens when it receivesworkflow_jobofcompletedstatus. The latter is used to make scaling-down faster, so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject toscaleDownDelaySecondsAfterScaleOut.To enable
workflow_jobwebhook, go to your Webhook settings page on GitHub and checkWorkflow jobs:Please read the section
Example 3: Scale on eachworkflow_jobeventin the updated version of our README for more information on its usage.