chore: new major release branch #4936

npalm · 2025-12-06T18:37:51Z

No description provided.

github-actions · 2025-12-06T18:38:08Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

…ingle invocation (#4603) Currently we restrict the `scale-up` Lambda to only handle a single event at a time. In very busy environments this can prove to be a bottleneck: there are calls to GitHub and AWS APIs that happen each time, and they can end up taking long enough that we can't process job queued events faster than they arrive. In our environment we are also using a pool, and typically we have responded to the alerts generated by this (SQS queue length growing) by expanding the size of the pool. This helps because we will more frequently find that we don't need to scale up, which allows the lambdas to exit a bit earlier, so we can get through the queue faster. But it makes the environment much less responsive to changes in usage patterns. At its core, this Lambda's task is to construct an EC2 `CreateFleet` call to create instances, after working out how many are needed. This is a job that can be batched. We can take any number of events, calculate the diff between our current state and the number of jobs we have, capping at the maximum, and then issue a single call. The thing to be careful about is how to handle partial failures, if EC2 creates some of the instances we wanted but not all of them. Lambda has a configurable function response type which can be set to `ReportBatchItemFailures`. In this mode, we return a list of failed messages from our handler and those are retried. We can make use of this to give back as many events as we failed to process. Now we're potentially processing multiple events in a single Lambda, one thing we should optimise for is not recreating GitHub API clients. We need one client for the app itself, which we use to find out installation IDs, and then one client for each installation which is relevant to the batch of events we are processing. This is done by creating a new client the first time we see an event for a given installation. We also remove the same `batch_size = 1` constraint from the `job-retry` Lambda. This Lambda is used to retry events that previously failed. However, instead of reporting failures to be retried, here we maintain the pre-existing fault-tolerant behaviour where errors are logged but explicitly do not cause message retries, avoiding infinite loops from persistent GitHub API issues or malformed events. Tests are added for all of this. Tests in a private repo (sorry) look good. This was running ephemeral runners with no pool, SSM high throughput enabled, the job queued check \_dis_abled, batch size of 200, wait time of 10 seconds. The workflow runs are each a matrix with 250 jobs. ![image](https://github.com/user-attachments/assets/0a656e99-8f1e-45e2-924b-0d5c1b6d6afb) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Niek Palm <npalm@users.noreply.github.com> Co-authored-by: Niek Palm <niek.palm@philips.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

npalm requested review from a team as code owners December 6, 2025 18:37

npalm force-pushed the next branch from 7149b77 to 19a94eb Compare December 6, 2025 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: new major release branch #4936

chore: new major release branch #4936

Uh oh!

npalm commented Dec 6, 2025

Uh oh!

github-actions bot commented Dec 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: new major release branch #4936

Are you sure you want to change the base?

chore: new major release branch #4936

Uh oh!

Conversation

npalm commented Dec 6, 2025

Uh oh!

github-actions bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 6, 2025 •

edited

Loading