ci(regression): build test Docker image once, share across shards#427
Merged
jrusso1020 merged 2 commits intomainfrom Apr 22, 2026
Merged
ci(regression): build test Docker image once, share across shards#427jrusso1020 merged 2 commits intomainfrom
jrusso1020 merged 2 commits intomainfrom
Conversation
Splits regression.yml into a `build-image` job + the existing `regression-shards` matrix. The build job produces a Docker tarball via `docker/build-push-action` with `outputs: type=docker,dest=...`, uploads it as a GHA artifact (retention 1 day, gzip level 1), and each shard downloads + `docker load`s it instead of rebuilding. Measured on PR #419 regression runs before the change: - Docker build step: ~234s per shard WITH GHA layer cache hit - 11 shards × ~234s = ~43 min of runner time per PR just on redundant image builds Cold-cache cases are much worse — happening right now on PR #419 after release commit b6f50ce bumped every `packages/*/package.json`, invalidating the COPY layer that feeds `bun install --frozen-lockfile`. All 10 shards are currently 25-30+ min into a parallel rebuild, thundering-herding the same npm packages from 10 runners. After this change: - 1× build (~4 min warm, ~15 min cold) + 11× (download + `docker load`) - Expected ~15-20s overhead per shard for artifact download + load - Net savings: ~30-40 min of runner time per PR run on warm cache, substantially more on cold cache The build job doesn't checkout LFS — Dockerfile.test only COPYs source + package manifests, never the golden baselines, so the image build never needed LFS. Shards still need LFS for the tests/**/output/output.mp4 baselines they validate against.
miguel-heygen
approved these changes
Apr 22, 2026
Addresses CodeQL warning 'Workflow does not contain permissions'. Defaults the workflow GITHUB_TOKEN to `contents: read` only. The build-image job elevates to `actions: write` because `docker/build-push-action` with `cache-from/to: type=gha` uses the GitHub Actions cache API, which needs read+write on the actions scope.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Splits
regression.ymlinto two jobs:build-image(new) — buildsDockerfile.testonce, exports the image to a tarball viadocker/build-push-action@v6withoutputs: type=docker,dest=..., and uploads it as a GHA artifact.regression-shards(existing matrix of 11) — downloads the artifact and runsdocker load -i <tar>instead of rebuilding per shard.GHA layer cache (
type=gha,scope=regression-test-image) is preserved on the build job for warm-cache reuse across PRs.Why
Measured on PR #419's earlier regression runs:
Cold-cache cases are much worse — happening right now on PR #419 as of this writing: release commit
b6f50cebumped everypackages/*/package.json, which invalidates the Docker layer that feedsbun install --frozen-lockfile. All 10 shards are currently 25-30+ minutes into a parallel rebuild, thundering-herding npm from 10 runners simultaneously.fastfinally finished its build at the 26-minute mark; 8 shards are still going.After this PR:
needs.changes.outputs.code == 'true')On cold cache, this is a ~15× runner-time reduction.
How
Build job (new)
Shard job (existing, simplified)
Replaces the per-shard
docker/setup-buildx-action+docker/build-push-actionwith:Tradeoffs
Pro: Massive savings on cold cache, meaningful savings on warm cache, shards start faster (no buildx setup + layer cache restore).
Con: Adds a sequential step — all shards now wait on
build-image. Wall-clock for the fastest shard goes from "start at t=0, build is the bottleneck at ~4 min" to "start at t=build-time (~4 min), immediately load and run tests". Net wall-clock is typically faster because shards aren't fighting for buildx capacity, but the "time to first test output" moves right by ~4 min on warm-cache runs. On cold cache this cost is recovered many times over.Con: Artifact retention consumes storage (1 day, ~500 MB per run × N PRs).
retention-days: 1caps it; artifacts older than 1 day are purged automatically.Con: If
build-imagefails, all shards fail. Currently the failure mode is equivalent (every shard would have failed the same Docker build independently), so no new failure surface.Not changed
regressionsummary job — now transitively depends onbuild-imageviaregression-shards, no explicit wiring needed.type=gha) preserved on the build job, so warm-cache rebuilds stay fast.regression) unchanged.Test plan
oxfmt --check; validated structure by comparing todocker/build-push-action@v6docs foroutputs: type=docker,dest=...andactions/download-artifact@v4docsValidation after merge: open any PR that touches
packages/producer/**and confirm the newbuild-imagejob appears, shards download the artifact, and regression completes. First run after merge will also rebuild the GHA layer cache under the new job, so that run won't show the full savings — the second run forward will.