Skip to content

fix(router): rmi snapshot image on PR merge so it isn't leaked to disk#1231

Merged
zbigniewsobiecki merged 1 commit intodevfrom
fix/snapshot-rmi-on-pr-merge
Apr 29, 2026
Merged

fix(router): rmi snapshot image on PR merge so it isn't leaked to disk#1231
zbigniewsobiecki merged 1 commit intodevfrom
fix/snapshot-rmi-on-pr-merge

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

pr-merged previously called invalidateSnapshot(projectId, workItemId) — a registry-only operation. The Docker image lingered in storage, and the periodic 5-minute cleanup loop iterates registry entries only — so the now-orphaned image was leaked permanently, freed only by docker image prune -af or host disk pressure.

This is a primary contributor to the 2026-04-29 prod disk-fill incident that put Redis into MISCONF write-block.

Fix

  • invalidateSnapshot(projectId, workItemId) now RETURNS the removed metadata (or undefined). Callers can docker rmi the underlying image; removing the registry entry without the image is the leak.
  • New invalidateAndRemoveSnapshot(projectId, workItemId) in snapshot-cleanup.ts chains invalidateSnapshot + the existing removeSnapshotImage (which gracefully handles 404 / 409 and Sentry-captures unexpected docker errors).
  • pr-merged.ts calls the new helper as fire-and-forget (void … .catch(logger.warn)) — preserves the non-blocking property of the original call site.
  • removeSnapshotImage is now exported so the new helper reuses it without duplicating the rmi block.

Test plan

  • tests/unit/router/snapshot-manager.test.ts — pins invalidateSnapshot returning metadata when an entry exists / undefined when not.
  • tests/unit/router/snapshot-cleanup.test.ts — pins invalidateAndRemoveSnapshot calling docker.getImage().remove() on the registered image, no-op when nothing is registered, swallows 404 / 409.
  • tests/unit/triggers/pr-merged.test.ts — mock target moved from snapshot-manager.invalidateSnapshot to snapshot-cleanup.invalidateAndRemoveSnapshot; existing assertions verify the helper is called with the right args on merged PRs and not on closed-but-not-merged or work-item-less PRs.
  • All 8717 unit tests pass.
  • npm run typecheck clean.
  • npm run lint clean.

🤖 Generated with Claude Code

`pr-merged` previously called `invalidateSnapshot(projectId, workItemId)`
(registry-only). The image lingered in Docker storage, and the periodic
5-minute cleanup loop iterates registry entries only — so the now-orphaned
image was leaked permanently, freed only by `docker image prune -af` or
host disk pressure. This is a primary contributor to the 2026-04-29 disk-
fill incident on prod.

Fix: split into two pieces.

- `invalidateSnapshot(projectId, workItemId)` now RETURNS the removed
  metadata (or undefined) so any caller can `docker rmi` the underlying
  image. Removing the registry entry without the image is the leak; the
  return shape forces callers to think about the image.

- New helper `invalidateAndRemoveSnapshot(projectId, workItemId)` in
  `snapshot-cleanup.ts` chains `invalidateSnapshot` + the existing
  `removeSnapshotImage` (which gracefully handles 404 / 409 + Sentry-
  captures unexpected docker errors). `pr-merged.ts` calls this helper as
  fire-and-forget (`void ... .catch(logger.warn)`) — preserves the
  non-blocking property of the original call site.

`removeSnapshotImage` is now exported (was module-private) so the new
helper can reuse it without duplicating the docker rmi block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki merged commit 1c339b6 into dev Apr 29, 2026
8 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the fix/snapshot-rmi-on-pr-merge branch April 29, 2026 16:54
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 80.76923% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/triggers/github/pr-merged.ts 37.50% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant