Skip to content

fix(snapshots): preserve workspace in committed snapshot images#1055

Merged
zbigniewsobiecki merged 1 commit intodevfrom
fix/snapshot-workspace-preservation
Mar 25, 2026
Merged

fix(snapshots): preserve workspace in committed snapshot images#1055
zbigniewsobiecki merged 1 commit intodevfrom
fix/snapshot-workspace-preservation

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Root cause: cleanupAgentResources deletes the workspace directory in the worker's finally block before the process exits. The router calls docker commit only after container.wait() resolves — after cleanup — so every committed snapshot image bakes an empty /workspace. On the next trigger the router logs a snapshot hit, but findSnapshotWorkspaceDir returns null and falls back to a full clone + npm install + setup.sh, defeating the purpose of snapshots.

  • Fix: introduce a CASCADE_SNAPSHOT_ENABLED=true worker env var (injected by the router whenever snapshotEnabled=true). The worker skips cleanupTempDir when this flag is present, leaving the workspace intact for docker commit.

  • Container-manager fallback path (fix(snapshots): gracefully recover when snapshot image is missing) also updated to forward snapshotEnabled so the fallback base-image run still bakes a valid workspace when it succeeds.

Changes

File Change
src/router/worker-env.ts Add snapshotEnabled param → push CASCADE_SNAPSHOT_ENABLED=true
src/router/container-manager.ts Pass snapshotEnabled to initial and fallback buildWorkerEnvWithProjectId calls
src/agents/shared/cleanup.ts Skip cleanupTempDir when CASCADE_SNAPSHOT_ENABLED=true
tests/unit/agents/cleanup.test.ts New test file covering all skip-deletion scenarios
tests/unit/router/worker-env.test.ts New describe block for snapshotEnabled flag

Test plan

  • npm run lint — clean
  • npm run typecheck — clean
  • npm test — 7164/7164 unit tests pass
  • CI green
  • After deploy: trigger a card with snapshotEnabled, observe Committed container to snapshot image, re-trigger — logs should show Snapshot reuse detected — skipping clone instead of Snapshot reuse requested but no workspace directory found — falling back to clone

🤖 Generated with Claude Code

The snapshot workspace was being deleted by the worker's cleanup step
before the process exited, so every `docker commit` captured an empty
/workspace. On the next trigger the router logged a snapshot hit but
`findSnapshotWorkspaceDir` returned null and fell back to a full clone +
setup, defeating the purpose of snapshots entirely.

Root cause: `cleanupAgentResources` calls `cleanupTempDir(repoDir)` in a
`finally` block; the router calls `docker commit` only after
`container.wait()` resolves — after the workspace is already gone.

Fix: introduce a `CASCADE_SNAPSHOT_ENABLED=true` worker env var (set by
the router whenever `snapshotEnabled=true`). When the worker sees this
flag it skips workspace deletion so the directory is present in the image
when the router commits the container.

- `worker-env.ts`: add `snapshotEnabled` param → push env var
- `container-manager.ts`: pass `snapshotEnabled` to both the initial and
  404-fallback `buildWorkerEnvWithProjectId` calls
- `cleanup.ts`: skip `cleanupTempDir` when `CASCADE_SNAPSHOT_ENABLED=true`
- tests: new `cleanup.test.ts` + expand `worker-env.test.ts`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@zbigniewsobiecki zbigniewsobiecki merged commit aac7f99 into dev Mar 25, 2026
9 checks passed
@zbigniewsobiecki zbigniewsobiecki deleted the fix/snapshot-workspace-preservation branch March 25, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant