Skip to content

feat(worker): refresh snapshot workspace instead of recloning on reuse#1047

Merged
aaight merged 2 commits intodevfrom
feature/snapshot-workspace-refresh
Mar 24, 2026
Merged

feat(worker): refresh snapshot workspace instead of recloning on reuse#1047
aaight merged 2 commits intodevfrom
feature/snapshot-workspace-refresh

Conversation

@aaight
Copy link
Copy Markdown
Collaborator

@aaight aaight commented Mar 24, 2026

Summary

  • Introduces CASCADE_SNAPSHOT_REUSE=true env flag injected by the router into worker containers that start from a reused snapshot image, so worker code can take a snapshot-specific path without affecting cold starts
  • Adds findSnapshotWorkspaceDir() to locate the baked-in workspace directory within the container image (by scanning /workspace/cascade-<projectId>-*)
  • Adds refreshSnapshotWorkspace() to update an existing checkout via git fetch origin + git reset --hard origin/<branch> + git checkout <branch> instead of cloning from scratch
  • setupRepository() now detects CASCADE_SNAPSHOT_REUSE=true and takes the refresh path when a snapshot workspace is found; falls back transparently to the clone path when no directory is found
  • Cold-start, manual-run, retry-run, and webhook-driven flows are completely unaffected (no CASCADE_SNAPSHOT_REUSE → no code-path change)

Key decisions

  • Explicit env flag over filesystem heuristics: Using CASCADE_SNAPSHOT_REUSE=true is set only when the router selects a snapshot image (workerImage !== routerConfig.workerImage), making the intent unambiguous rather than guessing from filesystem state
  • Graceful fallback: If no matching workspace directory exists in the snapshot image (e.g., image is malformed), setup falls back to a full clone with a warning log rather than crashing
  • Non-zero git exits are warnings, not errors: Snapshot refresh tolerates transient fetch/reset failures (network blip) and continues — consistent with the existing setup.sh failure handling pattern
  • maybeWarmTsCache helper extracted: Keeps setupRepository within biome's cognitive-complexity limit (15)

Test plan

  • All 6942 existing unit tests pass
  • findSnapshotWorkspaceDir — 5 new tests covering match, first-match, no-match, read-error, and custom workspace base
  • setupRepository snapshot-reuse path — 12 new tests covering: snapshot dir used, skip clone, git commands issued, prBranch respected, baseBranch fallback, no setup.sh, TS cache warming, fallback to clone when no dir, flag-absent cold start, no repo cold start, fetch non-zero continues, reset non-zero continues
  • buildWorkerEnvWithProjectId — 4 new tests confirming CASCADE_SNAPSHOT_REUSE=true is injected only when snapshotReuse=true
  • Lint passes (biome)
  • TypeScript type check passes (zero errors)

Card

https://trello.com/c/v88fCsqv/558-as-a-worker-i-want-to-refresh-a-reused-snapshot-workspace-instead-of-recloning-it-so-that-subsequent-runs-skip-redundant-setup

🕵️ claude-code · claude-sonnet-4-6 · run details

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 93.25843% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agents/shared/repository.ts 92.68% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-structured change with solid test coverage and comprehensive graceful fallback handling. One correctness concern with the project-ID prefix matching in findSnapshotWorkspaceDir that could cause a snapshot directory to be matched to the wrong project.

Code Issues

Should Fix

  • src/agents/shared/repository.ts:33 (findSnapshotWorkspaceDir)startsWith(prefix) where prefix = "cascade-${projectId}-" can produce false positives when one project ID is a prefix of another.

    Example: If projectId = "foo", the prefix cascade-foo- will match both cascade-foo-1711234567890 (correct — project foo) and cascade-foo-bar-1711234567890 (wrong — project foo-bar). The first match wins via Array.find(), so the wrong directory could be returned depending on filesystem ordering.

    Since createTempDir in repo.ts always produces cascade-${projectId}-${Date.now()} where Date.now() is purely numeric, tighten the match to verify the suffix is all digits:

    const match = entries.find((e) => {
      if (!e.startsWith(prefix)) return false;
      const suffix = e.slice(prefix.length);
      return /^\d+$/.test(suffix);
    });

    This ensures only directories with a numeric timestamp suffix (the Date.now() part) after the project ID are matched, preventing cross-project collisions.

Everything else looks good:

  • The explicit env flag approach (CASCADE_SNAPSHOT_REUSE) is cleaner than filesystem heuristics
  • Graceful fallback to clone when no snapshot dir is found is well-handled
  • Non-zero git exit tolerance is consistent with the existing setup.sh failure handling
  • The maybeWarmTsCache extraction is a sensible refactor
  • Router-side detection (workerImage !== routerConfig.workerImage) correctly identifies reuse
  • Test coverage is thorough — all paths exercised including error/fallback cases

🕵️ claude-code · claude-opus-4-6 · run details

…timestamp suffix

Prevents false-positive matches when one project ID is a prefix of another
(e.g. project "foo" matching directory "cascade-foo-bar-<timestamp>").
Now verifies the suffix after the project-ID prefix is all digits, matching
only directories created by createTempDir (cascade-<projectId>-<Date.now()>).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

LGTM — well-structured feature with clean separation between router-side detection and worker-side workspace refresh. The explicit env flag approach, graceful fallback to clone, and comprehensive test coverage make this solid.

Verified:

  • Snapshot reuse detection logic in container-manager.ts correctly gates on snapshotEnabled && workerImage !== routerConfig.workerImage
  • findSnapshotWorkspaceDir correctly matches cascade-<projectId>-<digits> to prevent cross-project prefix collisions (e.g., projectId foo won't match cascade-foo-bar-*)
  • refreshSnapshotWorkspace fetch → reset → checkout sequence is correct; non-zero exits are logged as warnings and tolerated, consistent with the existing setup.sh failure handling pattern
  • Cold-start path is completely unaffected — the CASCADE_SNAPSHOT_REUSE guard combined with the project.repo check ensures no code-path change for non-snapshot runs
  • maybeWarmTsCache extraction is a clean refactor that keeps behavior identical on both paths
  • Path construction via template literals is consistent with the existing repo.ts pattern (not using path.join)
  • All 7 CI checks pass
  • Test coverage is thorough: 21 new tests covering happy paths, edge cases, error tolerance, and fallback behavior

🕵️ claude-code · claude-opus-4-6 · run details

@aaight aaight merged commit 33fcc03 into dev Mar 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants