Summary
OpenCode storage could be corrupted on crashes or disk-full mid-write. This proposes atomic writes, a safe repair routine, and a restore path to recover.
Problem
- Direct writes risk partial/truncated JSON.
- Leftover temps accumulate; repair lacked coarse-grained coordination and reporting.
- Operators need dry-run and limits before touching data at scale.
Proposal
- Atomic write: temp + fsync(file) + rename + fsync(dir).
- Repair: dry-run, prefix/limits, safe temp cleanup (.oc-*.tmp), skip-when-locked, JSON report.
- Restore: bring files back from quarantine preserving structure.
- Tests for dry-run/restore and temp cleanup.
Non-goals
- Schema-level validation or semantic corruption detection.
- Cross-FS transactional guarantees (e.g., NFS/Windows beyond documented best effort).
- Retention policy for quarantine (can be follow-up).
Risks/Trade-offs
- rename + fsync(dir) adds slight I/O overhead.
- Repair writes a report file (operationally useful).
- Try-lock may skip files busy during repair; reported as skipped.
- Portability: Bun APIs with Node fsync fallback; best-effort on non-POSIX FS.
Verification
- Unit tests added.
- Manual: run repair in a sandbox with XDG paths; confirm JSON report and quarantined files.
Open questions
- Retention policy for quarantine?
- Global maintenance lock for repair windows?
Summary
OpenCode storage could be corrupted on crashes or disk-full mid-write. This proposes atomic writes, a safe repair routine, and a restore path to recover.
Problem
Proposal
Non-goals
Risks/Trade-offs
Verification
Open questions