Skip to content

Storage resilience: atomic writes, safer temp cleanup, repair/restore tools #7733

@KakashiTech

Description

@KakashiTech

Summary
OpenCode storage could be corrupted on crashes or disk-full mid-write. This proposes atomic writes, a safe repair routine, and a restore path to recover.

Problem

  • Direct writes risk partial/truncated JSON.
  • Leftover temps accumulate; repair lacked coarse-grained coordination and reporting.
  • Operators need dry-run and limits before touching data at scale.

Proposal

  • Atomic write: temp + fsync(file) + rename + fsync(dir).
  • Repair: dry-run, prefix/limits, safe temp cleanup (.oc-*.tmp), skip-when-locked, JSON report.
  • Restore: bring files back from quarantine preserving structure.
  • Tests for dry-run/restore and temp cleanup.

Non-goals

  • Schema-level validation or semantic corruption detection.
  • Cross-FS transactional guarantees (e.g., NFS/Windows beyond documented best effort).
  • Retention policy for quarantine (can be follow-up).

Risks/Trade-offs

  • rename + fsync(dir) adds slight I/O overhead.
  • Repair writes a report file (operationally useful).
  • Try-lock may skip files busy during repair; reported as skipped.
  • Portability: Bun APIs with Node fsync fallback; best-effort on non-POSIX FS.

Verification

  • Unit tests added.
  • Manual: run repair in a sandbox with XDG paths; confirm JSON report and quarantined files.

Open questions

  • Retention policy for quarantine?
  • Global maintenance lock for repair windows?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions