Skip to content

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 4, 2026

Summary

Fixes #5299 and #5204 - File saves lack fsync for durability.

Problem

Files written without fsync can be lost if system crashes before OS flushes buffers.

Solution

Added sync_all() calls after file writes and directory sync for crash safety.

Changes

  • cortex-storage/sessions/storage.rs: Add fsync to save_session and save_session_sync
  • cortex-tui/session/storage.rs: Add fsync to save_meta and rewrite_history
  • Add directory fsync on Unix for proper crash recovery (ensures directory entries are persisted)

Fixes #5299 and #5204.

Files written without fsync can be lost if system crashes before OS flushes
buffers to disk. Added sync_all() calls after file writes and directory sync
for crash safety.

Changes:
- cortex-storage: Add fsync to save_session and save_session_sync
- cortex-tui: Add fsync to save_meta and rewrite_history
- Add directory fsync on Unix for proper crash recovery
@greptile-apps
Copy link

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

Added sync_all() (fsync) calls after file writes to prevent data loss on system crashes or unexpected shutdowns.

Key Changes:

  • Modified save_session, save_session_sync, append_message, and append_message_sync in cortex-storage to call sync_all() after writing
  • Modified save_meta and rewrite_history in cortex-tui to call sync_all() after writing
  • Added directory fsync on Unix systems to ensure directory entries are persisted

Implementation Pattern:
The changes follow the same pattern seen in the existing cortex-utils/file-safety crate's atomic_write function, ensuring consistency across the codebase.

Confidence Score: 4/5

  • Safe to merge with high confidence - adds critical data durability guarantees
  • The implementation follows established patterns from the codebase (cortex-utils/file-safety), adds necessary crash safety, and the changes are well-documented. The only minor concern is the potential performance impact of synchronous fsync calls, but this is the correct trade-off for data durability.
  • No files require special attention

Important Files Changed

Filename Overview
src/cortex-storage/src/sessions/storage.rs Added sync_all() (fsync) to save_session, save_session_sync, append_message, and append_message_sync with directory sync on Unix for crash safety
src/cortex-tui/src/session/storage.rs Added sync_all() (fsync) to save_meta and rewrite_history with directory sync on Unix for crash safety

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Storage as SessionStorage
    participant File as File System
    participant Disk as Physical Disk
    
    Note over App,Disk: save_session / save_meta (with fsync)
    
    App->>Storage: save_session(session)
    Storage->>File: open(path, write, truncate)
    File-->>Storage: file handle
    Storage->>File: write_all(content)
    Storage->>File: flush()
    Note over Storage,File: Flush to OS buffer
    Storage->>Disk: sync_all() (fsync)
    Note over File,Disk: Force write to physical disk
    Disk-->>Storage: ✓ data persisted
    
    alt Unix systems
        Storage->>File: open(parent_dir)
        Storage->>Disk: sync_all() on directory
        Note over Storage,Disk: Persist directory entry
    end
    
    Storage-->>App: Success
    
    Note over App,Disk: Without fsync (old behavior)
    
    App->>Storage: save_session_old(session)
    Storage->>File: write(path, content)
    Note over File: Data in OS buffer cache
    Storage-->>App: Success (premature!)
    
    alt System crash before OS flush
        Note over File,Disk: ❌ Data lost - never reached disk
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates the following memory and storage fixes:
- #44: Add cleanup for stale file locks to prevent memory leak
- #45: Add cache size limits to prevent unbounded memory growth
- #47: Add fsync after file writes to prevent data loss
- #50: Bound ToolResponseStore size and cleanup consumed entries
- #51: Eliminate TOCTOU races in MCP server and plugin registry
- #52: Improve path validation and tilde expansion

Key changes:
- Added periodic cleanup of stale file locks
- Implemented LRU cache limits for config discovery and tokenizer
- Added fsync calls after critical file writes
- Created bounded ToolResponseStore with automatic cleanup
- Fixed time-of-check-time-of-use races
- Improved path validation security
@echobt
Copy link
Contributor Author

echobt commented Feb 4, 2026

Consolidated into #80 - fix: consolidated memory and storage improvements

@echobt echobt closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant