Skip to content

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 4, 2026

Summary

This PR consolidates 6 memory and storage fixes into a single, cohesive change.

Included PRs:

Key Changes:

  • Added periodic cleanup of stale file locks to prevent memory leaks
  • Implemented LRU cache limits for config discovery and tokenizer caches
  • Added fsync calls after critical file writes for data durability
  • Created bounded ToolResponseStore with configurable limits and automatic cleanup
  • Fixed time-of-check-time-of-use (TOCTOU) races in MCP server and plugin registry
  • Improved path validation with secure tilde expansion

Files Modified:

  • src/cortex-common/src/file_locking.rs
  • src/cortex-resume/src/session_store.rs
  • src/cortex-engine/src/config/config_discovery.rs
  • src/cortex-engine/src/tokenizer.rs
  • src/cortex-storage/src/sessions/storage.rs
  • src/cortex-tui/src/session/storage.rs
  • src/cortex-engine/src/tools/response_store.rs (new)
  • src/cortex-cli/src/utils/paths.rs
  • src/cortex-engine/src/validation.rs
  • src/cortex-mcp-server/src/server.rs
  • src/cortex-plugins/src/registry.rs

Closes #44, #45, #47, #50, #51, #52

This PR consolidates the following memory and storage fixes:
- #44: Add cleanup for stale file locks to prevent memory leak
- #45: Add cache size limits to prevent unbounded memory growth
- #47: Add fsync after file writes to prevent data loss
- #50: Bound ToolResponseStore size and cleanup consumed entries
- #51: Eliminate TOCTOU races in MCP server and plugin registry
- #52: Improve path validation and tilde expansion

Key changes:
- Added periodic cleanup of stale file locks
- Implemented LRU cache limits for config discovery and tokenizer
- Added fsync calls after critical file writes
- Created bounded ToolResponseStore with automatic cleanup
- Fixed time-of-check-time-of-use races
- Improved path validation security
@greptile-apps
Copy link

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

This PR consolidates 6 critical memory and storage improvements into a cohesive change that significantly enhances system reliability and security.

Key Improvements

Memory Leak Prevention

  • File lock cleanup: FileLockManager now automatically removes stale lock entries when reaching 10,000 entries, preventing unbounded growth from accumulated lock references
  • Cache limits: Config discovery and tokenizer caches now enforce 1,000-entry limits with simple eviction, preventing memory exhaustion in long-running processes

Data Durability

  • fsync integration: All critical file writes now call sync_all() to ensure data reaches disk before continuing, preventing data loss on crash or forceful termination
  • Atomic writes: Session stores use write-to-temp-then-rename pattern with fsync for crash-safe updates
  • Directory sync: Unix systems also sync parent directories after file operations to ensure directory entries are persisted

Unbounded Growth Prevention

  • ToolResponseStore: New bounded storage (500 entries max) with TTL-based cleanup (5 min), automatic eviction, and statistics tracking prevents unbounded accumulation of tool execution results

Security Hardening

  • TOCTOU race fixes:
    • MCP server initialization now uses atomic check-and-set within single write lock
    • Plugin registration uses Entry API for atomic check-and-insert
    • Both changes prevent race conditions where concurrent operations could bypass validation
  • Path validation: Fixed to correctly use path components instead of string matching, preventing false positives on filenames containing ".."
  • Command validation: Added normalization to prevent bypass attacks via extra whitespace, quotes, or path variants (e.g., /bin/rm vs rm)

Code Quality

  • Comprehensive test coverage added for all new validation logic
  • Clear documentation of security considerations and design decisions
  • Consistent error handling patterns across all changes

Confidence Score: 5/5

  • This PR is safe to merge with high confidence - all changes are defensive improvements that enhance reliability and security
  • All changes are well-tested defensive improvements with clear documentation. The consolidated PR successfully addresses multiple memory and storage issues without introducing breaking changes. TOCTOU fixes use standard Rust patterns, fsync additions follow best practices for durability, and cache limits prevent unbounded growth. No logic errors or security vulnerabilities detected.
  • No files require special attention - all changes are clean and well-implemented

Important Files Changed

Filename Overview
src/cortex-common/src/file_locking.rs Added automatic cleanup of stale file locks (MAX_LOCK_ENTRIES=10,000) to prevent memory leaks in FileLockManager
src/cortex-engine/src/tools/response_store.rs New bounded storage for tool responses with configurable limits (MAX_STORE_SIZE=500), TTL, automatic cleanup, and statistics tracking
src/cortex-cli/src/utils/paths.rs Improved path validation using component analysis to correctly distinguish traversal from filenames containing "..", added comprehensive test coverage
src/cortex-engine/src/validation.rs Added command normalization to prevent bypass attacks via whitespace, quotes, and path variants
src/cortex-mcp-server/src/server.rs Fixed TOCTOU race in handle_initialize() by holding write lock during entire check-and-transition operation
src/cortex-plugins/src/registry.rs Fixed TOCTOU race in plugin registration using Entry API for atomic check-and-insert operation

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Cache as Config/Token Cache
    participant LockMgr as FileLockManager
    participant Store as ToolResponseStore
    participant FS as FileSystem
    participant MCP as MCP Server
    participant Plugin as Plugin Registry

    Note over Cache,Store: Memory Management Flow
    
    App->>Cache: Request config/token
    Cache->>Cache: Check if size >= 1000
    alt Cache Full
        Cache->>Cache: Evict first entry
    end
    Cache->>Cache: Insert new entry
    Cache-->>App: Return result

    App->>LockMgr: get_lock(path)
    LockMgr->>LockMgr: Check if locks >= 10,000
    alt Too many locks
        LockMgr->>LockMgr: cleanup_stale_entries()
        Note over LockMgr: Remove entries with<br/>Arc::strong_count == 1
    end
    LockMgr-->>App: Return Arc<Mutex>

    App->>Store: store(call_id, result)
    Store->>Store: Check if size >= 500
    alt Store Full
        Store->>Store: Evict oldest entry
    end
    Store->>Store: Insert new response
    Store-->>App: Return evicted status

    Note over App,FS: Data Durability Flow

    App->>FS: Write session data
    FS->>FS: Write to temp file
    FS->>FS: flush()
    FS->>FS: sync_all() [fsync]
    FS->>FS: Atomic rename
    alt Unix
        FS->>FS: Sync parent directory
    end
    FS-->>App: Write complete

    Note over MCP,Plugin: TOCTOU Race Prevention

    App->>MCP: initialize()
    MCP->>MCP: Acquire write lock
    MCP->>MCP: Check state == Uninitialized
    alt Already initialized
        MCP-->>App: Error: Already initialized
    else Not initialized
        MCP->>MCP: Set state = Initializing
        MCP->>MCP: Release lock
        MCP->>MCP: Complete initialization
        MCP-->>App: Success
    end

    App->>Plugin: register_plugin(id)
    Plugin->>Plugin: Acquire write lock
    Plugin->>Plugin: Check entry exists
    alt Already exists
        Plugin-->>App: Error: Already exists
    else New entry
        Plugin->>Plugin: Insert via Entry API
        Plugin->>Plugin: Release lock
        Plugin-->>App: Success
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@echobt
Copy link
Contributor Author

echobt commented Feb 4, 2026

Closing this PR to consolidate into a single mega-PR combining all bug fixes. The changes will be included in a new consolidated PR.

@echobt echobt closed this Feb 4, 2026
echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates all bug fixes and security improvements from PRs #69-88 into a single cohesive change.

## Categories

### Security Fixes
- Path traversal prevention in MCP and session storage
- Shell injection prevention in restore scripts
- Secure random temp files for external editor
- TOCTOU race condition fixes

### TUI Improvements
- Overflow prevention for u16 conversions
- Cursor positioning fixes in selection lists
- Unicode width handling for popups
- Empty section handling in help browser

### Error Handling
- Graceful semaphore and init failure handling
- Improved error propagation in middleware
- Better client access error handling
- SystemTime operation safety

### Memory and Storage
- Cache size limits to prevent unbounded growth
- File lock cleanup for memory leak prevention
- fsync after critical writes for durability
- Bounded ToolResponseStore with automatic cleanup

### Protocol Robustness
- Buffer size limits for StreamProcessor
- ToolState transition validation
- State machine documentation

### Numeric Safety
- Saturating operations to prevent overflow/underflow
- Safe UTF-8 string slicing throughout codebase

### Tools
- Parameter alias support for backward compatibility
- Handler name consistency fixes

## Files Modified
Multiple files across cortex-tui, cortex-engine, cortex-exec, cortex-common,
cortex-protocol, cortex-storage, cortex-mcp-server, and other crates.

Closes #69, #70, #71, #73, #75, #80, #82, #87, #88
echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates all bug fixes and security improvements from PRs #69-88 into a single cohesive change.

## Categories

### Security Fixes
- Path traversal prevention in MCP and session storage
- Shell injection prevention in restore scripts
- Secure random temp files for external editor
- TOCTOU race condition fixes

### TUI Improvements
- Overflow prevention for u16 conversions
- Cursor positioning fixes in selection lists
- Unicode width handling for popups
- Empty section handling in help browser

### Error Handling
- Graceful semaphore and init failure handling
- Improved error propagation in middleware
- Better client access error handling
- SystemTime operation safety

### Memory and Storage
- Cache size limits to prevent unbounded growth
- File lock cleanup for memory leak prevention
- fsync after critical writes for durability
- Bounded ToolResponseStore with automatic cleanup

### Protocol Robustness
- Buffer size limits for StreamProcessor
- ToolState transition validation
- State machine documentation

### Numeric Safety
- Saturating operations to prevent overflow/underflow
- Safe UTF-8 string slicing throughout codebase

### Tools
- Parameter alias support for backward compatibility
- Handler name consistency fixes

## Files Modified
Multiple files across cortex-tui, cortex-engine, cortex-exec, cortex-common,
cortex-protocol, cortex-storage, cortex-mcp-server, and other crates.

Closes #69, #70, #71, #73, #75, #80, #82, #87, #88
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant