Skip to content

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 4, 2026

Summary

Fixes #5208, #5210, #5244 - Session storage error handling.

Problems

  1. Errors silently ignored in list operations
  2. State modified before confirming save success
  3. Default implementation can panic

Solution

Added proper error propagation and conditional state updates.

Fixes #5292 and #5293 - ToolResponseStore memory issues.

Problem:
1. Store grows without limit
2. Consumed responses not removed

Solution:
- Added ToolResponseStore with configurable max size (default: 500 entries)
- Entries are removed when consumed via take() method (#5293)
- Automatic periodic cleanup of expired entries based on TTL
- Eviction of oldest entries when capacity is reached (#5292)

Features:
- MAX_STORE_SIZE constant (500) to prevent unbounded growth
- DEFAULT_TTL (5 minutes) for automatic expiration
- CLEANUP_INTERVAL (1 minute) for periodic cleanup
- get() for peeking without removal
- take() for consuming and removing entries
- cleanup_expired() and cleanup_read() for manual cleanup
- Stats tracking for monitoring store behavior
- server.rs: Hold write lock during entire state check and modification
  in handle_initialize to prevent concurrent initialization races
- registry.rs: Use HashMap entry API to atomically check-and-insert
  plugin registration to prevent duplicate registration races

Fixes #5262, #5260
@greptile-apps
Copy link

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

This PR fixes critical TOCTOU (Time-of-Check-Time-of-Use) race conditions in the MCP server initialization and plugin registration, while also introducing a bounded tool response storage system.

Key Changes

  • MCP Server (server.rs): Eliminated race condition in handle_initialize by holding write lock during entire state check and modification, preventing concurrent initialization attempts
  • Plugin Registry (registry.rs): Fixed plugin registration race using HashMap's entry API for atomic check-and-insert operations
  • Tool Response Store (response_store.rs): New bounded storage module with:
    • Maximum capacity of 500 entries to prevent unbounded memory growth
    • Automatic cleanup of consumed entries via take() method
    • TTL-based expiration (5 minutes default)
    • Periodic cleanup every minute
    • Comprehensive test coverage (11 tests)

Safety Analysis

All changes follow Rust best practices for concurrent programming:

  • Write locks held during critical sections to prevent races
  • Entry API used for atomic HashMap operations
  • Proper error propagation maintained
  • Well-documented rationale in comments

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - all changes are well-tested concurrency fixes
  • The TOCTOU race fixes use standard Rust patterns (write locks, entry API) correctly. The new ToolResponseStore has comprehensive tests and proper bounds checking. No breaking changes to public APIs.
  • No files require special attention - all implementations are correct and well-documented

Important Files Changed

Filename Overview
src/cortex-engine/src/tools/mod.rs Added public exports for new ToolResponseStore module and its types
src/cortex-engine/src/tools/response_store.rs New bounded tool response store with TTL, capacity limits, and comprehensive test coverage
src/cortex-mcp-server/src/server.rs Fixed TOCTOU race in handle_initialize by holding write lock during check-and-set operation
src/cortex-plugins/src/registry.rs Fixed TOCTOU race in plugin registration using HashMap entry API for atomic check-and-insert

Sequence Diagram

sequenceDiagram
    participant C1 as Client 1
    participant C2 as Client 2
    participant S as McpServer
    participant R as PluginRegistry
    participant Store as ToolResponseStore

    Note over S,R: TOCTOU Race Prevention

    rect rgb(200, 230, 255)
        Note over C1,S: MCP Server Initialization (server.rs)
        C1->>S: initialize()
        activate S
        Note over S: Acquire write lock
        S->>S: Check state == Uninitialized
        S->>S: Set state = Initializing
        Note over S: Release write lock
        S-->>C1: InitializeResult
        deactivate S
        
        C2->>S: initialize() (concurrent)
        activate S
        Note over S: Acquire write lock
        S->>S: Check state != Uninitialized
        S-->>C2: Error: Already initialized
        Note over S: Release write lock
        deactivate S
    end

    rect rgb(200, 255, 200)
        Note over C1,R: Plugin Registration (registry.rs)
        C1->>R: register(plugin_a)
        activate R
        Note over R: Acquire write lock
        R->>R: Entry API: check if exists
        R->>R: Insert if vacant
        Note over R: Release write lock
        R-->>C1: Ok()
        deactivate R
        
        C2->>R: register(plugin_a) (concurrent)
        activate R
        Note over R: Acquire write lock
        R->>R: Entry API: check if exists
        R-->>C2: Error: AlreadyExists
        Note over R: Release write lock
        deactivate R
    end

    rect rgb(255, 240, 200)
        Note over C1,Store: Tool Response Storage (response_store.rs)
        C1->>Store: store(call_id, tool_name, result)
        activate Store
        Store->>Store: Check capacity
        alt At capacity
            Store->>Store: Evict oldest entry
        end
        Store->>Store: Insert new response
        Store-->>C1: evicted flag
        deactivate Store
        
        C2->>Store: take(call_id)
        activate Store
        Store->>Store: Remove and return entry
        Store-->>C2: Some(result)
        deactivate Store
        
        Note over Store: Entry cleaned up after consumption
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates the following error handling fixes:
- #48: Handle semaphore and init failures gracefully in async_utils
- #54: Improve error handling in session storage operations (includes TOCTOU race fixes)
- #55: Add validation for threshold, ratio, and token count fields
- #56: Replace unwrap with proper error handling for client access
- #57: Use unwrap_or_default for SystemTime operations
- #61: Handle invalid request-id header values gracefully
- #65: Improve error handling for timestamp and JSON operations in streaming

Key changes:
- Added graceful handling for semaphore and init failures
- Bound ToolResponseStore size and cleanup consumed entries
- Eliminated TOCTOU races in MCP server and plugin registry
- Replaced unwrap() with proper error handling throughout
- Added validation for config fields
- Improved error propagation in middleware
@echobt
Copy link
Contributor Author

echobt commented Feb 4, 2026

Consolidated into #73 - fix: consolidated error handling improvements

@echobt echobt closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant