-
Notifications
You must be signed in to change notification settings - Fork 3
fix(storage): improve error handling in session storage operations #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #5292 and #5293 - ToolResponseStore memory issues. Problem: 1. Store grows without limit 2. Consumed responses not removed Solution: - Added ToolResponseStore with configurable max size (default: 500 entries) - Entries are removed when consumed via take() method (#5293) - Automatic periodic cleanup of expired entries based on TTL - Eviction of oldest entries when capacity is reached (#5292) Features: - MAX_STORE_SIZE constant (500) to prevent unbounded growth - DEFAULT_TTL (5 minutes) for automatic expiration - CLEANUP_INTERVAL (1 minute) for periodic cleanup - get() for peeking without removal - take() for consuming and removing entries - cleanup_expired() and cleanup_read() for manual cleanup - Stats tracking for monitoring store behavior
- server.rs: Hold write lock during entire state check and modification in handle_initialize to prevent concurrent initialization races - registry.rs: Use HashMap entry API to atomically check-and-insert plugin registration to prevent duplicate registration races Fixes #5262, #5260
Greptile OverviewGreptile SummaryThis PR fixes critical TOCTOU (Time-of-Check-Time-of-Use) race conditions in the MCP server initialization and plugin registration, while also introducing a bounded tool response storage system. Key Changes
Safety AnalysisAll changes follow Rust best practices for concurrent programming:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| src/cortex-engine/src/tools/mod.rs | Added public exports for new ToolResponseStore module and its types |
| src/cortex-engine/src/tools/response_store.rs | New bounded tool response store with TTL, capacity limits, and comprehensive test coverage |
| src/cortex-mcp-server/src/server.rs | Fixed TOCTOU race in handle_initialize by holding write lock during check-and-set operation |
| src/cortex-plugins/src/registry.rs | Fixed TOCTOU race in plugin registration using HashMap entry API for atomic check-and-insert |
Sequence Diagram
sequenceDiagram
participant C1 as Client 1
participant C2 as Client 2
participant S as McpServer
participant R as PluginRegistry
participant Store as ToolResponseStore
Note over S,R: TOCTOU Race Prevention
rect rgb(200, 230, 255)
Note over C1,S: MCP Server Initialization (server.rs)
C1->>S: initialize()
activate S
Note over S: Acquire write lock
S->>S: Check state == Uninitialized
S->>S: Set state = Initializing
Note over S: Release write lock
S-->>C1: InitializeResult
deactivate S
C2->>S: initialize() (concurrent)
activate S
Note over S: Acquire write lock
S->>S: Check state != Uninitialized
S-->>C2: Error: Already initialized
Note over S: Release write lock
deactivate S
end
rect rgb(200, 255, 200)
Note over C1,R: Plugin Registration (registry.rs)
C1->>R: register(plugin_a)
activate R
Note over R: Acquire write lock
R->>R: Entry API: check if exists
R->>R: Insert if vacant
Note over R: Release write lock
R-->>C1: Ok()
deactivate R
C2->>R: register(plugin_a) (concurrent)
activate R
Note over R: Acquire write lock
R->>R: Entry API: check if exists
R-->>C2: Error: AlreadyExists
Note over R: Release write lock
deactivate R
end
rect rgb(255, 240, 200)
Note over C1,Store: Tool Response Storage (response_store.rs)
C1->>Store: store(call_id, tool_name, result)
activate Store
Store->>Store: Check capacity
alt At capacity
Store->>Store: Evict oldest entry
end
Store->>Store: Insert new response
Store-->>C1: evicted flag
deactivate Store
C2->>Store: take(call_id)
activate Store
Store->>Store: Remove and return entry
Store-->>C2: Some(result)
deactivate Store
Note over Store: Entry cleaned up after consumption
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, no comments
This PR consolidates the following error handling fixes: - #48: Handle semaphore and init failures gracefully in async_utils - #54: Improve error handling in session storage operations (includes TOCTOU race fixes) - #55: Add validation for threshold, ratio, and token count fields - #56: Replace unwrap with proper error handling for client access - #57: Use unwrap_or_default for SystemTime operations - #61: Handle invalid request-id header values gracefully - #65: Improve error handling for timestamp and JSON operations in streaming Key changes: - Added graceful handling for semaphore and init failures - Bound ToolResponseStore size and cleanup consumed entries - Eliminated TOCTOU races in MCP server and plugin registry - Replaced unwrap() with proper error handling throughout - Added validation for config fields - Improved error propagation in middleware
|
Consolidated into #73 - fix: consolidated error handling improvements |
Summary
Fixes #5208, #5210, #5244 - Session storage error handling.
Problems
Solution
Added proper error propagation and conditional state updates.