-
Notifications
You must be signed in to change notification settings - Fork 3
fix(server): eliminate TOCTOU races in MCP server and plugin registry #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #5292 and #5293 - ToolResponseStore memory issues. Problem: 1. Store grows without limit 2. Consumed responses not removed Solution: - Added ToolResponseStore with configurable max size (default: 500 entries) - Entries are removed when consumed via take() method (#5293) - Automatic periodic cleanup of expired entries based on TTL - Eviction of oldest entries when capacity is reached (#5292) Features: - MAX_STORE_SIZE constant (500) to prevent unbounded growth - DEFAULT_TTL (5 minutes) for automatic expiration - CLEANUP_INTERVAL (1 minute) for periodic cleanup - get() for peeking without removal - take() for consuming and removing entries - cleanup_expired() and cleanup_read() for manual cleanup - Stats tracking for monitoring store behavior
Greptile OverviewGreptile SummaryCritical Issue: PR description completely mismatches the actual changes. The description mentions fixing TOCTOU race conditions in What Actually ChangedThis PR adds a new bounded storage system (
Issues Found
TestingThe implementation includes comprehensive unit tests covering all major functionality including eviction, expiration, and stats tracking. Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| src/cortex-engine/src/tools/mod.rs | Added module declaration and public exports for new response_store module |
| src/cortex-engine/src/tools/response_store.rs | New bounded storage implementation for tool responses with TTL and eviction, but has unused config field and PR description mismatch |
Sequence Diagram
sequenceDiagram
participant Client
participant Store as ToolResponseStore
participant Responses as HashMap<String, StoredResponse>
participant Stats as StoreStats
Note over Client,Stats: Store Operation
Client->>Store: store(call_id, tool_name, result)
Store->>Store: maybe_cleanup()
Store->>Responses: write().await
alt At Capacity (len >= max_size)
Store->>Responses: find_oldest_key()
Store->>Responses: remove(oldest_key)
Note over Store: evicted = true
end
Store->>Responses: insert(call_id, StoredResponse)
Store->>Stats: write().await
Store->>Stats: total_stored += 1, evictions += 1?
Store-->>Client: evicted: bool
Note over Client,Stats: Get Operation (Peek)
Client->>Store: get(call_id)
Store->>Responses: write().await
alt Entry Exists
Store->>Responses: get_mut(call_id).read = true
Store->>Stats: write().await
Store->>Stats: reads += 1
Store-->>Client: Some(result.clone())
else Entry Not Found
Store-->>Client: None
end
Note over Client,Stats: Take Operation (Consume)
Client->>Store: take(call_id)
Store->>Responses: write().await
alt Entry Exists
Store->>Responses: remove(call_id)
Store->>Stats: write().await
Store->>Stats: takes += 1
Store-->>Client: Some(result)
else Entry Not Found
Store-->>Client: None
end
Note over Client,Stats: Periodic Cleanup
Store->>Store: maybe_cleanup()
Store->>Store: Check if CLEANUP_INTERVAL elapsed
alt Should Cleanup
Store->>Store: cleanup_expired()
Store->>Responses: write().await
Store->>Responses: retain(!is_expired(ttl))
Store->>Stats: expired_cleanups += removed
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
| /// Time-to-live for stored responses. | ||
| pub ttl: Duration, | ||
| /// Whether to remove entries on read (peek vs consume). | ||
| pub remove_on_read: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove_on_read config field is never used - the get() method at line 186 always just marks entries as read without checking this setting
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/tools/response_store.rs
Line: 72:72
Comment:
`remove_on_read` config field is never used - the `get()` method at line 186 always just marks entries as read without checking this setting
How can I resolve this? If you propose a fix, please make it concise.This PR consolidates the following memory and storage fixes: - #44: Add cleanup for stale file locks to prevent memory leak - #45: Add cache size limits to prevent unbounded memory growth - #47: Add fsync after file writes to prevent data loss - #50: Bound ToolResponseStore size and cleanup consumed entries - #51: Eliminate TOCTOU races in MCP server and plugin registry - #52: Improve path validation and tilde expansion Key changes: - Added periodic cleanup of stale file locks - Implemented LRU cache limits for config discovery and tokenizer - Added fsync calls after critical file writes - Created bounded ToolResponseStore with automatic cleanup - Fixed time-of-check-time-of-use races - Improved path validation security
|
Consolidated into #80 - fix: consolidated memory and storage improvements |
Summary
Fixes #5262 and #5260 - TOCTOU race conditions.
Problem
State checks and modifications not atomic, allowing races.
Solution
Hold locks during entire check-modify sequences to ensure atomicity.
Changes
handle_initializeto prevent concurrent initialization races