fix(config): add validation for threshold, ratio, and token count fields#55
fix(config): add validation for threshold, ratio, and token count fields#55
Conversation
Fixes #5292 and #5293 - ToolResponseStore memory issues. Problem: 1. Store grows without limit 2. Consumed responses not removed Solution: - Added ToolResponseStore with configurable max size (default: 500 entries) - Entries are removed when consumed via take() method (#5293) - Automatic periodic cleanup of expired entries based on TTL - Eviction of oldest entries when capacity is reached (#5292) Features: - MAX_STORE_SIZE constant (500) to prevent unbounded growth - DEFAULT_TTL (5 minutes) for automatic expiration - CLEANUP_INTERVAL (1 minute) for periodic cleanup - get() for peeking without removal - take() for consuming and removing entries - cleanup_expired() and cleanup_read() for manual cleanup - Stats tracking for monitoring store behavior
Greptile OverviewGreptile SummaryPR Description Mismatch: The PR description mentions adding validation for Actual Changes: This PR adds a new bounded storage system for tool responses that addresses unbounded memory growth (#5292) and missing cleanup on read (#5293):
Issues Found:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| src/cortex-engine/src/tools/mod.rs | Added module declaration and public exports for new response_store module |
| src/cortex-engine/src/tools/response_store.rs | New bounded tool response storage with TTL cleanup and capacity management; minor issues: unused config field and import |
Sequence Diagram
sequenceDiagram
participant Client
participant Store as ToolResponseStore
participant HashMap as responses: HashMap
participant Cleanup as Periodic Cleanup
Note over Client,Cleanup: Storage Flow with Bounded Capacity
Client->>Store: store(call_id, tool_name, result)
Store->>Store: maybe_cleanup() - check if cleanup needed
alt Cleanup interval exceeded
Store->>HashMap: cleanup_expired() - remove stale entries
end
alt Store at capacity (>= MAX_STORE_SIZE)
Store->>HashMap: find_oldest_key()
HashMap-->>Store: oldest_key
Store->>HashMap: remove(oldest_key) - evict oldest
Note over Store: Eviction prevents unbounded growth
end
Store->>HashMap: insert(call_id, response)
Store-->>Client: evicted: bool
Note over Client,HashMap: Consuming Response (take)
Client->>Store: take(call_id)
Store->>HashMap: remove(call_id)
HashMap-->>Store: Some(response)
Store-->>Client: Some(ToolResult)
Note over Store: Entry removed after consumption (#5293)
Note over Client,HashMap: Peeking Response (get)
Client->>Store: get(call_id)
Store->>HashMap: get_mut(call_id)
HashMap-->>Store: Some(response)
Store->>Store: Mark response.read = true
Store-->>Client: Some(ToolResult.clone())
Note over HashMap: Entry remains in store
| use std::time::{Duration, Instant}; | ||
|
|
||
| use tokio::sync::RwLock; | ||
| use tracing::{debug, warn}; |
There was a problem hiding this comment.
warn import is unused
| use tracing::{debug, warn}; | |
| use tracing::debug; |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/tools/response_store.rs
Line: 15:15
Comment:
`warn` import is unused
```suggestion
use tracing::debug;
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| /// Time-to-live for stored responses. | ||
| pub ttl: Duration, | ||
| /// Whether to remove entries on read (peek vs consume). | ||
| pub remove_on_read: bool, |
There was a problem hiding this comment.
remove_on_read config field is never used - neither get() nor take() check this field to determine behavior
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/tools/response_store.rs
Line: 72:72
Comment:
`remove_on_read` config field is never used - neither `get()` nor `take()` check this field to determine behavior
How can I resolve this? If you propose a fix, please make it concise.This PR consolidates the following error handling fixes: - #48: Handle semaphore and init failures gracefully in async_utils - #54: Improve error handling in session storage operations (includes TOCTOU race fixes) - #55: Add validation for threshold, ratio, and token count fields - #56: Replace unwrap with proper error handling for client access - #57: Use unwrap_or_default for SystemTime operations - #61: Handle invalid request-id header values gracefully - #65: Improve error handling for timestamp and JSON operations in streaming Key changes: - Added graceful handling for semaphore and init failures - Bound ToolResponseStore size and cleanup consumed entries - Eliminated TOCTOU races in MCP server and plugin registry - Replaced unwrap() with proper error handling throughout - Added validation for config fields - Improved error propagation in middleware
|
Consolidated into #73 - fix: consolidated error handling improvements |
Summary
Fixes #5132, #5125, #5127 - Configuration field validation.
Problems
Solution
Added validation and appropriate type constraints.
Changes:
threshold_percentto enforce 0.0-1.0 rangesampling_ratioto enforce 0.0-1.0 range, also validates env var parsingmodel_context_windowandmodel_auto_compact_token_limitfromi64tou64to prevent negative valuesTokenUsageInfoandTaskStartedEventstructs to useu64for consistency