-
Notifications
You must be signed in to change notification settings - Fork 4
fix(engine): add cache size limits to prevent unbounded memory growth #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,20 +4,36 @@ | |
| //! with caching support for performance in monorepo environments. | ||
|
|
||
| use std::collections::HashMap; | ||
| use std::hash::Hash; | ||
| use std::path::{Path, PathBuf}; | ||
| use std::sync::{LazyLock, RwLock}; | ||
|
|
||
| use tracing::{debug, trace}; | ||
|
|
||
| /// Maximum number of entries in each cache to prevent unbounded memory growth. | ||
| const MAX_CACHE_SIZE: usize = 1000; | ||
|
|
||
| /// Cache for discovered config paths. | ||
| /// Key is the start directory, value is the found config path (or None). | ||
| static CONFIG_CACHE: LazyLock<RwLock<HashMap<PathBuf, Option<PathBuf>>>> = | ||
| LazyLock::new(|| RwLock::new(HashMap::new())); | ||
| LazyLock::new(|| RwLock::new(HashMap::with_capacity(MAX_CACHE_SIZE))); | ||
|
|
||
| /// Cache for project roots. | ||
| /// Key is the start directory, value is the project root path. | ||
| static PROJECT_ROOT_CACHE: LazyLock<RwLock<HashMap<PathBuf, Option<PathBuf>>>> = | ||
| LazyLock::new(|| RwLock::new(HashMap::new())); | ||
| LazyLock::new(|| RwLock::new(HashMap::with_capacity(MAX_CACHE_SIZE))); | ||
|
|
||
| /// Insert a key-value pair into the cache with eviction when full. | ||
| /// When the cache reaches MAX_CACHE_SIZE, removes an arbitrary entry before inserting. | ||
| fn insert_with_eviction<K: Eq + Hash + Clone, V>(cache: &mut HashMap<K, V>, key: K, value: V) { | ||
| if cache.len() >= MAX_CACHE_SIZE { | ||
| // Remove first entry (simple eviction strategy) | ||
| if let Some(k) = cache.keys().next().cloned() { | ||
| cache.remove(&k); | ||
| } | ||
| } | ||
| cache.insert(key, value); | ||
| } | ||
|
Comment on lines
+28
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HashMap iteration order is non-deterministic in Rust - Prompt To Fix With AIThis is a comment left during a code review.
Path: src/cortex-engine/src/config/config_discovery.rs
Line: 28:36
Comment:
HashMap iteration order is non-deterministic in Rust - `keys().next()` can return different entries across runs, making eviction unpredictable. Consider using a proper LRU cache (see `src/cortex-file-search/src/cache.rs` for an example with `access_order` tracking) or a simpler FIFO with `VecDeque` to track insertion order.
How can I resolve this? If you propose a fix, please make it concise. |
||
|
|
||
| /// Markers that indicate a project root directory. | ||
| const PROJECT_ROOT_MARKERS: &[&str] = &[ | ||
|
|
@@ -57,9 +73,9 @@ pub fn find_up(start_dir: &Path, filename: &str) -> Option<PathBuf> { | |
|
|
||
| let result = find_up_uncached(start_dir, filename); | ||
|
|
||
| // Store in cache | ||
| // Store in cache with eviction when full | ||
| if let Ok(mut cache) = CONFIG_CACHE.write() { | ||
| cache.insert(cache_key, result.clone()); | ||
| insert_with_eviction(&mut cache, cache_key, result.clone()); | ||
| } | ||
|
|
||
| result | ||
|
|
@@ -169,9 +185,9 @@ pub fn find_project_root(start_dir: &Path) -> Option<PathBuf> { | |
|
|
||
| let result = find_project_root_uncached(start_dir); | ||
|
|
||
| // Store in cache | ||
| // Store in cache with eviction when full | ||
| if let Ok(mut cache) = PROJECT_ROOT_CACHE.write() { | ||
| cache.insert(start_dir.to_path_buf(), result.clone()); | ||
| insert_with_eviction(&mut cache, start_dir.to_path_buf(), result.clone()); | ||
| } | ||
|
|
||
| result | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,9 +3,25 @@ | |
| //! Provides token counting and text tokenization for various models. | ||
|
|
||
| use std::collections::HashMap; | ||
| use std::hash::Hash; | ||
|
|
||
| use serde::{Deserialize, Serialize}; | ||
|
|
||
| /// Maximum number of entries in the token cache to prevent unbounded memory growth. | ||
| const MAX_CACHE_SIZE: usize = 1000; | ||
|
|
||
| /// Insert a key-value pair into the cache with eviction when full. | ||
| /// When the cache reaches MAX_CACHE_SIZE, removes an arbitrary entry before inserting. | ||
| fn insert_with_eviction<K: Eq + Hash + Clone, V>(cache: &mut HashMap<K, V>, key: K, value: V) { | ||
| if cache.len() >= MAX_CACHE_SIZE { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Edge case: if the cache is exactly at Prompt To Fix With AIThis is a comment left during a code review.
Path: src/cortex-engine/src/tokenizer.rs
Line: 16:16
Comment:
Edge case: if the cache is exactly at `MAX_CACHE_SIZE` and we insert a key that already exists (updating its value), we'll unnecessarily evict an entry first. Consider checking if the key exists before eviction:
```
if cache.len() >= MAX_CACHE_SIZE && !cache.contains_key(&key) {
```
How can I resolve this? If you propose a fix, please make it concise. |
||
| // Remove first entry (simple eviction strategy) | ||
| if let Some(k) = cache.keys().next().cloned() { | ||
| cache.remove(&k); | ||
| } | ||
| } | ||
| cache.insert(key, value); | ||
| } | ||
|
Comment on lines
+15
to
+23
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HashMap iteration order is non-deterministic in Rust - Prompt To Fix With AIThis is a comment left during a code review.
Path: src/cortex-engine/src/tokenizer.rs
Line: 15:23
Comment:
HashMap iteration order is non-deterministic in Rust - `keys().next()` returns an arbitrary entry, not necessarily the oldest. For token caching, an LRU policy would be more effective since frequently-used text patterns should remain cached. Consider using a `VecDeque` or the existing LRU pattern from `src/cortex-file-search/src/cache.rs`.
How can I resolve this? If you propose a fix, please make it concise. |
||
|
|
||
| /// Tokenizer type. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
| #[serde(rename_all = "snake_case")] | ||
|
|
@@ -58,7 +74,7 @@ impl TokenizerType { | |
| pub struct TokenCounter { | ||
| /// Tokenizer type. | ||
| tokenizer: TokenizerType, | ||
| /// Cache. | ||
| /// Cache with bounded size to prevent unbounded memory growth. | ||
| cache: HashMap<u64, u32>, | ||
| } | ||
|
|
||
|
|
@@ -67,7 +83,7 @@ impl TokenCounter { | |
| pub fn new(tokenizer: TokenizerType) -> Self { | ||
| Self { | ||
| tokenizer, | ||
| cache: HashMap::new(), | ||
| cache: HashMap::with_capacity(MAX_CACHE_SIZE), | ||
| } | ||
| } | ||
|
|
||
|
|
@@ -85,7 +101,7 @@ impl TokenCounter { | |
| } | ||
|
|
||
| let count = self.count_uncached(text); | ||
| self.cache.insert(hash, count); | ||
| insert_with_eviction(&mut self.cache, hash, count); | ||
| count | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: if the cache is exactly at
MAX_CACHE_SIZEand we insert a key that already exists (updating its value), we'll unnecessarily evict an entry first. Consider checking if the key exists before eviction:Prompt To Fix With AI