Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions src/cortex-engine/src/config/config_discovery.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,36 @@
//! with caching support for performance in monorepo environments.

use std::collections::HashMap;
use std::hash::Hash;
use std::path::{Path, PathBuf};
use std::sync::{LazyLock, RwLock};

use tracing::{debug, trace};

/// Maximum number of entries in each cache to prevent unbounded memory growth.
const MAX_CACHE_SIZE: usize = 1000;

/// Cache for discovered config paths.
/// Key is the start directory, value is the found config path (or None).
static CONFIG_CACHE: LazyLock<RwLock<HashMap<PathBuf, Option<PathBuf>>>> =
LazyLock::new(|| RwLock::new(HashMap::new()));
LazyLock::new(|| RwLock::new(HashMap::with_capacity(MAX_CACHE_SIZE)));

/// Cache for project roots.
/// Key is the start directory, value is the project root path.
static PROJECT_ROOT_CACHE: LazyLock<RwLock<HashMap<PathBuf, Option<PathBuf>>>> =
LazyLock::new(|| RwLock::new(HashMap::new()));
LazyLock::new(|| RwLock::new(HashMap::with_capacity(MAX_CACHE_SIZE)));

/// Insert a key-value pair into the cache with eviction when full.
/// When the cache reaches MAX_CACHE_SIZE, removes an arbitrary entry before inserting.
fn insert_with_eviction<K: Eq + Hash + Clone, V>(cache: &mut HashMap<K, V>, key: K, value: V) {
if cache.len() >= MAX_CACHE_SIZE {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case: if the cache is exactly at MAX_CACHE_SIZE and we insert a key that already exists (updating its value), we'll unnecessarily evict an entry first. Consider checking if the key exists before eviction:

if cache.len() >= MAX_CACHE_SIZE && !cache.contains_key(&key) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/config/config_discovery.rs
Line: 29:29

Comment:
Edge case: if the cache is exactly at `MAX_CACHE_SIZE` and we insert a key that already exists (updating its value), we'll unnecessarily evict an entry first. Consider checking if the key exists before eviction:

```
if cache.len() >= MAX_CACHE_SIZE && !cache.contains_key(&key) {
```

How can I resolve this? If you propose a fix, please make it concise.

// Remove first entry (simple eviction strategy)
if let Some(k) = cache.keys().next().cloned() {
cache.remove(&k);
}
}
cache.insert(key, value);
}
Comment on lines +28 to +36
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap iteration order is non-deterministic in Rust - keys().next() can return different entries across runs, making eviction unpredictable. Consider using a proper LRU cache (see src/cortex-file-search/src/cache.rs for an example with access_order tracking) or a simpler FIFO with VecDeque to track insertion order.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/config/config_discovery.rs
Line: 28:36

Comment:
HashMap iteration order is non-deterministic in Rust - `keys().next()` can return different entries across runs, making eviction unpredictable. Consider using a proper LRU cache (see `src/cortex-file-search/src/cache.rs` for an example with `access_order` tracking) or a simpler FIFO with `VecDeque` to track insertion order.

How can I resolve this? If you propose a fix, please make it concise.


/// Markers that indicate a project root directory.
const PROJECT_ROOT_MARKERS: &[&str] = &[
Expand Down Expand Up @@ -57,9 +73,9 @@ pub fn find_up(start_dir: &Path, filename: &str) -> Option<PathBuf> {

let result = find_up_uncached(start_dir, filename);

// Store in cache
// Store in cache with eviction when full
if let Ok(mut cache) = CONFIG_CACHE.write() {
cache.insert(cache_key, result.clone());
insert_with_eviction(&mut cache, cache_key, result.clone());
}

result
Expand Down Expand Up @@ -169,9 +185,9 @@ pub fn find_project_root(start_dir: &Path) -> Option<PathBuf> {

let result = find_project_root_uncached(start_dir);

// Store in cache
// Store in cache with eviction when full
if let Ok(mut cache) = PROJECT_ROOT_CACHE.write() {
cache.insert(start_dir.to_path_buf(), result.clone());
insert_with_eviction(&mut cache, start_dir.to_path_buf(), result.clone());
}

result
Expand Down
22 changes: 19 additions & 3 deletions src/cortex-engine/src/tokenizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,25 @@
//! Provides token counting and text tokenization for various models.

use std::collections::HashMap;
use std::hash::Hash;

use serde::{Deserialize, Serialize};

/// Maximum number of entries in the token cache to prevent unbounded memory growth.
const MAX_CACHE_SIZE: usize = 1000;

/// Insert a key-value pair into the cache with eviction when full.
/// When the cache reaches MAX_CACHE_SIZE, removes an arbitrary entry before inserting.
fn insert_with_eviction<K: Eq + Hash + Clone, V>(cache: &mut HashMap<K, V>, key: K, value: V) {
if cache.len() >= MAX_CACHE_SIZE {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case: if the cache is exactly at MAX_CACHE_SIZE and we insert a key that already exists (updating its value), we'll unnecessarily evict an entry first. Consider checking if the key exists before eviction:

if cache.len() >= MAX_CACHE_SIZE && !cache.contains_key(&key) {
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/tokenizer.rs
Line: 16:16

Comment:
Edge case: if the cache is exactly at `MAX_CACHE_SIZE` and we insert a key that already exists (updating its value), we'll unnecessarily evict an entry first. Consider checking if the key exists before eviction:

```
if cache.len() >= MAX_CACHE_SIZE && !cache.contains_key(&key) {
```

How can I resolve this? If you propose a fix, please make it concise.

// Remove first entry (simple eviction strategy)
if let Some(k) = cache.keys().next().cloned() {
cache.remove(&k);
}
}
cache.insert(key, value);
}
Comment on lines +15 to +23
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap iteration order is non-deterministic in Rust - keys().next() returns an arbitrary entry, not necessarily the oldest. For token caching, an LRU policy would be more effective since frequently-used text patterns should remain cached. Consider using a VecDeque or the existing LRU pattern from src/cortex-file-search/src/cache.rs.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-engine/src/tokenizer.rs
Line: 15:23

Comment:
HashMap iteration order is non-deterministic in Rust - `keys().next()` returns an arbitrary entry, not necessarily the oldest. For token caching, an LRU policy would be more effective since frequently-used text patterns should remain cached. Consider using a `VecDeque` or the existing LRU pattern from `src/cortex-file-search/src/cache.rs`.

How can I resolve this? If you propose a fix, please make it concise.


/// Tokenizer type.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
Expand Down Expand Up @@ -58,7 +74,7 @@ impl TokenizerType {
pub struct TokenCounter {
/// Tokenizer type.
tokenizer: TokenizerType,
/// Cache.
/// Cache with bounded size to prevent unbounded memory growth.
cache: HashMap<u64, u32>,
}

Expand All @@ -67,7 +83,7 @@ impl TokenCounter {
pub fn new(tokenizer: TokenizerType) -> Self {
Self {
tokenizer,
cache: HashMap::new(),
cache: HashMap::with_capacity(MAX_CACHE_SIZE),
}
}

Expand All @@ -85,7 +101,7 @@ impl TokenCounter {
}

let count = self.count_uncached(text);
self.cache.insert(hash, count);
insert_with_eviction(&mut self.cache, hash, count);
count
}

Expand Down
Loading