Skip to content

fix(common): add cleanup for stale file locks to prevent memory leak#44

Closed
echobt wants to merge 1 commit intomainfrom
fix/file-lock-leak
Closed

fix(common): add cleanup for stale file locks to prevent memory leak#44
echobt wants to merge 1 commit intomainfrom
fix/file-lock-leak

Conversation

@echobt
Copy link
Contributor

@echobt echobt commented Feb 4, 2026

Summary

Fixes #5145 and #5142 - File lock HashMaps grow without bound.

Problem

Lock entries accumulate forever without cleanup.

Solution

Added stale lock cleanup mechanism to release old entries.

Fixes #5145 and #5142 - File lock HashMaps grow without bound.

Problem: Lock entries accumulate forever without cleanup.

Solution: Added stale lock cleanup mechanism that removes entries where
no external reference exists (strong_count == 1). Cleanup triggers
automatically when the map reaches MAX_LOCK_ENTRIES (10,000).
@greptile-apps
Copy link

greptile-apps bot commented Feb 4, 2026

Greptile Overview

Greptile Summary

This PR addresses memory leaks caused by unbounded growth of file lock HashMaps by implementing automatic cleanup of stale lock entries.

Key Changes:

  • Added MAX_LOCK_ENTRIES constant (10,000) as cleanup threshold in both file_locking.rs and session_store.rs
  • Implemented cleanup_stale_entries() using Arc reference counting to identify and remove unused locks
  • Cleanup automatically triggers when HashMap size reaches the threshold
  • Added public cleanup() and lock_count() methods to FileLockManager for manual maintenance

Implementation Approach:
The cleanup mechanism leverages Arc::strong_count() to detect stale entries - when only the HashMap holds a reference (count == 1), the lock is no longer in use and can be safely removed. This is a sound approach that prevents memory leaks without breaking existing lock holders.

Confidence Score: 4/5

  • This PR is safe to merge with proper testing of the cleanup mechanism
  • The implementation correctly uses Arc reference counting for cleanup and is thread-safe. The threshold-based approach (10,000 entries) is reasonable. However, no tests were added for the new cleanup functionality, and the choice of threshold is not documented with justification
  • Both files should have tests added to verify cleanup behavior under load

Important Files Changed

Filename Overview
src/cortex-common/src/file_locking.rs Added automatic cleanup of stale lock entries using Arc reference counting, triggered at 10,000 entries
src/cortex-resume/src/session_store.rs Added same stale lock cleanup mechanism to session store's file lock manager

Sequence Diagram

sequenceDiagram
    participant Caller
    participant FileLockManager
    participant HashMap
    participant Arc
    
    Caller->>FileLockManager: get_lock(path)
    FileLockManager->>HashMap: lock().unwrap()
    HashMap-->>FileLockManager: MutexGuard<HashMap>
    
    alt locks.len() >= MAX_LOCK_ENTRIES (10,000)
        FileLockManager->>FileLockManager: cleanup_stale_entries(&mut locks)
        loop For each entry in HashMap
            FileLockManager->>Arc: strong_count(arc)
            Arc-->>FileLockManager: count
            alt count == 1
                FileLockManager->>HashMap: remove entry (stale)
            else count > 1
                FileLockManager->>HashMap: keep entry (in use)
            end
        end
    end
    
    FileLockManager->>HashMap: entry(path).or_insert_with(...)
    HashMap-->>FileLockManager: Arc<Mutex<()>>
    FileLockManager->>Arc: clone()
    Arc-->>FileLockManager: Arc<Mutex<()>> (incremented count)
    FileLockManager-->>Caller: Arc<Mutex<()>>
    
    Note over Caller,Arc: Caller uses lock for operations
    
    Caller->>Caller: drop Arc reference
    Note over Arc: strong_count decrements
    
    alt Last external reference dropped
        Note over HashMap: Entry becomes stale (count == 1)
        Note over HashMap: Will be cleaned up when threshold reached
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +604 to +608
fn cleanup_stale_entries(
locks: &mut std::collections::HashMap<PathBuf, Arc<std::sync::Mutex<()>>>,
) {
locks.retain(|_, arc| Arc::strong_count(arc) > 1);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test that creates >10,000 lock entries and verifies cleanup behavior

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cortex-common/src/file_locking.rs
Line: 604:608

Comment:
Consider adding a test that creates >10,000 lock entries and verifies cleanup behavior

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

echobt added a commit that referenced this pull request Feb 4, 2026
This PR consolidates the following memory and storage fixes:
- #44: Add cleanup for stale file locks to prevent memory leak
- #45: Add cache size limits to prevent unbounded memory growth
- #47: Add fsync after file writes to prevent data loss
- #50: Bound ToolResponseStore size and cleanup consumed entries
- #51: Eliminate TOCTOU races in MCP server and plugin registry
- #52: Improve path validation and tilde expansion

Key changes:
- Added periodic cleanup of stale file locks
- Implemented LRU cache limits for config discovery and tokenizer
- Added fsync calls after critical file writes
- Created bounded ToolResponseStore with automatic cleanup
- Fixed time-of-check-time-of-use races
- Improved path validation security
@echobt
Copy link
Contributor Author

echobt commented Feb 4, 2026

Consolidated into #80 - fix: consolidated memory and storage improvements

@echobt echobt closed this Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant