-
-
Notifications
You must be signed in to change notification settings - Fork 52
feat: KV Cache Manager - POSIX shared memory pools for LLM inference (#221) #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,130 @@ | ||||||
| # KV-Cache Manager | ||||||
|
|
||||||
| **Bounty:** cortexlinux/cortex#221 | ||||||
| **Author:** Yair Siegel | ||||||
| **Value:** $175 | ||||||
|
|
||||||
| ## Overview | ||||||
|
|
||||||
| User-space KV-cache management for LLM inference. Manages transformer key-value caches as first-class system resources with POSIX shared memory pools and multiple eviction policies. | ||||||
|
|
||||||
| ## Features | ||||||
|
|
||||||
| - **Bitmap Block Allocator**: Thread-safe first-fit allocation | ||||||
| - **4 Eviction Policies**: LRU, LFU, FIFO, Priority | ||||||
| - **Prefix-Based Sharing**: Share cache across requests with same prompt prefix | ||||||
| - **Persistence**: Save/restore cache to disk | ||||||
| - **Multi-Pool Management**: Create and manage multiple cache pools | ||||||
| - **Memory Tiers**: CPU, GPU, NVMe support | ||||||
|
|
||||||
| ## Usage | ||||||
|
|
||||||
| ```bash | ||||||
| # Create a cache pool | ||||||
| cortex cache create llama-cache --size 16G --tier cpu --policy lru | ||||||
|
|
||||||
| # Check status | ||||||
| cortex cache status llama-cache | ||||||
|
|
||||||
| # Evict 25% of entries | ||||||
| cortex cache evict llama-cache --percent 25 | ||||||
|
|
||||||
| # Persist to disk | ||||||
| cortex cache persist llama-cache --path /tmp/llama-cache.dat | ||||||
|
|
||||||
| # Restore from disk | ||||||
| cortex cache restore /tmp/llama-cache.dat | ||||||
|
|
||||||
| # List all pools | ||||||
| cortex cache status | ||||||
|
|
||||||
| # Delete pool | ||||||
| cortex cache delete llama-cache | ||||||
| ``` | ||||||
|
|
||||||
| ## Memory Layout | ||||||
|
|
||||||
| ``` | ||||||
| ┌──────────────────┐ | ||||||
| │ Header (4KB) │ Magic, version, config | ||||||
| ├──────────────────┤ | ||||||
| │ Bitmap (4KB) │ Free list (1 bit per block) | ||||||
| ├──────────────────┤ | ||||||
| │ Data Region │ KV tensors (4KB blocks) | ||||||
| └──────────────────┘ | ||||||
| ``` | ||||||
|
|
||||||
| ## Eviction Policies | ||||||
|
|
||||||
| | Policy | Description | Use Case | | ||||||
| |--------|-------------|----------| | ||||||
| | LRU | Least Recently Used | General purpose, access pattern varies | | ||||||
| | LFU | Least Frequently Used | Hot/cold access patterns | | ||||||
| | FIFO | First In First Out | Streaming, time-based expiry | | ||||||
| | Priority | User-defined priority | Critical prompts, VIP users | | ||||||
|
|
||||||
| ## Architecture | ||||||
|
|
||||||
| ``` | ||||||
| ┌─────────────────┐ ┌──────────────────┐ ┌────────────────┐ | ||||||
| │ CLI Interface │────▶│ Cache Store │────▶│ KVCachePool │ | ||||||
| └─────────────────┘ └──────────────────┘ └────────────────┘ | ||||||
| │ | ||||||
| ▼ | ||||||
| ┌──────────────────────────────┐ | ||||||
| │ ┌──────────┐ ┌─────────────┐ │ | ||||||
| │ │ Bitmap │ │ Eviction │ │ | ||||||
| │ │ Allocator│ │ Manager │ │ | ||||||
| │ └──────────┘ └─────────────┘ │ | ||||||
| │ ┌──────────────────────────┐ │ | ||||||
| │ │ Data Region (mmap) │ │ | ||||||
| │ └──────────────────────────┘ │ | ||||||
| └──────────────────────────────┘ | ||||||
| ``` | ||||||
|
|
||||||
| ## Tests | ||||||
|
|
||||||
| 49 unit tests covering: | ||||||
| - Size parsing and formatting utilities | ||||||
| - Cache entry dataclass | ||||||
| - Pool configuration | ||||||
| - Bitmap allocator (allocate, free, serialize) | ||||||
| - Eviction policies (LRU, LFU, FIFO, Priority) | ||||||
| - Pool operations (put, get, delete, evict) | ||||||
| - Prefix-based sharing | ||||||
| - Persistence and restore | ||||||
| - Cache store management | ||||||
| - End-to-end LLM workflows | ||||||
|
|
||||||
| ```bash | ||||||
| python -m pytest test_kv_cache_manager.py -v | ||||||
| ``` | ||||||
|
|
||||||
| ## Example: LLM Inference Cache | ||||||
|
|
||||||
| ```python | ||||||
| from kv_cache_manager import CachePoolConfig, KVCachePool | ||||||
|
||||||
| from kv_cache_manager import CachePoolConfig, KVCachePool | |
| from cortex.kernel_features.kv_cache.kv_cache_manager import CachePoolConfig, KVCachePool |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """\nKV Cache Manager - POSIX shared memory pools for LLM inference\n\nThis module provides user-space cache management for transformer key-value caches\nas first-class system resources with multiple eviction policies.\n\nBounty: cortexlinux/cortex#221\nAuthor: Yair Siegel\n"""\n\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """\nKV Cache Manager - POSIX shared memory pools for LLM inference\n\nThis module provides user-space cache management for transformer key-value caches\nas first-class system resources with multiple eviction policies.\n\nBounty: cortexlinux/cortex#221\nAuthor: Yair Siegel\n"""\n\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n | |
| """ | |
| KV Cache Manager - POSIX shared memory pools for LLM inference | |
| This module provides user-space cache management for transformer key-value caches | |
| as first-class system resources with multiple eviction policies. | |
| Bounty: cortexlinux/cortex#221 | |
| Author: Yair Siegel | |
| """\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax Error (in Python 3).
| """\nKV Cache Manager - POSIX shared memory pools for LLM inference\n\nThis module provides user-space cache management for transformer key-value caches\nas first-class system resources with multiple eviction policies.\n\nBounty: cortexlinux/cortex#221\nAuthor: Yair Siegel\n"""\n\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n | |
| """ | |
| KV Cache Manager - POSIX shared memory pools for LLM inference | |
| This module provides user-space cache management for transformer key-value caches | |
| as first-class system resources with multiple eviction policies. | |
| Bounty: cortexlinux/cortex#221 | |
| Author: Yair Siegel | |
| """ | |
| from .kv_cache_manager import ( | |
| KVCachePool, | |
| CacheStore, | |
| CachePoolConfig, | |
| CacheEntry, | |
| EvictionPolicy, | |
| KVCacheCLI, | |
| parse_size, | |
| format_size, | |
| ) | |
| __all__ = [ | |
| 'KVCachePool', | |
| 'CacheStore', | |
| 'CachePoolConfig', | |
| 'CacheEntry', | |
| 'EvictionPolicy', | |
| 'KVCacheCLI', | |
| 'parse_size', | |
| 'format_size', | |
| ] | |
| __version__ = '1.0.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test command documentation is inconsistent with the actual test framework. The README states to run tests with
python -m pytest test_kv_cache_manager.py -v, but the tests useunittest. While pytest can run unittest tests, the documentation should be consistent. Either document the unittest command (python -m unittest test_kv_cache_manager.py -v) or ensure pytest is the intended test runner.