Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions cortex/kernel_features/kv_cache/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# KV-Cache Manager

**Bounty:** cortexlinux/cortex#221
**Author:** Yair Siegel
**Value:** $175

## Overview

User-space KV-cache management for LLM inference. Manages transformer key-value caches as first-class system resources with POSIX shared memory pools and multiple eviction policies.

## Features

- **Bitmap Block Allocator**: Thread-safe first-fit allocation
- **4 Eviction Policies**: LRU, LFU, FIFO, Priority
- **Prefix-Based Sharing**: Share cache across requests with same prompt prefix
- **Persistence**: Save/restore cache to disk
- **Multi-Pool Management**: Create and manage multiple cache pools
- **Memory Tiers**: CPU, GPU, NVMe support

## Usage

```bash
# Create a cache pool
cortex cache create llama-cache --size 16G --tier cpu --policy lru

# Check status
cortex cache status llama-cache

# Evict 25% of entries
cortex cache evict llama-cache --percent 25

# Persist to disk
cortex cache persist llama-cache --path /tmp/llama-cache.dat

# Restore from disk
cortex cache restore /tmp/llama-cache.dat

# List all pools
cortex cache status

# Delete pool
cortex cache delete llama-cache
```

## Memory Layout

```
┌──────────────────┐
│ Header (4KB) │ Magic, version, config
├──────────────────┤
│ Bitmap (4KB) │ Free list (1 bit per block)
├──────────────────┤
│ Data Region │ KV tensors (4KB blocks)
└──────────────────┘
```

## Eviction Policies

| Policy | Description | Use Case |
|--------|-------------|----------|
| LRU | Least Recently Used | General purpose, access pattern varies |
| LFU | Least Frequently Used | Hot/cold access patterns |
| FIFO | First In First Out | Streaming, time-based expiry |
| Priority | User-defined priority | Critical prompts, VIP users |

## Architecture

```
┌─────────────────┐ ┌──────────────────┐ ┌────────────────┐
│ CLI Interface │────▶│ Cache Store │────▶│ KVCachePool │
└─────────────────┘ └──────────────────┘ └────────────────┘
┌──────────────────────────────┐
│ ┌──────────┐ ┌─────────────┐ │
│ │ Bitmap │ │ Eviction │ │
│ │ Allocator│ │ Manager │ │
│ └──────────┘ └─────────────┘ │
│ ┌──────────────────────────┐ │
│ │ Data Region (mmap) │ │
│ └──────────────────────────┘ │
└──────────────────────────────┘
```

## Tests

49 unit tests covering:
- Size parsing and formatting utilities
- Cache entry dataclass
- Pool configuration
- Bitmap allocator (allocate, free, serialize)
- Eviction policies (LRU, LFU, FIFO, Priority)
- Pool operations (put, get, delete, evict)
- Prefix-based sharing
- Persistence and restore
- Cache store management
- End-to-end LLM workflows

```bash
python -m pytest test_kv_cache_manager.py -v
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test command documentation is inconsistent with the actual test framework. The README states to run tests with python -m pytest test_kv_cache_manager.py -v, but the tests use unittest. While pytest can run unittest tests, the documentation should be consistent. Either document the unittest command (python -m unittest test_kv_cache_manager.py -v) or ensure pytest is the intended test runner.

Suggested change
python -m pytest test_kv_cache_manager.py -v
python -m unittest test_kv_cache_manager.py -v

Copilot uses AI. Check for mistakes.
```

## Example: LLM Inference Cache

```python
from kv_cache_manager import CachePoolConfig, KVCachePool
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code imports from kv_cache_manager directly, but in the context of the package structure (cortex.kernel_features.kv_cache), the import should be from cortex.kernel_features.kv_cache import CachePoolConfig, KVCachePool or from cortex.kernel_features.kv_cache.kv_cache_manager import .... The current import will only work if the file is run from the same directory. Update to show the correct package-relative import.

Suggested change
from kv_cache_manager import CachePoolConfig, KVCachePool
from cortex.kernel_features.kv_cache.kv_cache_manager import CachePoolConfig, KVCachePool

Copilot uses AI. Check for mistakes.

# Create pool for LLM inference
config = CachePoolConfig(
name="llama-cache",
size_bytes=16 * 1024**3, # 16GB
tier="gpu",
eviction_policy="lru",
)
pool = KVCachePool(config)

# Cache KV tensors per layer
for layer in range(32):
key = f"batch0_layer{layer}_kv"
kv_tensor = get_kv_tensor(layer) # numpy/torch tensor
pool.put(key, kv_tensor.tobytes(),
layer_index=layer,
sequence_length=2048)

# Retrieve cached tensors
cached = pool.get("batch0_layer0_kv")

# Share cache for same prompt prefix
pool.find_by_prefix("system_prompt_hash")
```
1 change: 1 addition & 0 deletions cortex/kernel_features/kv_cache/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""\nKV Cache Manager - POSIX shared memory pools for LLM inference\n\nThis module provides user-space cache management for transformer key-value caches\nas first-class system resources with multiple eviction policies.\n\nBounty: cortexlinux/cortex#221\nAuthor: Yair Siegel\n"""\n\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring contains raw newline characters (\n) instead of actual line breaks. This will display as literal \n characters rather than formatting the text on multiple lines. Consider using a proper multi-line docstring with triple quotes and actual line breaks.

Suggested change
"""\nKV Cache Manager - POSIX shared memory pools for LLM inference\n\nThis module provides user-space cache management for transformer key-value caches\nas first-class system resources with multiple eviction policies.\n\nBounty: cortexlinux/cortex#221\nAuthor: Yair Siegel\n"""\n\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n
"""
KV Cache Manager - POSIX shared memory pools for LLM inference
This module provides user-space cache management for transformer key-value caches
as first-class system resources with multiple eviction policies.
Bounty: cortexlinux/cortex#221
Author: Yair Siegel
"""\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax Error (in Python 3).

Suggested change
"""\nKV Cache Manager - POSIX shared memory pools for LLM inference\n\nThis module provides user-space cache management for transformer key-value caches\nas first-class system resources with multiple eviction policies.\n\nBounty: cortexlinux/cortex#221\nAuthor: Yair Siegel\n"""\n\nfrom .kv_cache_manager import (\n KVCachePool,\n CacheStore,\n CachePoolConfig,\n CacheEntry,\n EvictionPolicy,\n KVCacheCLI,\n parse_size,\n format_size,\n)\n\n__all__ = [\n 'KVCachePool',\n 'CacheStore',\n 'CachePoolConfig',\n 'CacheEntry',\n 'EvictionPolicy',\n 'KVCacheCLI',\n 'parse_size',\n 'format_size',\n]\n\n__version__ = '1.0.0'\n
"""
KV Cache Manager - POSIX shared memory pools for LLM inference
This module provides user-space cache management for transformer key-value caches
as first-class system resources with multiple eviction policies.
Bounty: cortexlinux/cortex#221
Author: Yair Siegel
"""
from .kv_cache_manager import (
KVCachePool,
CacheStore,
CachePoolConfig,
CacheEntry,
EvictionPolicy,
KVCacheCLI,
parse_size,
format_size,
)
__all__ = [
'KVCachePool',
'CacheStore',
'CachePoolConfig',
'CacheEntry',
'EvictionPolicy',
'KVCacheCLI',
'parse_size',
'format_size',
]
__version__ = '1.0.0'

Copilot uses AI. Check for mistakes.
Loading