Skip to content

feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel#459

Merged
chenghuaWang merged 8 commits intoUbiquitousLearning:v2from
chenghuaWang:v2
Oct 6, 2025
Merged

Conversation

@chenghuaWang
Copy link
Copy Markdown
Collaborator

feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel

  • Add ZenFSBlobMMAPFile class for cross-platform memory-mapped file operations
  • Implement ZenFileSystem with blob allocation, access, and persistence logic
  • Support anonymous and file-backed mmap modes on both POSIX and Windows systems
  • Introduce JSON-based index management for filesystem metadata and recovery
  • Add prefetching and purging hints for performance optimization
  • Enable disk-based tensor storage with automatic memory mapping
  • Include HaHaHash utility for consistent hashing of filesystem components

Add documentation files for both paged_attn and paged_attn_x kernel
implementations. The paged_attn README describes its usage with lazy_llm
pruning methods, while paged_attn_x README explains its cooperation with
nn/lmcache/aux_page components.

chenghuaWang and others added 3 commits September 28, 2025 15:45
…ery support

- Add ZenFSBlobMMAPFile class for cross-platform memory-mapped file operations
- Implement ZenFileSystem with blob allocation, access, and persistence logic
- Support anonymous and file-backed mmap modes on both POSIX and Windows systems
- Introduce JSON-based index management for filesystem metadata and recovery
- Add prefetching and purging hints for performance optimization
- Enable disk-based tensor storage with automatic memory mapping
- Include HaHaHash utility for consistent hashing of filesystem components
Add documentation files for both paged_attn and paged_attn_x kernel
implementations. The paged_attn README describes its usage with lazy_llm
pruning methods, while paged_attn_x README explains its cooperation with
nn/lmcache/aux_page components.
chenghuaWang and others added 5 commits October 1, 2025 16:34
- Introduce architecture tags (x86, ARM, any) and vector dot product
  template for paged attention kernels
- Implement forward pass template for BSHD format with FA2-style loop
- Update setup scripts to initialize submodules recursively
- Include CPU architecture detection and conditional compilation headers
- Rename `aux_page` to `prefix_cache` and update related paths
- Introduce `RadixTree`-based cache management for efficient prefix caching
- Refactor `PagedAttnOp` to support prefix cache context
- Add ARM-specific optimizations for vectorized operations in paged attention
- Update CODEOWNERS and documentation to reflect new module structure
- Implement token-level allocator and TLB utilities for virtual page addressing
- Add necessary data structures and hashing utilities for cache indexing
…h TLB support

Introduce `_HiCPUAllocator` and `_GPUAllocator` implementations for managing
virtual addresses across memory and disk layers. Add `PrefixCacheAllocator`
to route allocation requests based on device type. Include TLB integration
for fast physical address lookup. Update ZenFS initialization and file-backed
mapping logic to support custom file paths and fix bit calculation errors.
Add new test executable for prefix cache functionality. Refactor rotary embedding
function signature for better formatting. Add Session header for future service
integration. Enable configuration options for prefix cache allocator and radix tree.
- Add `Service`, `RequestPool`, `ResponsePool`, and `SessionPool` to manage
  asynchronous requests and responses
- Introduce `Session` base class and `NoneSession` fallback implementation
- Create `sendRequest` and `getResponse` APIs for external bindings
- Add Qwen3 model support with RoPE and attention modules for service usage
- Include unit test for the new service functionality

The service layer now supports multi-threaded request handling and session
management, preparing the engine for integration with high-level servers
via language bindings.
This commit adds a README file in the pymllm directory to formally request
a PyPI organization for the mllm project. The file includes correspondence
with PyPI staff and verification details regarding project affiliation.
@chenghuaWang chenghuaWang merged commit 74b02d1 into UbiquitousLearning:v2 Oct 6, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant