feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel#459
Merged
chenghuaWang merged 8 commits intoUbiquitousLearning:v2from Oct 6, 2025
Conversation
…ery support - Add ZenFSBlobMMAPFile class for cross-platform memory-mapped file operations - Implement ZenFileSystem with blob allocation, access, and persistence logic - Support anonymous and file-backed mmap modes on both POSIX and Windows systems - Introduce JSON-based index management for filesystem metadata and recovery - Add prefetching and purging hints for performance optimization - Enable disk-based tensor storage with automatic memory mapping - Include HaHaHash utility for consistent hashing of filesystem components
Add documentation files for both paged_attn and paged_attn_x kernel implementations. The paged_attn README describes its usage with lazy_llm pruning methods, while paged_attn_x README explains its cooperation with nn/lmcache/aux_page components.
- Introduce architecture tags (x86, ARM, any) and vector dot product template for paged attention kernels - Implement forward pass template for BSHD format with FA2-style loop - Update setup scripts to initialize submodules recursively - Include CPU architecture detection and conditional compilation headers
- Rename `aux_page` to `prefix_cache` and update related paths - Introduce `RadixTree`-based cache management for efficient prefix caching - Refactor `PagedAttnOp` to support prefix cache context - Add ARM-specific optimizations for vectorized operations in paged attention - Update CODEOWNERS and documentation to reflect new module structure - Implement token-level allocator and TLB utilities for virtual page addressing - Add necessary data structures and hashing utilities for cache indexing
…h TLB support Introduce `_HiCPUAllocator` and `_GPUAllocator` implementations for managing virtual addresses across memory and disk layers. Add `PrefixCacheAllocator` to route allocation requests based on device type. Include TLB integration for fast physical address lookup. Update ZenFS initialization and file-backed mapping logic to support custom file paths and fix bit calculation errors. Add new test executable for prefix cache functionality. Refactor rotary embedding function signature for better formatting. Add Session header for future service integration. Enable configuration options for prefix cache allocator and radix tree.
- Add `Service`, `RequestPool`, `ResponsePool`, and `SessionPool` to manage asynchronous requests and responses - Introduce `Session` base class and `NoneSession` fallback implementation - Create `sendRequest` and `getResponse` APIs for external bindings - Add Qwen3 model support with RoPE and attention modules for service usage - Include unit test for the new service functionality The service layer now supports multi-threaded request handling and session management, preparing the engine for integration with high-level servers via language bindings.
This commit adds a README file in the pymllm directory to formally request a PyPI organization for the mllm project. The file includes correspondence with PyPI staff and verification details regarding project affiliation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel
Add documentation files for both paged_attn and paged_attn_x kernel
implementations. The paged_attn README describes its usage with lazy_llm
pruning methods, while paged_attn_x README explains its cooperation with
nn/lmcache/aux_page components.