feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel by chenghuaWang · Pull Request #459 · UbiquitousLearning/mllm

chenghuaWang · 2025-09-28T07:54:37Z

feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel

Add ZenFSBlobMMAPFile class for cross-platform memory-mapped file operations
Implement ZenFileSystem with blob allocation, access, and persistence logic
Support anonymous and file-backed mmap modes on both POSIX and Windows systems
Introduce JSON-based index management for filesystem metadata and recovery
Add prefetching and purging hints for performance optimization
Enable disk-based tensor storage with automatic memory mapping
Include HaHaHash utility for consistent hashing of filesystem components

Add documentation files for both paged_attn and paged_attn_x kernel
implementations. The paged_attn README describes its usage with lazy_llm
pruning methods, while paged_attn_x README explains its cooperation with
nn/lmcache/aux_page components.

…ery support - Add ZenFSBlobMMAPFile class for cross-platform memory-mapped file operations - Implement ZenFileSystem with blob allocation, access, and persistence logic - Support anonymous and file-backed mmap modes on both POSIX and Windows systems - Introduce JSON-based index management for filesystem metadata and recovery - Add prefetching and purging hints for performance optimization - Enable disk-based tensor storage with automatic memory mapping - Include HaHaHash utility for consistent hashing of filesystem components

Add documentation files for both paged_attn and paged_attn_x kernel implementations. The paged_attn README describes its usage with lazy_llm pruning methods, while paged_attn_x README explains its cooperation with nn/lmcache/aux_page components.

- Introduce architecture tags (x86, ARM, any) and vector dot product template for paged attention kernels - Implement forward pass template for BSHD format with FA2-style loop - Update setup scripts to initialize submodules recursively - Include CPU architecture detection and conditional compilation headers

- Rename `aux_page` to `prefix_cache` and update related paths - Introduce `RadixTree`-based cache management for efficient prefix caching - Refactor `PagedAttnOp` to support prefix cache context - Add ARM-specific optimizations for vectorized operations in paged attention - Update CODEOWNERS and documentation to reflect new module structure - Implement token-level allocator and TLB utilities for virtual page addressing - Add necessary data structures and hashing utilities for cache indexing

…h TLB support Introduce `_HiCPUAllocator` and `_GPUAllocator` implementations for managing virtual addresses across memory and disk layers. Add `PrefixCacheAllocator` to route allocation requests based on device type. Include TLB integration for fast physical address lookup. Update ZenFS initialization and file-backed mapping logic to support custom file paths and fix bit calculation errors. Add new test executable for prefix cache functionality. Refactor rotary embedding function signature for better formatting. Add Session header for future service integration. Enable configuration options for prefix cache allocator and radix tree.

- Add `Service`, `RequestPool`, `ResponsePool`, and `SessionPool` to manage asynchronous requests and responses - Introduce `Session` base class and `NoneSession` fallback implementation - Create `sendRequest` and `getResponse` APIs for external bindings - Add Qwen3 model support with RoPE and attention modules for service usage - Include unit test for the new service functionality The service layer now supports multi-threaded request handling and session management, preparing the engine for integration with high-level servers via language bindings.

This commit adds a README file in the pymllm directory to formally request a PyPI organization for the mllm project. The file includes correspondence with PyPI staff and verification details regarding project affiliation.

chenghuaWang and others added 3 commits September 28, 2025 15:45

Merge branch 'UbiquitousLearning:v2' into v2

cc20c1e

chenghuaWang requested review from oreomaker and yirongjie as code owners September 28, 2025 07:54

chenghuaWang and others added 5 commits October 1, 2025 16:34

docs(pymllm): add README for PyPI organization request

000471e

This commit adds a README file in the pymllm directory to formally request a PyPI organization for the mllm project. The file includes correspondence with PyPI staff and verification details regarding project affiliation.

chenghuaWang merged commit 74b02d1 into UbiquitousLearning:v2 Oct 6, 2025
0 of 2 checks passed

chenghuaWang mentioned this pull request Nov 1, 2025

Development Roadmap (2025 H2) #460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel#459

feat(lmcache, paged_attn): implement ZenFS with mmap-based blob storage and recovery. Coworks with special paged_attn kernel#459
chenghuaWang merged 8 commits intoUbiquitousLearning:v2from
chenghuaWang:v2

chenghuaWang commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chenghuaWang commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant