refactor(mocker): replace vllm block manager with kvbm-logical by dreamtalen · Pull Request #8451 · ai-dynamo/dynamo

dreamtalen · 2026-04-21T18:20:40Z

Overview:

Reopen #6059 with refactoring: replace the mocker's LLM block manager with kvbm-logical.

Details:

New vLLM G1 backend: kvbm-logical::BlockManager<G1> with Lineage inactive pool.
Deleted manual vLLM block manager & evictor.
Added a HashMap<PositionalLineageHash, SequenceHash> to bridge kvbm-logical's PositionalLineageHash to the router's SequenceHash.

Where should the reviewer start?

lib/mocker/src/kv_manager/kvbm_backend.rs — sole backend. Start from the KvManager struct, then process_use / process_promote → process_destroy / process_deref/ emit_evicted_events
lib/mocker/src/common/protocols.rs — MoveBlock variant changes (extra plhs field on Use, PLH on Promote) and MockerEvictionBackend.
lib/mocker/src/common/sequence.rs — positional_lineage_hashes() and the prefix-caching-off randomization path.

Related Issues:

Absorbs feat: integrate kvbm + mocker #6059 — commit has Co-authored-by: Ryan Olson.
Relates to feat(mocker): KVBM G2 offload for on/offline replay #8184 — the next commit on that branch lands G1↔G2 offload on top of this refactor.

Summary by CodeRabbit

Release Notes

New Features
- Added support for multiple eviction backend strategies (Lineage, LRU, MultiLRU) for KV cache management.
- Enhanced block promotion and allocation tracking with positional lineage hashing support.
Refactor
- Reimplemented KV cache block manager backend with improved state management and capacity handling.
- Streamlined block lifecycle operations and eviction detection mechanisms.

copy-pr-bot · 2026-04-21T18:20:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-21T18:32:09Z

Walkthrough

The PR replaces the vLLM-based KV cache manager with a kvbm-logical backend, removing the previous HashCache and LRUEvictor implementations. It introduces new protocol types (G1, MockerEvictionBackend) and updates MoveBlock message structures to incorporate PositionalLineageHash for lineage-aware eviction strategy support.

Changes

Cohort / File(s)	Summary
Dependency Addition `lib/mocker/Cargo.toml`	Added `kvbm-logical` workspace dependency.
Cache Module Removal `lib/mocker/src/cache/hash_cache.rs`, `lib/mocker/src/cache/mod.rs`	Deleted `HashCache` type with its public API (ref-count tracking, active/inactive pools, LRU eviction); removed submodule declaration and re-export from cache module while retaining `RadixCache`.
Common Module Cleanup `lib/mocker/src/common/evictor.rs`, `lib/mocker/src/common/mod.rs`	Removed `LRUEvictor<T>` implementation (priority counter and `BTreeSet`-based queue) and its module export; deprecated prior eviction infrastructure.
Protocol & Sequence Updates `lib/mocker/src/common/protocols.rs`, `lib/mocker/src/common/sequence.rs`	Added `G1` marker type and `MockerEvictionBackend` enum (Lineage/Lru/MultiLru); updated `MoveBlock::Use` and `MoveBlock::Promote` variants with `PositionalLineageHash` arguments; added `positional_lineage_hashes()` method to `ActiveSequence` and integrated PLH generation into block allocation/promotion logic.
KV Manager Migration `lib/mocker/src/kv_manager/kvbm_backend.rs`, `lib/mocker/src/kv_manager/mod.rs`, `lib/mocker/src/kv_manager/vllm_backend.rs`	Replaced vLLM KV cache backend with new kvbm-logical backend; new `KvManager` wraps `kvbm_logical::BlockManager<G1>`, translates `MoveBlock` protocol into RAII block lifecycle, supports multiple eviction strategies (Lineage/Lru/MultiLru with optional `TinyLFUTracker`), handles Use/Destroy/Deref/Promote with block matching via PLH registry, and emits KV events for router synchronization. Removed 650-line vLLM implementation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly summarizes the main refactoring change: replacing the vllm block manager with kvbm-logical implementation, which aligns perfectly with the core objective of the changeset.
Description check	✅ Passed	The PR description follows the template with Overview, Details, and Where should the reviewer start sections. It comprehensively explains the refactoring, key changes, and provides clear guidance for reviewing the main affected files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/mocker/src/common/sequence.rs (1)

230-258: ⚠️ Potential issue | 🟠 Major

Randomize promoted block PLHs when prefix caching is disabled.

Line 236 always uses the deterministic token-derived PLH. With enable_prefix_caching == false, two requests that generate the same completed block can still collide in process_promote() via match_blocks(&[plh]), reusing a block despite randomized last_seq_hash.

🐛 Proposed fix

             let last_block_hash = last_complete.block_hash();
-            let last_plh = last_complete.positional_lineage_hash();
+            let last_plh = if self.enable_prefix_caching {
+                last_complete.positional_lineage_hash()
+            } else {
+                PositionalLineageHash::new(
+                    last_seq_hash,
+                    None,
+                    self.block_hashes.len() as u64,
+                )
+            };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/common/sequence.rs` around lines 230 - 258, The code always
sets last_plh from last_complete.positional_lineage_hash(), causing collisions
when enable_prefix_caching is false; change the assignment of last_plh to be
conditional like last_seq_hash: use last_complete.positional_lineage_hash() only
when self.enable_prefix_caching is true, otherwise generate a random value
(e.g., random::<u64>()) so the Promote signal (MoveBlock::Promote) uses a
randomized PLH and avoids match_blocks(&[plh]) reusing the same block in
process_promote().

🧹 Nitpick comments (1)

lib/mocker/src/kv_manager/kvbm_backend.rs (1)
490-499: Include inactive blocks or narrow this method’s contract.

The doc says this counts blocks absent from active and inactive pools, but the implementation only checks active maps. Inactive cached full blocks will be reported as new, which can skew admission/preemption estimates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/kv_manager/kvbm_backend.rs` around lines 490 - 499, The
doc/behavior mismatch in probe_new_blocks: update the implementation of
probe_new_blocks (function probe_new_blocks and its match on
UniqueBlock::FullBlock/PartialBlock) so it also checks the inactive maps instead
of only active maps—i.e., change the predicates to ensure the block is absent
from both active_full and inactive_full for FullBlock, and absent from both
active_partial and inactive_partial for PartialBlock; alternatively, if you
prefer to narrow the contract, update the docstring to state the method only
checks active pools and leave logic as-is.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 211-216: The Stored event may be using the wrong tokens_hash
because publish_kv_event() currently slices the tail of local_hashes based on
num_blocks; instead, when handling partial success (blocks_stored contains the
successfully allocated prefix) compute the local_hashes slice that corresponds
to the actually stored prefix (e.g. take local_hashes[..blocks_stored.len()])
and pass that into event_data/tokens_hash logic; update the same logic paths
where local_hashes is sliced (including the block handling around
publish_kv_event(), is_store branch, and any other occurrences in the file) to
use the prefix length derived from blocks_stored rather than subtracting from
the end. Ensure you reference publish_kv_event(), blocks_stored, local_hashes,
and the event_data construction when making the change.
- Around line 402-420: process_destroy currently removes full blocks from
active_full and emits a Removed event via publish_kv_event, but the underlying
kvbm-logical block can still be matched later in the inactive registry and
reactivated (e.g., via PLH) without emitting a Stored event, leaving the router
out of sync; update process_destroy (and the analogous logic at lines ~324-338)
so that when handling UniqueBlock::FullBlock you either mark the seq_hash as
non-matchable in the inactive registry (prevent future matches) or ensure
reactivation emits a Stored event: specifically, in process_destroy (and the
similar destroy path) after removing from active_full and before calling
publish_kv_event, clear or flag the corresponding entry in the inactive matcher
registry to prevent matching, or record that the router mapping was removed and
ensure publish_kv_event or the reactivation path emits a Stored for any
subsequent Use/reactivation of that same SequenceHash so the router receives a
consistent Stored/Removed pair.
- Around line 296-351: The loop in process_use (UniqueBlock::FullBlock branch)
currently does plh.unwrap_or_default() and silently stages with
PositionalLineageHash::default() when plhs is too short; instead, validate that
plhs.get(plh_idx) is Some before proceeding: if plhs.get(plh_idx) is None,
reject the Use event by breaking/returning early (do not allocate or stage), and
do not use a default PLH. Update the code paths that use plh_idx (increment only
when a real PLH was consumed) and ensure no staging/allocation happens unless a
real PLH was present; reference symbols: process_use, plhs, plh_idx,
UniqueBlock::FullBlock, block_manager.allocate_blocks, mutable.stage.

---

Outside diff comments:
In `@lib/mocker/src/common/sequence.rs`:
- Around line 230-258: The code always sets last_plh from
last_complete.positional_lineage_hash(), causing collisions when
enable_prefix_caching is false; change the assignment of last_plh to be
conditional like last_seq_hash: use last_complete.positional_lineage_hash() only
when self.enable_prefix_caching is true, otherwise generate a random value
(e.g., random::<u64>()) so the Promote signal (MoveBlock::Promote) uses a
randomized PLH and avoids match_blocks(&[plh]) reusing the same block in
process_promote().

---

Nitpick comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 490-499: The doc/behavior mismatch in probe_new_blocks: update the
implementation of probe_new_blocks (function probe_new_blocks and its match on
UniqueBlock::FullBlock/PartialBlock) so it also checks the inactive maps instead
of only active maps—i.e., change the predicates to ensure the block is absent
from both active_full and inactive_full for FullBlock, and absent from both
active_partial and inactive_partial for PartialBlock; alternatively, if you
prefer to narrow the contract, update the docstring to state the method only
checks active pools and leave logic as-is.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec50491a-a53a-4977-8035-560930a85fa5

📥 Commits

Reviewing files that changed from the base of the PR and between b688597 and 64c0a63.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
lib/bindings/python/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

lib/mocker/Cargo.toml
lib/mocker/src/cache/hash_cache.rs
lib/mocker/src/cache/mod.rs
lib/mocker/src/common/evictor.rs
lib/mocker/src/common/mod.rs
lib/mocker/src/common/protocols.rs
lib/mocker/src/common/sequence.rs
lib/mocker/src/kv_manager/kvbm_backend.rs
lib/mocker/src/kv_manager/mod.rs
lib/mocker/src/kv_manager/vllm_backend.rs

💤 Files with no reviewable changes (5)

lib/mocker/src/common/mod.rs
lib/mocker/src/cache/mod.rs
lib/mocker/src/kv_manager/vllm_backend.rs
lib/mocker/src/cache/hash_cache.rs
lib/mocker/src/common/evictor.rs

PeaBrane · 2026-04-21T20:29:21Z

/ok to test e27e89f

Co-authored-by: Ryan Olson <rolson@nvidia.com> Signed-off-by: Yongming Ding <yongmingd@nvidia.com>

PeaBrane

Rechecked against the current PR head; I still see five logic/overhead issues worth addressing or documenting.

PeaBrane

One test-coverage suggestion that would help here: I think it would be worth bolting a router-side cleanup assertion onto the existing 8-case scheduler matrix. Using the existing RouterIndexerHarness / LocalKvIndexer path is probably enough for this. Run the scheduler to completion, forward its KV events into the side indexer, flush, call RouterIndexerHarness::assert_no_event_errors() so invalid Stored/Removed application fails the test without scraping warnings/logs, and then assert the indexer is empty at the end (e.g. dump_events().is_empty() via a small harness helper). That would give an end-to-end check that the emitted KV event stream is both valid for the router and leaves the router-visible KV index in a clean state after the request set drains, without needing to duplicate the full cross-variant coverage that already lives in lib/kv-router/src/indexer/tests.rs.

Signed-off-by: Yongming Ding <yongmingd@nvidia.com>

dreamtalen · 2026-04-22T01:26:18Z

Small follow-up on the scheduler-matrix test suggestion:

Wired RouterIndexerHarness into the 8-case scheduler matrix test. assert_no_event_errors() now guards the whole matrix. Did NOT add dump_events().is_empty() because under the current mocker contract Deref doesn't emit Removed, so the router tree is expected to retain entries after a run.

dreamtalen · 2026-04-22T06:16:59Z

/ok to test 3e06664

pull-request-size Bot added the size/XXL label Apr 21, 2026

github-actions Bot added the refactor label Apr 21, 2026

dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from 373b38c to 64c0a63 Compare April 21, 2026 18:24

dreamtalen marked this pull request as ready for review April 21, 2026 18:25

dreamtalen requested a review from a team as a code owner April 21, 2026 18:25

dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from 64c0a63 to ac9862d Compare April 21, 2026 18:29

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated

Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated

Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs

dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from ac9862d to e27e89f Compare April 21, 2026 20:12

refactor(mocker): replace vllm block manager with kvbm-logical

9c0cdb3

Co-authored-by: Ryan Olson <rolson@nvidia.com> Signed-off-by: Yongming Ding <yongmingd@nvidia.com>

dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from e27e89f to 9c0cdb3 Compare April 21, 2026 21:08

PeaBrane reviewed Apr 21, 2026

View reviewed changes

This comment was marked as duplicate.

Sign in to view

address comments

3e06664

Signed-off-by: Yongming Ding <yongmingd@nvidia.com>

PeaBrane approved these changes Apr 22, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB April 22, 2026 06:17 Inactive

dreamtalen enabled auto-merge (squash) April 22, 2026 06:40

dreamtalen merged commit 36b4208 into main Apr 22, 2026
187 of 191 checks passed

dreamtalen deleted the yongmingd/mocker-kvbm-g1 branch April 22, 2026 16:59

This was referenced Apr 22, 2026

feat: integrate kvbm + mocker #6059

Closed

[FEATURE]: KVBM-Mocker integration: multi-tier KV cache offload simulation #8190

Open

Conversation

dreamtalen commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

coderabbitai Bot commented Apr 21, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PeaBrane commented Apr 21, 2026

Uh oh!

PeaBrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PeaBrane left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

dreamtalen commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dreamtalen commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dreamtalen commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

PeaBrane left a comment •

edited

Loading

dreamtalen commented Apr 22, 2026 •

edited

Loading