Skip to content

refactor(mocker): replace vllm block manager with kvbm-logical#8451

Merged
dreamtalen merged 2 commits into
mainfrom
yongmingd/mocker-kvbm-g1
Apr 22, 2026
Merged

refactor(mocker): replace vllm block manager with kvbm-logical#8451
dreamtalen merged 2 commits into
mainfrom
yongmingd/mocker-kvbm-g1

Conversation

@dreamtalen
Copy link
Copy Markdown
Contributor

@dreamtalen dreamtalen commented Apr 21, 2026

Overview:

Reopen #6059 with refactoring: replace the mocker's LLM block manager with kvbm-logical.

Details:

  • New vLLM G1 backend: kvbm-logical::BlockManager<G1> with Lineage inactive pool.
  • Deleted manual vLLM block manager & evictor.
  • Added a HashMap<PositionalLineageHash, SequenceHash> to bridge kvbm-logical's PositionalLineageHash to the router's SequenceHash.

Where should the reviewer start?

  1. lib/mocker/src/kv_manager/kvbm_backend.rs — sole backend. Start from the KvManager struct, then process_use / process_promoteprocess_destroy / process_deref/ emit_evicted_events
  2. lib/mocker/src/common/protocols.rsMoveBlock variant changes (extra plhs field on Use, PLH on Promote) and MockerEvictionBackend.
  3. lib/mocker/src/common/sequence.rspositional_lineage_hashes() and the prefix-caching-off randomization path.

Related Issues:

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for multiple eviction backend strategies (Lineage, LRU, MultiLRU) for KV cache management.
    • Enhanced block promotion and allocation tracking with positional lineage hashing support.
  • Refactor

    • Reimplemented KV cache block manager backend with improved state management and capacity handling.
    • Streamlined block lifecycle operations and eviction detection mechanisms.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dreamtalen dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from 373b38c to 64c0a63 Compare April 21, 2026 18:24
@dreamtalen dreamtalen marked this pull request as ready for review April 21, 2026 18:25
@dreamtalen dreamtalen requested a review from a team as a code owner April 21, 2026 18:25
@dreamtalen dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from 64c0a63 to ac9862d Compare April 21, 2026 18:29
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Walkthrough

The PR replaces the vLLM-based KV cache manager with a kvbm-logical backend, removing the previous HashCache and LRUEvictor implementations. It introduces new protocol types (G1, MockerEvictionBackend) and updates MoveBlock message structures to incorporate PositionalLineageHash for lineage-aware eviction strategy support.

Changes

Cohort / File(s) Summary
Dependency Addition
lib/mocker/Cargo.toml
Added kvbm-logical workspace dependency.
Cache Module Removal
lib/mocker/src/cache/hash_cache.rs, lib/mocker/src/cache/mod.rs
Deleted HashCache type with its public API (ref-count tracking, active/inactive pools, LRU eviction); removed submodule declaration and re-export from cache module while retaining RadixCache.
Common Module Cleanup
lib/mocker/src/common/evictor.rs, lib/mocker/src/common/mod.rs
Removed LRUEvictor<T> implementation (priority counter and BTreeSet-based queue) and its module export; deprecated prior eviction infrastructure.
Protocol & Sequence Updates
lib/mocker/src/common/protocols.rs, lib/mocker/src/common/sequence.rs
Added G1 marker type and MockerEvictionBackend enum (Lineage/Lru/MultiLru); updated MoveBlock::Use and MoveBlock::Promote variants with PositionalLineageHash arguments; added positional_lineage_hashes() method to ActiveSequence and integrated PLH generation into block allocation/promotion logic.
KV Manager Migration
lib/mocker/src/kv_manager/kvbm_backend.rs, lib/mocker/src/kv_manager/mod.rs, lib/mocker/src/kv_manager/vllm_backend.rs
Replaced vLLM KV cache backend with new kvbm-logical backend; new KvManager wraps kvbm_logical::BlockManager<G1>, translates MoveBlock protocol into RAII block lifecycle, supports multiple eviction strategies (Lineage/Lru/MultiLru with optional TinyLFUTracker), handles Use/Destroy/Deref/Promote with block matching via PLH registry, and emits KV events for router synchronization. Removed 650-line vLLM implementation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main refactoring change: replacing the vllm block manager with kvbm-logical implementation, which aligns perfectly with the core objective of the changeset.
Description check ✅ Passed The PR description follows the template with Overview, Details, and Where should the reviewer start sections. It comprehensively explains the refactoring, key changes, and provides clear guidance for reviewing the main affected files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/mocker/src/common/sequence.rs (1)

230-258: ⚠️ Potential issue | 🟠 Major

Randomize promoted block PLHs when prefix caching is disabled.

Line 236 always uses the deterministic token-derived PLH. With enable_prefix_caching == false, two requests that generate the same completed block can still collide in process_promote() via match_blocks(&[plh]), reusing a block despite randomized last_seq_hash.

🐛 Proposed fix
             let last_block_hash = last_complete.block_hash();
-            let last_plh = last_complete.positional_lineage_hash();
+            let last_plh = if self.enable_prefix_caching {
+                last_complete.positional_lineage_hash()
+            } else {
+                PositionalLineageHash::new(
+                    last_seq_hash,
+                    None,
+                    self.block_hashes.len() as u64,
+                )
+            };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/common/sequence.rs` around lines 230 - 258, The code always
sets last_plh from last_complete.positional_lineage_hash(), causing collisions
when enable_prefix_caching is false; change the assignment of last_plh to be
conditional like last_seq_hash: use last_complete.positional_lineage_hash() only
when self.enable_prefix_caching is true, otherwise generate a random value
(e.g., random::<u64>()) so the Promote signal (MoveBlock::Promote) uses a
randomized PLH and avoids match_blocks(&[plh]) reusing the same block in
process_promote().
🧹 Nitpick comments (1)
lib/mocker/src/kv_manager/kvbm_backend.rs (1)

490-499: Include inactive blocks or narrow this method’s contract.

The doc says this counts blocks absent from active and inactive pools, but the implementation only checks active maps. Inactive cached full blocks will be reported as new, which can skew admission/preemption estimates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/kv_manager/kvbm_backend.rs` around lines 490 - 499, The
doc/behavior mismatch in probe_new_blocks: update the implementation of
probe_new_blocks (function probe_new_blocks and its match on
UniqueBlock::FullBlock/PartialBlock) so it also checks the inactive maps instead
of only active maps—i.e., change the predicates to ensure the block is absent
from both active_full and inactive_full for FullBlock, and absent from both
active_partial and inactive_partial for PartialBlock; alternatively, if you
prefer to narrow the contract, update the docstring to state the method only
checks active pools and leave logic as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 211-216: The Stored event may be using the wrong tokens_hash
because publish_kv_event() currently slices the tail of local_hashes based on
num_blocks; instead, when handling partial success (blocks_stored contains the
successfully allocated prefix) compute the local_hashes slice that corresponds
to the actually stored prefix (e.g. take local_hashes[..blocks_stored.len()])
and pass that into event_data/tokens_hash logic; update the same logic paths
where local_hashes is sliced (including the block handling around
publish_kv_event(), is_store branch, and any other occurrences in the file) to
use the prefix length derived from blocks_stored rather than subtracting from
the end. Ensure you reference publish_kv_event(), blocks_stored, local_hashes,
and the event_data construction when making the change.
- Around line 402-420: process_destroy currently removes full blocks from
active_full and emits a Removed event via publish_kv_event, but the underlying
kvbm-logical block can still be matched later in the inactive registry and
reactivated (e.g., via PLH) without emitting a Stored event, leaving the router
out of sync; update process_destroy (and the analogous logic at lines ~324-338)
so that when handling UniqueBlock::FullBlock you either mark the seq_hash as
non-matchable in the inactive registry (prevent future matches) or ensure
reactivation emits a Stored event: specifically, in process_destroy (and the
similar destroy path) after removing from active_full and before calling
publish_kv_event, clear or flag the corresponding entry in the inactive matcher
registry to prevent matching, or record that the router mapping was removed and
ensure publish_kv_event or the reactivation path emits a Stored for any
subsequent Use/reactivation of that same SequenceHash so the router receives a
consistent Stored/Removed pair.
- Around line 296-351: The loop in process_use (UniqueBlock::FullBlock branch)
currently does plh.unwrap_or_default() and silently stages with
PositionalLineageHash::default() when plhs is too short; instead, validate that
plhs.get(plh_idx) is Some before proceeding: if plhs.get(plh_idx) is None,
reject the Use event by breaking/returning early (do not allocate or stage), and
do not use a default PLH. Update the code paths that use plh_idx (increment only
when a real PLH was consumed) and ensure no staging/allocation happens unless a
real PLH was present; reference symbols: process_use, plhs, plh_idx,
UniqueBlock::FullBlock, block_manager.allocate_blocks, mutable.stage.

---

Outside diff comments:
In `@lib/mocker/src/common/sequence.rs`:
- Around line 230-258: The code always sets last_plh from
last_complete.positional_lineage_hash(), causing collisions when
enable_prefix_caching is false; change the assignment of last_plh to be
conditional like last_seq_hash: use last_complete.positional_lineage_hash() only
when self.enable_prefix_caching is true, otherwise generate a random value
(e.g., random::<u64>()) so the Promote signal (MoveBlock::Promote) uses a
randomized PLH and avoids match_blocks(&[plh]) reusing the same block in
process_promote().

---

Nitpick comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 490-499: The doc/behavior mismatch in probe_new_blocks: update the
implementation of probe_new_blocks (function probe_new_blocks and its match on
UniqueBlock::FullBlock/PartialBlock) so it also checks the inactive maps instead
of only active maps—i.e., change the predicates to ensure the block is absent
from both active_full and inactive_full for FullBlock, and absent from both
active_partial and inactive_partial for PartialBlock; alternatively, if you
prefer to narrow the contract, update the docstring to state the method only
checks active pools and leave logic as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec50491a-a53a-4977-8035-560930a85fa5

📥 Commits

Reviewing files that changed from the base of the PR and between b688597 and 64c0a63.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • lib/mocker/Cargo.toml
  • lib/mocker/src/cache/hash_cache.rs
  • lib/mocker/src/cache/mod.rs
  • lib/mocker/src/common/evictor.rs
  • lib/mocker/src/common/mod.rs
  • lib/mocker/src/common/protocols.rs
  • lib/mocker/src/common/sequence.rs
  • lib/mocker/src/kv_manager/kvbm_backend.rs
  • lib/mocker/src/kv_manager/mod.rs
  • lib/mocker/src/kv_manager/vllm_backend.rs
💤 Files with no reviewable changes (5)
  • lib/mocker/src/common/mod.rs
  • lib/mocker/src/cache/mod.rs
  • lib/mocker/src/kv_manager/vllm_backend.rs
  • lib/mocker/src/cache/hash_cache.rs
  • lib/mocker/src/common/evictor.rs

Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated
Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated
Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs
@dreamtalen dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from ac9862d to e27e89f Compare April 21, 2026 20:12
@PeaBrane
Copy link
Copy Markdown
Contributor

/ok to test e27e89f

Co-authored-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
@dreamtalen dreamtalen force-pushed the yongmingd/mocker-kvbm-g1 branch from e27e89f to 9c0cdb3 Compare April 21, 2026 21:08
Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rechecked against the current PR head; I still see five logic/overhead issues worth addressing or documenting.

Comment thread lib/mocker/src/common/sequence.rs Outdated
Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated
Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs
Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated
Comment thread lib/mocker/src/kv_manager/kvbm_backend.rs Outdated
Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test-coverage suggestion that would help here: I think it would be worth bolting a router-side cleanup assertion onto the existing 8-case scheduler matrix. Using the existing RouterIndexerHarness / LocalKvIndexer path is probably enough for this. Run the scheduler to completion, forward its KV events into the side indexer, flush, call RouterIndexerHarness::assert_no_event_errors() so invalid Stored/Removed application fails the test without scraping warnings/logs, and then assert the indexer is empty at the end (e.g. dump_events().is_empty() via a small harness helper). That would give an end-to-end check that the emitted KV event stream is both valid for the router and leaves the router-visible KV index in a clean state after the request set drains, without needing to duplicate the full cross-variant coverage that already lives in lib/kv-router/src/indexer/tests.rs.

PeaBrane

This comment was marked as duplicate.

Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
@dreamtalen
Copy link
Copy Markdown
Contributor Author

dreamtalen commented Apr 22, 2026

Small follow-up on the scheduler-matrix test suggestion:

Wired RouterIndexerHarness into the 8-case scheduler matrix test. assert_no_event_errors() now guards the whole matrix. Did NOT add dump_events().is_empty() because under the current mocker contract Deref doesn't emit Removed, so the router tree is expected to retain entries after a run.

@dreamtalen
Copy link
Copy Markdown
Contributor Author

/ok to test 3e06664

@dreamtalen dreamtalen enabled auto-merge (squash) April 22, 2026 06:40
@dreamtalen dreamtalen merged commit 36b4208 into main Apr 22, 2026
187 of 191 checks passed
@dreamtalen dreamtalen deleted the yongmingd/mocker-kvbm-g1 branch April 22, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants