refactor(mocker): replace vllm block manager with kvbm-logical#8451
Conversation
373b38c to
64c0a63
Compare
64c0a63 to
ac9862d
Compare
WalkthroughThe PR replaces the vLLM-based KV cache manager with a kvbm-logical backend, removing the previous Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/mocker/src/common/sequence.rs (1)
230-258:⚠️ Potential issue | 🟠 MajorRandomize promoted block PLHs when prefix caching is disabled.
Line 236 always uses the deterministic token-derived PLH. With
enable_prefix_caching == false, two requests that generate the same completed block can still collide inprocess_promote()viamatch_blocks(&[plh]), reusing a block despite randomizedlast_seq_hash.🐛 Proposed fix
let last_block_hash = last_complete.block_hash(); - let last_plh = last_complete.positional_lineage_hash(); + let last_plh = if self.enable_prefix_caching { + last_complete.positional_lineage_hash() + } else { + PositionalLineageHash::new( + last_seq_hash, + None, + self.block_hashes.len() as u64, + ) + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/mocker/src/common/sequence.rs` around lines 230 - 258, The code always sets last_plh from last_complete.positional_lineage_hash(), causing collisions when enable_prefix_caching is false; change the assignment of last_plh to be conditional like last_seq_hash: use last_complete.positional_lineage_hash() only when self.enable_prefix_caching is true, otherwise generate a random value (e.g., random::<u64>()) so the Promote signal (MoveBlock::Promote) uses a randomized PLH and avoids match_blocks(&[plh]) reusing the same block in process_promote().
🧹 Nitpick comments (1)
lib/mocker/src/kv_manager/kvbm_backend.rs (1)
490-499: Include inactive blocks or narrow this method’s contract.The doc says this counts blocks absent from active and inactive pools, but the implementation only checks active maps. Inactive cached full blocks will be reported as new, which can skew admission/preemption estimates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/mocker/src/kv_manager/kvbm_backend.rs` around lines 490 - 499, The doc/behavior mismatch in probe_new_blocks: update the implementation of probe_new_blocks (function probe_new_blocks and its match on UniqueBlock::FullBlock/PartialBlock) so it also checks the inactive maps instead of only active maps—i.e., change the predicates to ensure the block is absent from both active_full and inactive_full for FullBlock, and absent from both active_partial and inactive_partial for PartialBlock; alternatively, if you prefer to narrow the contract, update the docstring to state the method only checks active pools and leave logic as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 211-216: The Stored event may be using the wrong tokens_hash
because publish_kv_event() currently slices the tail of local_hashes based on
num_blocks; instead, when handling partial success (blocks_stored contains the
successfully allocated prefix) compute the local_hashes slice that corresponds
to the actually stored prefix (e.g. take local_hashes[..blocks_stored.len()])
and pass that into event_data/tokens_hash logic; update the same logic paths
where local_hashes is sliced (including the block handling around
publish_kv_event(), is_store branch, and any other occurrences in the file) to
use the prefix length derived from blocks_stored rather than subtracting from
the end. Ensure you reference publish_kv_event(), blocks_stored, local_hashes,
and the event_data construction when making the change.
- Around line 402-420: process_destroy currently removes full blocks from
active_full and emits a Removed event via publish_kv_event, but the underlying
kvbm-logical block can still be matched later in the inactive registry and
reactivated (e.g., via PLH) without emitting a Stored event, leaving the router
out of sync; update process_destroy (and the analogous logic at lines ~324-338)
so that when handling UniqueBlock::FullBlock you either mark the seq_hash as
non-matchable in the inactive registry (prevent future matches) or ensure
reactivation emits a Stored event: specifically, in process_destroy (and the
similar destroy path) after removing from active_full and before calling
publish_kv_event, clear or flag the corresponding entry in the inactive matcher
registry to prevent matching, or record that the router mapping was removed and
ensure publish_kv_event or the reactivation path emits a Stored for any
subsequent Use/reactivation of that same SequenceHash so the router receives a
consistent Stored/Removed pair.
- Around line 296-351: The loop in process_use (UniqueBlock::FullBlock branch)
currently does plh.unwrap_or_default() and silently stages with
PositionalLineageHash::default() when plhs is too short; instead, validate that
plhs.get(plh_idx) is Some before proceeding: if plhs.get(plh_idx) is None,
reject the Use event by breaking/returning early (do not allocate or stage), and
do not use a default PLH. Update the code paths that use plh_idx (increment only
when a real PLH was consumed) and ensure no staging/allocation happens unless a
real PLH was present; reference symbols: process_use, plhs, plh_idx,
UniqueBlock::FullBlock, block_manager.allocate_blocks, mutable.stage.
---
Outside diff comments:
In `@lib/mocker/src/common/sequence.rs`:
- Around line 230-258: The code always sets last_plh from
last_complete.positional_lineage_hash(), causing collisions when
enable_prefix_caching is false; change the assignment of last_plh to be
conditional like last_seq_hash: use last_complete.positional_lineage_hash() only
when self.enable_prefix_caching is true, otherwise generate a random value
(e.g., random::<u64>()) so the Promote signal (MoveBlock::Promote) uses a
randomized PLH and avoids match_blocks(&[plh]) reusing the same block in
process_promote().
---
Nitpick comments:
In `@lib/mocker/src/kv_manager/kvbm_backend.rs`:
- Around line 490-499: The doc/behavior mismatch in probe_new_blocks: update the
implementation of probe_new_blocks (function probe_new_blocks and its match on
UniqueBlock::FullBlock/PartialBlock) so it also checks the inactive maps instead
of only active maps—i.e., change the predicates to ensure the block is absent
from both active_full and inactive_full for FullBlock, and absent from both
active_partial and inactive_partial for PartialBlock; alternatively, if you
prefer to narrow the contract, update the docstring to state the method only
checks active pools and leave logic as-is.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ec50491a-a53a-4977-8035-560930a85fa5
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
lib/mocker/Cargo.tomllib/mocker/src/cache/hash_cache.rslib/mocker/src/cache/mod.rslib/mocker/src/common/evictor.rslib/mocker/src/common/mod.rslib/mocker/src/common/protocols.rslib/mocker/src/common/sequence.rslib/mocker/src/kv_manager/kvbm_backend.rslib/mocker/src/kv_manager/mod.rslib/mocker/src/kv_manager/vllm_backend.rs
💤 Files with no reviewable changes (5)
- lib/mocker/src/common/mod.rs
- lib/mocker/src/cache/mod.rs
- lib/mocker/src/kv_manager/vllm_backend.rs
- lib/mocker/src/cache/hash_cache.rs
- lib/mocker/src/common/evictor.rs
ac9862d to
e27e89f
Compare
|
/ok to test e27e89f |
Co-authored-by: Ryan Olson <rolson@nvidia.com> Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
e27e89f to
9c0cdb3
Compare
PeaBrane
left a comment
There was a problem hiding this comment.
Rechecked against the current PR head; I still see five logic/overhead issues worth addressing or documenting.
There was a problem hiding this comment.
One test-coverage suggestion that would help here: I think it would be worth bolting a router-side cleanup assertion onto the existing 8-case scheduler matrix. Using the existing RouterIndexerHarness / LocalKvIndexer path is probably enough for this. Run the scheduler to completion, forward its KV events into the side indexer, flush, call RouterIndexerHarness::assert_no_event_errors() so invalid Stored/Removed application fails the test without scraping warnings/logs, and then assert the indexer is empty at the end (e.g. dump_events().is_empty() via a small harness helper). That would give an end-to-end check that the emitted KV event stream is both valid for the router and leaves the router-visible KV index in a clean state after the request set drains, without needing to duplicate the full cross-variant coverage that already lives in lib/kv-router/src/indexer/tests.rs.
Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
Wired |
|
/ok to test 3e06664 |
Overview:
Reopen #6059 with refactoring: replace the mocker's LLM block manager with kvbm-logical.
Details:
kvbm-logical::BlockManager<G1>withLineageinactive pool.HashMap<PositionalLineageHash, SequenceHash>to bridge kvbm-logical's PositionalLineageHash to the router's SequenceHash.Where should the reviewer start?
KvManagerstruct, thenprocess_use/process_promote→process_destroy/process_deref/emit_evicted_eventsMoveBlockvariant changes (extraplhsfield onUse, PLH onPromote) andMockerEvictionBackend.positional_lineage_hashes()and the prefix-caching-off randomization path.Related Issues:
Co-authored-by: Ryan Olson.Summary by CodeRabbit
Release Notes
New Features
Refactor