Skip to content

feat(mocker): KVBM G2 offload for on/offline replay#8184

Merged
dreamtalen merged 2 commits into
mainfrom
yongmingd/replay-kvbm-engine-2
Apr 30, 2026
Merged

feat(mocker): KVBM G2 offload for on/offline replay#8184
dreamtalen merged 2 commits into
mainfrom
yongmingd/replay-kvbm-engine-2

Conversation

@dreamtalen
Copy link
Copy Markdown
Contributor

@dreamtalen dreamtalen commented Apr 14, 2026

Overview:

Absorbed #8033
This PR adds optional KVBM-backed G1↔G2 offload simulation for the vLLM mocker, for both online/offline replay.

The current shape intentionally uses the same in-process kvbm-engine stack in both modes:
OffloadEngine + InstanceLeader + PipelineBuilder + a mock Worker.

Live mode drives the offload engine with wall-clock time. Offline replay drives the same hot path with replay virtual time.

Details:

This PR introduces a kvbm-offload feature on dynamo-mocker and exposes it to Python as mocker-kvbm-offload.

Main pieces:

  • lib/mocker/src/kvbm_offload/engine.rs

    • Builds an in-process kvbm-engine::OffloadEngine and InstanceLeader.
  • lib/mocker/src/kvbm_offload/worker.rs

    • Implements kvbm-engine worker traits without moving real memory.
  • lib/mocker/src/kvbm_offload/bandwidth_sharing_model.rs

    • Deterministic processor-sharing bandwidth model.
    • Concurrent transfers on the same link share throughput.
  • lib/mocker/src/kv_manager/kvbm_backend.rs

    • G2→G1 swap-in now reserves destination G1 slots before starting transfer bandwidth reservation.
  • lib/mocker/src/scheduler/vllm/core.rs

    • Ticks the offload engine at pass start.
    • Parks requests waiting on G2→G1 swap-in and promotes them once the handle completes.
  • lib/kvbm-engine/src/offload/*

    • Adds small support hooks needed by the mock worker path: queue notification instead of fixed polling.

Use example:

cd lib/bindings/python
maturin develop --features mocker-kvbm-offload --uv --release

python3 -m dynamo.replay mooncake_trace_1000.jsonl \
  --replay-mode offline \
  --num-workers 1 \
  --trace-block-size 512 \
  --extra-engine-args '{"num_g2_blocks":10000,"num_gpu_blocks":8192,"kv_bytes_per_token":131072}'

Where should the reviewer start?

  • lib/mocker/src/kvbm_offload/*
  • lib/mocker/src/kv_manager/kvbm_backend.rs
  • lib/mocker/src/scheduler/vllm/core.rs
  • lib/kvbm-engine/src/offload/*

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to #8190, #6383

Summary by CodeRabbit

Release Notes

  • New Features

    • Added KV cache offload support for multi-tier memory with configurable parameters: number of G2 blocks, offload batch size, and bandwidth limits.
    • Introduced virtual-time replay mode for offline KV cache offload simulation.
  • Tests

    • Updated unit tests to validate new KV cache offload configuration parameters.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the feat label Apr 14, 2026
@dreamtalen dreamtalen force-pushed the yongmingd/replay-kvbm-engine-2 branch 2 times, most recently from 88aabb6 to b7c678b Compare April 14, 2026 20:40
@dreamtalen dreamtalen marked this pull request as ready for review April 14, 2026 20:56
@dreamtalen dreamtalen requested review from a team and PeaBrane as code owners April 14, 2026 20:56
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Walkthrough

This PR adds KVBM (KV Block Manager) G1↔G2 offload functionality to simulate hierarchical KV cache memory with configurable parameters, transfer delays, and both live async and offline replay modes. Three new configuration arguments are introduced, accompanied by corresponding Rust and Python binding updates, a new KVBM orchestration module, and integration into the scheduler and KV manager systems.

Changes

Cohort / File(s) Summary
CLI & Configuration Setup
components/src/dynamo/mocker/args.py, components/src/dynamo/mocker/config.py, components/src/dynamo/mocker/tests/unit/test_config.py
Added three new CLI arguments (--num-g2-blocks, --kvbm-offload-batch-size, --kvbm-bandwidth-g1-g2) with defaults and corresponding config builder updates; unit test extended to validate JSON payload includes new fields.
Cargo Features & Core Types
lib/bindings/python/Cargo.toml, lib/mocker/Cargo.toml, lib/mocker/src/common/protocols.rs
Added mocker-kvbm and kvbm Cargo features, declared optional KVBM dependencies (kvbm-engine, kvbm-logical, kvbm-physical, velo, futures), and extended MockEngineArgs struct with three new fields and JSON parsing logic.
Python Bindings
lib/bindings/python/rust/llm/replay.rs, lib/bindings/python/src/dynamo/_core.pyi
Updated MockEngineArgs Python constructor signature and type stubs to accept three new parameters; updated dump_json() to include serialized KVBM configuration fields.
KVBM Block Manager
lib/kvbm-logical/src/manager/mod.rs
Added public has_blocks() method for non-destructive hash-to-existence-check queries against the inactive pool.
KV Manager KVBM Integration
lib/mocker/src/kv_manager/mod.rs, lib/mocker/src/kv_manager/vllm_backend.rs
Added conditional kvbm_offload module export; extended KvManager with offload engine state, batch slot tracking, virtual-time support, and updated process() signature to accept now_ms parameter for time-aware event handling and offload completion scheduling.
KVBM Offload Engine
lib/mocker/src/kv_manager/kvbm_offload.rs
New 776-line module implementing KvbmOffloadConfig, MockWorker, and MockOffloadEngine with dual build paths (async/live vs. sync/offline replay), transfer delay simulation based on bandwidth, batch scheduling, and G2 presence queries via InstanceLeader or direct BlockManager access.
Scheduler Integration
lib/mocker/src/scheduler/mod.rs, lib/mocker/src/scheduler/vllm/core.rs, lib/mocker/src/scheduler/vllm/live.rs
Added init_kvbm_offload() initialization method on EngineCore; extended VllmCore with pending swap-in tracking, offload engine forwarding, and updated execute_pass_internal to poll swap-in completion, advance virtual time, and propagate now_ms through all KV event processing; live scheduler now asynchronously initializes offload engine on startup.
Replay System Updates
lib/mocker/src/replay/offline/core.rs, lib/mocker/src/replay/offline/state.rs, lib/mocker/src/scheduler/vllm/tests.rs
Updated ReplayWorkerCore and OfflineWorkerState to clone args and conditionally call init_kvbm_offline(); updated test calls to KvManager.process() to include new now_ms parameter.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 88.37% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description follows the template structure with all required sections (Overview, Details, Where should the reviewer start, Related Issues) completed and substantive content provided.
Title check ✅ Passed The PR title 'feat(mocker): KVBM G2 offload for on/offline replay' accurately summarizes the main change - adding KVBM G2 offload functionality for both online and offline replay modes in the mocker component.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
lib/kvbm-logical/src/manager/mod.rs (1)

264-268: Consider a batched inactive-pool existence API to reduce per-hash overhead.

has_blocks currently performs one inactive_pool.has_block call per hash. If this path is hot, a single batched lookup in InactivePool can reduce lock churn and improve offline replay throughput.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/kvbm-logical/src/manager/mod.rs` around lines 264 - 268, has_blocks
currently calls InactivePool::has_block in a loop, causing per-hash lock
overhead; add a batched existence API on InactivePool (e.g.,
InactivePool::has_blocks or has_many that takes &[SequenceHash] and returns
Vec<bool> or a HashSet of present hashes), implement the internal lookup under a
single lock/scan to reduce churn, then modify Manager::has_blocks to call the
new batched method (keeping the public signature of Manager::has_blocks) so
callers get the same Vec<bool> while benefiting from the single-shot lookup;
ensure tests covering both single and multiple hashes are updated accordingly.
lib/mocker/src/kv_manager/vllm_backend.rs (1)

152-182: LGTM!

The complete_ready_offloads method correctly iterates pending offloads and completes those whose deadline has arrived. The Arc clone is lightweight (just refcount increment).

Optional: Minor simplification opportunity

The drain + collect pattern could be simplified using retain:

-let mut still_pending = Vec::new();
-for offload in self.pending_offloads.drain(..) {
-    if now_ms >= offload.complete_at_ms {
-        engine.complete_offload(offload.block_id, offload.seq_hash);
-        completed += 1;
-    } else {
-        still_pending.push(offload);
-    }
-}
-self.pending_offloads = still_pending;
+self.pending_offloads.retain(|offload| {
+    if now_ms >= offload.complete_at_ms {
+        engine.complete_offload(offload.block_id, offload.seq_hash);
+        completed += 1;
+        false
+    } else {
+        true
+    }
+});

However, this requires completed to be accessible in the closure (via a Cell or moving the counter). The current approach is clear and works correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/kv_manager/vllm_backend.rs` around lines 152 - 182,
complete_ready_offloads currently drains pending_offloads and rebuilds a vector;
you can simplify by using Vec::retain to keep items whose complete_at_ms is in
the future and call engine.complete_offload for items being completed, while
tracking the count via a Cell/AtomicUsize captured in the closure; specifically,
in complete_ready_offloads use &self.offload_engine (clone Arc as needed), call
retain on self.pending_offloads and inside the closure check now_ms >=
offload.complete_at_ms to call engine.complete_offload(offload.block_id,
offload.seq_hash) and increment the counter, then use the counter for the
tracing::debug call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/mocker/src/kv_manager/kvbm_offload.rs`:
- Around line 38-40: Offline replay is ignoring
KvbmOffloadConfig.offload_batch_size, causing virtual evictions to always use
transfer_delay_ms(1); propagate offload_batch_size into the sync engine setup
(the code path that builds the SyncEngine/virtual eviction) and use it to
compute batched transfer latency when marking N evicted blocks ready: compute
number_of_batches = ceil(evicted_count / offload_batch_size) and apply
transfer_delay_ms = per_batch_transfer_ms * number_of_batches (or equivalent
batching formula used by the live KVBM pipeline) instead of using a fixed 1 ms;
update the SyncEngine construction/site that currently hardcodes
transfer_delay_ms(1) to accept and use offload_batch_size from
KvbmOffloadConfig.

In `@lib/mocker/src/scheduler/mod.rs`:
- Around line 153-182: The init_kvbm_offline function currently ignores
num_g2_blocks > 0 for non-Vllm engines (Sglang), making invalid KVBM configs
silently accepted; change init_kvbm_offline to fail fast instead of no-op:
update init_kvbm_offline signature to return Result<(), E> (or propagate an
existing error type), check early if args.num_g2_blocks > 0 and match self — if
Self::Vllm proceed as before, but if Self::Sglang return Err (or panic if you
prefer) with a clear message ("KVBM config requires Vllm engine; found Sglang")
so callers/config normalization will catch the invalid config. Ensure references
to init_kvbm_offline, Self::Vllm, Self::Sglang, and args.num_g2_blocks are
updated where this function is called.

---

Nitpick comments:
In `@lib/kvbm-logical/src/manager/mod.rs`:
- Around line 264-268: has_blocks currently calls InactivePool::has_block in a
loop, causing per-hash lock overhead; add a batched existence API on
InactivePool (e.g., InactivePool::has_blocks or has_many that takes
&[SequenceHash] and returns Vec<bool> or a HashSet of present hashes), implement
the internal lookup under a single lock/scan to reduce churn, then modify
Manager::has_blocks to call the new batched method (keeping the public signature
of Manager::has_blocks) so callers get the same Vec<bool> while benefiting from
the single-shot lookup; ensure tests covering both single and multiple hashes
are updated accordingly.

In `@lib/mocker/src/kv_manager/vllm_backend.rs`:
- Around line 152-182: complete_ready_offloads currently drains pending_offloads
and rebuilds a vector; you can simplify by using Vec::retain to keep items whose
complete_at_ms is in the future and call engine.complete_offload for items being
completed, while tracking the count via a Cell/AtomicUsize captured in the
closure; specifically, in complete_ready_offloads use &self.offload_engine
(clone Arc as needed), call retain on self.pending_offloads and inside the
closure check now_ms >= offload.complete_at_ms to call
engine.complete_offload(offload.block_id, offload.seq_hash) and increment the
counter, then use the counter for the tracing::debug call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4ea5b97-1fa7-4769-8544-7119c3e31de6

📥 Commits

Reviewing files that changed from the base of the PR and between 07c7cc8 and b7c678b.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • components/src/dynamo/mocker/args.py
  • components/src/dynamo/mocker/config.py
  • components/src/dynamo/mocker/tests/unit/test_config.py
  • lib/bindings/python/Cargo.toml
  • lib/bindings/python/rust/llm/replay.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/kvbm-logical/src/manager/mod.rs
  • lib/mocker/Cargo.toml
  • lib/mocker/src/common/protocols.rs
  • lib/mocker/src/kv_manager/kvbm_offload.rs
  • lib/mocker/src/kv_manager/mod.rs
  • lib/mocker/src/kv_manager/vllm_backend.rs
  • lib/mocker/src/replay/offline/core.rs
  • lib/mocker/src/replay/offline/state.rs
  • lib/mocker/src/scheduler/mod.rs
  • lib/mocker/src/scheduler/vllm/core.rs
  • lib/mocker/src/scheduler/vllm/live.rs
  • lib/mocker/src/scheduler/vllm/tests.rs

Comment thread lib/mocker/src/kv_manager/kvbm_offload.rs Outdated
Comment thread lib/mocker/src/scheduler/mod.rs Outdated
@PeaBrane
Copy link
Copy Markdown
Contributor

would want @jthomson04 to have a look as well to see if there's any interaction with his remote indexing work

Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level review pass — the diff is a bit hard to read in its current shape, and I'd like the refactors below landed first before I do a more comprehensive review of the featural / logical bits (correctness, causality, test coverage). The architecture itself looks sound; most of the friction is readability and a few parallel-method pairs that double the surface area without adding expressiveness.

  1. Extract KvManager::try_batch_swap_in from lib/mocker/src/kv_manager/vllm_backend.rs:295–431. Return an enum like { NoHits, Scheduled { allocated, defer } } so process() becomes linear again. This is the biggest readability win on the branch.

  2. Extract contiguous_g2_prefix_hits(remaining, batch_results) as a free pure function from vllm_backend.rs:341–354. The batch_idx / FullBlock / first-miss-break logic deserves its own unit test.

  3. KvbmOffloadConfig::from_args(&MockEngineArgs) -> Option<Self>. Dedups live.rs:125–133 and scheduler/mod.rs:168–174. Both sites reconstruct the same config with the same block_size * bpt boilerplate.

  4. Collapse the async/virtual method pairs on MockOffloadEngine (lib/mocker/src/kv_manager/kvbm_offload.rs):

    • enqueue_g1_eviction(bid, sh, now_ms) — one method; branch on self.offload_engine.is_some() internally.
    • start_swap_in(num_blocks, now_ms) — same.
    • Merge MockWorker::transfer_delay and MockOffloadEngine::transfer_delay_ms into one helper (one returns Duration, the other f64 ms — gratuitous).
    • SwapInHandle::is_complete(now_ms) — single method, live ignores now_ms. Kills the two panic paths.

    Cuts the public surface roughly in half and removes the "which mode am I in?" cognitive load at every call site.

  5. Hoist virtual-time bookkeeping onto MockOffloadEngine. Currently KvManager owns pending_offloads + drain_pending_offloads + pending_offload_deadlines + complete_ready_offloads + virtual_time. These are all engine concerns, not cache concerns.

    Shape:

    • engine.record_eviction(bid, sh, now_ms) — does the virtual-time branch internally.
    • engine.tick(now_ms) — called at pass start; replaces complete_ready_offloads.
    • engine.earliest_pending_deadline() -> Option<f64> — feeds the stall-advance in core.rs:471–480.

    Payoff: KvManager stops knowing about virtual time entirely for offloads. The virtual_time: bool flag either moves onto the engine or disappears (inferred from offload_engine being sync vs async). Removes ~50 lines of #[cfg] fields and methods from KvManager.

  6. (Optional extension of #5) Move pending_swap_ins off VllmCore onto the engine too, with engine.tick(now_ms) -> Vec<PromotionReady { uuid, reused_input_tokens }>. Completes the story — all KVBM state lives in one place. VllmCore just iterates the returned promotions, does prepend_waiting, and reports admits. Worth it only if #5 alone still leaves too much KVBM logic in VllmCore.

@dreamtalen
Copy link
Copy Markdown
Contributor Author

@PeaBrane thanks for the feedback! Makes sense, will ping once refactor is completed

Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for posterity — per-worker virtual time only holds because G2 is per-worker

Writing this down explicitly so a future reader (human or AI) who extends this to shared storage doesn't trip on it. (We should also probably comment / doc this out briefly somewhere in the code if not already)

Current situation — fine as-is. The offline virtual-time machinery is entirely contained inside one worker: pending_offloads, pending_swap_ins, and BlockManager<G2> all live on that worker's KvManager. No PendingOffload deadline ever needs to be visible to another worker. This works because G2 is modeled as per-worker host memory (each KvManager owns its own BlockManager<G2> sized by num_g2_blocks), so no worker ever needs to query another worker's G2 state. The only externally observable effects — token completion timestamps in the trace and the synthetic Stored/Removed events going to the router — are already routed through existing per-worker pumps (TraceCollector, EnginePassResult.kv_events) and don't require cross-worker coordination. The router tracks per-worker radix trees, so each worker announces its own tier state independently.

This assumption breaks once shared storage is introduced. If someone later adds a G3 tier that's a genuinely shared pool (CXL fabric, RDMA host-memory pool, shared NVMe, NDS-style global cache), worker B at virtual time t will need to be able to observe "block X landed in G3 at t' ≤ t because worker A offloaded it." At that point the pending-completion queue cannot remain a private Vec<PendingOffload> on one KvManager — it has to move up to a shared structure indexed by virtual time. The natural shape is:

  • A single BlockManager<G3> (or equivalent shared map) owned by the offline harness, not per-worker.
  • A global virtual-time event queue of G3 operations keyed by complete_at_ms. Workers append on evict; workers drain up to now_ms at pass-start.
  • G3 find_in_tiers becomes a query against the shared state. "Is this block ready yet?" is answered by whether the global queue has advanced past the block's complete_at_ms.

Architecturally this would look much more like the KV event pump looks today (a shared, virtual-time-ordered stream of tier mutations) than like the current per-worker G2 plumbing. A clean way to hook it in: extend EnginePassResult with something like tier_events: Vec<TierEvent { tier, op, block, complete_at_ms }>, have the offline coordinator merge those into a global priority queue, and re-dispatch completions at the right virtual time. Same pattern as kv_events, just with a different consumer (a shared tier manager instead of the router).

Bottom line: nothing to do here. The per-worker containment is correct for the current model and clean. But we cannot assume it still works once G3 shared storage is added — at that point the virtual-time offload bookkeeping needs to be refactored onto whatever cross-worker time-ordered machinery the harness has, which today is effectively only the router event pump but would need to be generalized.

cc @ryanolson if you have any idea on parallelization (pdes) over shared block usage / transfer

Copy link
Copy Markdown
Contributor

@jthomson04 jthomson04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple concerns here in terms of correctness.

Offline-mode logic mirrors kvbm internals by hand

build_sync() bypasses OffloadEngine, InstanceLeader, and PipelineBuilder, reimplementing three contracts that will silently drift if kvbm changes:

  • complete_offload mirrors TransferExecutor's post-transfer sequence (allocate_blocks → stage → register_block → drop).
  • scan_matches vs match_blocks — relies on a specific semantic difference between two closely-related kvbm APIs.
  • offload_batch_size is inert offline — delays are computed per single block.
  • No CI test runs the same trace through live + offline and asserts equivalence, so drift would pass unnoticed.

No bandwidth contention

transfer_delay = bytes / bandwidth_gbps is computed per transfer with no shared-resource state. Concurrent offloads and swap-ins all get full peak bandwidth; bursty evictions finish as fast as a single one. Under-estimates TTFT under offload pressure.

GPU slot freed before the offload completes

release_block_id returns the slot to block_id_pool immediately on eviction — long before the simulated host transfer might be finished. The new allocator pulls the same slot right back while the "transfer" is still in flight. Effective G1 capacity is inflated, and the scheduler can admit work that a real system wouldn't. Impacts scheduling decisions, not just timing.

Some final thoughts

On benchmarks with high kv pressure or long context, results from offline replay will likely be radically different than reality. This current approach will also make it very difficult to integrate offline replay with G3.

@dreamtalen
Copy link
Copy Markdown
Contributor Author

@jthomson04 thanks for the reviews. I'm refactoring with KVBM-logical as the G1 manager, which should unify some offline paths. Will ping you when ready.

@dreamtalen dreamtalen marked this pull request as draft April 20, 2026 21:02
@dreamtalen dreamtalen force-pushed the yongmingd/replay-kvbm-engine-2 branch 3 times, most recently from ed993d8 to 46fd899 Compare April 28, 2026 20:17
@dreamtalen
Copy link
Copy Markdown
Contributor Author

Hi @PeaBrane @jthomson04, I pushed a large refactor based on your feedback. The two biggest changes are:

  • Added a simple processor-sharing bandwidth model, so concurrent transfers on the same link share bandwidth
  • Reworked live/offline replay to use the same kvbm-engine offload path (OffloadEngine + InstanceLeader + PipelineBuilder + mock Worker). Offline now drives the same hot path with virtual time; live drives it with wall-clock time.

This should address most of the previous concerns, but the change is now chunky. Would you mind doing a high-level architecture/readability pass first? I’m also happy to do a quick walkthrough if that’s easier.

@dreamtalen dreamtalen force-pushed the yongmingd/replay-kvbm-engine-2 branch from 46fd899 to 44b0516 Compare April 28, 2026 21:56
@PeaBrane
Copy link
Copy Markdown
Contributor

@dreamtalen can you put this PR back to review and trigger the CIs if needed

@dreamtalen dreamtalen marked this pull request as ready for review April 29, 2026 18:35
@dreamtalen
Copy link
Copy Markdown
Contributor Author

/ok to test 44b0516

Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this iteration. A few follow-ups I would like tracked:

  1. Please hook G2 KV events into the router/storage-tier event protocol. When blocks land in G2, emit HostPinned-tier Stored events; when they leave G2, emit the matching lower-tier Removed events. This can be a separate PR.

  2. In the new transfer hot path, consider using FxHashMap for the TransferId-keyed maps instead of std::HashMap.

  3. cc @rolson for visibility on the network/bandwidth modeling bits; worth coordinating after the velo network math pieces are refactored.

@jthomson04 jthomson04 self-requested a review April 29, 2026 18:53
Comment thread lib/mocker/src/kvbm_offload/engine.rs Outdated
Comment thread lib/mocker/src/kvbm_offload/engine.rs Outdated
@dreamtalen dreamtalen force-pushed the yongmingd/replay-kvbm-engine-2 branch from 419c4e3 to 79a4777 Compare April 29, 2026 19:32
@dreamtalen
Copy link
Copy Markdown
Contributor Author

/ok to test 79a4777

@dreamtalen dreamtalen changed the title feat(mocker): KVBM G2 offload for offline replay feat(mocker): KVBM G2 offload for on/offline replay Apr 29, 2026
Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
@dreamtalen dreamtalen force-pushed the yongmingd/replay-kvbm-engine-2 branch from 79a4777 to 23b4fed Compare April 29, 2026 21:33
@dreamtalen
Copy link
Copy Markdown
Contributor Author

/ok to test 23b4fed

@dreamtalen dreamtalen merged commit f332454 into main Apr 30, 2026
221 of 227 checks passed
@dreamtalen dreamtalen deleted the yongmingd/replay-kvbm-engine-2 branch April 30, 2026 00:15
furionw pushed a commit that referenced this pull request May 2, 2026
Signed-off-by: Yongming Ding <yongmingd@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants