chore(kv-router): make sequences stop doing token math#8260
Conversation
Signed-off-by: PeaBrane <yanrpei@gmail.com>
WalkthroughThe changes remove Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lib/kv-router/src/scheduling/queue.rs (1)
300-334: Consider centralizingPrefillLoadHintconstruction.This
effective_isl/ estimator-fallback logic is now copied here, inlib/llm/src/kv_router.rs, and inlib/mocker/src/replay/offline/components/router.rs. A shared helper indynamo_kv_routerwould make the token math and warning behavior much less likely to drift across the three admission paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/kv-router/src/scheduling/queue.rs` around lines 300 - 334, The PrefillLoadHint construction and effective_isl/estimator-fallback logic in prefill_load_hint_for is duplicated across modules; factor this into a single helper in the dynamo_kv_router crate (e.g., a function like compute_prefill_load_hint or PrefillLoadHint::from_params) that accepts isl_tokens, overlap_blocks, block_size, track_prefill_tokens and an Option<&PrefillLoadEstimator>, performs the prefix/effective_isl math, calls estimator.predict_prefill_duration, logs the warning on Err, and returns Option<PrefillLoadHint>; then replace the in-place logic in prefill_load_hint_for and the other two call sites to call this new helper, keeping the same behavior and tracing messages (use the existing symbols prefill_load_estimator, predict_prefill_duration, PrefillLoadHint).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/kv-router/src/scheduling/queue.rs`:
- Around line 300-334: The PrefillLoadHint construction and
effective_isl/estimator-fallback logic in prefill_load_hint_for is duplicated
across modules; factor this into a single helper in the dynamo_kv_router crate
(e.g., a function like compute_prefill_load_hint or
PrefillLoadHint::from_params) that accepts isl_tokens, overlap_blocks,
block_size, track_prefill_tokens and an Option<&PrefillLoadEstimator>, performs
the prefix/effective_isl math, calls estimator.predict_prefill_duration, logs
the warning on Err, and returns Option<PrefillLoadHint>; then replace the
in-place logic in prefill_load_hint_for and the other two call sites to call
this new helper, keeping the same behavior and tracing messages (use the
existing symbols prefill_load_estimator, predict_prefill_duration,
PrefillLoadHint).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 20ea3008-6bb0-44bb-b1bb-099c94c1adcc
📒 Files selected for processing (10)
lib/bench/kv_router/active_sequences_bench.rslib/kv-router/src/protocols.rslib/kv-router/src/scheduling/queue.rslib/kv-router/src/sequences/README.mdlib/kv-router/src/sequences/multi_worker.rslib/kv-router/src/sequences/single.rslib/kv-router/src/sequences/topology.rslib/llm/src/kv_router.rslib/llm/src/kv_router/sequence.rslib/mocker/src/replay/offline/components/router.rs
💤 Files with no reviewable changes (1)
- lib/kv-router/src/protocols.rs
Signed-off-by: PeaBrane <yanrpei@gmail.com>
…ompt trie Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
This moves prefill token math out of the sequence write model so callers compute
PrefillLoadHintonce and the sequence path just applies it. It also adds a short README for the local sequence DAG and its intentionally eventually consistent read projection.flowchart TD A["Routing event<br/>AddRequest / MarkPrefillCompleted / Free"] B["WorkerTable + RequestIndex<br/>find the authoritative worker-local state"] C["ActiveSequences<br/>authoritative write model"] D["PromptRegistry<br/>derived read model"] E["Scheduler reads projected load"] A --> B B --> C C --> D D -. read .-> ESummary by CodeRabbit
Documentation
Refactor
Tests