Skip to content

chore(kv-router): make sequences stop doing token math#8260

Merged
PeaBrane merged 8 commits into
mainfrom
rupei/sequence-prefill-hints-dag
Apr 17, 2026
Merged

chore(kv-router): make sequences stop doing token math#8260
PeaBrane merged 8 commits into
mainfrom
rupei/sequence-prefill-hints-dag

Conversation

@PeaBrane
Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane commented Apr 16, 2026

This moves prefill token math out of the sequence write model so callers compute PrefillLoadHint once and the sequence path just applies it. It also adds a short README for the local sequence DAG and its intentionally eventually consistent read projection.

flowchart TD
    A["Routing event<br/>AddRequest / MarkPrefillCompleted / Free"]
    B["WorkerTable + RequestIndex<br/>find the authoritative worker-local state"]
    C["ActiveSequences<br/>authoritative write model"]
    D["PromptRegistry<br/>derived read model"]
    E["Scheduler reads projected load"]

    A --> B
    B --> C
    C --> D
    D -. read .-> E
Loading

Summary by CodeRabbit

  • Documentation

    • Added architectural documentation for the sequence state model, detailing the routing state pipeline and consistency guarantees.
  • Refactor

    • Refactored prefill load tracking to use a unified hint structure instead of separate field parameters, improving state management consistency across the system.
  • Tests

    • Updated test suite to align with refactored prefill load hint handling.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane requested a review from a team as a code owner April 16, 2026 02:27
@github-actions github-actions Bot added chore documentation Improvements or additions to documentation router Relates to routing, KV-aware routing, etc. labels Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Walkthrough

The changes remove isl and overlap fields from SequenceRequest and ActiveSequenceEventData::AddRequest, consolidating prefill load metadata into a PrefillLoadHint structure containing initial_effective_prefill_tokens and expected_prefill_duration. Related prefill estimation logic is refactored across multiple modules, and test callsites are updated to use the new field structure.

Changes

Cohort / File(s) Summary
Core Struct & Enum Updates
lib/kv-router/src/protocols.rs, lib/kv-router/src/sequences/multi_worker.rs
Removed isl: usize and overlap: u32 fields from SequenceRequest and ActiveSequenceEventData::AddRequest enum variant payload. SequenceRequest now carries prefill metadata solely through prefill_load_hint.
Prefill Load Estimation Refactoring
lib/kv-router/src/scheduling/queue.rs, lib/llm/src/kv_router.rs, lib/mocker/src/replay/offline/components/router.rs
Refactored prefill_load_hint_for to compute expected_prefill_duration via nested match on estimator and prediction results, consolidating error handling into a single warning log with unified PrefillLoadHint construction. Removed isl and overlap field assignments in SequenceRequest creation.
Sequence Management Logic
lib/kv-router/src/sequences/single.rs, lib/kv-router/src/sequences/topology.rs
Removed add_request(...) test method and new_tokens(...) utility method. Updated add_request_with_prefill_tracking(...) signature to remove isl and overlap parameters. Test callsites refactored to compute prefill tokens via added_prefill_tokens(...) helper and explicit PrefillLoadHint construction. block_size field moved to test-only scope within ActiveSequences.
Test Updates & Benchmarks
lib/bench/kv_router/active_sequences_bench.rs, lib/llm/src/kv_router/sequence.rs
Updated bench and integration test code to remove isl/overlap field assignments and construct prefill_load_hint via tracking_hint(...) helper with initial_effective_prefill_tokens and expected_prefill_duration: None.
Documentation
lib/kv-router/src/sequences/README.md
Added comprehensive documentation of the Sequence State Model describing routing event pipelines, authority boundaries across topology.rs, request_maps.rs, single.rs, and prompt_registry.rs, and eventual consistency semantics for prefill tracking updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately describes the main change: moving prefill token calculations out of the sequence write model.
Description check ✅ Passed Description covers objectives and includes a diagram, but lacks specific details about which files changed and guidance for reviewer focus areas.
Docstring Coverage ✅ Passed Docstring coverage is 84.09% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/kv-router/src/scheduling/queue.rs (1)

300-334: Consider centralizing PrefillLoadHint construction.

This effective_isl / estimator-fallback logic is now copied here, in lib/llm/src/kv_router.rs, and in lib/mocker/src/replay/offline/components/router.rs. A shared helper in dynamo_kv_router would make the token math and warning behavior much less likely to drift across the three admission paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/kv-router/src/scheduling/queue.rs` around lines 300 - 334, The
PrefillLoadHint construction and effective_isl/estimator-fallback logic in
prefill_load_hint_for is duplicated across modules; factor this into a single
helper in the dynamo_kv_router crate (e.g., a function like
compute_prefill_load_hint or PrefillLoadHint::from_params) that accepts
isl_tokens, overlap_blocks, block_size, track_prefill_tokens and an
Option<&PrefillLoadEstimator>, performs the prefix/effective_isl math, calls
estimator.predict_prefill_duration, logs the warning on Err, and returns
Option<PrefillLoadHint>; then replace the in-place logic in
prefill_load_hint_for and the other two call sites to call this new helper,
keeping the same behavior and tracing messages (use the existing symbols
prefill_load_estimator, predict_prefill_duration, PrefillLoadHint).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/kv-router/src/scheduling/queue.rs`:
- Around line 300-334: The PrefillLoadHint construction and
effective_isl/estimator-fallback logic in prefill_load_hint_for is duplicated
across modules; factor this into a single helper in the dynamo_kv_router crate
(e.g., a function like compute_prefill_load_hint or
PrefillLoadHint::from_params) that accepts isl_tokens, overlap_blocks,
block_size, track_prefill_tokens and an Option<&PrefillLoadEstimator>, performs
the prefix/effective_isl math, calls estimator.predict_prefill_duration, logs
the warning on Err, and returns Option<PrefillLoadHint>; then replace the
in-place logic in prefill_load_hint_for and the other two call sites to call
this new helper, keeping the same behavior and tracing messages (use the
existing symbols prefill_load_estimator, predict_prefill_duration,
PrefillLoadHint).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 20ea3008-6bb0-44bb-b1bb-099c94c1adcc

📥 Commits

Reviewing files that changed from the base of the PR and between 7d46806 and 460a78a.

📒 Files selected for processing (10)
  • lib/bench/kv_router/active_sequences_bench.rs
  • lib/kv-router/src/protocols.rs
  • lib/kv-router/src/scheduling/queue.rs
  • lib/kv-router/src/sequences/README.md
  • lib/kv-router/src/sequences/multi_worker.rs
  • lib/kv-router/src/sequences/single.rs
  • lib/kv-router/src/sequences/topology.rs
  • lib/llm/src/kv_router.rs
  • lib/llm/src/kv_router/sequence.rs
  • lib/mocker/src/replay/offline/components/router.rs
💤 Files with no reviewable changes (1)
  • lib/kv-router/src/protocols.rs

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane merged commit d94b350 into main Apr 17, 2026
90 checks passed
@PeaBrane PeaBrane deleted the rupei/sequence-prefill-hints-dag branch April 17, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore documentation Improvements or additions to documentation router Relates to routing, KV-aware routing, etc. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants