fix(prefill-router): persist endpoint after handshake so decode rebuild reactivates by yifjiang · Pull Request #8965 · ai-dynamo/dynamo

yifjiang · 2026-05-01T03:09:22Z

Summary

Fixes a routing-chain stall in the disaggregated frontend that turns transient decode-pod restarts into permanent inference outages. After all decode pods in a namespace briefly disappear (canary livenessProbe kills two replicas in quick succession, or a manual rollout), every subsequent inference request returns:

HTTP 500 ValueError: Disaggregated params are required for decode mode

…until the frontend pod itself is restarted. Pods stay 1/1 Running, prefill workers are clean, but routing has degenerated to aggregated mode (bare requests forwarded to decode workers) and stays that way.

The PR also addresses a corner case discovered during fix iteration: decode registers before prefill, then is removed before prefill ever arrives — leaving a stale DecodeWaiting entry that orphans the next decode rebuild.

Two patterns, both fixed

Pattern 1 — decode-rebuild stall after handshake completes

decode registers → prefill registers → handshake completes (map empty)
→ all decode pods removed (canary kills both, or kubectl delete)
→ new decode pods register → rebuild's PrefillRouter stalls in DecodeWaiting forever

Post-handshake the activator map was empty, so the rebuild's register_prefill_router fell into the None branch, created a fresh DecodeWaiting(tx), and waited forever (prefill workers hadn't changed; nobody calls activate_prefill_router again). PrefillRouter::generate falls back to aggregated mode and forwards bare requests to decode → trtllm 500s, vllm/sglang silent agg-mode degradation.

Pattern 2 — stale DecodeWaiting after decode-first race

decode registers first → activator stores DecodeWaiting(sender)
→ decode WorkerSet removed before prefill ever arrives
→ activator state preserved on decode teardown (good for Pattern 1's PrefillReady cache)
  …but DecodeWaiting's receiver is now dead (held by the dropped PrefillRouter)
→ new decode pod registers → register_prefill_router sees stale DecodeWaiting → returns None
→ rebuilt WorkerSet has no PrefillRouter

How to reproduce

Both patterns can be exercised on a single Kubernetes namespace running 1 prefill + 1–2 decode workers + a frontend. No special engine config required.

Pattern 1 reproducer (decode-rebuild stall)

Setup: 1P + 2D + canary livenessProbe wired to /health. Required env on each worker: DYN_HEALTH_CHECK_ENABLED=true, DYN_CANARY_WAIT_TIME=10, DYN_HEALTH_CHECK_REQUEST_TIMEOUT=3, plus livenessProbe: {httpGet: {path: /health, port: system}, failureThreshold: 6, periodSeconds: 10}. Apply the DGD and wait for all 4 pods Ready, then 60 s for worker discovery to settle.

# Pre-trigger sanity probe
curl -X POST http://<frontend>/v1/chat/completions -d '{...}'    # → HTTP 200

# Trigger: kubectl-delete BOTH decodes simultaneously
kubectl delete pod decodeworker-0 decodeworker-1 --grace-period=0 --force

# Wait ~2 min for replacement decodes to be Ready, then probe again:
curl -X POST http://<frontend>/v1/chat/completions -d '{...}'
# without fix: HTTP 500 ValueError: Disaggregated params are required for decode mode
# with fix:    HTTP 200 ✓

In production this trigger fires when the canary livenessProbe kills both decode replicas in close succession (we observed ~1.5 min apart on a real incident), bringing the namespace's decode count to 0 briefly.

Pattern 2 reproducer (decode-first race / stale DecodeWaiting)

Setup: 1P + 1D + frontend. Make the PREFILL image temporarily invalid (e.g., nvcr.io/.../dynamo-trtllm:DOESNOTEXIST) so it never registers; decode + frontend images are real. Apply the DGD and wait for frontend + decode Ready (prefill stays in ImagePullBackOff). The activator now holds DecodeWaiting — decode arrived first, is waiting for prefill to register.

# Trigger: kubectl-delete the decode pod while still DecodeWaiting
kubectl delete pod decodeworker-0 --grace-period=0 --force

# Wait for replacement decode Ready. Now patch the prefill image to valid:
kubectl patch dgd <name> --type=json -p \
  '[{"op":"replace","path":"/spec/services/PrefillWorker/extraPodSpec/mainContainer/image","value":"nvcr.io/.../dynamo-trtllm:<real-tag>"}]'

# Wait for prefill Ready, then probe:
curl -X POST http://<frontend>/v1/chat/completions -d '{...}'
# without fix: HTTP 500 ValueError: Disaggregated params are required for decode mode
# with fix:    HTTP 200 ✓

Frontend log signatures

State	Cold-start handshake	After teardown	Recovery
Without fix (P1)	`Activated prefill router` + `Activating prefill router`	`Removed WorkerSet (no remaining instances in namespace)`	(silent — no second `Activating prefill router` ever)
With fix (P1)	same	same	`Activating prefill router` + `Prefill router activated successfully` (cache hand-off)
Without fix (P2)	`No prefill endpoint for namespace yet, storing sender`	`Removed WorkerSet`	`ERROR Decode WorkerSet already registered for this prefill router` — rebuilt WorkerSet has no PrefillRouter
With fix (P2)	same	`Removed stale DecodeWaiting activator on decode WorkerSet teardown` (debug)	second `Activating prefill router` after prefill arrives

Fix (8 commits)

Commit	What
`cde9b578a6`	`activate_prefill_router` DecodeWaiting branch: persist `PrefillReady` after waking the original receiver
`ec33872bdb`	watcher: scope `remove_prefill_activator` to prefill-component teardown only
`07142a7d6a`	refactor `PrefillReady(oneshot::Receiver<Endpoint>)` → `PrefillReady(Endpoint)`; consumer hands out fresh receiver synthesized from cache and re-inserts atomically
`d4088241c0`	`PrefillReady(Endpoint)` → `PrefillReady(Box<Endpoint>)` for `clippy::large_enum_variant`
`3bc7891aff`	cargo fmt for boxed inserts
`ef1dadcdf5`	add `remove_decode_prefill_waiter` helper + watcher decode-teardown call — Pattern 2 fix
`47b6f0b91c`	refactor both functions to `Entry` API for atomic per-key state transitions; ungate decode-waiter cleanup on `removed.is_some()` (addresses dynamo-ops review)
`96ecac2204`	cargo fmt for entry-API refactor

State-machine invariants are now unambiguous:

DecodeWaiting(sender) = a live decode router is waiting on the receiver — cleared on decode teardown
PrefillReady(Box<Endpoint>) = a prefill endpoint is cached for future decode rebuilds — cleared only on prefill teardown

Verification

End-to-end verified on head-tot-rc13-pr8965-2026-05-02-x86-b200:

#	Reproducer	Result
1	burst-sweep, canary OFF	3/3 RECOVERY at T+30 / 90 / 120 s ✅
2	multi-decode restart (Pattern 1)	10 / 10 ✅
3	decode-first race (Pattern 2)	5 / 5 ✅
4	burst-sweep, canary ON	3/3 RECOVERY at T+60 / 120 / 180 s ✅

Without these fixes, Patterns 1 and 2 produce 100% HTTP 500 Disaggregated params required for decode mode (trtllm) or silent agg-mode degradation (vllm/sglang). The latest two commits (entry-API atomicity + ungated cleanup) are logic-equivalent to the prior verified head; re-verification on a fresh image build pending.

Test plan

Manual reproduction on a real Kubernetes cluster across all four reproducers — confirms each fires before the fix and recovers after.
Unit test for the new PrefillReady-branch refresh-and-reactivate behavior. Currently blocked on the existing test-mod note ("activate_prefill_router requires an Endpoint, so we test the registration state machine and cleanup only" — model_manager.rs:1094-1095). A small #[cfg(test)] shim that constructs a stub Endpoint would unblock this.
Integration test in tests/fault_tolerance/ mirroring test_canary_rank_pause.py: spin up a real engine + decode worker, kill the decode rank, wait for restart, send an inference request, assert it returns 200.
Run with --enforce-disagg to confirm the existing strict-mode error path still fires correctly when prefill is actually gone (this PR shouldn't change that).

The rebuild trigger pattern was first observed in production with head-tot-rc13-pr13495 on the GB200 NVCF disagg shadow function: canary livenessProbe killed both decode replicas within ~1.5 minutes of each other, bringing the namespace's decode count to 0 briefly. After the new pods came up healthy, every subsequent request hit this bug for the next 75 minutes until the function was redeployed via NVCF version cutover.

Pre-fix mitigations (without kubectl access in NVCF): raise DYN_HEALTH_CHECK_REQUEST_TIMEOUT above prefill P99 (≥ 130 s for the prod 480B), or drop the canary livenessProbe entirely. Both remove the trigger but don't fix the underlying state-machine gap.

🤖 Generated with Claude Code

Summary by CodeRabbit

Refactor
- Optimized internal routing state machine to improve endpoint caching efficiency and reduce unnecessary re-registrations during system operations.
- Enhanced worker set deletion cleanup logic to ensure more precise resource management and system stability.

…ld reactivates When all decode pods in a namespace temporarily go away (e.g., the DYN_HEALTH_CHECK_ENABLED canary livenessProbe kills two replicas in quick succession, or a manual rollout deletes the decode pods), the watcher tears down the decode WorkerSet via remove_worker_set + remove_prefill_activator. Once the new decode pods register fresh instance_ids in ETCD, the watcher rebuilds the WorkerSet. The new PrefillRouter calls register_prefill_router, which now finds the activator map empty and falls into the None branch, returning a fresh DecodeWaiting(tx). Prefill workers haven't changed, so no second activate_prefill_router call ever fires. The new PrefillRouter sits on that oneshot::Receiver forever; prefill_router OnceLock stays None; PrefillRouter::generate falls back to aggregated mode and forwards bare requests to decode workers, which reject them with "Disaggregated params are required for decode mode" until the frontend pod itself is restarted. The fix mirrors what the existing prefill-rejoin path at lines 740-770 already does: persist the prefill endpoint as PrefillReady(rx) after a successful handshake so future decode WorkerSet rebuilds find it and activate immediately. Two changes: 1. DecodeWaiting branch: after waking the original receiver, also store a fresh PrefillReady(rx) carrying a clone of the endpoint. Endpoint is #[derive(Clone)]. 2. PrefillReady branch: previously errored with "already activated". Change to refresh the cached endpoint and reactivate any deactivated decode-side router. This handles two cases: (a) duplicate activate_prefill_router calls, (b) prefill rejoin after the DecodeWaiting branch's persist step has left a PrefillReady entry. The None branch (lines 740-789) is unchanged and still handles the case where the activator map was cleared between the original handshake and a prefill rejoin. Manually reproduced on dynamo-trtllm:head-tot-rc13-2026-04-29-x86-b200 in a 1P+2D Kubernetes deployment. After "kubectl delete pod" of both decode pods, the post-restart inference success rate is 0/10 (HTTP 500 ValueError: Disaggregated params required for decode mode). With this patch the same scenario should return to 1/1 readiness with inference probes succeeding immediately after the new decode pods register their instance_ids. Verification: - An integration test in tests/fault_tolerance/ should drive the full end-to-end scenario (real engine, kill decode rank, verify inference succeeds after rebuild) — left for a follow-up since the existing test_canary_rank_pause.py harness only covers the canary's pause detection and not the WorkerSet-rebuild routing path. - Rust unit tests for the new PrefillReady-branch behavior are also follow-up work; they need a way to construct a stub Endpoint without a DistributedRuntime.

github-actions · 2026-05-01T03:09:31Z

🌿 Fern Docs Preview: https://nvidia-preview-81d7f35c-99c9-4cb0-a5c9-2c8140580363.docs.buildwithfern.com/dynamo/dev

Companion to the activate_prefill_router persist-on-handshake fix in the previous commit. That commit caches the prefill endpoint as PrefillReady(rx) in prefill_router_activators after a successful handshake so future decode WorkerSet rebuilds find it and activate immediately. But the watcher's component-teardown path (handle_delete_helper) was clearing prefill_router_activators on ANY WorkerSet removal — including decode-component removal — which wiped the cache exactly when we needed it. Empirically verified on a 1P+2D Kubernetes deployment with the activate_prefill_router fix alone: after `kubectl delete pod` of both decode pods, the frontend log shows the cached PrefillReady was set during cold start ("Activated prefill router for decode WorkerSet") and then cleared by the watcher ("Removed WorkerSet (no remaining instances in namespace)"). Post-rebuild inference returns 0/10 with "Disaggregated params required for decode mode" — same as without the fix. Move remove_prefill_activator inside the `supports_prefill()` branch so it only fires when the prefill component is gone (where the cached endpoint is genuinely stale). Decode-component teardown leaves the cache intact; the next handle_add that creates a new decode WorkerSet calls register_prefill_router and finds PrefillReady, activating without needing prefill workers to re-register. Both the activator cleanup AND the existing decode-side deactivate_prefill_router_for_decode call are now under the same prefill-teardown guard, which is the correct semantic — neither should fire on decode-side removal.

…persists The previous v2 fix (commit cde9b57 + ec33872) only kept PrefillReady alive across decode rebuilds when decode happened to register first at cold start. When prefill won the race instead, the existing register_prefill_router PrefillReady-arm consumed the cached oneshot::Receiver and emptied the map, leaving the same buggy state. Empirically reproduced on -v2 image: frontend log shows "Stored prefill endpoint for future decode WorkerSet registration" (the prefill-first path) at cold start, then post-rebuild requests still return 0/10 with the disagg-params error. The right fix is to cache the Endpoint directly instead of wrapping it in a oneshot::Receiver: enum PrefillActivationState { - PrefillReady(oneshot::Receiver<Endpoint>), + PrefillReady(Endpoint), } The Endpoint is Clone (#[derive(Debug, Clone)]), so register_prefill_router synthesizes a fresh oneshot::channel and primes the receiver with a clone of the cached endpoint, then re-inserts PrefillReady with the endpoint so the next call finds it again. activate_prefill_router inserts PrefillReady(endpoint) directly without the oneshot dance. Both race winners now leave a durable PrefillReady cache: - Decode arrives first → activate_prefill_router DecodeWaiting branch wakes original receiver, then inserts PrefillReady(endpoint) - Prefill arrives first → activate_prefill_router None branch inserts PrefillReady(endpoint); decode's register_prefill_router consumes, hands out fresh rx with clone, AND re-inserts PrefillReady(endpoint) Decode rebuild (next register_prefill_router for the same key) finds PrefillReady regardless of cold-start race, hands out a fresh rx, and re-caches. Prefill teardown still clears the cache via remove_prefill_activator (in the watcher's prefill-only branch from the previous commit). Companion to ec33872 (watcher: scope remove_prefill_activator to prefill teardown). Both commits together fix the "all decodes restart → frontend stuck in aggregated mode" wedge.

coderabbitai · 2026-05-01T17:36:09Z

Walkthrough

Refactors the prefill/decode rendezvous state machine from storing oneshot::Receiver to caching actual Endpoint objects. Updates deletion cleanup logic to distinguish whether a WorkerSet was removed and whether the removed component is the PREFILL component, applying cache clearing only when appropriate.

Changes

Cohort / File(s)	Summary
State Machine Refactoring `lib/llm/src/discovery/model_manager.rs`	Refactors prefill/decode rendezvous from storing `oneshot::Receiver` to caching `Endpoint` objects. `register_prefill_router` now creates fresh oneshot channels on cache hits and reinserts cached endpoints. `activate_prefill_router` caches endpoints after `DecodeWaiting` handshake and refreshes cached endpoint with potential deactivated decode-side reactivation. Removes oneshot sender/channel plumbing in non-cached paths.
Cleanup Logic Refinement `lib/llm/src/discovery/watcher.rs`	Updates deletion cleanup to distinguish whether a `WorkerSet` was actually removed and whether the removed component is `PREFILL`. Activator cache is now cleared only when both conditions are met (deletion occurred and component supports prefill), while preserving decode-side deactivation behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(prefill-router): persist endpoint after handshake so decode rebuild reactivates' accurately summarizes the main change: persisting the prefill endpoint after handshake to enable decode rebuild reactivation, directly addressing the core fix for the routing-chain stall.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is comprehensive and follows the template structure with all required sections: Overview (Summary), Details (detailed pattern analysis, reproducers, fix breakdown, verification), and Related Issues references. It includes root cause, reproduction steps, verification results, and a thorough test plan.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/discovery/model_manager.rs`:
- Around line 49-56: The PrefillActivationState enum's PrefillReady(Endpoint)
variant must be boxed to satisfy clippy::large_enum_variant; change it to
PrefillReady(Box<Endpoint>) in the enum definition (PrefillActivationState) and
update all places that construct or pattern-match it: when destructuring
PrefillReady(endpoint) dereference the box (e.g., *endpoint) to access Endpoint,
when sending or cloning use (*endpoint).clone() or clone the inner value, and
when constructing PrefillReady wrap the endpoint with Box::new(...) (e.g.,
PrefillReady(Box::new(endpoint.clone()))). Ensure every usage site that
previously assumed owning Endpoint is adjusted to handle Box<Endpoint>.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b96ab0fc-15c7-4fd0-b6ad-b5c3d592177f

📥 Commits

Reviewing files that changed from the base of the PR and between 67dc080 and 07142a7.

📒 Files selected for processing (2)

lib/llm/src/discovery/model_manager.rs
lib/llm/src/discovery/watcher.rs

…arge_enum_variant CodeRabbit / CI Clippy flagged `PrefillReady(Endpoint)` as creating a large size disparity vs `DecodeWaiting(oneshot::Sender<Endpoint>)`. Box the cached Endpoint so the enum variants stay balanced. Touches: - enum definition: `PrefillReady(Endpoint)` → `PrefillReady(Box<Endpoint>)` - register_prefill_router PrefillReady arm: `endpoint.clone()` → `(*endpoint).clone()` to dereference the Box for the oneshot::Sender (which still carries Endpoint, not Box<Endpoint>) - 4 construction sites: wrap with `Box::new(endpoint)` - destructure-and-reinsert path keeps the Box (no change needed) Logic is unchanged; only the storage indirection changes. Behavior verified end-to-end on -v3 image (commits 1+2+3) was 10/10. This follow-up satisfies the lint without touching any state-machine code paths.

CI rust-tests cargo-fmt check flagged the four `.insert(key, PrefillActivationState::PrefillReady(Box::new(endpoint)));` lines as too long for the single-line form. Apply rustfmt's preferred multi-line wrap. Pure formatting; no behavior change.

…ardown Companion to commit 2 ("watcher: scope remove_prefill_activator to prefill teardown only"). That commit deliberately stopped clearing activator state on decode teardown so the cached `PrefillReady` survives across decode rebuilds. The unintended consequence: a `DecodeWaiting(sender)` entry left over from a decode that registered before any prefill also persists, even though its `oneshot::Receiver` was dropped along with the PrefillRouter inside the removed WorkerSet. The next decode rebuild's `register_prefill_router` then finds the stale `DecodeWaiting`, hits the `Some(DecodeWaiting)` arm at model_manager.rs:737, emits `Decode WorkerSet already registered for this prefill router`, and returns `None`. The rebuilt WorkerSet ends up with no PrefillRouter at all. When prefill finally registers, `activate_prefill_router` wakes the orphaned receiver and activates the *old* router (still kept alive by the spawn task's Arc), so logs *look* like success ("Activating prefill router" / "Prefill router activated successfully") while the rebuilt WorkerSet stays empty. Inference requests bypass the missing PrefillRouter and decode workers reject them with "Disaggregated params required for decode mode". Reproducer: 1. Start frontend; decode registers; prefill held back (e.g., invalid image, replicas=0, impossible nodeSelector). Activator → DecodeWaiting. 2. Force-delete the decode pod. Watcher tears down the WorkerSet. With commit 2 alone, DecodeWaiting stays in the map (stale). 3. Replacement decode pod registers. register_prefill_router finds stale DecodeWaiting → returns None → rebuilt WorkerSet has no PrefillRouter. Frontend log: ERROR "Decode WorkerSet already registered for this prefill router". 4. Allow prefill to register. activate_prefill_router activates the orphaned old router; rebuilt WorkerSet still empty. 5. Inference: HTTP 500 "Disaggregated params are required for decode mode". Verified end-to-end on `head-tot-rc13-pr8965-2026-05-01-x86-b200` (d104, 2026-05-01): 5/5 inference probes return the expected disagg-params 500 with the predicted log signature. Fix: - Add `ModelManager::remove_decode_prefill_waiter` that atomically removes the activator entry only if it's `DecodeWaiting`. `PrefillReady` cache entries are preserved. - Watcher's `handle_delete_helper` now calls `remove_decode_prefill_waiter` on decode-component teardown (when `!supports_prefill()`), preserving PR 8965's primary contribution while plugging the stale-waiter gap. State-machine invariants are now: DecodeWaiting → "a live decode router is waiting on the receiver" PrefillReady → "a prefill endpoint is cached for future decode rebuilds"

…code-waiter cleanup Addresses two correctness issues raised by dynamo-ops review on PR 8965. 1. **Race on PrefillReady re-insert** (model_manager.rs) Both `register_prefill_router` and `activate_prefill_router` previously used a `remove → process → insert` pattern on `prefill_router_activators`. Between the `remove` and the `insert` the entry is gone from the map, so a concurrent `remove_prefill_activator` call (called by the watcher on prefill-component teardown) hits the empty map, skips its cleanup, and we then re-insert a stale `PrefillReady` for a prefill that's already gone. The next decode rebuild's lookup finds the resurrected cache and activates a `PrefillRouter` against a dead endpoint. Refactor both functions to use DashMap's `Entry` API. The shard lock is held for the duration of the OccupiedEntry, so any concurrent `remove_prefill_activator` serializes after us and observes the entry it needs to clear. The PrefillReady-consume path in `register_prefill_router` now reads-and-clones in place without removing; the DecodeWaiting/PrefillReady → PrefillReady transition in `activate_prefill_router` uses `OccupiedEntry::insert(...)` for an atomic value swap that returns the old value. 2. **Decode-waiter cleanup gated on `removed.is_some()`** (watcher.rs) `handle_delete_helper`'s decode branch only called `remove_decode_prefill_waiter` when `remove_worker_set` returned `Some(_)`. If decode registered (creating a `DecodeWaiting` activator entry via `register_prefill_router`) but `handle_add_helper` failed later (e.g., on `kv_chooser_for`, monitor setup, or `build_routed_pipeline`) before `add_worker_set`, no WorkerSet ever landed in the manager — and on later teardown `remove_worker_set` returns `None`, leaving the stale `DecodeWaiting` entry orphaned in the activator map. Drop the `removed.is_some()` guard. The helper itself is state-safe (uses `DashMap::remove_if(|_, v| matches!(v, DecodeWaiting(_)))`) so calling it on a key that's vacant or holds `PrefillReady` is a no-op. The prefill-teardown branch keeps its `removed.is_some()` guard because `remove_prefill_activator` blanket-removes both states, and calling it without a real WorkerSet teardown could orphan a live decode-side `DecodeWaiting`. Behavior is unchanged for all four reproducers (ai-dynamo#1 burst-sweep canary OFF, ai-dynamo#2 Pattern 1 multi-decode restart, ai-dynamo#3 Pattern 2 stale-DecodeWaiting, ai-dynamo#4 burst-sweep canary ON); will re-verify on a fresh build.

CI rust-tests caught a one-liner fmt nit at model_manager.rs:765 — rustfmt prefers the let-binding on a single line at this width. Pure cosmetic.

…decode-rebuild-stall Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

…decode-rebuild-stall Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com> # Conflicts: # docs/backends/vllm/vllm-omni.md

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

PeaBrane

LGTM. The state-machine change looks right to me: the initial decode-first and prefill-first rendezvous paths still work, and keeping the prefill endpoint cached across decode WorkerSet rebuilds addresses the stall without adding request-path overhead.

One follow-up I’d like to see, if practical: a regression/integration test for decode-first activation followed by a decode WorkerSet rebuild while prefill stays alive, asserting that the rebuilt decode side activates immediately from the cached PrefillReady endpoint. The new DecodeWaiting cleanup tests cover one half of the lifecycle; this would lock down the core behavior this PR is fixing.

pull-request-size Bot added the size/M label May 1, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 03:09 Inactive

github-actions Bot added fix external-contribution Pull request is from an external contributor labels May 1, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 03:52 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 04:20 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 04:23 Inactive

pull-request-size Bot added size/L and removed size/M labels May 1, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 09:02 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 09:03 Inactive

yifjiang marked this pull request as ready for review May 1, 2026 17:31

yifjiang requested a review from a team as a code owner May 1, 2026 17:31

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

Comment thread lib/llm/src/discovery/model_manager.rs

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 18:36 Inactive

This comment was marked as outdated.

Sign in to view

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 18:58 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 19:03 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 1, 2026 19:46 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 2, 2026 00:16 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 2, 2026 00:21 Inactive

dynamo-ops reviewed May 4, 2026

View reviewed changes

Comment thread lib/llm/src/discovery/model_manager.rs Outdated

Comment thread lib/llm/src/discovery/watcher.rs Outdated

copy-pr-bot Bot temporarily deployed to GITLAB May 4, 2026 19:51 Inactive

chore(prefill-router): cargo fmt fix on activate_prefill_router rebase

96ecac2

CI rust-tests caught a one-liner fmt nit at model_manager.rs:765 — rustfmt prefers the let-binding on a single line at this width. Pure cosmetic.

copy-pr-bot Bot temporarily deployed to GITLAB May 4, 2026 19:58 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 4, 2026 20:19 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 5, 2026 16:36 Inactive

Merge remote-tracking branch 'upstream/main' into fix/prefill-router-…

a2e9e75

…decode-rebuild-stall Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

yifjiang force-pushed the fix/prefill-router-decode-rebuild-stall branch from 0947b7b to a2e9e75 Compare May 5, 2026 16:38

copy-pr-bot Bot temporarily deployed to GITLAB May 5, 2026 16:38 Inactive

fix(docs): remove stale reproducer path reference

cafa04c

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB May 5, 2026 16:43 Inactive

ci: skip lychee cache on pull requests

7ef27df

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

yifjiang requested a review from a team as a code owner May 5, 2026 16:48

copy-pr-bot Bot temporarily deployed to GITLAB May 5, 2026 16:48 Inactive

github-actions Bot added the actions label May 5, 2026

fix(docs): update vllm omni qwen3 tts link

2661f44

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB May 5, 2026 17:24 Inactive

github-actions Bot added the documentation Improvements or additions to documentation label May 5, 2026

fix(tests): ignore unrelated agent trace records

ee135cb

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

copy-pr-bot Bot had a problem deploying to GITLAB May 5, 2026 17:38 Failure

Merge remote-tracking branch 'upstream/main' into fix/prefill-router-…

32d19f5

…decode-rebuild-stall Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com> # Conflicts: # docs/backends/vllm/vllm-omni.md

copy-pr-bot Bot had a problem deploying to GITLAB May 6, 2026 17:09 Failure

chore: remove agent trace test workaround

3a0ffc6

Signed-off-by: yifjiang <19356972+yifjiang@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB May 6, 2026 17:17 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 6, 2026 17:23 Inactive

PeaBrane approved these changes May 6, 2026

View reviewed changes

yifjiang merged commit 7ae69a3 into ai-dynamo:main May 7, 2026
159 of 162 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prefill-router): persist endpoint after handshake so decode rebuild reactivates#8965

fix(prefill-router): persist endpoint after handshake so decode rebuild reactivates#8965
yifjiang merged 15 commits into
ai-dynamo:mainfrom
yifjiang:fix/prefill-router-decode-rebuild-stall

yifjiang commented May 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

PeaBrane left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yifjiang commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Two patterns, both fixed

Pattern 1 — decode-rebuild stall after handshake completes

Pattern 2 — stale DecodeWaiting after decode-first race

How to reproduce

Pattern 1 reproducer (decode-rebuild stall)

Pattern 2 reproducer (decode-first race / stale DecodeWaiting)

Frontend log signatures

Fix (8 commits)

Verification

Test plan

Related

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

PeaBrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yifjiang commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading