fix(planner): backfill max_num_batched_tokens from discovery for VirtualConnector by jthomson04 · Pull Request #8042 · ai-dynamo/dynamo

jthomson04 · 2026-04-09T20:03:40Z

Summary

The VirtualConnector doesn't implement get_worker_info, so max_num_batched_tokens was always None — blocking both load-based and throughput-based scaling in agg and prefill mode.
Engines already publish max_num_batched_tokens in their ModelDeploymentCard via discovery, and the FPM subscriber already watches those MDC events. This change captures the card_json on Added events and exposes it via get_model_cards().
The planner now backfills WorkerInfo.max_num_batched_tokens from the first available model card when the connector didn't provide the value. No-ops if the value is already set (e.g. Kubernetes connector).

Test plan

Added unit tests for _populate_worker_info_from_discovery: backfill happy path, no-op when already set, no-op without subscriber, skip incomplete cards
Verify cargo check -p dynamo-py3 passes (confirmed locally)
Deploy planner with "mode": "agg" using VirtualConnector and confirm max_num_batched_tokens is populated from discovery and scaling proceeds

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved throughput and load-based scaling to automatically discover missing worker configuration from model deployment metadata. The system now uses complete information when adjusting resource allocation and batch processing limits.
Tests
- Added comprehensive test coverage validating configuration discovery and backfill behavior in various scenarios.

coderabbitai · 2026-04-09T20:08:42Z

Walkthrough

The changes add support for backfilling max_num_batched_tokens from discovered model deployment cards when not explicitly provided. A new private method is integrated into planner scaling loops to refresh worker information from FPM subscriber model cards, with corresponding Rust-side tracking, type stub declarations, and test coverage.

Changes

Cohort / File(s)	Summary
Core Planner Logic `components/src/dynamo/planner/core/base.py`, `components/src/dynamo/planner/core/agg.py`, `components/src/dynamo/planner/core/prefill.py`	Added new private method `_populate_worker_info_from_discovery()` to backfill missing `max_num_batched_tokens` from model cards with early-exit guards; integrated invocations before batching limit retrieval in throughput and load-based scaling loops.
FPM Subscriber `lib/bindings/python/rust/llm/fpm.rs`	Added model card tracking in `FpmEventSubscriber` via concurrent map keyed by worker ID; updated MDC discovery task to capture and remove card JSON on model instance events; exposed new public method `get_model_cards()` returning filtered snapshot by worker ID.
Type Stubs `lib/bindings/python/src/dynamo/_core.pyi`	Updated Python type stubs to declare new `FpmEventSubscriber.get_model_cards(self) -> dict[str, str]` method with documented runtime error on missing `start_tracking()` invocation.
Tests `components/src/dynamo/planner/tests/unit/test_load_based_scaling.py`	Added `TestPopulateWorkerInfoFromDiscovery` test suite validating successful backfill from model cards, no-op behavior for already-set values, unset FPM subscriber, and graceful skipping of malformed cards.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description covers the problem, solution, and test plan, but is missing the required template structure with Overview/Details/Where should the reviewer start sections and Related Issues.	Restructure the description to follow the template: add Overview, Details, and Where should the reviewer start sections; add Related Issues section with GitHub issue reference.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: backfilling max_num_batched_tokens from discovery for VirtualConnector, which is the core objective of this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tedzhouhk · 2026-04-16T20:51:58Z

Here's a review comment you can paste based on our conversation:

Thanks for unblocking VirtualConnector scaling — this works, but I think it solves the symptom in the wrong layer and is worth reshaping before merge.

The architectural concern

max_num_batched_tokens is a WorkerInfo field, and WorkerInfo population is the connector's contract (resolve_worker_info in monitoring/worker_info.py, KubernetesConnector.get_worker_info in connectors/kubernetes.py:491). The root cause here is simply that VirtualConnector doesn't implement get_worker_info. Fixing it by threading a per-field getter through FpmEventSubscriber (whose documented purpose is Forward Pass Metrics) couples two unrelated concerns and only patches one of several missing fields — total_kv_blocks, kv_cache_block_size, max_num_seqs, and context_length come from the same card_json.runtime_config (see kubernetes.py:453-487) and are all None on VirtualConnector today. Each new field would need another get_<field>() on the subscriber; that's a clear sign the abstraction is off.

Suggested shape

Factor the MDC plumbing into two pieces:

A shared worker_info_from_mdc(card_json, sub_component_type) helper (basically lift _build_worker_info_from_mdc out of kubernetes.py) — pure function, no K8s/discovery dependencies. Also pull the LoRA-card filter and prefill/decode heuristic into the same module; they're properties of MDC, not of K8s.
A small McdSource protocol with two implementations:
- KubeCrdSource — wraps today's _extract_mdc_entries.
- DiscoverySource — reads MDC from the existing discovery watch.

Then VirtualConnector.get_worker_info composes DiscoverySource + worker_info_from_mdc. K8s keeps its DGD enrichment (k8s_name, container-arg model-name fallback) as a thin wrapper around the shared extractor.

copy-pr-bot · 2026-04-17T15:54:46Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jthomson04 · 2026-04-17T18:39:49Z

/ok to test 20fb262

…ualConnector The VirtualConnector does not implement get_worker_info, so max_num_batched_tokens was always None — blocking both load-based and throughput-based scaling in agg (and prefill) mode. Engines already publish max_num_batched_tokens in their ModelDeploymentCard via the discovery plane, and the FPM subscriber already watches those MDC events. The subscriber now extracts max_num_batched_tokens from each worker's runtime_config at discovery time and resolves the minimum across all known workers, exposing it via a narrow get_max_num_batched_tokens() -> Option<u64> binding. NativePlannerBase backfills WorkerInfo.max_num_batched_tokens from the subscriber when the connector didn't provide it, and updates the state machine's capabilities via a new update_capabilities() method so subsequent scaling decisions pick up the value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Narrow model_name_fallback except to (RuntimeError, OSError, ValueError) per python-guidelines.md, and hoist unittest.mock / planner imports in test_load_based_scaling.py to module top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jthomson04 requested a review from a team as a code owner April 9, 2026 20:03

jthomson04 requested a review from a team April 9, 2026 20:03

jthomson04 requested a review from a team as a code owner April 9, 2026 20:03

pull-request-size Bot added the size/L label Apr 9, 2026

github-actions Bot added fix planner labels Apr 9, 2026

This comment was marked as resolved.

Sign in to view

jthomson04 force-pushed the jthomson04/planner-discovery-backfill branch from a143136 to fc88d16 Compare April 9, 2026 20:18

copy-pr-bot Bot temporarily deployed to GITLAB April 9, 2026 20:18 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 9, 2026 20:24 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 9, 2026 20:48 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 9, 2026 22:31 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 10, 2026 17:35 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 10, 2026 17:39 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 10, 2026 17:57 Inactive

jthomson04 requested a review from tedzhouhk April 10, 2026 18:22

jthomson04 requested a review from a team as a code owner April 14, 2026 16:34

copy-pr-bot Bot temporarily deployed to GITLAB April 14, 2026 16:34 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 14, 2026 16:37 Inactive

jthomson04 force-pushed the jthomson04/planner-discovery-backfill branch from cf74241 to 7d69b37 Compare April 16, 2026 18:07

copy-pr-bot Bot temporarily deployed to GITLAB April 16, 2026 18:07 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 16, 2026 18:08 Inactive

jthomson04 force-pushed the jthomson04/planner-discovery-backfill branch from 7d69b37 to c35acbc Compare April 17, 2026 05:16

pull-request-size Bot added size/XXL and removed size/L labels Apr 17, 2026

copy-pr-bot Bot temporarily deployed to GITLAB April 17, 2026 05:16 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 17, 2026 05:22 Inactive

jthomson04 force-pushed the jthomson04/planner-discovery-backfill branch from c35acbc to 20fb262 Compare April 17, 2026 15:54

This comment was marked as resolved.

Sign in to view

copy-pr-bot Bot temporarily deployed to GITLAB April 17, 2026 18:39 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 17, 2026 19:02 Inactive

tedzhouhk approved these changes Apr 20, 2026

View reviewed changes

jthomson04 and others added 3 commits April 20, 2026 11:09

refactor

0592372

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 force-pushed the jthomson04/planner-discovery-backfill branch from 0382e05 to a16693f Compare April 20, 2026 18:09

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 18:09 Inactive

copy-pr-bot Bot had a problem deploying to GITLAB April 20, 2026 19:49 Failure

jthomson04 merged commit 4419009 into main Apr 20, 2026
89 of 90 checks passed

jthomson04 deleted the jthomson04/planner-discovery-backfill branch April 20, 2026 19:54

dmitry-tokarev-nv mentioned this pull request Apr 21, 2026

fix(planner/tests): scope test_advisory_mode stubs so they don't leak #8418

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(planner): backfill max_num_batched_tokens from discovery for VirtualConnector#8042

fix(planner): backfill max_num_batched_tokens from discovery for VirtualConnector#8042
jthomson04 merged 3 commits into
mainfrom
jthomson04/planner-discovery-backfill

jthomson04 commented Apr 9, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

This comment was marked as resolved.

Uh oh!

tedzhouhk commented Apr 16, 2026

Uh oh!

copy-pr-bot Bot commented Apr 17, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

jthomson04 commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jthomson04 commented Apr 9, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

This comment was marked as resolved.

Uh oh!

tedzhouhk commented Apr 16, 2026

Uh oh!

copy-pr-bot Bot commented Apr 17, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

jthomson04 commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jthomson04 commented Apr 9, 2026 •

edited by devin-ai-integration Bot

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading