Skip to content

fix(planner): normalize model_name case in KubernetesConnector comparisons#8384

Merged
tedzhouhk merged 3 commits into
mainfrom
hzhou/dyn-2747-model-name-case
Apr 20, 2026
Merged

fix(planner): normalize model_name case in KubernetesConnector comparisons#8384
tedzhouhk merged 3 commits into
mainfrom
hzhou/dyn-2747-model-name-case

Conversation

@tedzhouhk
Copy link
Copy Markdown
Contributor

@tedzhouhk tedzhouhk commented Apr 20, 2026

Summary

Fixes a pair of case-sensitivity bugs in KubernetesConnector.get_model_name() that caused Planner to enter CrashLoopBackOff in active mode when the deployment model name retained original casing (e.g. Qwen/Qwen3-0.6B).

Supersedes #8360 — includes brluo's original fix plus one additional normalization for the sibling prefill/decode comparison.

Root cause

  • self.user_provided_model_name is lowercased in __init__ (line 58).
  • But model_name retrieved from the DGD at line 227 was compared as-is → spurious UserProvidedModelNameMismatchError.
  • The sibling comparison at line 206 (prefill vs decode) had the same case-sensitive pattern, which would bite later if prefill/decode ever reported the same model with different casing (e.g. MDC display_name vs container-arg parsing).

Changes

  • Commit 1 (brluo): normalize model_name.lower() before comparing with user_provided_model_name.
  • Commit 2: normalize both sides of the prefill/decode comparison.

Test Plan

Validated end-to-end on a single-node Kubernetes cluster (L20 GPU), per brluo's original validation:

  • Planner enters Running state (no CrashLoopBackOff) with model_name: "Qwen/Qwen3-0.6B" + scaling_mode: "active"
  • Under load, Planner scales VllmDecodeWorker from 1→2
  • Advisory mode continues to work unchanged

Closes #8359

Summary by CodeRabbit

Bug Fixes

  • Model name validation is now case-insensitive. Validation errors will no longer occur when model names differ solely in capitalization between prefill and decode model configurations, or when comparing user-provided model names against deployment-derived values. This provides greater flexibility and reduces potential errors when managing model configurations across different deployment scenarios.

brluobt and others added 2 commits April 20, 2026 11:16
…ith user-provided name

When model_name is provided in Planner config, it is normalized to lowercase
in __init__ (self.user_provided_model_name = model_name.lower()), but the
model name retrieved from the deployment was not normalized before comparison.
This caused a spurious UserProvidedModelNameMismatchError in active mode when
the deployment model name retained its original casing (e.g. Qwen/Qwen3-0.6B
vs qwen/qwen3-0.6b).

Fixes #8359

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…comparison

The sibling comparison at get_model_name() was also case-sensitive. If the
prefill and decode services in a DGD ever report the same model with
different casing (e.g. MDC display_name vs container-arg parsing), it would
spuriously raise DeploymentModelNameMismatchError. Normalize both sides to
match the user-provided-name comparison already fixed in the previous commit.

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 03f6dd06-5f72-4e5b-ab96-2682d0240baf

📥 Commits

Reviewing files that changed from the base of the PR and between 2618812 and d1466c2.

📒 Files selected for processing (1)
  • components/src/dynamo/planner/connectors/kubernetes.py

Walkthrough

Updated KubernetesConnector.get_model_name() to perform case-insensitive model name comparisons. Added .lower() calls to two equality checks, enabling the method to correctly handle model names with differing capitalization without raising spurious validation errors.

Changes

Cohort / File(s) Summary
Case-insensitive model name validation
components/src/dynamo/planner/connectors/kubernetes.py
Added .lower() calls to two model name equality comparisons in get_model_name(): (1) when comparing prefill_model_name vs decode_model_name, and (2) when comparing deployment-derived model_name vs user_provided_model_name. Aligns comparison logic with the lowercase normalization already applied during initialization.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: normalizing model_name case in KubernetesConnector comparisons.
Description check ✅ Passed The description covers all required sections: Summary (root cause), Changes (what was fixed), and Test Plan (validation results). Includes issue reference.
Linked Issues check ✅ Passed The PR fully addresses issue #8359: implements case-insensitive comparisons via .lower() on both model_name sources, matching the root cause analysis and required fixes.
Out of Scope Changes check ✅ Passed All changes are scoped to KubernetesConnector.get_model_name() case-sensitivity fixes; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tedzhouhk tedzhouhk enabled auto-merge (squash) April 20, 2026 18:24
@tedzhouhk tedzhouhk merged commit 073da7b into main Apr 20, 2026
65 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/dyn-2747-model-name-case branch April 20, 2026 18:58
nv-nmailhot pushed a commit that referenced this pull request Apr 20, 2026
…isons (cherry-pick of #8384) (#8401)

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: brluobt <brluo@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(planner): KubernetesConnector.get_model_name() case-mismatch causes active mode CrashLoopBackOff

3 participants