fix(planner): normalize model_name case in KubernetesConnector comparisons#8384
Conversation
…ith user-provided name When model_name is provided in Planner config, it is normalized to lowercase in __init__ (self.user_provided_model_name = model_name.lower()), but the model name retrieved from the deployment was not normalized before comparison. This caused a spurious UserProvidedModelNameMismatchError in active mode when the deployment model name retained its original casing (e.g. Qwen/Qwen3-0.6B vs qwen/qwen3-0.6b). Fixes #8359 Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…comparison The sibling comparison at get_model_name() was also case-sensitive. If the prefill and decode services in a DGD ever report the same model with different casing (e.g. MDC display_name vs container-arg parsing), it would spuriously raise DeploymentModelNameMismatchError. Normalize both sides to match the user-provided-name comparison already fixed in the previous commit. Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughUpdated Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Fixes a pair of case-sensitivity bugs in
KubernetesConnector.get_model_name()that caused Planner to enterCrashLoopBackOffin active mode when the deployment model name retained original casing (e.g.Qwen/Qwen3-0.6B).Supersedes #8360 — includes brluo's original fix plus one additional normalization for the sibling prefill/decode comparison.
Root cause
self.user_provided_model_nameis lowercased in__init__(line 58).model_nameretrieved from the DGD at line 227 was compared as-is → spuriousUserProvidedModelNameMismatchError.display_namevs container-arg parsing).Changes
model_name.lower()before comparing withuser_provided_model_name.Test Plan
Validated end-to-end on a single-node Kubernetes cluster (L20 GPU), per brluo's original validation:
Runningstate (no CrashLoopBackOff) withmodel_name: "Qwen/Qwen3-0.6B"+scaling_mode: "active"VllmDecodeWorkerfrom 1→2Closes #8359
Summary by CodeRabbit
Bug Fixes