feat(recipes): add DSv4 Pro SGLang recipes for GB200 (agg + disagg)#8960
Conversation
Adapted from recipes/glm-5-nvfp4/sglang/disagg/deploy.yaml for DeepSeek-V4-Pro on dynamo-gcp-dev-01 (4x GB200 nodes, ComputeDomain). Includes: - DGD + ComputeDomain + ResourceClaimTemplate for 1P+1D (TP8 each) - Namespace-local NATS workaround (system NATS unreachable from GPU pods) - Model download job for GCP Filestore Status: NIXL backend init passes but KV transfer fails at runtime (NIXL KVReceiver Exception on decode, NIXL_ERR_BACKEND on prefill). Needs investigation with NIXL team / GLM-5 recipe author. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validated 7/7 smoke tests on dynamo-gcp-dev-02 (GB200 w0e rack). Includes GKE RDMA annotations, ComputeDomain, namespace-local NATS workaround, and memory tuning for stable multi-node TP8 startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Working 2-node TP8 aggregated config validated on AWS GB200. Includes NCCL Socket transport config for non-NVLink multi-node. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WalkthroughIntroduces Kubernetes deployment manifests for DeepSeek V4 Pro using SGLang backend on GB200 infrastructure. Adds both aggregated and disaggregated deployment configurations with supporting resources for model downloading and NATS messaging broker setup. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (7)
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml (2)
8-9: ⚡ Quick winHardcoded developer namespace.
Replace
kprashanthwith<your-namespace>for consistency with other recipe files.Proposed fix
metadata: name: download-dsv4-pro - namespace: kprashanth + namespace: <your-namespace>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml` around lines 8 - 9, The recipe has a hardcoded developer namespace value — in the YAML for the resource named "download-dsv4-pro" replace the literal namespace value "kprashanth" with the placeholder "<your-namespace>" so it matches other recipe files; update the "namespace:" field accordingly and keep the rest of the document unchanged.
10-11: 💤 Low valueConsider adding
activeDeadlineSecondsand resource limits for the ~865GB download.Downloading ~865GB without a deadline risks indefinite pod runtime on network issues. Adding
activeDeadlineSecondsand CPU/memorylimitswould bound resource consumption.Also applies to: 55-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml` around lines 10 - 11, The Job spec currently only sets backoffLimit (spec.backoffLimit) and must be hardened for the ~865GB download: add an activeDeadlineSeconds field under the Job spec to bound total runtime (e.g., a sensible number of hours), and add resource requests and limits for the download container under spec.template.spec.containers[] (set cpu and memory requests and limits to prevent runaway usage); update any duplicate Job entries (the other block at lines ~55-58) the same way so both Jobs include activeDeadlineSeconds and container resources.limits/requests.recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml (2)
7-8: ⚡ Quick winHardcoded developer namespace reduces recipe reusability.
The namespace
kprashanthis hardcoded. For a reusable recipe, consider using a placeholder like<your-namespace>(consistent withdeploy.yaml) or documenting the required substitution.Proposed fix
metadata: name: nats-local - namespace: kprashanth + namespace: <your-namespace> labels: app: nats-localmetadata: name: nats-local - namespace: kprashanth + namespace: <your-namespace> spec:Also applies to: 45-47
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml` around lines 7 - 8, The namespace field is hardcoded as "namespace: kprashanth", which reduces reusability; replace that hardcoded value with a reusable placeholder (e.g., "namespace: <your-namespace>") or a templated variable consistent with deploy.yaml, and apply the same change to the other occurrences around lines 45–47; ensure the manifest around the "name: nats-local" resource uses the placeholder and add a brief comment or README note indicating the user must substitute their namespace.
36-41: 💤 Low valueConsider adding a liveness probe for NATS resilience.
The pod has a readiness probe but no liveness probe. If NATS hangs without crashing, Kubernetes won't restart it. A liveness probe would improve reliability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml` around lines 36 - 41, Add a liveness probe to the NATS container to ensure Kubernetes restarts it if it becomes unresponsive; currently only readinessProbe (httpGet path: /healthz port: 8222) is defined. In the pod spec add a livenessProbe section (similar to readinessProbe) using httpGet to /healthz on port 8222 and tune timing (e.g., initialDelaySeconds a bit larger than readiness, periodSeconds and failureThreshold to avoid flapping) so the kubelet can detect and restart hung NATS processes.recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml (2)
34-35: ⚡ Quick winWildcard tolerations are overly permissive.
Using
- operator: Existstolerates all taints, which differs from the explicit tolerations in the disaggregated recipe. This could lead to pods scheduling on unintended nodes. Consider using explicit tolerations for consistency.Example from disagg recipe
tolerations: - key: dedicated operator: Equal value: user-workload effect: NoExecute - key: nvidia.com/gpu operator: Exists effect: NoSchedule - key: kubernetes.io/arch operator: Equal value: arm64 effect: NoScheduleAlso applies to: 67-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml` around lines 34 - 35, The tolerations block currently uses a wildcard `operator: Exists` which is overly permissive; replace the wildcard toleration with explicit tolerations matching the disaggregated recipe (e.g., add entries for key: dedicated with operator: Equal and value: user-workload and effect: NoExecute; key: nvidia.com/gpu with operator: Exists and effect: NoSchedule; and key: kubernetes.io/arch with operator: Equal, value: arm64 and effect: NoSchedule) so the pod only tolerates intended taints—update the tolerations section in deploy.yaml where the current `tolerations:` and `- operator: Exists` appear (also apply the same change to the second occurrence) to mirror those explicit entries.
39-41: ⚖️ Poor tradeoffRunning containers as root (UID 0).
Both Frontend and decode containers run as
runAsUser: 0. While this may be required for the current setup, it's flagged by static analysis (CKV_K8S_23). If root is not strictly necessary, consider running as a non-root user for improved security posture.Also applies to: 112-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml` around lines 39 - 41, The securityContext for the Frontend and decode containers currently sets runAsUser: 0 and runAsGroup: 0 (running as root); change these to run as a non-root UID/GID and/or enable runAsNonRoot: true—update the container- or pod-level securityContext entries (the securityContext blocks referencing runAsUser/runAsGroup) to use a non-zero UID/GID that exists in the container image (or modify the image to create that user), and consider adding fsGroup or runAsNonRoot: true as appropriate so static analysis (CKV_K8S_23) no longer flags the pods for running as root.recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/deploy.yaml (1)
95-213: 💤 Low valueNote: Significant duplication between decode and prefill worker specs.
The decode and prefill workers share nearly identical configurations (RDMA annotations, resources, tolerations, env vars). This is acceptable for recipe readability, but if maintaining multiple recipes becomes burdensome, consider extracting common configurations.
Also applies to: 215-333
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml`:
- Around line 1-5: Add the missing SPDX copyright header and the required
file-level namespace declaration to the top of deploy.yaml so CI passes; update
the file that defines the DynamoGraphDeployment (kind: DynamoGraphDeployment,
metadata.name: sglang-dsv4-pro-agg) by inserting the standard SPDX header line
(e.g., "SPDX-License-Identifier: <license>") and the repository/recipe namespace
comment/block used by other recipe files immediately above the existing
apiVersion/kind block.
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`:
- Around line 1-4: This file is missing the required SPDX copyright header;
prepend the standard SPDX header used across the repo to the top of
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml
(above the existing comment block starting "# Download DeepSeek-V4-Pro..."),
e.g., add the SPDX-License-Identifier line and any SPDX-FileCopyrightText entry
that the project standard requires so CI recognizes the license.
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`:
- Around line 1-3: This file (nats-local.yaml) is missing the SPDX copyright
header required by CI; add the same SPDX header block used in deploy.yaml at the
very top of the file (i.e., the SPDX-FileCopyrightText and
SPDX-License-Identifier lines) so the file matches the repository header pattern
and the pipeline stops failing.
---
Nitpick comments:
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml`:
- Around line 34-35: The tolerations block currently uses a wildcard `operator:
Exists` which is overly permissive; replace the wildcard toleration with
explicit tolerations matching the disaggregated recipe (e.g., add entries for
key: dedicated with operator: Equal and value: user-workload and effect:
NoExecute; key: nvidia.com/gpu with operator: Exists and effect: NoSchedule; and
key: kubernetes.io/arch with operator: Equal, value: arm64 and effect:
NoSchedule) so the pod only tolerates intended taints—update the tolerations
section in deploy.yaml where the current `tolerations:` and `- operator: Exists`
appear (also apply the same change to the second occurrence) to mirror those
explicit entries.
- Around line 39-41: The securityContext for the Frontend and decode containers
currently sets runAsUser: 0 and runAsGroup: 0 (running as root); change these to
run as a non-root UID/GID and/or enable runAsNonRoot: true—update the container-
or pod-level securityContext entries (the securityContext blocks referencing
runAsUser/runAsGroup) to use a non-zero UID/GID that exists in the container
image (or modify the image to create that user), and consider adding fsGroup or
runAsNonRoot: true as appropriate so static analysis (CKV_K8S_23) no longer
flags the pods for running as root.
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`:
- Around line 8-9: The recipe has a hardcoded developer namespace value — in the
YAML for the resource named "download-dsv4-pro" replace the literal namespace
value "kprashanth" with the placeholder "<your-namespace>" so it matches other
recipe files; update the "namespace:" field accordingly and keep the rest of the
document unchanged.
- Around line 10-11: The Job spec currently only sets backoffLimit
(spec.backoffLimit) and must be hardened for the ~865GB download: add an
activeDeadlineSeconds field under the Job spec to bound total runtime (e.g., a
sensible number of hours), and add resource requests and limits for the download
container under spec.template.spec.containers[] (set cpu and memory requests and
limits to prevent runaway usage); update any duplicate Job entries (the other
block at lines ~55-58) the same way so both Jobs include activeDeadlineSeconds
and container resources.limits/requests.
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`:
- Around line 7-8: The namespace field is hardcoded as "namespace: kprashanth",
which reduces reusability; replace that hardcoded value with a reusable
placeholder (e.g., "namespace: <your-namespace>") or a templated variable
consistent with deploy.yaml, and apply the same change to the other occurrences
around lines 45–47; ensure the manifest around the "name: nats-local" resource
uses the placeholder and add a brief comment or README note indicating the user
must substitute their namespace.
- Around line 36-41: Add a liveness probe to the NATS container to ensure
Kubernetes restarts it if it becomes unresponsive; currently only readinessProbe
(httpGet path: /healthz port: 8222) is defined. In the pod spec add a
livenessProbe section (similar to readinessProbe) using httpGet to /healthz on
port 8222 and tune timing (e.g., initialDelaySeconds a bit larger than
readiness, periodSeconds and failureThreshold to avoid flapping) so the kubelet
can detect and restart hung NATS processes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1a32ea19-ee3a-4a92-b4d6-9c0a45a0dc06
📒 Files selected for processing (4)
recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yamlrecipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/deploy.yamlrecipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yamlrecipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml
…download.yaml Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com> Signed-off-by: Krishnan Prashanth <140860868+KrishnanPrash@users.noreply.github.com>
…download.yaml Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com> Signed-off-by: Krishnan Prashanth <140860868+KrishnanPrash@users.noreply.github.com>
- Add SPDX headers to all files (CI blocker) - Replace internal dtokarev image with public sglang-runtime tag - Remove cluster-specific tolerations, nodeSelector, imagePullSecrets - Replace hardcoded namespace with <your-namespace> placeholders - Move GKE RDMA annotations to README as cluster-specific overlay - Add README with cluster-specific RDMA config (GKE, EFA, IB) - Keep nats-local.yaml as optional workaround (documented in README) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NATS workaround documented in README as optional GCP step. Model download is a standard pre-deployment step, not recipe-specific. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explains why LD_PRELOAD + NCCL overrides are needed (bundled pynccl 2.27.7 lacks Socket transport) and marks for removal in future images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addressed/Resolved comments and dismissing review because dmitry is OOO. If there are any concerns, will address with a follow-up PR.
…8960) Signed-off-by: Krishnan Prashanth <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Overview:
Add aggregated and disaggregated serving recipes for DeepSeek-V4-Pro on GB200. The disagg recipe splits inference into prefill (TP8, 2 nodes) and decode (TP8, 2 nodes) with NIXL KV transfer over GKE RDMA. The agg recipe runs 2-node TP8 over NCCL Socket.
Summary by CodeRabbit
Release Notes