feat(recipes): add DSv4 Pro SGLang recipes for GB200 (agg + disagg) by KrishnanPrash · Pull Request #8960 · ai-dynamo/dynamo

KrishnanPrash · 2026-05-01T01:00:18Z

Overview:

Add aggregated and disaggregated serving recipes for DeepSeek-V4-Pro on GB200. The disagg recipe splits inference into prefill (TP8, 2 nodes) and decode (TP8, 2 nodes) with NIXL KV transfer over GKE RDMA. The agg recipe runs 2-node TP8 over NCCL Socket.

Summary by CodeRabbit

Release Notes

New Features
- Added deployment configurations for DeepSeek V4 Pro with SGLang backend on GB200 accelerators
- Support for both aggregated and disaggregated prefill/decode inference architectures
- Implemented automated model downloading from Hugging Face with shared caching
- Introduced NATS message queue infrastructure for inter-service communication

Adapted from recipes/glm-5-nvfp4/sglang/disagg/deploy.yaml for DeepSeek-V4-Pro on dynamo-gcp-dev-01 (4x GB200 nodes, ComputeDomain). Includes: - DGD + ComputeDomain + ResourceClaimTemplate for 1P+1D (TP8 each) - Namespace-local NATS workaround (system NATS unreachable from GPU pods) - Model download job for GCP Filestore Status: NIXL backend init passes but KV transfer fails at runtime (NIXL KVReceiver Exception on decode, NIXL_ERR_BACKEND on prefill). Needs investigation with NIXL team / GLM-5 recipe author. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Validated 7/7 smoke tests on dynamo-gcp-dev-02 (GB200 w0e rack). Includes GKE RDMA annotations, ComputeDomain, namespace-local NATS workaround, and memory tuning for stable multi-node TP8 startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Working 2-node TP8 aggregated config validated on AWS GB200. Includes NCCL Socket transport config for non-NVLink multi-node. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-01T01:04:01Z

Walkthrough

Introduces Kubernetes deployment manifests for DeepSeek V4 Pro using SGLang backend on GB200 infrastructure. Adds both aggregated and disaggregated deployment configurations with supporting resources for model downloading and NATS messaging broker setup.

Changes

Cohort / File(s)	Summary
Aggregated Deployment `recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml`	New `DynamoGraphDeployment` manifest configuring Frontend service and decode worker component with HF model caching, NCCL/GLOO networking, GPU resource allocation, health probes, and DeepSeek V4 Pro model inference parameters.
Disaggregated Deployment `recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/deploy.yaml`	Complete deployment manifest with `DynamoGraphDeployment`, `ComputeDomain`, and `ResourceClaimTemplate` resources for prefill/decode disaggregation. Configures NATS/NCCL/UCX/NIXL networking, multi-node scheduling across 2-4 nodes, compute-domain channel resource claims, and disaggregation-specific parameters.
Supporting Resources `recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`, `recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`	Introduces Kubernetes Job for downloading DeepSeek V4 Pro model from Hugging Face with persistent caching. Adds NATS v2.10 Pod with JetStream and corresponding Service for cluster-wide message brokering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description covers the overview and key details but lacks guidance on where reviewers should start and does not follow the complete template structure with all required sections.	Add 'Where should the reviewer start?' section highlighting critical files and clarify 'Related Issues' section or explicitly note if no related issues exist.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the primary change: adding DeepSeek-V4-Pro SGLang recipes for GB200 in both aggregated and disaggregated configurations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml (2)
8-9: ⚡ Quick win

Hardcoded developer namespace.

Replace kprashanth with <your-namespace> for consistency with other recipe files.
Proposed fix
 metadata:
   name: download-dsv4-pro
-  namespace: kprashanth
+  namespace: <your-namespace>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`
around lines 8 - 9, The recipe has a hardcoded developer namespace value — in
the YAML for the resource named "download-dsv4-pro" replace the literal
namespace value "kprashanth" with the placeholder "<your-namespace>" so it
matches other recipe files; update the "namespace:" field accordingly and keep
the rest of the document unchanged.
10-11: 💤 Low value

Consider adding activeDeadlineSeconds and resource limits for the ~865GB download.

Downloading ~865GB without a deadline risks indefinite pod runtime on network issues. Adding activeDeadlineSeconds and CPU/memory limits would bound resource consumption.

Also applies to: 55-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`
around lines 10 - 11, The Job spec currently only sets backoffLimit
(spec.backoffLimit) and must be hardened for the ~865GB download: add an
activeDeadlineSeconds field under the Job spec to bound total runtime (e.g., a
sensible number of hours), and add resource requests and limits for the download
container under spec.template.spec.containers[] (set cpu and memory requests and
limits to prevent runaway usage); update any duplicate Job entries (the other
block at lines ~55-58) the same way so both Jobs include activeDeadlineSeconds
and container resources.limits/requests.
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml (2)
7-8: ⚡ Quick win

Hardcoded developer namespace reduces recipe reusability.

The namespace kprashanth is hardcoded. For a reusable recipe, consider using a placeholder like <your-namespace> (consistent with deploy.yaml) or documenting the required substitution.
Proposed fix
 metadata:
   name: nats-local
-  namespace: kprashanth
+  namespace: <your-namespace>
   labels:
     app: nats-local
 metadata:
   name: nats-local
-  namespace: kprashanth
+  namespace: <your-namespace>
 spec:
Also applies to: 45-47
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`
around lines 7 - 8, The namespace field is hardcoded as "namespace: kprashanth",
which reduces reusability; replace that hardcoded value with a reusable
placeholder (e.g., "namespace: <your-namespace>") or a templated variable
consistent with deploy.yaml, and apply the same change to the other occurrences
around lines 45–47; ensure the manifest around the "name: nats-local" resource
uses the placeholder and add a brief comment or README note indicating the user
must substitute their namespace.
36-41: 💤 Low value

Consider adding a liveness probe for NATS resilience.

The pod has a readiness probe but no liveness probe. If NATS hangs without crashing, Kubernetes won't restart it. A liveness probe would improve reliability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`
around lines 36 - 41, Add a liveness probe to the NATS container to ensure
Kubernetes restarts it if it becomes unresponsive; currently only readinessProbe
(httpGet path: /healthz port: 8222) is defined. In the pod spec add a
livenessProbe section (similar to readinessProbe) using httpGet to /healthz on
port 8222 and tune timing (e.g., initialDelaySeconds a bit larger than
readiness, periodSeconds and failureThreshold to avoid flapping) so the kubelet
can detect and restart hung NATS processes.
recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml (2)
34-35: ⚡ Quick win

Wildcard tolerations are overly permissive.

Using - operator: Exists tolerates all taints, which differs from the explicit tolerations in the disaggregated recipe. This could lead to pods scheduling on unintended nodes. Consider using explicit tolerations for consistency.
Example from disagg recipe
tolerations:
  - key: dedicated
    operator: Equal
    value: user-workload
    effect: NoExecute
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  - key: kubernetes.io/arch
    operator: Equal
    value: arm64
    effect: NoSchedule
Also applies to: 67-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml` around
lines 34 - 35, The tolerations block currently uses a wildcard `operator:
Exists` which is overly permissive; replace the wildcard toleration with
explicit tolerations matching the disaggregated recipe (e.g., add entries for
key: dedicated with operator: Equal and value: user-workload and effect:
NoExecute; key: nvidia.com/gpu with operator: Exists and effect: NoSchedule; and
key: kubernetes.io/arch with operator: Equal, value: arm64 and effect:
NoSchedule) so the pod only tolerates intended taints—update the tolerations
section in deploy.yaml where the current `tolerations:` and `- operator: Exists`
appear (also apply the same change to the second occurrence) to mirror those
explicit entries.
39-41: ⚖️ Poor tradeoff

Running containers as root (UID 0).

Both Frontend and decode containers run as runAsUser: 0. While this may be required for the current setup, it's flagged by static analysis (CKV_K8S_23). If root is not strictly necessary, consider running as a non-root user for improved security posture.

Also applies to: 112-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml` around
lines 39 - 41, The securityContext for the Frontend and decode containers
currently sets runAsUser: 0 and runAsGroup: 0 (running as root); change these to
run as a non-root UID/GID and/or enable runAsNonRoot: true—update the container-
or pod-level securityContext entries (the securityContext blocks referencing
runAsUser/runAsGroup) to use a non-zero UID/GID that exists in the container
image (or modify the image to create that user), and consider adding fsGroup or
runAsNonRoot: true as appropriate so static analysis (CKV_K8S_23) no longer
flags the pods for running as root.
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/deploy.yaml (1)

95-213: 💤 Low value

Note: Significant duplication between decode and prefill worker specs.

The decode and prefill workers share nearly identical configurations (RDMA annotations, resources, tolerations, env vars). This is acceptable for recipe readability, but if maintaining multiple recipes becomes burdensome, consider extracting common configurations.

Also applies to: 215-333

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml`:
- Around line 1-5: Add the missing SPDX copyright header and the required
file-level namespace declaration to the top of deploy.yaml so CI passes; update
the file that defines the DynamoGraphDeployment (kind: DynamoGraphDeployment,
metadata.name: sglang-dsv4-pro-agg) by inserting the standard SPDX header line
(e.g., "SPDX-License-Identifier: <license>") and the repository/recipe namespace
comment/block used by other recipe files immediately above the existing
apiVersion/kind block.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`:
- Around line 1-4: This file is missing the required SPDX copyright header;
prepend the standard SPDX header used across the repo to the top of
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml
(above the existing comment block starting "# Download DeepSeek-V4-Pro..."),
e.g., add the SPDX-License-Identifier line and any SPDX-FileCopyrightText entry
that the project standard requires so CI recognizes the license.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`:
- Around line 1-3: This file (nats-local.yaml) is missing the SPDX copyright
header required by CI; add the same SPDX header block used in deploy.yaml at the
very top of the file (i.e., the SPDX-FileCopyrightText and
SPDX-License-Identifier lines) so the file matches the repository header pattern
and the pipeline stops failing.

---

Nitpick comments:
In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml`:
- Around line 34-35: The tolerations block currently uses a wildcard `operator:
Exists` which is overly permissive; replace the wildcard toleration with
explicit tolerations matching the disaggregated recipe (e.g., add entries for
key: dedicated with operator: Equal and value: user-workload and effect:
NoExecute; key: nvidia.com/gpu with operator: Exists and effect: NoSchedule; and
key: kubernetes.io/arch with operator: Equal, value: arm64 and effect:
NoSchedule) so the pod only tolerates intended taints—update the tolerations
section in deploy.yaml where the current `tolerations:` and `- operator: Exists`
appear (also apply the same change to the second occurrence) to mirror those
explicit entries.
- Around line 39-41: The securityContext for the Frontend and decode containers
currently sets runAsUser: 0 and runAsGroup: 0 (running as root); change these to
run as a non-root UID/GID and/or enable runAsNonRoot: true—update the container-
or pod-level securityContext entries (the securityContext blocks referencing
runAsUser/runAsGroup) to use a non-zero UID/GID that exists in the container
image (or modify the image to create that user), and consider adding fsGroup or
runAsNonRoot: true as appropriate so static analysis (CKV_K8S_23) no longer
flags the pods for running as root.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml`:
- Around line 8-9: The recipe has a hardcoded developer namespace value — in the
YAML for the resource named "download-dsv4-pro" replace the literal namespace
value "kprashanth" with the placeholder "<your-namespace>" so it matches other
recipe files; update the "namespace:" field accordingly and keep the rest of the
document unchanged.
- Around line 10-11: The Job spec currently only sets backoffLimit
(spec.backoffLimit) and must be hardened for the ~865GB download: add an
activeDeadlineSeconds field under the Job spec to bound total runtime (e.g., a
sensible number of hours), and add resource requests and limits for the download
container under spec.template.spec.containers[] (set cpu and memory requests and
limits to prevent runaway usage); update any duplicate Job entries (the other
block at lines ~55-58) the same way so both Jobs include activeDeadlineSeconds
and container resources.limits/requests.

In `@recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml`:
- Around line 7-8: The namespace field is hardcoded as "namespace: kprashanth",
which reduces reusability; replace that hardcoded value with a reusable
placeholder (e.g., "namespace: <your-namespace>") or a templated variable
consistent with deploy.yaml, and apply the same change to the other occurrences
around lines 45–47; ensure the manifest around the "name: nats-local" resource
uses the placeholder and add a brief comment or README note indicating the user
must substitute their namespace.
- Around line 36-41: Add a liveness probe to the NATS container to ensure
Kubernetes restarts it if it becomes unresponsive; currently only readinessProbe
(httpGet path: /healthz port: 8222) is defined. In the pod spec add a
livenessProbe section (similar to readinessProbe) using httpGet to /healthz on
port 8222 and tune timing (e.g., initialDelaySeconds a bit larger than
readiness, periodSeconds and failureThreshold to avoid flapping) so the kubelet
can detect and restart hung NATS processes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a32ea19-ee3a-4a92-b4d6-9c0a45a0dc06

📥 Commits

Reviewing files that changed from the base of the PR and between c39a477 and b9de79f.

📒 Files selected for processing (4)

recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/deploy.yaml
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml
recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml

…download.yaml Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com> Signed-off-by: Krishnan Prashanth <140860868+KrishnanPrash@users.noreply.github.com>

- Add SPDX headers to all files (CI blocker) - Replace internal dtokarev image with public sglang-runtime tag - Remove cluster-specific tolerations, nodeSelector, imagePullSecrets - Replace hardcoded namespace with <your-namespace> placeholders - Move GKE RDMA annotations to README as cluster-specific overlay - Add README with cluster-specific RDMA config (GKE, EFA, IB) - Keep nats-local.yaml as optional workaround (documented in README) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-01T06:01:20Z

🌿 Fern Docs Preview: https://nvidia-preview-ea6e9a77-5ef5-4bb4-97ce-598b43c4b4a4.docs.buildwithfern.com/dynamo/dev

NATS workaround documented in README as optional GCP step. Model download is a standard pre-deployment step, not recipe-specific. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

biswapanda

one minor comment

Explains why LD_PRELOAD + NCCL overrides are needed (bundled pynccl 2.27.7 lacks Socket transport) and marks for removal in future images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Addressed/Resolved comments and dismissing review because dmitry is OOO. If there are any concerns, will address with a follow-up PR.

…8960) Signed-off-by: Krishnan Prashanth <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

KrishnanPrash and others added 3 commits April 30, 2026 12:59

feat(recipes): add DSv4 Pro GB200 aggregated serving recipe

b9de79f

Working 2-node TP8 aggregated config validated on AWS GB200. Includes NCCL Socket transport config for non-NVLink multi-node. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

KrishnanPrash requested review from a team as code owners May 1, 2026 01:00

pull-request-size Bot added the size/XL label May 1, 2026

github-actions Bot added the feat label May 1, 2026

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

Comment thread recipes/deepseek-v4/deepseek-v4-pro/sglang/agg-gb200/deploy.yaml

Comment thread recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/model-download.yaml Outdated

Comment thread recipes/deepseek-v4/deepseek-v4-pro/sglang/disagg-gb200/nats-local.yaml Outdated