Skip to content

feat: Update Dynamo k8s deployment example to use ModelExpress#4112

Closed
nv-hwoo wants to merge 9 commits into
mainfrom
hwoo/update-dynamo-deploy-yaml
Closed

feat: Update Dynamo k8s deployment example to use ModelExpress#4112
nv-hwoo wants to merge 9 commits into
mainfrom
hwoo/update-dynamo-deploy-yaml

Conversation

@nv-hwoo
Copy link
Copy Markdown
Contributor

@nv-hwoo nv-hwoo commented Nov 5, 2025

Summary

As the title says, the PR integrates ModelExpress into Dynamo k8s deployments. Deployment now starts ModelExpress server with PVC attached (shared with the Dynamo workers) so that models are managed in a central location and shared across different workers. This avoids duplicate model download happening on each worker.

Details:

  • Start modelexpress server along with frontend and workers for model management
  • Add PVC manifest shared across modelexpress server and workers
  • Update README docs with the new workflow

Summary by CodeRabbit

  • New Features

    • Integrated ModelExpress service for shared model caching across all deployment architectures.
    • Added persistent volume support with 256Gi model cache storage allocation.
    • Implemented cache sharing across inference workers and backend services.
  • Documentation

    • Updated deployment guides for SGLang, TensorRT-LLM, and vLLM backends.
    • Added PVC creation instructions to deployment workflows.

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
@nv-hwoo nv-hwoo requested a review from a team November 5, 2025 00:49
@nv-hwoo nv-hwoo requested review from a team as code owners November 5, 2025 00:49
@github-actions github-actions Bot added the feat label Nov 5, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 5, 2025

Walkthrough

Multiple backend deployment configurations (SGLang, TRT-LLM, vLLM) enhanced with ModelExpress shared model caching service. Changes introduce persistent volume claims, environment variables for cache paths, and volume mounts across decode/prefill components. Documentation updated to reflect new architecture and deployment prerequisites.

Changes

Cohort / File(s) Summary
Documentation & Setup
examples/backends/sglang/deploy/README.md, examples/backends/trtllm/deploy/README.md, examples/backends/vllm/deploy/README.md
Added ModelExpress references to architecture diagrams, introduced PVC documentation block, and included deployment steps for applying model-cache-pvc manifest prior to deployments.
PVC Manifests
examples/backends/sglang/deploy/model_cache_pvc.yaml, examples/backends/trtllm/deploy/model_cache_pvc.yaml, examples/backends/vllm/deploy/model_cache_pvc.yaml
New Kubernetes PersistentVolumeClaim manifests (256Gi, ReadWriteMany access, csi-mounted-fs-path-sc storage class) enabling shared model cache across workers.
SGLang Deployments
examples/backends/sglang/deploy/agg.yaml, examples/backends/sglang/deploy/agg_logging.yaml, examples/backends/sglang/deploy/agg_router.yaml, examples/backends/sglang/deploy/disagg.yaml, examples/backends/sglang/deploy/disagg-multinode.yaml, examples/backends/sglang/deploy/disagg_planner.yaml
Added ModelExpress service with readinessProbe, resources, and startup script; introduced model-cache-pvc references; added cache directory environment variables; extended decode/prefill/planner components with volume mounts for shared cache access.
TRT-LLM Deployments
examples/backends/trtllm/deploy/agg.yaml, examples/backends/trtllm/deploy/agg-with-config.yaml, examples/backends/trtllm/deploy/agg_router.yaml, examples/backends/trtllm/deploy/disagg.yaml, examples/backends/trtllm/deploy/disagg-multinode.yaml, examples/backends/trtllm/deploy/disagg_router.yaml, examples/backends/trtllm/deploy/disagg_planner.yaml
Integrated ModelExpress service with configuration (envFromSecret, replicas, resource limits); wired persistent volume claims; mounted model-cache-pvc across TRTLLMWorker, prefill, and decode components; established cache initialization via startup scripts.
vLLM Deployments
examples/backends/vllm/deploy/agg.yaml, examples/backends/vllm/deploy/agg_kvbm.yaml, examples/backends/vllm/deploy/agg_router.yaml, examples/backends/vllm/deploy/disagg.yaml, examples/backends/vllm/deploy/disagg-multinode.yaml, examples/backends/vllm/deploy/disagg_kvbm.yaml, examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml, examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml, examples/backends/vllm/deploy/disagg_planner.yaml, examples/backends/vllm/deploy/disagg_router.yaml
Added ModelExpress service component with configuration for port, logging, database path; introduced pvcs and envs fields at spec level; mounted model-cache-pvc in VllmDecodeWorker and VllmPrefillWorker for shared cache; updated Frontend image references.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring attention:

  • ModelExpress integration consistency: Verify startup script logic across all files creates cache directories, writes config.yaml correctly, and launches servers with proper PID handling
  • Volume mount wiring: Confirm model-cache-pvc is correctly mounted at /model across all decode/prefill/planner/frontend components in each backend
  • Environment variable propagation: Check MODEL_EXPRESS_CACHE_DIRECTORY, HF_HUB_CACHE, and MODEL_EXPRESS_URL are consistently defined and referenced
  • PVC creation semantics: Verify create: false is correctly applied in all deployment YAML files (PVC managed separately via model_cache_pvc.yaml)
  • Backend-specific variations: Watch for subtle differences between SGLang, TRT-LLM, and vLLM configurations (e.g., different namespaces, image tags, resource allocations)

Poem

🐰 A cache unfolded, shared with care,
ModelExpress dancing through the air,
Volumes mounted, PVCs aligned,
With vLLM, TRT, SGLang combined!
Hopping through deployments, disaggregated so fine,
Shared model memories—now they're mine! 🥕✨

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: integrating ModelExpress into the Dynamo Kubernetes deployment examples for centralized model management.
Description check ✅ Passed The PR description covers the Overview and Details sections, explaining the integration of ModelExpress, PVC usage, and documentation updates. However, it lacks the 'Where should the reviewer start?' section and does not reference related GitHub issues.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (8)
examples/backends/sglang/deploy/model_cache_pvc.yaml (1)

11-11: Verify storage class availability.

Same note as vllm PVC—the storageClassName: csi-mounted-fs-path-sc must exist in your target cluster. See vllm PVC review for verification steps.

examples/backends/trtllm/deploy/disagg_router.yaml (1)

60-78: Apply same error handling to ModelExpress startup script (duplicate issue).

This file has identical startup script robustness issue as disagg-multinode.yaml. Apply the same error-handling fixes to lines 60–78.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/sglang/deploy/disagg_planner.yaml (1)

59-77: Apply error handling to ModelExpress startup script (duplicate issue #3).

This file has the same startup script robustness gaps. Apply consistent error handling across all deployment files.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/vllm/deploy/disagg_kvbm.yaml (1)

57-75: Apply error handling to ModelExpress startup script (duplicate issue #4).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/vllm/deploy/disagg.yaml (1)

56-74: Apply error handling to ModelExpress startup script (duplicate issue #5).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/trtllm/deploy/disagg.yaml (1)

57-75: Apply error handling to ModelExpress startup script (duplicate issue #6).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/sglang/deploy/disagg-multinode.yaml (1)

65-83: Apply error handling to ModelExpress startup script (duplicate issue #7).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/trtllm/deploy/agg-with-config.yaml (1)

78-96: Apply error handling to ModelExpress startup script (duplicate issue #8).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

🧹 Nitpick comments (14)
examples/backends/vllm/deploy/README.md (1)

94-104: Clarify mandatory storage class customization requirement.

The note at Line 104 is good, but the storage class requirement deserves more prominence since the deployment will fail silently if the class doesn't exist. Consider adding a warning box or emphasizing this in the prerequisites section.

Consider restructuring like this:

### Persistent Volume Claim (PVC)

All templates expect a pre-created PVC named `model-cache-pvc` for the shared model cache.

⚠️ **IMPORTANT**: The default `storageClassName: csi-mounted-fs-path-sc` must exist in your cluster. If unavailable, edit `model_cache_pvc.yaml` and set `storageClassName` to one that exists in your cluster (check with `kubectl get storageclass`).

Apply the PVC once per namespace:

```bash
kubectl apply -f model_cache_pvc.yaml -n $NAMESPACE

</blockquote></details>
<details>
<summary>examples/backends/sglang/deploy/README.md (1)</summary><blockquote>

`85-95`: **Clarify mandatory storage class customization requirement.**

Same recommendation as vllm README (line 94-104): emphasize that the storage class must exist in the target cluster. Storage class mismatch is a common deployment failure.

</blockquote></details>
<details>
<summary>examples/backends/trtllm/deploy/model_cache_pvc.yaml (1)</summary><blockquote>

`9-16`: **Storage size may require customization based on model sizes.**

The 256Gi capacity assumes models fit within this limit. While reasonable as a default, consider documenting the sizing guidelines and providing guidance on resizing for larger models.

</blockquote></details>
<details>
<summary>examples/backends/sglang/deploy/disagg.yaml (1)</summary><blockquote>

`61-75`: **Add error handling to ModelExpress startup script.**

The shell script lacks error handling for critical operations. If `mkdir -p` or `cat > config.yaml` fails, the server will start with incomplete configuration.


```diff
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)

61-75: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.

- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/sglang/deploy/agg_router.yaml (1)

64-78: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.

- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/disagg_planner.yaml (1)

62-77: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.

- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/agg.yaml (1)

61-75: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.

- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/trtllm/deploy/agg.yaml (1)

61-75: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.

- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)

57-75: Add PodSecurityContext to ModelExpress service and verify shell script robustness.

The startup script at lines 57-75 uses backgrounding (&) with wait, which should work but relies on proper environment variable expansion. More importantly, the ModelExpress service is missing a PodSecurityContext that should be set at the pod level.

Per the learnings on Dynamo's security posture, add a PodSecurityContext with runAsUser, runAsGroup, and fsGroup set to 1000 to the extraPodSpec:

      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:

Additionally, verify that:

  1. The shell variable $MODEL_EXPRESS_CACHE_DIRECTORY is correctly expanded from the global envs when the container starts.
  2. The wait $SERVER_PID pattern properly keeps the container alive; if modelexpress-server exits unexpectedly, the container will exit, which is the desired behavior.
  3. The relative path ./modelexpress-server resolves correctly; consider using an absolute path or ensuring the working directory is set appropriately.
examples/backends/sglang/deploy/agg_logging.yaml (1)

29-80: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:

      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
examples/backends/vllm/deploy/disagg-multinode.yaml (1)

35-86: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:

      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
examples/backends/vllm/deploy/agg_kvbm.yaml (1)

27-78: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:

    ModelExpress:
      envFromSecret: hf-token-secret
      dynamoNamespace: vllm-agg-kvbm
      componentType: frontend
      readinessProbe:
        tcpSocket:
          port: 8000
        initialDelaySeconds: 10
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 3
      replicas: 1
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
examples/backends/vllm/deploy/agg_router.yaml (1)

30-81: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:

    ModelExpress:
      envFromSecret: hf-token-secret
      dynamoNamespace: vllm-agg-router
      componentType: frontend
      readinessProbe:
        tcpSocket:
          port: 8000
        initialDelaySeconds: 10
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 3
      replicas: 1
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83a3fe4 and 6c9c40b.

📒 Files selected for processing (29)
  • examples/backends/sglang/deploy/README.md (4 hunks)
  • examples/backends/sglang/deploy/agg.yaml (3 hunks)
  • examples/backends/sglang/deploy/agg_logging.yaml (3 hunks)
  • examples/backends/sglang/deploy/agg_router.yaml (3 hunks)
  • examples/backends/sglang/deploy/disagg-multinode.yaml (4 hunks)
  • examples/backends/sglang/deploy/disagg.yaml (4 hunks)
  • examples/backends/sglang/deploy/disagg_planner.yaml (4 hunks)
  • examples/backends/sglang/deploy/model_cache_pvc.yaml (1 hunks)
  • examples/backends/trtllm/deploy/README.md (5 hunks)
  • examples/backends/trtllm/deploy/agg-with-config.yaml (3 hunks)
  • examples/backends/trtllm/deploy/agg.yaml (3 hunks)
  • examples/backends/trtllm/deploy/agg_router.yaml (3 hunks)
  • examples/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
  • examples/backends/trtllm/deploy/disagg.yaml (4 hunks)
  • examples/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
  • examples/backends/trtllm/deploy/disagg_router.yaml (4 hunks)
  • examples/backends/trtllm/deploy/model_cache_pvc.yaml (1 hunks)
  • examples/backends/vllm/deploy/README.md (4 hunks)
  • examples/backends/vllm/deploy/agg.yaml (3 hunks)
  • examples/backends/vllm/deploy/agg_kvbm.yaml (4 hunks)
  • examples/backends/vllm/deploy/agg_router.yaml (3 hunks)
  • examples/backends/vllm/deploy/disagg-multinode.yaml (4 hunks)
  • examples/backends/vllm/deploy/disagg.yaml (4 hunks)
  • examples/backends/vllm/deploy/disagg_kvbm.yaml (4 hunks)
  • examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (4 hunks)
  • examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (4 hunks)
  • examples/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
  • examples/backends/vllm/deploy/disagg_router.yaml (4 hunks)
  • examples/backends/vllm/deploy/model_cache_pvc.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-10-16T13:35:33.710Z
Learnt from: grahamking
Repo: ai-dynamo/dynamo PR: 3659
File: lib/llm/src/common/checked_file.rs:113-124
Timestamp: 2025-10-16T13:35:33.710Z
Learning: In the dynamo project, model deployment cards stored in etcd are cleared by lease expiration, so there's no persistence of old card data across system restarts or upgrades.

Applied to files:

  • examples/backends/sglang/deploy/disagg_planner.yaml
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • examples/backends/trtllm/deploy/disagg_planner.yaml
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

  • examples/backends/vllm/deploy/agg_kvbm.yaml
  • examples/backends/vllm/deploy/agg.yaml
📚 Learning: 2025-06-11T21:18:00.425Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.

Applied to files:

  • examples/backends/vllm/deploy/agg_kvbm.yaml
📚 Learning: 2025-07-18T16:04:31.771Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Applied to files:

  • examples/backends/sglang/deploy/README.md
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

  • examples/backends/sglang/deploy/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (11)
examples/backends/vllm/deploy/model_cache_pvc.yaml (1)

11-11: Storage class should follow established project patterns for consistency.

The hardcoded csi-mounted-fs-path-sc is inconsistent with how other examples and recipes handle storage. The recipe files and documentation examples use placeholders (e.g., "your-storage-class-name") with comments explaining expected values. Additionally, deploy/utils/manifests/pvc.yaml omits storageClassName entirely to use the cluster's default storage class.

Consider aligning this with one of these patterns:

  • Use a placeholder with clarifying comment: storageClassName: "your-storage-class-name" # e.g., csi-mounted-fs-path-sc, nfs, efs
  • Omit storageClassName to rely on the cluster's default (verified by pre-deployment checks)
  • Document which specific storage class is required and why in accompanying README
examples/backends/trtllm/deploy/model_cache_pvc.yaml (1)

11-11: Verify storage class availability in target clusters.

The storageClassName: csi-mounted-fs-path-sc is non-standard and environment-specific. Deployments will fail if this storage class doesn't exist in the cluster.

  1. Is this storage class documented as a prerequisite for Dynamo Cloud installations?
  2. Should this be configurable or use a more portable storage class as a fallback?
  3. Does the README adequately warn users about this prerequisite?
examples/backends/sglang/deploy/disagg.yaml (1)

20-26: Consider adding volumeMount for Frontend service.

The decode and prefill services mount the model-cache-pvc, but the Frontend service does not. Verify if this is intentional. If the Frontend also needs access to cached models or shared configuration, add the volumeMount.

examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)

20-26: Consider adding volumeMount for Frontend service.

The VllmDecodeWorker and VllmPrefillWorker mount the model-cache-pvc, but the Frontend service does not. Verify if this is intentional.

examples/backends/sglang/deploy/agg_router.yaml (1)

20-26: Consider adding volumeMount for Frontend service.

The decode service mounts the model-cache-pvc, but the Frontend service does not. Verify if this is intentional.

examples/backends/trtllm/deploy/README.md (2)

116-124: Clear documentation of PVC prerequisite.

The README well documents the requirement to pre-create the model-cache-pvc. This is critical for successful deployments and is appropriately emphasized.


189-194: Verify PVC manifest filename consistency.

The deployment instructions reference model_cache_pvc.yaml. Confirm this filename matches across all backend deployment directories (SGLang, TRT-LLM, vLLM).

examples/backends/vllm/deploy/disagg_kvbm.yaml (1)

46-49: Add missing imagePullPolicy for consistency.

Other deployment files in this PR include imagePullPolicy: IfNotPresent for the ModelExpress container (e.g., disagg-multinode.yaml line 123). Add it here to ensure consistent pull behavior across all deployments.

       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
+          imagePullPolicy: IfNotPresent
           env:

Likely an incorrect or invalid review comment.

examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)

12-18: Verify service DNS naming for MODEL_EXPRESS_URL.

The MODEL_EXPRESS_URL at line 18 assumes the Kubernetes service DNS name follows the pattern {dynamoNamespace}-modelexpress. Ensure that the Dynamo operator generates service names using this convention (service name ModelExpress with dynamoNamespace vllm-disagg-kvbm-tp2vllm-disagg-kvbm-tp2-modelexpress).

examples/backends/sglang/deploy/agg_logging.yaml (1)

107-110: Verify the --skip-tokenizer-init flag addition.

The --skip-tokenizer-init flag is added to the decode service at line 107, which differs from the vLLM deployments in the PR. Confirm this flag is appropriate for this SGLang logging deployment and consider documenting its purpose in a comment if it's specific to this configuration.

examples/backends/vllm/deploy/disagg-multinode.yaml (1)

9-18: Verify MODEL_EXPRESS_URL naming for multinode deployment.

The file name is disagg-multinode.yaml but the MODEL_EXPRESS_URL at line 18 is http://vllm-disagg-modelexpress:8000, omitting the -multinode suffix. Confirm this is intentional—if the multinode variant should have a distinct ModelExpress service, the URL should be http://vllm-disagg-multinode-modelexpress:8000.

Comment thread examples/backends/sglang/deploy/agg.yaml
Comment thread examples/backends/trtllm/deploy/agg_router.yaml
Comment thread examples/backends/trtllm/deploy/disagg_planner.yaml
Comment on lines +131 to +149
command:
- /bin/sh
- -c
args:
- |
echo "Setting up Model Express configuration..."

mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
server_endpoint: http://localhost:8000
timeout_secs: null
EOF

./modelexpress-server &

SERVER_PID=$!
echo "Server started with PID: $SERVER_PID"
wait $SERVER_PID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling to the ModelExpress startup script.

The initialization script lacks error checks. If mkdir fails or the config file write fails, the server won't start correctly, but errors go undetected because the background process launch succeeds.

Apply this diff to add error handling:

           args:
             - |
               echo "Setting up Model Express configuration..."
-
-              mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
+              
+              if ! mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY; then
+                echo "Failed to create cache directory" >&2
+                exit 1
+              fi
+              
-              cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+              if ! cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
               local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
               server_endpoint: http://localhost:8000
               timeout_secs: null
-              EOF
-
+              EOF
+              then
+                echo "Failed to write config.yaml" >&2
+                exit 1
+              fi
+              
               ./modelexpress-server &
-
+              
               SERVER_PID=$!
               echo "Server started with PID: $SERVER_PID"
               wait $SERVER_PID
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
command:
- /bin/sh
- -c
args:
- |
echo "Setting up Model Express configuration..."
mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
server_endpoint: http://localhost:8000
timeout_secs: null
EOF
./modelexpress-server &
SERVER_PID=$!
echo "Server started with PID: $SERVER_PID"
wait $SERVER_PID
command:
- /bin/sh
- -c
args:
- |
echo "Setting up Model Express configuration..."
if ! mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY; then
echo "Failed to create cache directory" >&2
exit 1
fi
if ! cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
server_endpoint: http://localhost:8000
timeout_secs: null
EOF
then
echo "Failed to write config.yaml" >&2
exit 1
fi
./modelexpress-server &
SERVER_PID=$!
echo "Server started with PID: $SERVER_PID"
wait $SERVER_PID
🤖 Prompt for AI Agents
In examples/backends/trtllm/deploy/disagg-multinode.yaml around lines 131-149,
the startup script for Model Express runs mkdir, writes the config and launches
the server with no failure checks; add explicit error handling: fail fast (e.g.,
set -o errexit -o pipefail) or check the exit status after mkdir and the config
heredoc write and echo a clear error and exit non‑zero if they fail, write the
config to a temp file and mv it into place only on success to avoid partial
files, start the modelexpress-server and verify it actually launched (check exit
code and that $SERVER_PID is non-empty) and trap EXIT/ERR to kill the server on
failures so the pod/container exits with a non‑zero status instead of silently
continuing.

Comment thread examples/backends/vllm/deploy/disagg_router.yaml
Comment on lines +46 to +48
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add missing imagePullPolicy for consistency.

Like disagg_kvbm.yaml, this file is missing imagePullPolicy: IfNotPresent. Add it to maintain consistency with other deployments.

       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
+          imagePullPolicy: IfNotPresent
           env:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
imagePullPolicy: IfNotPresent
🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/disagg.yaml around lines 46 to 48, the
mainContainer block is missing an imagePullPolicy setting; add imagePullPolicy:
IfNotPresent under mainContainer (alongside image) to match disagg_kvbm.yaml and
maintain consistency across deployments.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 6, 2025

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the Stale label Dec 6, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

@github-actions github-actions Bot closed this Dec 11, 2025
@github-actions github-actions Bot deleted the hwoo/update-dynamo-deploy-yaml branch December 11, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant