feat: Update Dynamo k8s deployment example to use ModelExpress by nv-hwoo · Pull Request #4112 · ai-dynamo/dynamo

nv-hwoo · 2025-11-05T00:49:51Z

Summary

As the title says, the PR integrates ModelExpress into Dynamo k8s deployments. Deployment now starts ModelExpress server with PVC attached (shared with the Dynamo workers) so that models are managed in a central location and shared across different workers. This avoids duplicate model download happening on each worker.

Details:

Start modelexpress server along with frontend and workers for model management
Add PVC manifest shared across modelexpress server and workers
Update README docs with the new workflow

Summary by CodeRabbit

New Features
- Integrated ModelExpress service for shared model caching across all deployment architectures.
- Added persistent volume support with 256Gi model cache storage allocation.
- Implemented cache sharing across inference workers and backend services.
Documentation
- Updated deployment guides for SGLang, TensorRT-LLM, and vLLM backends.
- Added PVC creation instructions to deployment workflows.

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

coderabbitai · 2025-11-05T00:57:15Z

Walkthrough

Multiple backend deployment configurations (SGLang, TRT-LLM, vLLM) enhanced with ModelExpress shared model caching service. Changes introduce persistent volume claims, environment variables for cache paths, and volume mounts across decode/prefill components. Documentation updated to reflect new architecture and deployment prerequisites.

Changes

Cohort / File(s)	Summary
Documentation & Setup `examples/backends/sglang/deploy/README.md`, `examples/backends/trtllm/deploy/README.md`, `examples/backends/vllm/deploy/README.md`	Added ModelExpress references to architecture diagrams, introduced PVC documentation block, and included deployment steps for applying model-cache-pvc manifest prior to deployments.
PVC Manifests `examples/backends/sglang/deploy/model_cache_pvc.yaml`, `examples/backends/trtllm/deploy/model_cache_pvc.yaml`, `examples/backends/vllm/deploy/model_cache_pvc.yaml`	New Kubernetes PersistentVolumeClaim manifests (256Gi, ReadWriteMany access, csi-mounted-fs-path-sc storage class) enabling shared model cache across workers.
SGLang Deployments `examples/backends/sglang/deploy/agg.yaml`, `examples/backends/sglang/deploy/agg_logging.yaml`, `examples/backends/sglang/deploy/agg_router.yaml`, `examples/backends/sglang/deploy/disagg.yaml`, `examples/backends/sglang/deploy/disagg-multinode.yaml`, `examples/backends/sglang/deploy/disagg_planner.yaml`	Added ModelExpress service with readinessProbe, resources, and startup script; introduced model-cache-pvc references; added cache directory environment variables; extended decode/prefill/planner components with volume mounts for shared cache access.
TRT-LLM Deployments `examples/backends/trtllm/deploy/agg.yaml`, `examples/backends/trtllm/deploy/agg-with-config.yaml`, `examples/backends/trtllm/deploy/agg_router.yaml`, `examples/backends/trtllm/deploy/disagg.yaml`, `examples/backends/trtllm/deploy/disagg-multinode.yaml`, `examples/backends/trtllm/deploy/disagg_router.yaml`, `examples/backends/trtllm/deploy/disagg_planner.yaml`	Integrated ModelExpress service with configuration (envFromSecret, replicas, resource limits); wired persistent volume claims; mounted model-cache-pvc across TRTLLMWorker, prefill, and decode components; established cache initialization via startup scripts.
vLLM Deployments `examples/backends/vllm/deploy/agg.yaml`, `examples/backends/vllm/deploy/agg_kvbm.yaml`, `examples/backends/vllm/deploy/agg_router.yaml`, `examples/backends/vllm/deploy/disagg.yaml`, `examples/backends/vllm/deploy/disagg-multinode.yaml`, `examples/backends/vllm/deploy/disagg_kvbm.yaml`, `examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml`, `examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml`, `examples/backends/vllm/deploy/disagg_planner.yaml`, `examples/backends/vllm/deploy/disagg_router.yaml`	Added ModelExpress service component with configuration for port, logging, database path; introduced pvcs and envs fields at spec level; mounted model-cache-pvc in VllmDecodeWorker and VllmPrefillWorker for shared cache; updated Frontend image references.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring attention:

ModelExpress integration consistency: Verify startup script logic across all files creates cache directories, writes config.yaml correctly, and launches servers with proper PID handling
Volume mount wiring: Confirm model-cache-pvc is correctly mounted at /model across all decode/prefill/planner/frontend components in each backend
Environment variable propagation: Check MODEL_EXPRESS_CACHE_DIRECTORY, HF_HUB_CACHE, and MODEL_EXPRESS_URL are consistently defined and referenced
PVC creation semantics: Verify create: false is correctly applied in all deployment YAML files (PVC managed separately via model_cache_pvc.yaml)
Backend-specific variations: Watch for subtle differences between SGLang, TRT-LLM, and vLLM configurations (e.g., different namespaces, image tags, resource allocations)

Poem

🐰 A cache unfolded, shared with care,
ModelExpress dancing through the air,
Volumes mounted, PVCs aligned,
With vLLM, TRT, SGLang combined!
Hopping through deployments, disaggregated so fine,
Shared model memories—now they're mine! 🥕✨

Pre-merge checks

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: integrating ModelExpress into the Dynamo Kubernetes deployment examples for centralized model management.
Description check	✅ Passed	The PR description covers the Overview and Details sections, explaining the integration of ModelExpress, PVC usage, and documentation updates. However, it lacks the 'Where should the reviewer start?' section and does not reference related GitHub issues.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (8)

examples/backends/sglang/deploy/model_cache_pvc.yaml (1)

11-11: Verify storage class availability.

Same note as vllm PVC—the storageClassName: csi-mounted-fs-path-sc must exist in your target cluster. See vllm PVC review for verification steps.

examples/backends/trtllm/deploy/disagg_router.yaml (1)

60-78: Apply same error handling to ModelExpress startup script (duplicate issue).

This file has identical startup script robustness issue as disagg-multinode.yaml. Apply the same error-handling fixes to lines 60–78.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/sglang/deploy/disagg_planner.yaml (1)

59-77: Apply error handling to ModelExpress startup script (duplicate issue #3).

This file has the same startup script robustness gaps. Apply consistent error handling across all deployment files.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/vllm/deploy/disagg_kvbm.yaml (1)

57-75: Apply error handling to ModelExpress startup script (duplicate issue #4).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/vllm/deploy/disagg.yaml (1)

56-74: Apply error handling to ModelExpress startup script (duplicate issue #5).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/trtllm/deploy/disagg.yaml (1)

57-75: Apply error handling to ModelExpress startup script (duplicate issue #6).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/sglang/deploy/disagg-multinode.yaml (1)

65-83: Apply error handling to ModelExpress startup script (duplicate issue #7).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

examples/backends/trtllm/deploy/agg-with-config.yaml (1)

78-96: Apply error handling to ModelExpress startup script (duplicate issue #8).

Same startup script robustness gaps. Apply consistent error handling.

See the fix suggested in disagg-multinode.yaml at lines 131–149.

🧹 Nitpick comments (14)

examples/backends/vllm/deploy/README.md (1)
94-104: Clarify mandatory storage class customization requirement.

The note at Line 104 is good, but the storage class requirement deserves more prominence since the deployment will fail silently if the class doesn't exist. Consider adding a warning box or emphasizing this in the prerequisites section.

Consider restructuring like this:
### Persistent Volume Claim (PVC)

All templates expect a pre-created PVC named `model-cache-pvc` for the shared model cache.

⚠️ **IMPORTANT**: The default `storageClassName: csi-mounted-fs-path-sc` must exist in your cluster. If unavailable, edit `model_cache_pvc.yaml` and set `storageClassName` to one that exists in your cluster (check with `kubectl get storageclass`).

Apply the PVC once per namespace:

```bash
kubectl apply -f model_cache_pvc.yaml -n $NAMESPACE
</blockquote></details>
<details>
<summary>examples/backends/sglang/deploy/README.md (1)</summary><blockquote>

`85-95`: **Clarify mandatory storage class customization requirement.**

Same recommendation as vllm README (line 94-104): emphasize that the storage class must exist in the target cluster. Storage class mismatch is a common deployment failure.

</blockquote></details>
<details>
<summary>examples/backends/trtllm/deploy/model_cache_pvc.yaml (1)</summary><blockquote>

`9-16`: **Storage size may require customization based on model sizes.**

The 256Gi capacity assumes models fit within this limit. While reasonable as a default, consider documenting the sizing guidelines and providing guidance on resizing for larger models.

</blockquote></details>
<details>
<summary>examples/backends/sglang/deploy/disagg.yaml (1)</summary><blockquote>

`61-75`: **Add error handling to ModelExpress startup script.**

The shell script lacks error handling for critical operations. If `mkdir -p` or `cat > config.yaml` fails, the server will start with incomplete configuration.


```diff
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)
61-75: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/sglang/deploy/agg_router.yaml (1)
64-78: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/disagg_planner.yaml (1)
62-77: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/agg.yaml (1)
61-75: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/trtllm/deploy/agg.yaml (1)
61-75: Add error handling to ModelExpress startup script.

The shell script lacks error handling for critical operations. If mkdir -p or cat > config.yaml fails, the server will start with incomplete configuration.
- args:
-   - |
-     echo "Setting up Model Express configuration..."
-
-     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
-     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+   - |
+     echo "Setting up Model Express configuration..."
+
+     mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; }
+     cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
      local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
      server_endpoint: http://localhost:8000
      timeout_secs: null
      EOF
-
+     || { echo "Failed to create config.yaml"; exit 1; }
+
      ./modelexpress-server &
examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)
57-75: Add PodSecurityContext to ModelExpress service and verify shell script robustness.

The startup script at lines 57-75 uses backgrounding (&) with wait, which should work but relies on proper environment variable expansion. More importantly, the ModelExpress service is missing a PodSecurityContext that should be set at the pod level.

Per the learnings on Dynamo's security posture, add a PodSecurityContext with runAsUser, runAsGroup, and fsGroup set to 1000 to the extraPodSpec:
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
Additionally, verify that:

The shell variable $MODEL_EXPRESS_CACHE_DIRECTORY is correctly expanded from the global envs when the container starts.

The wait $SERVER_PID pattern properly keeps the container alive; if modelexpress-server exits unexpectedly, the container will exit, which is the desired behavior.

The relative path ./modelexpress-server resolves correctly; consider using an absolute path or ensuring the working directory is set appropriately.
examples/backends/sglang/deploy/agg_logging.yaml (1)
29-80: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
examples/backends/vllm/deploy/disagg-multinode.yaml (1)
35-86: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
examples/backends/vllm/deploy/agg_kvbm.yaml (1)
27-78: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:
    ModelExpress:
      envFromSecret: hf-token-secret
      dynamoNamespace: vllm-agg-kvbm
      componentType: frontend
      readinessProbe:
        tcpSocket:
          port: 8000
        initialDelaySeconds: 10
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 3
      replicas: 1
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:
examples/backends/vllm/deploy/agg_router.yaml (1)
30-81: Add PodSecurityContext to ModelExpress service.

The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).

Apply this diff:
    ModelExpress:
      envFromSecret: hf-token-secret
      dynamoNamespace: vllm-agg-router
      componentType: frontend
      readinessProbe:
        tcpSocket:
          port: 8000
        initialDelaySeconds: 10
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 3
      replicas: 1
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
        limits:
          cpu: "4"
          memory: "16Gi"
      extraPodSpec:
+       securityContext:
+         runAsUser: 1000
+         runAsGroup: 1000
+         fsGroup: 1000
        mainContainer:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83a3fe4 and 6c9c40b.

📒 Files selected for processing (29)

examples/backends/sglang/deploy/README.md (4 hunks)
examples/backends/sglang/deploy/agg.yaml (3 hunks)
examples/backends/sglang/deploy/agg_logging.yaml (3 hunks)
examples/backends/sglang/deploy/agg_router.yaml (3 hunks)
examples/backends/sglang/deploy/disagg-multinode.yaml (4 hunks)
examples/backends/sglang/deploy/disagg.yaml (4 hunks)
examples/backends/sglang/deploy/disagg_planner.yaml (4 hunks)
examples/backends/sglang/deploy/model_cache_pvc.yaml (1 hunks)
examples/backends/trtllm/deploy/README.md (5 hunks)
examples/backends/trtllm/deploy/agg-with-config.yaml (3 hunks)
examples/backends/trtllm/deploy/agg.yaml (3 hunks)
examples/backends/trtllm/deploy/agg_router.yaml (3 hunks)
examples/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
examples/backends/trtllm/deploy/disagg.yaml (4 hunks)
examples/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
examples/backends/trtllm/deploy/disagg_router.yaml (4 hunks)
examples/backends/trtllm/deploy/model_cache_pvc.yaml (1 hunks)
examples/backends/vllm/deploy/README.md (4 hunks)
examples/backends/vllm/deploy/agg.yaml (3 hunks)
examples/backends/vllm/deploy/agg_kvbm.yaml (4 hunks)
examples/backends/vllm/deploy/agg_router.yaml (3 hunks)
examples/backends/vllm/deploy/disagg-multinode.yaml (4 hunks)
examples/backends/vllm/deploy/disagg.yaml (4 hunks)
examples/backends/vllm/deploy/disagg_kvbm.yaml (4 hunks)
examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (4 hunks)
examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (4 hunks)
examples/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
examples/backends/vllm/deploy/disagg_router.yaml (4 hunks)
examples/backends/vllm/deploy/model_cache_pvc.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (6)

📚 Learning: 2025-10-16T13:35:33.710Z

Learnt from: grahamking
Repo: ai-dynamo/dynamo PR: 3659
File: lib/llm/src/common/checked_file.rs:113-124
Timestamp: 2025-10-16T13:35:33.710Z
Learning: In the dynamo project, model deployment cards stored in etcd are cleared by lease expiration, so there's no persistence of old card data across system restarts or upgrades.

Applied to files:

examples/backends/sglang/deploy/disagg_planner.yaml

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

examples/backends/trtllm/deploy/disagg_planner.yaml

📚 Learning: 2025-09-04T19:03:06.643Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

examples/backends/vllm/deploy/agg_kvbm.yaml
examples/backends/vllm/deploy/agg.yaml

📚 Learning: 2025-06-11T21:18:00.425Z

Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.

Applied to files:

examples/backends/vllm/deploy/agg_kvbm.yaml

📚 Learning: 2025-07-18T16:04:31.771Z

Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Applied to files:

examples/backends/sglang/deploy/README.md

📚 Learning: 2025-10-24T04:21:08.751Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.

Applied to files:

examples/backends/sglang/deploy/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: operator (arm64)
GitHub Check: vllm (amd64)
GitHub Check: sglang (arm64)
GitHub Check: vllm (arm64)
GitHub Check: sglang (amd64)
GitHub Check: operator (amd64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (11)

examples/backends/vllm/deploy/model_cache_pvc.yaml (1)

11-11: Storage class should follow established project patterns for consistency.

The hardcoded csi-mounted-fs-path-sc is inconsistent with how other examples and recipes handle storage. The recipe files and documentation examples use placeholders (e.g., "your-storage-class-name") with comments explaining expected values. Additionally, deploy/utils/manifests/pvc.yaml omits storageClassName entirely to use the cluster's default storage class.

Consider aligning this with one of these patterns:

Use a placeholder with clarifying comment: storageClassName: "your-storage-class-name" # e.g., csi-mounted-fs-path-sc, nfs, efs

Omit storageClassName to rely on the cluster's default (verified by pre-deployment checks)

Document which specific storage class is required and why in accompanying README

examples/backends/trtllm/deploy/model_cache_pvc.yaml (1)

11-11: Verify storage class availability in target clusters.

The storageClassName: csi-mounted-fs-path-sc is non-standard and environment-specific. Deployments will fail if this storage class doesn't exist in the cluster.

Is this storage class documented as a prerequisite for Dynamo Cloud installations?

Should this be configurable or use a more portable storage class as a fallback?

Does the README adequately warn users about this prerequisite?

examples/backends/sglang/deploy/disagg.yaml (1)

20-26: Consider adding volumeMount for Frontend service.

The decode and prefill services mount the model-cache-pvc, but the Frontend service does not. Verify if this is intentional. If the Frontend also needs access to cached models or shared configuration, add the volumeMount.

examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)

20-26: Consider adding volumeMount for Frontend service.

The VllmDecodeWorker and VllmPrefillWorker mount the model-cache-pvc, but the Frontend service does not. Verify if this is intentional.

examples/backends/sglang/deploy/agg_router.yaml (1)

20-26: Consider adding volumeMount for Frontend service.

The decode service mounts the model-cache-pvc, but the Frontend service does not. Verify if this is intentional.

examples/backends/trtllm/deploy/README.md (2)

116-124: Clear documentation of PVC prerequisite.

The README well documents the requirement to pre-create the model-cache-pvc. This is critical for successful deployments and is appropriately emphasized.

189-194: Verify PVC manifest filename consistency.

The deployment instructions reference model_cache_pvc.yaml. Confirm this filename matches across all backend deployment directories (SGLang, TRT-LLM, vLLM).
examples/backends/vllm/deploy/disagg_kvbm.yaml (1)
46-49: Add missing imagePullPolicy for consistency.

Other deployment files in this PR include imagePullPolicy: IfNotPresent for the ModelExpress container (e.g., disagg-multinode.yaml line 123). Add it here to ensure consistent pull behavior across all deployments.
       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
+          imagePullPolicy: IfNotPresent
           env:
Likely an incorrect or invalid review comment.
examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)

12-18: Verify service DNS naming for MODEL_EXPRESS_URL.

The MODEL_EXPRESS_URL at line 18 assumes the Kubernetes service DNS name follows the pattern {dynamoNamespace}-modelexpress. Ensure that the Dynamo operator generates service names using this convention (service name ModelExpress with dynamoNamespace vllm-disagg-kvbm-tp2 → vllm-disagg-kvbm-tp2-modelexpress).

examples/backends/sglang/deploy/agg_logging.yaml (1)

107-110: Verify the --skip-tokenizer-init flag addition.

The --skip-tokenizer-init flag is added to the decode service at line 107, which differs from the vLLM deployments in the PR. Confirm this flag is appropriate for this SGLang logging deployment and consider documenting its purpose in a comment if it's specific to this configuration.

examples/backends/vllm/deploy/disagg-multinode.yaml (1)

9-18: Verify MODEL_EXPRESS_URL naming for multinode deployment.

The file name is disagg-multinode.yaml but the MODEL_EXPRESS_URL at line 18 is http://vllm-disagg-modelexpress:8000, omitting the -multinode suffix. Confirm this is intentional—if the multinode variant should have a distinct ModelExpress service, the URL should be http://vllm-disagg-multinode-modelexpress:8000.

coderabbitai · 2025-11-05T00:57:19Z

+          command:
+            - /bin/sh
+            - -c
+          args:
+            - |
+              echo "Setting up Model Express configuration..."
+
+              mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
+              cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+              local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
+              server_endpoint: http://localhost:8000
+              timeout_secs: null
+              EOF
+
+              ./modelexpress-server &
+
+              SERVER_PID=$!
+              echo "Server started with PID: $SERVER_PID"
+              wait $SERVER_PID


⚠️ Potential issue | 🟠 Major

Add error handling to the ModelExpress startup script.

The initialization script lacks error checks. If mkdir fails or the config file write fails, the server won't start correctly, but errors go undetected because the background process launch succeeds.

Apply this diff to add error handling:

args: - | echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY + + if ! mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY; then + echo "Failed to create cache directory" >&2 + exit 1 + fi + - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + if ! cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null - EOF - + EOF + then + echo "Failed to write config.yaml" >&2 + exit 1 + fi + ./modelexpress-server & - + SERVER_PID=$! echo "Server started with PID: $SERVER_PID" wait $SERVER_PID

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

command:

- /bin/sh

- -c

args:

- |

echo "Setting up Model Express configuration..."

mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY

cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF

local_path: $MODEL_EXPRESS_CACHE_DIRECTORY

server_endpoint: http://localhost:8000

timeout_secs: null

EOF

./modelexpress-server &

SERVER_PID=$!

echo "Server started with PID: $SERVER_PID"

wait $SERVER_PID

command:

- /bin/sh

- -c

args:

- |

echo "Setting up Model Express configuration..."

if ! mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY; then

echo "Failed to create cache directory" >&2

exit 1

fi

if ! cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF

local_path: $MODEL_EXPRESS_CACHE_DIRECTORY

server_endpoint: http://localhost:8000

timeout_secs: null

EOF

then

echo "Failed to write config.yaml" >&2

exit 1

fi

./modelexpress-server &

SERVER_PID=$!

echo "Server started with PID: $SERVER_PID"

wait $SERVER_PID

🤖 Prompt for AI Agents

In examples/backends/trtllm/deploy/disagg-multinode.yaml around lines 131-149, the startup script for Model Express runs mkdir, writes the config and launches the server with no failure checks; add explicit error handling: fail fast (e.g., set -o errexit -o pipefail) or check the exit status after mkdir and the config heredoc write and echo a clear error and exit non‑zero if they fail, write the config to a temp file and mv it into place only on success to avoid partial files, start the modelexpress-server and verify it actually launched (check exit code and that $SERVER_PID is non-empty) and trap EXIT/ERR to kill the server on failures so the pod/container exits with a non‑zero status instead of silently continuing.

coderabbitai · 2025-11-05T00:57:20Z

+      extraPodSpec:
+        mainContainer:
+          image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag


🛠️ Refactor suggestion | 🟠 Major

Add missing imagePullPolicy for consistency.

Like disagg_kvbm.yaml, this file is missing imagePullPolicy: IfNotPresent. Add it to maintain consistency with other deployments.

extraPodSpec: mainContainer: image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag + imagePullPolicy: IfNotPresent env:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

extraPodSpec:

mainContainer:

image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag

extraPodSpec:

mainContainer:

image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag

imagePullPolicy: IfNotPresent

🤖 Prompt for AI Agents

In examples/backends/vllm/deploy/disagg.yaml around lines 46 to 48, the mainContainer block is missing an imagePullPolicy setting; add imagePullPolicy: IfNotPresent under mainContainer (alongside image) to match disagg_kvbm.yaml and maintain consistency across deployments.

github-actions · 2025-12-06T09:34:04Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2025-12-11T09:38:25Z

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

nv-hwoo added 9 commits November 4, 2025 16:26

add modelexpress server

9e82db9

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

add modelexpress-server

06fed2b

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

extract out pvc creation

64269d5

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

update readme

50bae8b

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

update sglang deployment

fd0335c

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

update trtllm deployments

211cc62

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

specify storageclass for RWX volume capability

6262a4f

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

refeactor env vars

7b8bac6

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

update workspace dir

6c9c40b

Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>

nv-hwoo requested a review from a team November 5, 2025 00:49

nv-hwoo requested review from a team as code owners November 5, 2025 00:49

pull-request-size Bot added the size/XXL label Nov 5, 2025

github-actions Bot added the feat label Nov 5, 2025

coderabbitai Bot reviewed Nov 5, 2025

View reviewed changes

nv-hwoo requested review from KavinKrishnan, biswapanda and grahamking November 5, 2025 00:58

coderabbitai Bot mentioned this pull request Nov 12, 2025

fix: typo from mountPoint to mountPath #4252

Merged

github-actions Bot added the Stale label Dec 6, 2025

github-actions Bot closed this Dec 11, 2025

github-actions Bot deleted the hwoo/update-dynamo-deploy-yaml branch December 11, 2025 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Update Dynamo k8s deployment example to use ModelExpress#4112

feat: Update Dynamo k8s deployment example to use ModelExpress#4112
nv-hwoo wants to merge 9 commits into
mainfrom
hwoo/update-dynamo-deploy-yaml

nv-hwoo commented Nov 5, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 5, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Nov 5, 2025

Uh oh!

Uh oh!

coderabbitai Bot Nov 5, 2025

Uh oh!

github-actions Bot commented Dec 6, 2025

Uh oh!

github-actions Bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nv-hwoo commented Nov 5, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 5, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 6, 2025

Uh oh!

github-actions Bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nv-hwoo commented Nov 5, 2025 •

edited by coderabbitai Bot

Loading