feat: Update Dynamo k8s deployment example to use ModelExpress#4112
feat: Update Dynamo k8s deployment example to use ModelExpress#4112nv-hwoo wants to merge 9 commits into
Conversation
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
Signed-off-by: Hyunjae Woo <hwoo@nvidia.com>
WalkthroughMultiple backend deployment configurations (SGLang, TRT-LLM, vLLM) enhanced with ModelExpress shared model caching service. Changes introduce persistent volume claims, environment variables for cache paths, and volume mounts across decode/prefill components. Documentation updated to reflect new architecture and deployment prerequisites. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring attention:
Poem
Pre-merge checks✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (8)
examples/backends/sglang/deploy/model_cache_pvc.yaml (1)
11-11: Verify storage class availability.Same note as vllm PVC—the
storageClassName: csi-mounted-fs-path-scmust exist in your target cluster. See vllm PVC review for verification steps.examples/backends/trtllm/deploy/disagg_router.yaml (1)
60-78: Apply same error handling to ModelExpress startup script (duplicate issue).This file has identical startup script robustness issue as disagg-multinode.yaml. Apply the same error-handling fixes to lines 60–78.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
examples/backends/sglang/deploy/disagg_planner.yaml (1)
59-77: Apply error handling to ModelExpress startup script (duplicate issue #3).This file has the same startup script robustness gaps. Apply consistent error handling across all deployment files.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
examples/backends/vllm/deploy/disagg_kvbm.yaml (1)
57-75: Apply error handling to ModelExpress startup script (duplicate issue #4).Same startup script robustness gaps. Apply consistent error handling.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
examples/backends/vllm/deploy/disagg.yaml (1)
56-74: Apply error handling to ModelExpress startup script (duplicate issue #5).Same startup script robustness gaps. Apply consistent error handling.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
examples/backends/trtllm/deploy/disagg.yaml (1)
57-75: Apply error handling to ModelExpress startup script (duplicate issue #6).Same startup script robustness gaps. Apply consistent error handling.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
examples/backends/sglang/deploy/disagg-multinode.yaml (1)
65-83: Apply error handling to ModelExpress startup script (duplicate issue #7).Same startup script robustness gaps. Apply consistent error handling.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
examples/backends/trtllm/deploy/agg-with-config.yaml (1)
78-96: Apply error handling to ModelExpress startup script (duplicate issue #8).Same startup script robustness gaps. Apply consistent error handling.
See the fix suggested in disagg-multinode.yaml at lines 131–149.
🧹 Nitpick comments (14)
examples/backends/vllm/deploy/README.md (1)
94-104: Clarify mandatory storage class customization requirement.The note at Line 104 is good, but the storage class requirement deserves more prominence since the deployment will fail silently if the class doesn't exist. Consider adding a warning box or emphasizing this in the prerequisites section.
Consider restructuring like this:
### Persistent Volume Claim (PVC) All templates expect a pre-created PVC named `model-cache-pvc` for the shared model cache. ⚠️ **IMPORTANT**: The default `storageClassName: csi-mounted-fs-path-sc` must exist in your cluster. If unavailable, edit `model_cache_pvc.yaml` and set `storageClassName` to one that exists in your cluster (check with `kubectl get storageclass`). Apply the PVC once per namespace: ```bash kubectl apply -f model_cache_pvc.yaml -n $NAMESPACE</blockquote></details> <details> <summary>examples/backends/sglang/deploy/README.md (1)</summary><blockquote> `85-95`: **Clarify mandatory storage class customization requirement.** Same recommendation as vllm README (line 94-104): emphasize that the storage class must exist in the target cluster. Storage class mismatch is a common deployment failure. </blockquote></details> <details> <summary>examples/backends/trtllm/deploy/model_cache_pvc.yaml (1)</summary><blockquote> `9-16`: **Storage size may require customization based on model sizes.** The 256Gi capacity assumes models fit within this limit. While reasonable as a default, consider documenting the sizing guidelines and providing guidance on resizing for larger models. </blockquote></details> <details> <summary>examples/backends/sglang/deploy/disagg.yaml (1)</summary><blockquote> `61-75`: **Add error handling to ModelExpress startup script.** The shell script lacks error handling for critical operations. If `mkdir -p` or `cat > config.yaml` fails, the server will start with incomplete configuration. ```diff - args: - - | - echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + - | + echo "Setting up Model Express configuration..." + + mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; } + cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null EOF - + || { echo "Failed to create config.yaml"; exit 1; } + ./modelexpress-server &examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)
61-75: Add error handling to ModelExpress startup script.The shell script lacks error handling for critical operations. If
mkdir -porcat > config.yamlfails, the server will start with incomplete configuration.- args: - - | - echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + - | + echo "Setting up Model Express configuration..." + + mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; } + cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null EOF - + || { echo "Failed to create config.yaml"; exit 1; } + ./modelexpress-server &examples/backends/sglang/deploy/agg_router.yaml (1)
64-78: Add error handling to ModelExpress startup script.The shell script lacks error handling for critical operations. If
mkdir -porcat > config.yamlfails, the server will start with incomplete configuration.- args: - - | - echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + - | + echo "Setting up Model Express configuration..." + + mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; } + cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null EOF - + || { echo "Failed to create config.yaml"; exit 1; } + ./modelexpress-server &examples/backends/vllm/deploy/disagg_planner.yaml (1)
62-77: Add error handling to ModelExpress startup script.The shell script lacks error handling for critical operations. If
mkdir -porcat > config.yamlfails, the server will start with incomplete configuration.- args: - - | - echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + - | + echo "Setting up Model Express configuration..." + + mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; } + cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null EOF - + || { echo "Failed to create config.yaml"; exit 1; } + ./modelexpress-server &examples/backends/vllm/deploy/agg.yaml (1)
61-75: Add error handling to ModelExpress startup script.The shell script lacks error handling for critical operations. If
mkdir -porcat > config.yamlfails, the server will start with incomplete configuration.- args: - - | - echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + - | + echo "Setting up Model Express configuration..." + + mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; } + cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null EOF - + || { echo "Failed to create config.yaml"; exit 1; } + ./modelexpress-server &examples/backends/trtllm/deploy/agg.yaml (1)
61-75: Add error handling to ModelExpress startup script.The shell script lacks error handling for critical operations. If
mkdir -porcat > config.yamlfails, the server will start with incomplete configuration.- args: - - | - echo "Setting up Model Express configuration..." - - mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY - cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF + - | + echo "Setting up Model Express configuration..." + + mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY || { echo "Failed to create cache directory"; exit 1; } + cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF local_path: $MODEL_EXPRESS_CACHE_DIRECTORY server_endpoint: http://localhost:8000 timeout_secs: null EOF - + || { echo "Failed to create config.yaml"; exit 1; } + ./modelexpress-server &examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)
57-75: Add PodSecurityContext to ModelExpress service and verify shell script robustness.The startup script at lines 57-75 uses backgrounding (
&) withwait, which should work but relies on proper environment variable expansion. More importantly, the ModelExpress service is missing a PodSecurityContext that should be set at the pod level.Per the learnings on Dynamo's security posture, add a PodSecurityContext with runAsUser, runAsGroup, and fsGroup set to 1000 to the extraPodSpec:
extraPodSpec: + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 mainContainer:Additionally, verify that:
- The shell variable
$MODEL_EXPRESS_CACHE_DIRECTORYis correctly expanded from the global envs when the container starts.- The
wait $SERVER_PIDpattern properly keeps the container alive; ifmodelexpress-serverexits unexpectedly, the container will exit, which is the desired behavior.- The relative path
./modelexpress-serverresolves correctly; consider using an absolute path or ensuring the working directory is set appropriately.examples/backends/sglang/deploy/agg_logging.yaml (1)
29-80: Add PodSecurityContext to ModelExpress service.The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).
Apply this diff:
extraPodSpec: + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 mainContainer:examples/backends/vllm/deploy/disagg-multinode.yaml (1)
35-86: Add PodSecurityContext to ModelExpress service.The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).
Apply this diff:
extraPodSpec: + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 mainContainer:examples/backends/vllm/deploy/agg_kvbm.yaml (1)
27-78: Add PodSecurityContext to ModelExpress service.The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).
Apply this diff:
ModelExpress: envFromSecret: hf-token-secret dynamoNamespace: vllm-agg-kvbm componentType: frontend readinessProbe: tcpSocket: port: 8000 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 replicas: 1 resources: requests: cpu: "4" memory: "16Gi" limits: cpu: "4" memory: "16Gi" extraPodSpec: + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 mainContainer:examples/backends/vllm/deploy/agg_router.yaml (1)
30-81: Add PodSecurityContext to ModelExpress service.The ModelExpress service is missing a PodSecurityContext consistent with Dynamo's security best practices (learnings reference).
Apply this diff:
ModelExpress: envFromSecret: hf-token-secret dynamoNamespace: vllm-agg-router componentType: frontend readinessProbe: tcpSocket: port: 8000 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 replicas: 1 resources: requests: cpu: "4" memory: "16Gi" limits: cpu: "4" memory: "16Gi" extraPodSpec: + securityContext: + runAsUser: 1000 + runAsGroup: 1000 + fsGroup: 1000 mainContainer:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
examples/backends/sglang/deploy/README.md(4 hunks)examples/backends/sglang/deploy/agg.yaml(3 hunks)examples/backends/sglang/deploy/agg_logging.yaml(3 hunks)examples/backends/sglang/deploy/agg_router.yaml(3 hunks)examples/backends/sglang/deploy/disagg-multinode.yaml(4 hunks)examples/backends/sglang/deploy/disagg.yaml(4 hunks)examples/backends/sglang/deploy/disagg_planner.yaml(4 hunks)examples/backends/sglang/deploy/model_cache_pvc.yaml(1 hunks)examples/backends/trtllm/deploy/README.md(5 hunks)examples/backends/trtllm/deploy/agg-with-config.yaml(3 hunks)examples/backends/trtllm/deploy/agg.yaml(3 hunks)examples/backends/trtllm/deploy/agg_router.yaml(3 hunks)examples/backends/trtllm/deploy/disagg-multinode.yaml(3 hunks)examples/backends/trtllm/deploy/disagg.yaml(4 hunks)examples/backends/trtllm/deploy/disagg_planner.yaml(4 hunks)examples/backends/trtllm/deploy/disagg_router.yaml(4 hunks)examples/backends/trtllm/deploy/model_cache_pvc.yaml(1 hunks)examples/backends/vllm/deploy/README.md(4 hunks)examples/backends/vllm/deploy/agg.yaml(3 hunks)examples/backends/vllm/deploy/agg_kvbm.yaml(4 hunks)examples/backends/vllm/deploy/agg_router.yaml(3 hunks)examples/backends/vllm/deploy/disagg-multinode.yaml(4 hunks)examples/backends/vllm/deploy/disagg.yaml(4 hunks)examples/backends/vllm/deploy/disagg_kvbm.yaml(4 hunks)examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml(4 hunks)examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml(4 hunks)examples/backends/vllm/deploy/disagg_planner.yaml(4 hunks)examples/backends/vllm/deploy/disagg_router.yaml(4 hunks)examples/backends/vllm/deploy/model_cache_pvc.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-10-16T13:35:33.710Z
Learnt from: grahamking
Repo: ai-dynamo/dynamo PR: 3659
File: lib/llm/src/common/checked_file.rs:113-124
Timestamp: 2025-10-16T13:35:33.710Z
Learning: In the dynamo project, model deployment cards stored in etcd are cleared by lease expiration, so there's no persistence of old card data across system restarts or upgrades.
Applied to files:
examples/backends/sglang/deploy/disagg_planner.yaml
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Applied to files:
examples/backends/trtllm/deploy/disagg_planner.yaml
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
examples/backends/vllm/deploy/agg_kvbm.yamlexamples/backends/vllm/deploy/agg.yaml
📚 Learning: 2025-06-11T21:18:00.425Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
Applied to files:
examples/backends/vllm/deploy/agg_kvbm.yaml
📚 Learning: 2025-07-18T16:04:31.771Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
Applied to files:
examples/backends/sglang/deploy/README.md
📚 Learning: 2025-10-24T04:21:08.751Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 3858
File: recipes/deepseek-r1/model-cache/model-download.yaml:18-32
Timestamp: 2025-10-24T04:21:08.751Z
Learning: In the recipes directory structure, model-specific recipes (e.g., recipes/deepseek-r1/, recipes/llama-3-70b/) contain hardcoded model names and revisions in their Kubernetes manifests (like model-download.yaml). Each recipe directory is deployment-specific and self-contained, so hardcoding model-specific values is the intended design pattern.
Applied to files:
examples/backends/sglang/deploy/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: trtllm (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (11)
examples/backends/vllm/deploy/model_cache_pvc.yaml (1)
11-11: Storage class should follow established project patterns for consistency.The hardcoded
csi-mounted-fs-path-scis inconsistent with how other examples and recipes handle storage. The recipe files and documentation examples use placeholders (e.g.,"your-storage-class-name") with comments explaining expected values. Additionally,deploy/utils/manifests/pvc.yamlomitsstorageClassNameentirely to use the cluster's default storage class.Consider aligning this with one of these patterns:
- Use a placeholder with clarifying comment:
storageClassName: "your-storage-class-name" # e.g., csi-mounted-fs-path-sc, nfs, efs- Omit
storageClassNameto rely on the cluster's default (verified by pre-deployment checks)- Document which specific storage class is required and why in accompanying README
examples/backends/trtllm/deploy/model_cache_pvc.yaml (1)
11-11: Verify storage class availability in target clusters.The
storageClassName: csi-mounted-fs-path-scis non-standard and environment-specific. Deployments will fail if this storage class doesn't exist in the cluster.
- Is this storage class documented as a prerequisite for Dynamo Cloud installations?
- Should this be configurable or use a more portable storage class as a fallback?
- Does the README adequately warn users about this prerequisite?
examples/backends/sglang/deploy/disagg.yaml (1)
20-26: Consider adding volumeMount for Frontend service.The
decodeandprefillservices mount themodel-cache-pvc, but theFrontendservice does not. Verify if this is intentional. If the Frontend also needs access to cached models or shared configuration, add the volumeMount.examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (1)
20-26: Consider adding volumeMount for Frontend service.The
VllmDecodeWorkerandVllmPrefillWorkermount themodel-cache-pvc, but theFrontendservice does not. Verify if this is intentional.examples/backends/sglang/deploy/agg_router.yaml (1)
20-26: Consider adding volumeMount for Frontend service.The
decodeservice mounts themodel-cache-pvc, but theFrontendservice does not. Verify if this is intentional.examples/backends/trtllm/deploy/README.md (2)
116-124: Clear documentation of PVC prerequisite.The README well documents the requirement to pre-create the model-cache-pvc. This is critical for successful deployments and is appropriately emphasized.
189-194: Verify PVC manifest filename consistency.The deployment instructions reference
model_cache_pvc.yaml. Confirm this filename matches across all backend deployment directories (SGLang, TRT-LLM, vLLM).examples/backends/vllm/deploy/disagg_kvbm.yaml (1)
46-49: Add missingimagePullPolicyfor consistency.Other deployment files in this PR include
imagePullPolicy: IfNotPresentfor the ModelExpress container (e.g., disagg-multinode.yaml line 123). Add it here to ensure consistent pull behavior across all deployments.extraPodSpec: mainContainer: image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag + imagePullPolicy: IfNotPresent env:Likely an incorrect or invalid review comment.
examples/backends/vllm/deploy/disagg_kvbm_tp2.yaml (1)
12-18: Verify service DNS naming for MODEL_EXPRESS_URL.The MODEL_EXPRESS_URL at line 18 assumes the Kubernetes service DNS name follows the pattern
{dynamoNamespace}-modelexpress. Ensure that the Dynamo operator generates service names using this convention (service nameModelExpresswith dynamoNamespacevllm-disagg-kvbm-tp2→vllm-disagg-kvbm-tp2-modelexpress).examples/backends/sglang/deploy/agg_logging.yaml (1)
107-110: Verify the--skip-tokenizer-initflag addition.The
--skip-tokenizer-initflag is added to the decode service at line 107, which differs from the vLLM deployments in the PR. Confirm this flag is appropriate for this SGLang logging deployment and consider documenting its purpose in a comment if it's specific to this configuration.examples/backends/vllm/deploy/disagg-multinode.yaml (1)
9-18: Verify MODEL_EXPRESS_URL naming for multinode deployment.The file name is
disagg-multinode.yamlbut the MODEL_EXPRESS_URL at line 18 ishttp://vllm-disagg-modelexpress:8000, omitting the-multinodesuffix. Confirm this is intentional—if the multinode variant should have a distinct ModelExpress service, the URL should behttp://vllm-disagg-multinode-modelexpress:8000.
| command: | ||
| - /bin/sh | ||
| - -c | ||
| args: | ||
| - | | ||
| echo "Setting up Model Express configuration..." | ||
|
|
||
| mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY | ||
| cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF | ||
| local_path: $MODEL_EXPRESS_CACHE_DIRECTORY | ||
| server_endpoint: http://localhost:8000 | ||
| timeout_secs: null | ||
| EOF | ||
|
|
||
| ./modelexpress-server & | ||
|
|
||
| SERVER_PID=$! | ||
| echo "Server started with PID: $SERVER_PID" | ||
| wait $SERVER_PID |
There was a problem hiding this comment.
Add error handling to the ModelExpress startup script.
The initialization script lacks error checks. If mkdir fails or the config file write fails, the server won't start correctly, but errors go undetected because the background process launch succeeds.
Apply this diff to add error handling:
args:
- |
echo "Setting up Model Express configuration..."
-
- mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY
+
+ if ! mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY; then
+ echo "Failed to create cache directory" >&2
+ exit 1
+ fi
+
- cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
+ if ! cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF
local_path: $MODEL_EXPRESS_CACHE_DIRECTORY
server_endpoint: http://localhost:8000
timeout_secs: null
- EOF
-
+ EOF
+ then
+ echo "Failed to write config.yaml" >&2
+ exit 1
+ fi
+
./modelexpress-server &
-
+
SERVER_PID=$!
echo "Server started with PID: $SERVER_PID"
wait $SERVER_PID📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| command: | |
| - /bin/sh | |
| - -c | |
| args: | |
| - | | |
| echo "Setting up Model Express configuration..." | |
| mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY | |
| cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF | |
| local_path: $MODEL_EXPRESS_CACHE_DIRECTORY | |
| server_endpoint: http://localhost:8000 | |
| timeout_secs: null | |
| EOF | |
| ./modelexpress-server & | |
| SERVER_PID=$! | |
| echo "Server started with PID: $SERVER_PID" | |
| wait $SERVER_PID | |
| command: | |
| - /bin/sh | |
| - -c | |
| args: | |
| - | | |
| echo "Setting up Model Express configuration..." | |
| if ! mkdir -p $MODEL_EXPRESS_CACHE_DIRECTORY; then | |
| echo "Failed to create cache directory" >&2 | |
| exit 1 | |
| fi | |
| if ! cat > $MODEL_EXPRESS_CACHE_DIRECTORY/config.yaml << EOF | |
| local_path: $MODEL_EXPRESS_CACHE_DIRECTORY | |
| server_endpoint: http://localhost:8000 | |
| timeout_secs: null | |
| EOF | |
| then | |
| echo "Failed to write config.yaml" >&2 | |
| exit 1 | |
| fi | |
| ./modelexpress-server & | |
| SERVER_PID=$! | |
| echo "Server started with PID: $SERVER_PID" | |
| wait $SERVER_PID |
🤖 Prompt for AI Agents
In examples/backends/trtllm/deploy/disagg-multinode.yaml around lines 131-149,
the startup script for Model Express runs mkdir, writes the config and launches
the server with no failure checks; add explicit error handling: fail fast (e.g.,
set -o errexit -o pipefail) or check the exit status after mkdir and the config
heredoc write and echo a clear error and exit non‑zero if they fail, write the
config to a temp file and mv it into place only on success to avoid partial
files, start the modelexpress-server and verify it actually launched (check exit
code and that $SERVER_PID is non-empty) and trap EXIT/ERR to kill the server on
failures so the pod/container exits with a non‑zero status instead of silently
continuing.
| extraPodSpec: | ||
| mainContainer: | ||
| image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add missing imagePullPolicy for consistency.
Like disagg_kvbm.yaml, this file is missing imagePullPolicy: IfNotPresent. Add it to maintain consistency with other deployments.
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag
+ imagePullPolicy: IfNotPresent
env:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| extraPodSpec: | |
| mainContainer: | |
| image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag | |
| extraPodSpec: | |
| mainContainer: | |
| image: nvcr.io/nvidia/ai-dynamo/modelexpress-server:my-tag | |
| imagePullPolicy: IfNotPresent |
🤖 Prompt for AI Agents
In examples/backends/vllm/deploy/disagg.yaml around lines 46 to 48, the
mainContainer block is missing an imagePullPolicy setting; add imagePullPolicy:
IfNotPresent under mainContainer (alongside image) to match disagg_kvbm.yaml and
maintain consistency across deployments.
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
Summary
As the title says, the PR integrates ModelExpress into Dynamo k8s deployments. Deployment now starts ModelExpress server with PVC attached (shared with the Dynamo workers) so that models are managed in a central location and shared across different workers. This avoids duplicate model download happening on each worker.
Details:
Summary by CodeRabbit
New Features
Documentation