Conversation
WalkthroughAdds a Triton Inference Server example to the custom inference runtime docs, including a full ClusterServingRuntime YAML configured for NVIDIA GPUs, startup commands/env vars, resource settings, startupProbe, supportedModelFormats, usage steps, and an update to the runtime comparison table. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`:
- Around line 329-333: Add a Kubernetes startupProbe entry to the Triton runtime
YAML so the pod is not considered ready until the model finishes loading;
specifically, insert a startupProbe block (mirroring the pattern used in other
runtimes) immediately before the supportedModelFormats section in the Triton
runtime example (near the runAsUser key and before supportedModelFormats: -
name: triton) to probe the model server endpoint until it is healthy.
🧹 Nitpick comments (2)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (2)
308-312: Unused environment variableMODEL_REPO.The
MODEL_REPOenvironment variable is defined on lines 311-312 but is not used anywhere in the container command (lines 302-307). Either remove it or use it in the command if it's intended for some purpose.🔧 Suggested fix: Remove unused environment variable
env: - name: OMP_NUM_THREADS value: "1" - - name: MODEL_REPO - value: '{{ index .Annotations "aml-model-repo" }}' image: 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3
313-313: Internal registry image may not be accessible to users.The image
152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3appears to reference an internal registry. Consider adding a comment similar to other examples, or use the official NVIDIA NGC image reference (e.g.,nvcr.io/nvidia/tritonserver:25.02-py3) for better accessibility.🔧 Suggested fix: Use official NVIDIA image
- image: 152-231-registry.alauda.cn:60070/mlops/tritonserver:25.02-py3 + image: nvcr.io/nvidia/tritonserver:25.02-py3 # Replace with your actual image if needed
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Show resolved
Hide resolved
Deploying alauda-ai with
|
| Latest commit: |
217ba40
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://49a3ab18.alauda-ai.pages.dev |
| Branch Preview URL: | https://add-triton-rt.alauda-ai.pages.dev |
|
/test-pass |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.