AI-22071 Add MLServer runtime by fyuan1316 · Pull Request #14 · alauda/aml-docs

fyuan1316 · 2025-08-15T04:41:29Z

Refactor extend runtimes

Add MLServer runtime

Summary by CodeRabbit

Documentation
- Reorganized custom inference runtime guide with a new "Configuration Examples for Runtimes" section.
- Added tabbed templates for MLServer and Xinference across GPU, NPU, and CPU, including environment variables, probes, resource settings, and supported model formats.
- Updated publishing workflow: renamed sections, clarified runtime selection, added a step for setting environment variables.
- Consolidated inline examples; example filename changed to your-runtime.yaml.
- Expanded environment-variable guidance, including MODEL_FAMILY.

Add MLServer runtime Signed-off-by: Yuan Fang <yuanfang@alauda.io>

coderabbitai · 2025-08-15T04:41:37Z

Walkthrough

Reorganized the custom inference runtime docs: removed inline Xinference-only examples, added a tabbed "Configuration Examples for Runtimes" (MLServer and Xinference for GPU/NPU/CPU), generalized publishing steps and headings, updated example filenames, and expanded environment-variable guidance (including MODEL_FAMILY).

Changes

Cohort / File(s)	Summary
Docs: Custom Inference Runtime refactor `docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx`	Replaced inline Xinference YAML with a tabbed “Configuration Examples for Runtimes” (MLServer; Xinference: GPU/NPU/CPU). Generalized publishing flow and step names, added Step for environment variables, renamed example file to `your-runtime.yaml`, documented MODEL_FAMILY, and adjusted wording to “custom runtime.”

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Docs as Documentation
  participant ControlPlane as Inference Service API
  participant RuntimePod as Custom Runtime Pod

  User->>Docs: Read publish flow & runtime templates
  Docs-->>User: Provide MLServer/Xinference YAML examples
  User->>ControlPlane: Submit InferenceService using chosen runtime YAML
  ControlPlane->>RuntimePod: Create Pod with env vars, probes, resources
  RuntimePod-->>ControlPlane: Startup probe/health checks pass
  ControlPlane-->>User: Inference service becomes available

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Add how to use custom inference runtime #5: Parallel refactor expanding the same custom_inference_runtime page to multi-runtime templates and updated publishing workflow.
ASM-21393 refine inference docs #4: Related documentation updates to inference service publishing and runtime/template guidance.

Suggested reviewers

typhoonzero
zhaomingkun1030

Poem

I thump my paw—tabs spread with care,
MLServer, Xinference, kernels bare.
GPUs hum, NPUs softly sing,
Env vars set — the startup spring.
Hop, deploy; the runtime’s there. 🥕

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5702454 and 274bb08.

📒 Files selected for processing (1)

docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-mlserver-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (5)

docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (5)
45-56: Good reorg of Step 1; minor copy edit.

The step reads clearly and properly points to the new examples. One tiny nit: “examples below” is correct now, thanks for updating the tip.

Minor copy edit suggestion in the step title later (“Set Environment Variables(if needed)” → “Set Environment Variables (if needed)”) — see separate comment.

71-75: Add a space before the parenthesis in the step title.

Subject-verb agreement and spacing: “Set Environment Variables(if needed)” → “Set Environment Variables (if needed)”.

Apply:
-4.  **Set Environment Variables(if needed)**:
+4.  **Set Environment Variables (if needed)**:
227-233: Consider adding a securityContext for consistency.

The MLServer example hardens the container. For parity, add the same securityContext to Xinference examples unless your runtime requires elevated privileges.

279-279: Avoid installing packages at container startup.

pip install transformers~=4.49.0 in the startup script introduces network dependency, longer cold starts, and potential reproducibility issues. Bake this into the image instead.

356-366: CPU template looks good; labels minor nit.

For CPU, consider omitting cpaas.io/cuda-version entirely rather than setting it to an empty string.

Apply:
-      cpaas.io/cuda-version: ""

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bce989 and 5702454.

📒 Files selected for processing (1)

docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (1 hunks)

🔇 Additional comments (4)

docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (4)

52-53: LGTM: kubectl apply example is correct.

Example file naming now matches the generalized runtime guidance.

121-129: LGTM: startupProbe path is consistent with V2 ready endpoint.

The probe should work with MLServer’s V2 endpoints on port 8080.

137-142: Verify supported model formats vs. MLServer runtime.

Declaring both mlflow and transformers under one MLServer runtime may not reflect what the provided image actually supports out of the box. SKLearn via MLServer is fine; MLflow is typically supported; “transformers” may require a custom implementation.

If the image doesn’t include a Transformers runtime, consider removing it or adding the appropriate implementation.

447-456: Environment variable guidance is clear.

The MODEL_FAMILY explanation and example are helpful and necessary for Xinference.