Skip to content

[chart] Update InferenceService template for PD mode support#430

Merged
slin1237 merged 2 commits into
mainfrom
helm-n/1
Dec 13, 2025
Merged

[chart] Update InferenceService template for PD mode support#430
slin1237 merged 2 commits into
mainfrom
helm-n/1

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

  • Remove runtime spec (auto-selected by operator)
  • Add PD mode auto-detection (model names ending with "-pd")
  • PD mode: requires engine, decoder, and router
  • Non-PD mode: requires engine, optional router (no decoder)
  • Support both flat (minReplicas) and nested (engine.minReplicas) config

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

- Remove runtime spec (auto-selected by operator)
- Add PD mode auto-detection (model names ending with "-pd")
- PD mode: requires engine, decoder, and router
- Non-PD mode: requires engine, optional router (no decoder)
- Support both flat (minReplicas) and nested (engine.minReplicas) config
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the helm Helm chart changes label Dec 12, 2025
- Remove 10 -pd model entries from values.yaml (they're not separate models)
- Remove 10 -pd entries from model registry in _helpers.tpl
- Add pdMode field: when true, InferenceService includes decoder and router
- Update InferenceService to use explicit pdMode flag (not model name suffix)
- Update README with pdMode documentation and supported PD models list
- Update model count from 176 to 165

PD mode supported models: kimi-k2-instruct, deepseek-rdma, llama-3-1-70b-instruct,
llama-3-2-1b-instruct, llama-3-2-3b-instruct, llama-3-3-70b-instruct,
llama-4-maverick-17b-128e-instruct-fp8, llama-4-scout-17b-16e-instruct,
mistral-7b-instruct, mixtral-8x7b-instruct
@github-actions github-actions Bot added the documentation Documentation changes label Dec 13, 2025
@slin1237 slin1237 merged commit 9466e5e into main Dec 13, 2025
27 checks passed
@slin1237 slin1237 deleted the helm-n/1 branch December 13, 2025 18:21
truddy0 pushed a commit that referenced this pull request Dec 16, 2025
* [chart] Update InferenceService template for PD mode support

- Remove runtime spec (auto-selected by operator)
- Add PD mode auto-detection (model names ending with "-pd")
- PD mode: requires engine, decoder, and router
- Non-PD mode: requires engine, optional router (no decoder)
- Support both flat (minReplicas) and nested (engine.minReplicas) config

* [chart] Replace -pd model entries with pdMode configuration option

- Remove 10 -pd model entries from values.yaml (they're not separate models)
- Remove 10 -pd entries from model registry in _helpers.tpl
- Add pdMode field: when true, InferenceService includes decoder and router
- Update InferenceService to use explicit pdMode flag (not model name suffix)
- Update README with pdMode documentation and supported PD models list
- Update model count from 176 to 165

PD mode supported models: kimi-k2-instruct, deepseek-rdma, llama-3-1-70b-instruct,
llama-3-2-1b-instruct, llama-3-2-3b-instruct, llama-3-3-70b-instruct,
llama-4-maverick-17b-128e-instruct-fp8, llama-4-scout-17b-16e-instruct,
mistral-7b-instruct, mixtral-8x7b-instruct
slin1237 added a commit that referenced this pull request Dec 22, 2025
* [chart] Update InferenceService template for PD mode support

- Remove runtime spec (auto-selected by operator)
- Add PD mode auto-detection (model names ending with "-pd")
- PD mode: requires engine, decoder, and router
- Non-PD mode: requires engine, optional router (no decoder)
- Support both flat (minReplicas) and nested (engine.minReplicas) config

* [chart] Replace -pd model entries with pdMode configuration option

- Remove 10 -pd model entries from values.yaml (they're not separate models)
- Remove 10 -pd entries from model registry in _helpers.tpl
- Add pdMode field: when true, InferenceService includes decoder and router
- Update InferenceService to use explicit pdMode flag (not model name suffix)
- Update README with pdMode documentation and supported PD models list
- Update model count from 176 to 165

PD mode supported models: kimi-k2-instruct, deepseek-rdma, llama-3-1-70b-instruct,
llama-3-2-1b-instruct, llama-3-2-3b-instruct, llama-3-3-70b-instruct,
llama-4-maverick-17b-128e-instruct-fp8, llama-4-scout-17b-16e-instruct,
mistral-7b-instruct, mixtral-8x7b-instruct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation changes helm Helm chart changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant