Skip to content

[Runtime] Add Qwen 3.6 35B A3B 1M context runtime#609

Open
YouNeedCryDear wants to merge 3 commits into
mainfrom
feat/qwen-3-6-1m-context
Open

[Runtime] Add Qwen 3.6 35B A3B 1M context runtime#609
YouNeedCryDear wants to merge 3 commits into
mainfrom
feat/qwen-3-6-1m-context

Conversation

@YouNeedCryDear
Copy link
Copy Markdown
Collaborator

What this PR does

Adds OME configuration for serving Qwen 3.6 35B A3B with a 1M context window:

  • Adds the qwen3-6-35b-a3b ClusterBaseModel pointing to hf://Qwen/Qwen3.6-35B-A3B.
  • Adds the vllm-qwen-3-6-35b-a3b ClusterServingRuntime with vLLM settings for Qwen3_5Moe, 2-way tensor parallelism, long-context rope overrides, multimodal limits, Qwen reasoning parsing, and tool-call parsing.
  • Registers the runtime in config/runtimes/kustomization.yaml.
  • Adds a sample InferenceService for the Qwen namespace.

Why we need it

This enables OME to select and deploy a dedicated runtime for Qwen 3.6 35B A3B, including the long-context configuration required for 1M-token serving.

Fixes: N/A

How to test

Not run locally; this PR was created from an already-pushed commit.

Checklist

  • Tests added/updated (if applicable)
  • Docs updated (if applicable)
  • make test passes locally

@github-actions github-actions Bot added runtime Runtime configuration changes models Model configuration changes config Configuration changes labels May 12, 2026
@YouNeedCryDear YouNeedCryDear marked this pull request as ready for review May 12, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration changes models Model configuration changes runtime Runtime configuration changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant