Is your feature request related to a problem? Please describe.
Attempting to export Qwen/Qwen3.6-35B-A3B (and other Qwen3.5/3.6 MoE variants) using QEfficient.cloud.export fails at the config loading stage because qwen3_5_moe is not a recognized architecture in the transformers version that QEfficient currently pins:
KeyError: 'qwen3_5_moe'
ValueError: The checkpoint you are trying to load has model type `qwen3_5_moe`
but Transformers does not recognize this architecture.
The qwen3_5_moe architecture was introduced in transformers>=5.3.0, but QEfficient's current dependency tree pins transformers<5.x, making it impossible to export or compile Qwen3.5/3.6 MoE models for AIC100 without breaking QEfficient's own internal imports.
Describe the solution you'd like
- Add
qwen3_5_moe / Qwen3_5MoeForCausalLM to QEfficient's supported model registry (QEfficient/utils/model_registery.py)
- Add the necessary AIC100-specific PyTorch transforms for the Qwen3.5 MoE attention and MoE routing layers in
QEfficient/transformers/models/
- Bump the
transformers dependency pin to >=5.3.0 (while fixing the internal import breakages caused by symbols removed in transformers>=5.4.0, such as AwqBackendPackingMethod, AWQLinearVersion, HybridCache, and Qwen2RMSNorm from qwen2_5_vl)
- Add an ONNX export config for
text-generation-with-past task for this architecture
Describe alternatives you've considered
- Manually patching the installed QEfficient venv files with
sed to stub out missing imports — this is unsustainable as each fix exposes another broken import down the chain
- Using
optimum-cli export onnx directly — same blocker, as optimum also has a transformers>=5.x compatibility issue
- Downgrading to a supported Qwen variant (e.g.,
Qwen2-57B-A14B) — undesirable as Qwen3.5/3.6 MoE offers significantly better performance per parameter
Additional context
Is your feature request related to a problem? Please describe.
Attempting to export
Qwen/Qwen3.6-35B-A3B(and other Qwen3.5/3.6 MoE variants) usingQEfficient.cloud.exportfails at the config loading stage becauseqwen3_5_moeis not a recognized architecture in thetransformersversion that QEfficient currently pins:The
qwen3_5_moearchitecture was introduced intransformers>=5.3.0, but QEfficient's current dependency tree pinstransformers<5.x, making it impossible to export or compile Qwen3.5/3.6 MoE models for AIC100 without breaking QEfficient's own internal imports.Describe the solution you'd like
qwen3_5_moe/Qwen3_5MoeForCausalLMto QEfficient's supported model registry (QEfficient/utils/model_registery.py)QEfficient/transformers/models/transformersdependency pin to>=5.3.0(while fixing the internal import breakages caused by symbols removed intransformers>=5.4.0, such asAwqBackendPackingMethod,AWQLinearVersion,HybridCache, andQwen2RMSNormfromqwen2_5_vl)text-generation-with-pasttask for this architectureDescribe alternatives you've considered
sedto stub out missing imports — this is unsustainable as each fix exposes another broken import down the chainoptimum-cli export onnxdirectly — same blocker, asoptimumalso has atransformers>=5.xcompatibility issueQwen2-57B-A14B) — undesirable as Qwen3.5/3.6 MoE offers significantly better performance per parameterAdditional context
Qwen/Qwen3.6-35B-A3B(35B total params, 3.6B active,qwen3_5_moearchitecture)ai-ABPI-130, Ubuntu 22.04)5.3.0(works for config loading;>=5.4.0has a regression — see [huggingface/transformers#45310]([BUG] transformers>=5.4.0, Qwen3.5 Moe from_pretrained error huggingface/transformers#45310))35B-A3B,30B-A3B,122B-A14B) is increasingly popular for on-device and edge inference, making AIC100 support particularly valuable