[Feature Request] Add support for Qwen3.5/3.6 MoE (qwen3_5_moe) architecture

**Is your feature request related to a problem? Please describe.**

Attempting to export `Qwen/Qwen3.6-35B-A3B` (and other Qwen3.5/3.6 MoE variants) using `QEfficient.cloud.export` fails at the config loading stage because `qwen3_5_moe` is not a recognized architecture in the `transformers` version that QEfficient currently pins:

```
KeyError: 'qwen3_5_moe'
ValueError: The checkpoint you are trying to load has model type `qwen3_5_moe`
but Transformers does not recognize this architecture.
```

The `qwen3_5_moe` architecture was introduced in `transformers>=5.3.0`, but QEfficient's current dependency tree pins `transformers<5.x`, making it impossible to export or compile Qwen3.5/3.6 MoE models for AIC100 without breaking QEfficient's own internal imports.

**Describe the solution you'd like**

1. Add `qwen3_5_moe` / `Qwen3_5MoeForCausalLM` to QEfficient's supported model registry (`QEfficient/utils/model_registery.py`)
2. Add the necessary AIC100-specific PyTorch transforms for the Qwen3.5 MoE attention and MoE routing layers in `QEfficient/transformers/models/`
3. Bump the `transformers` dependency pin to `>=5.3.0` (while fixing the internal import breakages caused by symbols removed in `transformers>=5.4.0`, such as `AwqBackendPackingMethod`, `AWQLinearVersion`, `HybridCache`, and `Qwen2RMSNorm` from `qwen2_5_vl`)
4. Add an ONNX export config for `text-generation-with-past` task for this architecture

**Describe alternatives you've considered**

- Manually patching the installed QEfficient venv files with `sed` to stub out missing imports — this is unsustainable as each fix exposes another broken import down the chain
- Using `optimum-cli export onnx` directly — same blocker, as `optimum` also has a `transformers>=5.x` compatibility issue
- Downgrading to a supported Qwen variant (e.g., `Qwen2-57B-A14B`) — undesirable as Qwen3.5/3.6 MoE offers significantly better performance per parameter

**Additional context**

- **Target model:** `Qwen/Qwen3.6-35B-A3B` (35B total params, 3.6B active, `qwen3_5_moe` architecture)
- **Hardware:** Qualcomm AIC100 (`ai-ABPI-130`, Ubuntu 22.04)
- **QEfficient version:** installed from PyPI (latest), Python 3.10
- **transformers version that supports the arch:** `5.3.0` (works for config loading; `>=5.4.0` has a regression — see [[huggingface/transformers#45310](https://github.com/huggingface/transformers/issues/45310)](https://github.com/huggingface/transformers/issues/45310))
- The Qwen3.5 MoE family (`35B-A3B`, `30B-A3B`, `122B-A14B`) is increasingly popular for on-device and edge inference, making AIC100 support particularly valuable

***


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add support for Qwen3.5/3.6 MoE (qwen3_5_moe) architecture #929

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add support for Qwen3.5/3.6 MoE (qwen3_5_moe) architecture #929

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions