Qwen2/3 MoE + GGUF model support (restored) by a4lg · Pull Request #42854 · huggingface/transformers

a4lg · 2025-12-13T00:04:40Z

What does this PR do?

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, actually) Qwen2/3 MoE models in Transformers v4 is now broken in v5.

This commit now adopts new tensor handling, along with extended TensorProcessor with capabilities to handle not only tensor data but also tensor mappings.
In this process, Qwen2/3 MoE-specific hack is moved to Qwen2MoeTensorProcessor, making the main function to look more model-agnostic.

This is fully tested on Qwen2 MoE Qwen1.5-MoE-A2.7B (with 14.3B total parameters) and partially on Qwen3 MoE Qwen3-30B-A3B-Thinking-2507 (due to memory constraints).

Future Possibilities

Portions of this change is written to be model-agnostic and easily replaceable.
If we decide to add more GGUF support to MoE models, we'd better to have either a mix-in or an utility. In this case, a part of Qwen2MoeTensorProcessor can be copied to that with small modification.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Cyrilvallez @SunMarc @MekkCyber

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5. In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, only) Qwen2/3 MoE models in Transformers v4 is now broken. This commit now adopts new tensor handling, along with extended `TensorProcessor` with capabilities to handle not only tensor data but also tensor mappings. In this process, Qwen2/3 MoE-specific hack is moved to `Qwen2MoeTensorProcessor`, making the main function to look more model-agnostic. This is fully tested on Qwen2 MoE `Qwen1.5-MoE-A2.7B` and partially on Qwen3 MoE `Qwen3-30B-A3B-Thinking-2507` (due to memory constraints). Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

SunMarc

Thanks for fixing this ! The CI is a bit red due to the v5 and the different changes related to the modeling and the tokenizer. We will make sure to fix those before the release !

HuggingFaceDocBuilderDev · 2025-12-17T13:36:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5. In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, only) Qwen2/3 MoE models in Transformers v4 is now broken. This commit now adopts new tensor handling, along with extended `TensorProcessor` with capabilities to handle not only tensor data but also tensor mappings. In this process, Qwen2/3 MoE-specific hack is moved to `Qwen2MoeTensorProcessor`, making the main function to look more model-agnostic. This is fully tested on Qwen2 MoE `Qwen1.5-MoE-A2.7B` and partially on Qwen3 MoE `Qwen3-30B-A3B-Thinking-2507` (due to memory constraints). Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

- Add minimax-m2 GGUF config mapping (MoE fields: expert_count, expert_used_count) - Add MiniMaxM2TensorProcessor with preprocess_name() and perform_fallback_tensor_mapping() for w1/w2/w3 -> gate/down/up expert tensor splitting (follows new TensorProcessor API from huggingface#42854) - Add GGUFQwen2Converter for minimax_m2 tokenizer - Add model_type (minimax_m2 <-> minimax-m2) and architecture mappings - Add MiniMax-M2 to supported models in gguf.md

a4lg force-pushed the gguf-support-v5-qwen23-moe branch from 688c1bf to d82151b Compare December 13, 2025 00:05

SunMarc approved these changes Dec 17, 2025

View reviewed changes

SunMarc enabled auto-merge (squash) December 17, 2025 13:26

SunMarc merged commit c67ec2c into huggingface:main Dec 17, 2025
25 checks passed

JoursBleu mentioned this pull request Mar 8, 2026

Add GGUF support for MiniMax-M2.1 model #44526

Merged

5 tasks

lucaspirola mentioned this pull request Apr 27, 2026

[GGUF] Add support for Qwen3.5 MoE (qwen35moe arch) #45668

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2/3 MoE + GGUF model support (restored)#42854

Qwen2/3 MoE + GGUF model support (restored)#42854
SunMarc merged 1 commit intohuggingface:mainfrom
a4lg:gguf-support-v5-qwen23-moe

a4lg commented Dec 13, 2025 •

edited

Loading

Uh oh!

SunMarc left a comment •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

a4lg commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Future Possibilities

Before submitting

Who can review?

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

a4lg commented Dec 13, 2025 •

edited

Loading

SunMarc left a comment •

edited

Loading