Skip to content

Qwen2/3 MoE + GGUF model support (restored)#42854

Merged
SunMarc merged 1 commit intohuggingface:mainfrom
a4lg:gguf-support-v5-qwen23-moe
Dec 17, 2025
Merged

Qwen2/3 MoE + GGUF model support (restored)#42854
SunMarc merged 1 commit intohuggingface:mainfrom
a4lg:gguf-support-v5-qwen23-moe

Conversation

@a4lg
Copy link
Copy Markdown
Contributor

@a4lg a4lg commented Dec 13, 2025

What does this PR do?

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so that support for all MoE + GGUF models ... (okay, actually) Qwen2/3 MoE models in Transformers v4 is now broken in v5.

This commit now adopts new tensor handling, along with extended TensorProcessor with capabilities to handle not only tensor data but also tensor mappings.
In this process, Qwen2/3 MoE-specific hack is moved to Qwen2MoeTensorProcessor, making the main function to look more model-agnostic.

This is fully tested on Qwen2 MoE Qwen1.5-MoE-A2.7B (with 14.3B total parameters) and partially on Qwen3 MoE Qwen3-30B-A3B-Thinking-2507 (due to memory constraints).

Future Possibilities

Portions of this change is written to be model-agnostic and easily replaceable.
If we decide to add more GGUF support to MoE models, we'd better to have either a mix-in or an utility. In this case, a part of Qwen2MoeTensorProcessor can be copied to that with small modification.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Cyrilvallez @SunMarc @MekkCyber

This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so
that support for all MoE + GGUF models ... (okay, only) Qwen2/3 MoE
models in Transformers v4 is now broken.

This commit now adopts new tensor handling, along with extended
`TensorProcessor` with capabilities to handle not only tensor data
but also tensor mappings.  In this process, Qwen2/3 MoE-specific hack
is moved to `Qwen2MoeTensorProcessor`, making the main function to look
more model-agnostic.

This is fully tested on Qwen2 MoE `Qwen1.5-MoE-A2.7B` and partially on
Qwen3 MoE `Qwen3-30B-A3B-Thinking-2507` (due to memory constraints).

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
@a4lg a4lg force-pushed the gguf-support-v5-qwen23-moe branch from 688c1bf to d82151b Compare December 13, 2025 00:05
Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this ! The CI is a bit red due to the v5 and the different changes related to the modeling and the tokenizer. We will make sure to fix those before the release !

@SunMarc SunMarc enabled auto-merge (squash) December 17, 2025 13:26
@SunMarc SunMarc merged commit c67ec2c into huggingface:main Dec 17, 2025
25 checks passed
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
This commit restores Qwen2/3 MoE + GGUF support in Transformers v5.

In this version, handling of MoE tensors are significantly changed so
that support for all MoE + GGUF models ... (okay, only) Qwen2/3 MoE
models in Transformers v4 is now broken.

This commit now adopts new tensor handling, along with extended
`TensorProcessor` with capabilities to handle not only tensor data
but also tensor mappings.  In this process, Qwen2/3 MoE-specific hack
is moved to `Qwen2MoeTensorProcessor`, making the main function to look
more model-agnostic.

This is fully tested on Qwen2 MoE `Qwen1.5-MoE-A2.7B` and partially on
Qwen3 MoE `Qwen3-30B-A3B-Thinking-2507` (due to memory constraints).

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
JoursBleu added a commit to JoursBleu/transformers that referenced this pull request Mar 8, 2026
- Add minimax-m2 GGUF config mapping (MoE fields: expert_count, expert_used_count)
- Add MiniMaxM2TensorProcessor with preprocess_name() and
  perform_fallback_tensor_mapping() for w1/w2/w3 -> gate/down/up expert
  tensor splitting (follows new TensorProcessor API from huggingface#42854)
- Add GGUFQwen2Converter for minimax_m2 tokenizer
- Add model_type (minimax_m2 <-> minimax-m2) and architecture mappings
- Add MiniMax-M2 to supported models in gguf.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants