Skip to content

Add mistral3 architecture support for Ministral 3B#440

Open
insecure-erasure wants to merge 1 commit intocity96:mainfrom
insecure-erasure:mistral3-support
Open

Add mistral3 architecture support for Ministral 3B#440
insecure-erasure wants to merge 1 commit intocity96:mainfrom
insecure-erasure:mistral3-support

Conversation

@insecure-erasure
Copy link
Copy Markdown

Fixes #439

Adds support for mistral3 architecture (official K-quant GGUFs frommistralai/Ministral-3-3B-Instruct-2512-GGUF). Without this, loading them raises ValueError: Unexpected text model architecture type: 'mistral3'.

Also fixes the hardcoded shape (131072, 5120) in gguf_tekken_tokenizer_loader to also accept (131072, 3072) for
Ministral 3B. This one affects any GGUF of this model regardless of origin.

Tested with ERNIE-Image Turbo (NVFP4) + official Q5_K_M on 8 GB VRAM.

@insecure-erasure insecure-erasure force-pushed the mistral3-support branch 6 times, most recently from c3e0046 to f0b733f Compare April 18, 2026 07:28
- Add mistral3 to TXT_ARCH_LIST
- Extend gguf_clip_loader branch to handle mistral3 alongside llama
- Fix hardcoded temb shape (131072, 5120) to also accept (131072, 3072)
  for Ministral 3B hidden_size 3072
- Apply llama_permute and tekken tokenizer reconstruction to mistral3

Fixes city96#439
@muljanis45
Copy link
Copy Markdown

This is a duplicate of #436

If it’s about code factoring, it would be better to discuss it with the author of that PR no? since for me, the other PR already works fine.

@insecure-erasure
Copy link
Copy Markdown
Author

Yes, I agree. I didn't notice the PR. Only checked the issue and started working on the PR right away. I don't have strong opinions about coding styles in a project that doesn't belong to me, so seems fair to let the original author of the PR know about this duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support mistral3 architecture

2 participants