Add mistral3 architecture support for Ministral 3B#440
Open
insecure-erasure wants to merge 1 commit intocity96:mainfrom
Open
Add mistral3 architecture support for Ministral 3B#440insecure-erasure wants to merge 1 commit intocity96:mainfrom
insecure-erasure wants to merge 1 commit intocity96:mainfrom
Conversation
c3e0046 to
f0b733f
Compare
- Add mistral3 to TXT_ARCH_LIST - Extend gguf_clip_loader branch to handle mistral3 alongside llama - Fix hardcoded temb shape (131072, 5120) to also accept (131072, 3072) for Ministral 3B hidden_size 3072 - Apply llama_permute and tekken tokenizer reconstruction to mistral3 Fixes city96#439
f0b733f to
0c43fb7
Compare
|
This is a duplicate of #436 If it’s about code factoring, it would be better to discuss it with the author of that PR no? since for me, the other PR already works fine. |
Author
|
Yes, I agree. I didn't notice the PR. Only checked the issue and started working on the PR right away. I don't have strong opinions about coding styles in a project that doesn't belong to me, so seems fair to let the original author of the PR know about this duplicate. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #439
Adds support for
mistral3architecture (official K-quant GGUFs frommistralai/Ministral-3-3B-Instruct-2512-GGUF). Without this, loading them raisesValueError: Unexpected text model architecture type: 'mistral3'.Also fixes the hardcoded shape
(131072, 5120)ingguf_tekken_tokenizer_loaderto also accept(131072, 3072)forMinistral 3B. This one affects any GGUF of this model regardless of origin.
Tested with ERNIE-Image Turbo (NVFP4) + official Q5_K_M on 8 GB VRAM.