Skip to content

model: add llama 4 scaling for mistral-large (deepseek arch)#17744

Merged
ngxson merged 1 commit intoggml-org:masterfrom
ngxson:xsn/mistral_large_scaling
Dec 7, 2025
Merged

model: add llama 4 scaling for mistral-large (deepseek arch)#17744
ngxson merged 1 commit intoggml-org:masterfrom
ngxson:xsn/mistral_large_scaling

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented Dec 3, 2025

Cont #17730

This should allow Mistral Large to go past 16K context length (hopefully, someone with enough VRAM can verify if this works or not)

@github-actions github-actions Bot added the model Model specific label Dec 3, 2025
@DocShotgun
Copy link
Copy Markdown
Contributor

I cherry-picked this commit onto the current latest master 08f9d3c and loaded a 4.94bpw quant of Mistral Large 3 675B Instruct with 32k sequence length, and it does produce coherent text both on a short prompt of 1.5k tokens and a longer prompt of around 19k tokens.

@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented Dec 7, 2025

@DocShotgun thanks for testing, I guess it's good to merge then

@ngxson ngxson merged commit 4d37262 into ggml-org:master Dec 7, 2025
76 of 80 checks passed
0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants