llama-quant : fix the verification of attention layers for encoder-decoder models by DamonFool · Pull Request #16023 · ggml-org/llama.cpp

DamonFool · 2025-09-16T10:54:37Z

llama.cpp fails to quantize T5 models with unequal encoder-decoder blocks.
The failure was caused by the failing of the verification of attention layers.

GGML_ASSERT((qs.n_attention_wv == n_attn_layer - pruned_attention_w) && "n_attention_wv is unexpected");

The original verification has assumed that the encoder and decoder have the same number of blocks.
So it fails with unequal encoder-decoder models.

Testing: flan-t5-small, t5-small and an unequal encoder-decoder t5 model

…coder models Signed-off-by: Jie Fu <jiefu@tencent.com>

DamonFool · 2025-09-17T01:35:02Z

Hi @CISC , there is another encoder-decoder pr here #16002 .

The simple example is a very good startup to help people get the llama.cpp integrated into their apps.
It would be helpful to also support encoder-decoder models in that example.
Hope you are fine with it
Thanks.

DamonFool · 2025-09-17T07:59:38Z

Thanks @CISC .

…coder models (#16023) Signed-off-by: Jie Fu <jiefu@tencent.com>

…coder models (ggml-org#16023) Signed-off-by: Jie Fu <jiefu@tencent.com>

llama-quant : fix the verification of attention layers for encoder-de…

22ccb6a

…coder models Signed-off-by: Jie Fu <jiefu@tencent.com>

CISC reviewed Sep 16, 2025

View reviewed changes

Comment thread src/llama-quant.cpp

CISC approved these changes Sep 17, 2025

View reviewed changes

CISC merged commit 745cbcf into ggml-org:master Sep 17, 2025
47 of 48 checks passed

DamonFool deleted the llama-quant-t5 branch September 17, 2025 07:59

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

llama-quant : fix the verification of attention layers for encoder-de…

8e10bbb

…coder models (#16023) Signed-off-by: Jie Fu <jiefu@tencent.com>

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama-quant : fix the verification of attention layers for encoder-de…

355eefe

…coder models (ggml-org#16023) Signed-off-by: Jie Fu <jiefu@tencent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-quant : fix the verification of attention layers for encoder-decoder models#16023

llama-quant : fix the verification of attention layers for encoder-decoder models#16023
CISC merged 1 commit intoggml-org:masterfrom
DamonFool:llama-quant-t5

DamonFool commented Sep 16, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DamonFool commented Sep 16, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Uh oh!

DamonFool commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants