model : refactor QKV into common build_qkv and create_tensor_qkv helpers by JoursBleu · Pull Request #21245 · ggml-org/llama.cpp

JoursBleu · 2026-04-01T03:27:52Z

Overview

Currently llama.cpp supports 112 model files in src/models/.

We modified the 85 applicable model files. Our changes abstract the duplicated
Q/K/V tensors' loading and graph-building code into two reusable helpers,
following the create_tensor_gate_up_exps pattern (#19139).

• create_tensor_qkv (llama-model.cpp): tries fused wqkv/bqkv first (TENSOR_NOT_REQUIRED | TENSOR_SKIP_IF_VIRTUAL), falls back to separate wq/wk/wv. Supports adding biases.

• build_qkv (llama-graph.h/cpp): returns {Qcur, Kcur, Vcur} as 3D tensors. Fused case: single fused qkv matmul + ggml_view_3d split. Separate case: 3 separate matmuls + ggml_reshape_3d.

Test: test-llama-archs — all OK, 0 FAIL. Zero diff on llama-arch.cpp.

The remaining 27 models are not modified for the following reasons:

Reason	Count	Models
Non-attention (SSM/linear/RNN)	10	mamba, mamba-base, rwkv6, rwkv6-base, rwkv6qwen2, rwkv7, rwkv7-base, arwkv7, delta-net-base, wavtokenizer-dec
MLA attention	4	deepseek2, minicpm3, minimax-m2, plm
Graph directly uses layer.wqkv (non-standard layout)	3	cogvlm, openelm, plamo2
Q+gate joint projection	4	qwen35, qwen35moe, qwen3next, plamo3
n_embd_head_k != n_embd_head_v	2	step35-iswa, mimo2-iswa
No fused wqkv_enc	1	t5-enc
Other special architectures	3	olmo2, olmoe, kimi-linear

Additional information

Basing on the discussion in #20628 (@am17an, @ngxson). The plan is:

This PR: This PR does not modify any logic, it simply extracts the redundant code into
the two functions above, and adds handling for the fused qkv case.
Future PR: add --fuse-qkv to convert_hf_to_gguf.py.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES - used as a translation tool for translating the PR description

JoursBleu · 2026-04-01T09:32:25Z

hi @CISC,

Removed the has_bias flag.
Bias tensors are now always created with TENSOR_NOT_REQUIRED.
Fixed the incomplete conversions and typos mentioned above.

JoursBleu · 2026-04-01T13:21:15Z

@CISC Done：

Remove unnecessary comments/restore comments that should be retained.
JAIS2 restores manually created bias tensors.

JoursBleu · 2026-04-02T09:47:32Z

@CISC Done:

Remove the Vcur reshape in afmoe.cpp

CISC

OP is inaccurate, there's nothing special about these:

nemotron-h: just add build_qkv in llm_build_nemotron_h::build_attention_layer
granite-hybrid: just add build_qkv in lm_build_granite_hybrid::build_attention_layer
olmo/mpt/dbrx: use build_qkv, add clamping
gemma3n-iswa: just do build_qkv
t5-dec/t5-enc: do build_qkv on normal self-attention
bert: use build_qkv
lfm2: do build_qkv in build_attn_block

CISC · 2026-04-04T20:14:49Z

I meant move the clamping to build_qkv.

JoursBleu · 2026-04-09T01:22:03Z

@CISC Done:

Extended build_qkv to bert, mpt, dbrx, olmo, lfm2, nemotron-h, granite-hybrid, gemma3n-iswa, t5-dec, t5-enc;
Clamping handled internally in build_qkv using hparams.f_clamp_kqv.

JoursBleu · 2026-04-13T02:17:03Z

@ngxson @am17an @ggerganov This PR is ready. Could you take a look when you have time?

am17an

Good job!

…e-hybrid/gemma3n-iswa/t5-dec and fix wqkv_s

…ers (ggml-org#21245) * model : refactor QKV into common build_qkv and create_tensor_qkv helpers * model : extend build_qkv to bert/mpt/dbrx/olmo/lfm2/nemotron-h/granite-hybrid/gemma3n-iswa/t5-dec and fix wqkv_s

github-actions Bot added the model Model specific label Apr 1, 2026

JoursBleu force-pushed the refactor/build-qkv-helper branch 3 times, most recently from da129d5 to 26e72e0 Compare April 1, 2026 04:18

JoursBleu marked this pull request as ready for review April 1, 2026 06:01

JoursBleu requested a review from CISC as a code owner April 1, 2026 06:01

CISC reviewed Apr 1, 2026

View reviewed changes

JoursBleu force-pushed the refactor/build-qkv-helper branch from 26e72e0 to bcc69fd Compare April 1, 2026 09:05

CISC reviewed Apr 1, 2026

View reviewed changes

Comment thread src/llama-model.cpp

Comment thread src/llama-model.cpp Outdated

Comment thread src/llama-model.cpp

ngxson mentioned this pull request Apr 1, 2026

[Mirror] model : refactor QKV into common build_qkv and create_tensor_qkv helpers ngxson/llama.cpp#94

Open

JoursBleu force-pushed the refactor/build-qkv-helper branch from bcc69fd to 42eae08 Compare April 1, 2026 13:19

CISC reviewed Apr 1, 2026

View reviewed changes

Comment thread src/models/afmoe.cpp

JoursBleu force-pushed the refactor/build-qkv-helper branch from 42eae08 to 75d759d Compare April 2, 2026 01:56

loci-dev mentioned this pull request Apr 2, 2026

UPSTREAM PR #21245: model : refactor QKV into common build_qkv and create_tensor_qkv helpers auroralabs-loci/llama.cpp#1328

Open

CISC reviewed Apr 2, 2026

View reviewed changes

Comment thread src/llama-graph.cpp Outdated

JoursBleu marked this pull request as draft April 2, 2026 12:56

JoursBleu force-pushed the refactor/build-qkv-helper branch from 09d8066 to 04506d4 Compare April 6, 2026 01:27

CISC reviewed Apr 6, 2026

View reviewed changes

Comment thread src/llama-graph.cpp Outdated

JoursBleu force-pushed the refactor/build-qkv-helper branch 2 times, most recently from 050b5a9 to 623ed29 Compare April 9, 2026 01:17

JoursBleu marked this pull request as ready for review April 9, 2026 01:17

CISC reviewed Apr 10, 2026

View reviewed changes

Comment thread src/models/gemma3n-iswa.cpp Outdated

JoursBleu force-pushed the refactor/build-qkv-helper branch from 623ed29 to ccd1f60 Compare April 10, 2026 13:39

CISC reviewed Apr 10, 2026

View reviewed changes

Comment thread src/llama-graph.h Outdated

JoursBleu force-pushed the refactor/build-qkv-helper branch from ccd1f60 to 67a8492 Compare April 11, 2026 05:29

CISC approved these changes Apr 11, 2026

View reviewed changes

CISC requested review from ggerganov and ngxson April 11, 2026 09:36

CISC reviewed Apr 11, 2026

View reviewed changes

Comment thread src/models/mimo2-iswa.cpp Outdated

Comment thread src/models/openai-moe-iswa.cpp Outdated

JoursBleu force-pushed the refactor/build-qkv-helper branch from 67a8492 to d8bf733 Compare April 12, 2026 07:18

am17an approved these changes Apr 14, 2026

View reviewed changes

CISC mentioned this pull request Apr 16, 2026

model : support NVFP4 tensors for Gemma4 #21971

Merged

ggerganov approved these changes Apr 16, 2026

View reviewed changes

ggerganov reviewed Apr 16, 2026

View reviewed changes

Comment thread src/llama-graph.cpp Outdated

JoursBleu added 2 commits April 16, 2026 08:36

model : refactor QKV into common build_qkv and create_tensor_qkv helpers

157afaf

model : extend build_qkv to bert/mpt/dbrx/olmo/lfm2/nemotron-h/granit…

51dbd8c

…e-hybrid/gemma3n-iswa/t5-dec and fix wqkv_s

JoursBleu force-pushed the refactor/build-qkv-helper branch from d8bf733 to 51dbd8c Compare April 16, 2026 09:05

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 16, 2026

CISC approved these changes Apr 16, 2026

View reviewed changes

CISC merged commit 9db77a0 into ggml-org:master Apr 16, 2026
48 of 50 checks passed

ngxson mentioned this pull request Apr 16, 2026

model: move load_hparams and load_tensors to per-model definition #22004

Open

6 tasks

Conversation

JoursBleu commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JoursBleu commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JoursBleu commented Apr 1, 2026

Uh oh!

Uh oh!

JoursBleu commented Apr 2, 2026

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CISC commented Apr 4, 2026

Uh oh!

Uh oh!

JoursBleu commented Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JoursBleu commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JoursBleu commented Apr 1, 2026 •

edited

Loading

JoursBleu commented Apr 13, 2026 •

edited

Loading