vulkan: fix min subgroup 16 condition for mmid subgroup optimization by 0cc4m · Pull Request #15565 · ggml-org/llama.cpp

0cc4m · 2025-08-25T13:43:17Z

This fixes a bug in the selection of the subgroup mmid optimization introduced in #15524.

@MrLavender please give it a try, it should be working for you with this fix.

MrLavender · 2025-08-25T14:52:27Z

Yes that works, thank you! :)

This is a fantastic win because with flash attention the ROCm backend is now only better than Vulkan at small pp sizes where the difference doesn't really matter, and ROCm performance degrades very quickly as pp size increases.

llama-bench -fa 0,1 -p 512,1024,2048,4096 -m gpt-oss-20b-mxfp4.gguf

Vulkan

model	size	params	backend	ngl	fa	test	t/s
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	0	pp512	976.97 ± 8.17
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	0	pp1024	969.50 ± 1.57
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	0	pp2048	910.61 ± 1.93
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	0	pp4096	849.46 ± 2.31
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	0	tg128	125.82 ± 0.05
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	1	pp512	980.48 ± 4.63
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	1	pp1024	975.67 ± 7.09
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	1	pp2048	961.13 ± 3.07
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	1	pp4096	929.77 ± 3.94
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	Vulkan	99	1	tg128	123.70 ± 0.10

ROCm

model	size	params	backend	ngl	fa	test	t/s
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	0	pp512	1712.99 ± 14.24
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	0	pp1024	1652.80 ± 4.07
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	0	pp2048	1537.22 ± 4.35
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	0	pp4096	1376.64 ± 2.84
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	0	tg128	95.96 ± 0.19
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	1	pp512	1244.60 ± 3.71
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	1	pp1024	1078.54 ± 3.17
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	1	pp2048	884.51 ± 2.27
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	1	pp4096	640.66 ± 0.59
gpt-oss 20B MXFP4 MoE	11.27 GiB	20.91 B	ROCm	99	1	tg128	93.37 ± 0.25

…gml-org#15565)

- Make subgroup_min_size_16 condition less restrictive for GCN (subgroup_max_size >= 8) - Add GCN-specific pipeline configurations with 64 subgroup sizes - Enable more aggressive subgroup usage for GCN architecture - Target: orders of magnitude performance improvement like PR ggml-org#15565

…(#15565)

…gml-org#15565)

vulkan: fix min subgroup 16 condition for mmid subgroup optimization

2467234

0cc4m requested a review from jeffbolznv August 25, 2025 13:43

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 25, 2025

jeffbolznv approved these changes Aug 25, 2025

View reviewed changes

0cc4m merged commit 4d917cd into master Aug 25, 2025
48 checks passed

MrLavender mentioned this pull request Aug 25, 2025

GPT-OSS 20B and Qwen3 30B A3B prefill so slowly #15163

Closed

p421098 mentioned this pull request Aug 25, 2025

Eval bug: gptoss-120b multi-turn chat: KV cache growth & unexpected <|start|>assistant... prefix #15573

Closed

Minh141120 pushed a commit to janhq/llama.cpp that referenced this pull request Aug 26, 2025

vulkan: fix min subgroup 16 condition for mmid subgroup optimization (g…

9e92649

…gml-org#15565)

Minh141120 pushed a commit to janhq/llama.cpp that referenced this pull request Aug 27, 2025

vulkan: fix min subgroup 16 condition for mmid subgroup optimization (g…

363b0fc

…gml-org#15565)

0cc4m deleted the 0cc4m/vulkan-mmid-subgroup-fix branch August 31, 2025 09:16

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

vulkan: fix min subgroup 16 condition for mmid subgroup optimization …

19f6365

…(#15565)

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

vulkan: fix min subgroup 16 condition for mmid subgroup optimization (g…

e6814a2

…gml-org#15565)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: fix min subgroup 16 condition for mmid subgroup optimization#15565

vulkan: fix min subgroup 16 condition for mmid subgroup optimization#15565
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-mmid-subgroup-fix

0cc4m commented Aug 25, 2025

Uh oh!

MrLavender commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0cc4m commented Aug 25, 2025

Uh oh!

MrLavender commented Aug 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants