Skip to content

vulkan: fix min subgroup 16 condition for mmid subgroup optimization#15565

Merged
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-mmid-subgroup-fix
Aug 25, 2025
Merged

vulkan: fix min subgroup 16 condition for mmid subgroup optimization#15565
0cc4m merged 1 commit intomasterfrom
0cc4m/vulkan-mmid-subgroup-fix

Conversation

@0cc4m
Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m commented Aug 25, 2025

This fixes a bug in the selection of the subgroup mmid optimization introduced in #15524.

@MrLavender please give it a try, it should be working for you with this fix.

@0cc4m 0cc4m requested a review from jeffbolznv August 25, 2025 13:43
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 25, 2025
@MrLavender
Copy link
Copy Markdown

Yes that works, thank you! :)

This is a fantastic win because with flash attention the ROCm backend is now only better than Vulkan at small pp sizes where the difference doesn't really matter, and ROCm performance degrades very quickly as pp size increases.

llama-bench -fa 0,1 -p 512,1024,2048,4096 -m gpt-oss-20b-mxfp4.gguf 

Vulkan

model size params backend ngl fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp512 976.97 ± 8.17
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp1024 969.50 ± 1.57
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp2048 910.61 ± 1.93
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp4096 849.46 ± 2.31
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 tg128 125.82 ± 0.05
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp512 980.48 ± 4.63
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp1024 975.67 ± 7.09
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp2048 961.13 ± 3.07
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp4096 929.77 ± 3.94
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 tg128 123.70 ± 0.10

ROCm

model size params backend ngl fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp512 1712.99 ± 14.24
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp1024 1652.80 ± 4.07
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp2048 1537.22 ± 4.35
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp4096 1376.64 ± 2.84
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 tg128 95.96 ± 0.19
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp512 1244.60 ± 3.71
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp1024 1078.54 ± 3.17
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp2048 884.51 ± 2.27
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp4096 640.66 ± 0.59
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 tg128 93.37 ± 0.25

@0cc4m 0cc4m merged commit 4d917cd into master Aug 25, 2025
48 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-mmid-subgroup-fix branch August 31, 2025 09:16
rasbid pushed a commit to rasbid/llama.cpp that referenced this pull request Oct 11, 2025
- Make subgroup_min_size_16 condition less restrictive for GCN (subgroup_max_size >= 8)
- Add GCN-specific pipeline configurations with 64 subgroup sizes
- Enable more aggressive subgroup usage for GCN architecture
- Target: orders of magnitude performance improvement like PR ggml-org#15565
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants