vulkan: use graphics queue on AMD by 0cc4m · Pull Request #20551 · ggml-org/llama.cpp

0cc4m · 2026-03-14T16:16:48Z

I'm not sure why, but the graphics queue is slightly faster in tg on AMD than the compute queue, and this also fixes the partial offload issue I fixed in #19976, so the second queue no longer has to be enabled by default. I got the idea from @zedbytes reporting that tg goes up when running with RADV_DEBUG=nocompute.

AMD RX 9070 XT

model	size	params	ngl	fa	test	t/s (before)	t/s (after)	diff
llama 8B Q4_0	4.33 GiB	8.03 B	20	1	pp512	2288.04 ± 2.42	2225.76 ± 2.31	-2.7%
llama 8B Q4_0	4.33 GiB	8.03 B	20	1	tg128	24.33 ± 0.04	24.58 ± 0.05	+1.0%
llama 8B Q4_0	4.33 GiB	8.03 B	99	1	pp512	4886.26 ± 105.08	4901.77 ± 102.66	+0.3%
llama 8B Q4_0	4.33 GiB	8.03 B	99	1	tg128	115.78 ± 0.02	121.39 ± 0.02	+4.8%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	20	1	pp512	736.21 ± 9.37	735.19 ± 7.51	-0.1%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	20	1	tg128	39.53 ± 0.10	40.36 ± 0.21	+2.1%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	99	1	pp512	3383.58 ± 29.26	3425.38 ± 28.68	+1.2%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	99	1	tg128	200.45 ± 1.89	220.41 ± 1.46	+10.0%

AMD Radeon Pro VII

model	size	params	ngl	fa	test	t/s (before)	t/s (after)	diff
llama 8B Q4_0	4.33 GiB	8.03 B	20	1	pp512	636.62 ± 9.07	615.62 ± 0.79	-3.3%
llama 8B Q4_0	4.33 GiB	8.03 B	20	1	tg128	38.35 ± 0.09	38.20 ± 0.01	-0.4%
llama 8B Q4_0	4.33 GiB	8.03 B	99	1	pp512	830.30 ± 1.51	834.44 ± 1.05	+0.5%
llama 8B Q4_0	4.33 GiB	8.03 B	99	1	tg128	102.45 ± 0.64	100.28 ± 0.24	-2.1%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	20	1	pp512	289.76 ± 3.59	287.75 ± 3.10	-0.7%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	20	1	tg128	34.57 ± 0.32	34.05 ± 1.20	-1.5%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	99	1	pp512	749.65 ± 5.42	762.52 ± 5.89	+1.7%
qwen3moe 30B.A3B Q2_K - Medium	10.48 GiB	30.53 B	99	1	tg128	94.70 ± 0.46	97.55 ± 0.20	+3.0%

zedbytes · 2026-03-15T03:43:10Z

nice one , i am guessing that after this change RADV_DEBUG=nocompute is set by default for AMD GPUs ?

0cc4m · 2026-03-15T07:18:01Z

It won't be needed after this change. With nocompute, it disabled the compute queue, so the backend used the graphics queue instead. That is what made it faster (but don't ask me why). This PR changes it to use the graphics queue even if a compute-only queue is available.

winstonma · 2026-03-15T14:18:50Z

I just installed this version and I am using llama-server with immersive translate addon to translate the web article for me. I set the addon to send an API request to the llama-server every 10 seconds just to make some translation.

With this feature now my KDE desktop lag when the LLM is processing. I also tried playing youtube video on Firefox and it drop 30% of the frame while playing 1080p video. Does this happen on your side?

Sorry I forgot to provide more info. My laptop is running AMD Ryzen AI 360 with 880M iGPU.

Neutralized · 2026-03-15T15:59:02Z

I downloaded latest release and my tg/s dropped about 40% compared to release b8333 ! I use 2 Radeon cards 9070XT and 6700 XT, i would suggest removing changes from this update until further tested

0cc4m · 2026-03-15T16:08:43Z

I downloaded latest release and my tg/s dropped about 40% compared to release b8333 ! I use 2 Radeon cards 9070XT and 6700 XT, i would suggest removing changes from this update until further tested

Linux or Windows?

Neutralized · 2026-03-15T16:52:36Z

Windows

Neutralized · 2026-03-15T16:52:45Z

#20597

* vulkan: use graphics queue on AMD for slightly better performance * disable async transfer queue on AMD

0cc4m added 2 commits March 14, 2026 16:49

vulkan: use graphics queue on AMD for slightly better performance

9a2173b

disable async transfer queue on AMD

4e7eb27

0cc4m requested a review from jeffbolznv March 14, 2026 16:16

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 14, 2026

jeffbolznv approved these changes Mar 14, 2026

View reviewed changes

0cc4m merged commit 1a3d8ed into master Mar 15, 2026
81 of 82 checks passed

0cc4m deleted the 0cc4m/vulkan-amd-queue branch March 15, 2026 07:18

0cc4m mentioned this pull request Mar 15, 2026

vulkan: allow graphics queue only through env var #20599

Merged

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026

vulkan: use graphics queue on AMD (ggml-org#20551)

9785e9c

* vulkan: use graphics queue on AMD for slightly better performance * disable async transfer queue on AMD

Bobylein mentioned this pull request Apr 3, 2026

Eval bug: [Vulkan] Regression starting from b8459 #20890

Open

sagar-kale mentioned this pull request Apr 15, 2026

Vulkan/AMD performance: vendored llama.cpp (b7437, Dec 2025) missing Wave32 FA (#19625) and graphics queue (#20551) — ~56% t/s gap vs standalone llama.cpp ollama/ollama#15601

Open

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

vulkan: use graphics queue on AMD (ggml-org#20551)

cf71352

* vulkan: use graphics queue on AMD for slightly better performance * disable async transfer queue on AMD

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

vulkan: use graphics queue on AMD (ggml-org#20551)

635bf92

* vulkan: use graphics queue on AMD for slightly better performance * disable async transfer queue on AMD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: use graphics queue on AMD#20551

vulkan: use graphics queue on AMD#20551
0cc4m merged 2 commits intomasterfrom
0cc4m/vulkan-amd-queue

0cc4m commented Mar 14, 2026

Uh oh!

zedbytes commented Mar 15, 2026

Uh oh!

0cc4m commented Mar 15, 2026

Uh oh!

Uh oh!

winstonma commented Mar 15, 2026 •

edited

Loading

Uh oh!

Neutralized commented Mar 15, 2026

Uh oh!

0cc4m commented Mar 15, 2026

Uh oh!

Neutralized commented Mar 15, 2026

Uh oh!

Neutralized commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

0cc4m commented Mar 14, 2026

Uh oh!

zedbytes commented Mar 15, 2026

Uh oh!

0cc4m commented Mar 15, 2026

Uh oh!

Uh oh!

winstonma commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Neutralized commented Mar 15, 2026

Uh oh!

0cc4m commented Mar 15, 2026

Uh oh!

Neutralized commented Mar 15, 2026

Uh oh!

Neutralized commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

winstonma commented Mar 15, 2026 •

edited

Loading