vulkan: disable mmvq on Intel Windows driver by 0cc4m · Pull Request #20672 · ggml-org/llama.cpp

0cc4m · 2026-03-17T09:23:59Z

@savvadesogle This disables MMVQ entirely on Intel Windows, that should remove the need to use the env var. Please try it.

savvadesogle · 2026-03-17T10:52:39Z

Hello Ruben!

Thank you very much, and I'm sorry for bothering you with my Intel GPU...

0cc4m · 2026-03-17T20:51:34Z

Thank you very much, and I'm sorry for bothering you with my Intel GPU...

No worries, it's good to get some performance data from Intel Windows, and to see it working decently well.

gustrd · 2026-03-17T22:08:38Z

Some additional info: disabling MMVQ also helps with stability, avoiding driver errors at Windows (ARC-140V).

zero-one-soft · 2026-03-18T05:34:38Z

I noticed something with this post, My intel b580 does not get nearly as high a score as the A770

Am I doing something wrong or is it just the lack of more processing power ?

savvadesogle · 2026-03-18T06:49:26Z

Hi @zero-one-soft
Try to select the device

set GGML_VK_VISIBLE_DEVICES=0

And which driver version do you have?

zero-one-soft · 2026-03-18T08:52:07Z

Hi @savvadesogle
I am using intel driver 32.0.101.8531
I did the set GGML_VK_VISIBLE_DEVICES=0 and I now get the following

I am also experiencing random crashes where my pc would just freeze for a couple of seconds then the screen goes dark and the llama.server just stops , or the bench app
on smaller models it does not happen to often but with the gpt-oss-20b it seems to happen more often

also noticed that if i use the sycl binaries it is now even worse to run the benchmark on gpt-oss-20b. I get like 4 tokens/s

smallers models seems to be still ok

savvadesogle · 2026-03-18T09:02:50Z

@zero-one-soft
Try opening an issue and testing earlier builds.

ps
The speed on the smaller model is just great!

savvadesogle · 2026-03-18T09:23:15Z

@zero-one-soft
It's very strange that you still see two devices after using the GGML_VK_VISIBLE_DEVICES variable. Double-check that you're entering the command correctly.

You can try another command in the new CMD

llama-bench -m model -ngl 100 -fa 0,1 -sm none -mg 0

0cc4m · 2026-03-18T09:25:55Z

The iGPU will not be used automatically, unless no dGPU is available. It is not necessary to hide it.

savvadesogle · 2026-03-18T09:35:49Z

Thanks Ruben, I didn't know that. Sorry for the misinformation. @0cc4m

zero-one-soft · 2026-03-18T09:38:55Z

@savvadesogle
I did the set command in powershell that is why it did not do anything
in CMD it works but does nothing for the speed issue

I have had the situation where my Igpu and my CPU would also be processing something so offloading to the GPU is important, I tend to use the -mg 0 in all of my commands just to make sure

I will try your advice in terms of using other builds , or I will just use the smaller models , I was just curious why you managed to get such great performance on your A770 and my B580 struggles with that model :)

savvadesogle · 2026-03-18T10:03:13Z

I was just curious why you managed to get such great performance on your A770 and my B580 struggles with that model :)

@zero-one-soft
Unfortunately, I don't know... We have a cool guy here and his name is Ruben! He's the master of Vulkan 💪.

But I managed to get a little more (#17628 (comment)).

Come to our OpenArс community discord (https://github.com/SearchSavior/OpenArc), we have guys there who have B580 and B50/60 + if you need to, you can directly ask Jianyu (maintainer SYCL backend) about the SYCL backend

OpenVINO, llama.cpp, vLLM etc on Intel GPUs.
https://discord.gg/Pq9ZNyd9

gustrd · 2026-03-18T11:50:20Z

I am also experiencing random crashes where my pc would just freeze for a couple of seconds then the screen goes dark and the llama.server just stops , or the bench app
on smaller models it does not happen to often but with the gpt-oss-20b it seems to happen more often

I think this kind of crash is called TDR (Timeout Detection and Recovery). It also happens with my ARC-140 at Windows. Happens less with MMVQ disabled, but I had some even so. Seems like it stops only if I disable COOPMAT.

For sure it's an issue with the Intel Driver. Not sure if there are other ways to avoid.

savvadesogle · 2026-03-18T11:54:12Z

I think this kind of crash is called TDR (Timeout Detection and Recovery)

@gustrd

Does disabling energy saving help (+ASPM)?

zero-one-soft · 2026-03-18T16:34:53Z

It seems to me that my crashes happen with the bigger models more than the smaller
I did install the newest intel driver today , the version is now 32.0.101.8626 with the vulkan driver being on 1.4.340
so far i am not getting crashes so holding thumbs . hope it stopped now

that link state power management is off on my machine , is that correct ?

gustrd · 2026-03-18T23:58:02Z

@savvadesogle , thank you!

I really believe you pinpointed the issue. After changing the power management I spent a whole day of intensive inference without a single TDR.

It should be published somewhere.

* vulkan: disable mmvq on Intel Windows driver * improve comment

vulkan: disable mmvq on Intel Windows driver

ee5ae6f

0cc4m requested a review from a team as a code owner March 17, 2026 09:24

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 17, 2026

jeffbolznv approved these changes Mar 17, 2026

View reviewed changes

Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated

improve comment

b633d8f

0cc4m merged commit 892e3c3 into master Mar 17, 2026
49 checks passed

0cc4m deleted the 0cc4m/vulkan-intel-windows-mmvq-tune2 branch March 18, 2026 05:10

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026

vulkan: disable mmvq on Intel Windows driver (ggml-org#20672)

075edee

* vulkan: disable mmvq on Intel Windows driver * improve comment

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

vulkan: disable mmvq on Intel Windows driver (ggml-org#20672)

cf21b89

* vulkan: disable mmvq on Intel Windows driver * improve comment

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

vulkan: disable mmvq on Intel Windows driver (ggml-org#20672)

81ea508

* vulkan: disable mmvq on Intel Windows driver * improve comment

Conversation

0cc4m commented Mar 17, 2026

Uh oh!

savvadesogle commented Mar 17, 2026

Uh oh!

Uh oh!

0cc4m commented Mar 17, 2026

Uh oh!

Uh oh!

gustrd commented Mar 17, 2026

Uh oh!

zero-one-soft commented Mar 18, 2026

Uh oh!

savvadesogle commented Mar 18, 2026

Uh oh!

zero-one-soft commented Mar 18, 2026

Uh oh!

savvadesogle commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

savvadesogle commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0cc4m commented Mar 18, 2026

Uh oh!

savvadesogle commented Mar 18, 2026

Uh oh!

zero-one-soft commented Mar 18, 2026

Uh oh!

savvadesogle commented Mar 18, 2026

Uh oh!

gustrd commented Mar 18, 2026

Uh oh!

savvadesogle commented Mar 18, 2026

Uh oh!

zero-one-soft commented Mar 18, 2026

Uh oh!

gustrd commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

savvadesogle commented Mar 18, 2026 •

edited

Loading

savvadesogle commented Mar 18, 2026 •

edited

Loading