Skip to content

vulkan: disable mmvq on Intel Windows driver#20672

Merged
0cc4m merged 2 commits intomasterfrom
0cc4m/vulkan-intel-windows-mmvq-tune2
Mar 17, 2026
Merged

vulkan: disable mmvq on Intel Windows driver#20672
0cc4m merged 2 commits intomasterfrom
0cc4m/vulkan-intel-windows-mmvq-tune2

Conversation

@0cc4m
Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m commented Mar 17, 2026

Fixes #17628

@savvadesogle This disables MMVQ entirely on Intel Windows, that should remove the need to use the env var. Please try it.

@0cc4m 0cc4m requested a review from a team as a code owner March 17, 2026 09:24
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 17, 2026
@savvadesogle
Copy link
Copy Markdown

Hello Ruben!

Thank you very much, and I'm sorry for bothering you with my Intel GPU...

изображение

Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated
@0cc4m
Copy link
Copy Markdown
Contributor Author

0cc4m commented Mar 17, 2026

Thank you very much, and I'm sorry for bothering you with my Intel GPU...

No worries, it's good to get some performance data from Intel Windows, and to see it working decently well.

@0cc4m 0cc4m merged commit 892e3c3 into master Mar 17, 2026
49 checks passed
@gustrd
Copy link
Copy Markdown
Contributor

gustrd commented Mar 17, 2026

Some additional info: disabling MMVQ also helps with stability, avoiding driver errors at Windows (ARC-140V).

@0cc4m 0cc4m deleted the 0cc4m/vulkan-intel-windows-mmvq-tune2 branch March 18, 2026 05:10
@zero-one-soft
Copy link
Copy Markdown

I noticed something with this post, My intel b580 does not get nearly as high a score as the A770
Screenshot 2026-03-18 073100
Screenshot 2026-03-18 072952

Am I doing something wrong or is it just the lack of more processing power ?

@savvadesogle
Copy link
Copy Markdown

Hi @zero-one-soft
Try to select the device

set GGML_VK_VISIBLE_DEVICES=0

And which driver version do you have?

@zero-one-soft
Copy link
Copy Markdown

Hi @savvadesogle
I am using intel driver 32.0.101.8531
I did the set GGML_VK_VISIBLE_DEVICES=0 and I now get the following
image
I am also experiencing random crashes where my pc would just freeze for a couple of seconds then the screen goes dark and the llama.server just stops , or the bench app
on smaller models it does not happen to often but with the gpt-oss-20b it seems to happen more often

also noticed that if i use the sycl binaries it is now even worse to run the benchmark on gpt-oss-20b. I get like 4 tokens/s

smallers models seems to be still ok
image

@savvadesogle
Copy link
Copy Markdown

savvadesogle commented Mar 18, 2026

@zero-one-soft
Try opening an issue and testing earlier builds.

ps
The speed on the smaller model is just great!

@savvadesogle
Copy link
Copy Markdown

savvadesogle commented Mar 18, 2026

@zero-one-soft
It's very strange that you still see two devices after using the GGML_VK_VISIBLE_DEVICES variable. Double-check that you're entering the command correctly.
изображение

изображение

You can try another command in the new CMD
изображение

llama-bench -m model -ngl 100 -fa 0,1 -sm none -mg 0

@0cc4m
Copy link
Copy Markdown
Contributor Author

0cc4m commented Mar 18, 2026

The iGPU will not be used automatically, unless no dGPU is available. It is not necessary to hide it.

@savvadesogle
Copy link
Copy Markdown

Thanks Ruben, I didn't know that. Sorry for the misinformation. @0cc4m

@zero-one-soft
Copy link
Copy Markdown

@savvadesogle
I did the set command in powershell that is why it did not do anything
in CMD it works but does nothing for the speed issue
image

I have had the situation where my Igpu and my CPU would also be processing something so offloading to the GPU is important, I tend to use the -mg 0 in all of my commands just to make sure

I will try your advice in terms of using other builds , or I will just use the smaller models , I was just curious why you managed to get such great performance on your A770 and my B580 struggles with that model :)

@savvadesogle
Copy link
Copy Markdown

I was just curious why you managed to get such great performance on your A770 and my B580 struggles with that model :)

@zero-one-soft
Unfortunately, I don't know... We have a cool guy here and his name is Ruben! He's the master of Vulkan 💪.

But I managed to get a little more (#17628 (comment)).
изображение

Come to our OpenArс community discord (https://github.com/SearchSavior/OpenArc), we have guys there who have B580 and B50/60 + if you need to, you can directly ask Jianyu (maintainer SYCL backend) about the SYCL backend

изображение

OpenVINO, llama.cpp, vLLM etc on Intel GPUs.
https://discord.gg/Pq9ZNyd9

@gustrd
Copy link
Copy Markdown
Contributor

gustrd commented Mar 18, 2026

I am also experiencing random crashes where my pc would just freeze for a couple of seconds then the screen goes dark and the llama.server just stops , or the bench app
on smaller models it does not happen to often but with the gpt-oss-20b it seems to happen more often

I think this kind of crash is called TDR (Timeout Detection and Recovery). It also happens with my ARC-140 at Windows. Happens less with MMVQ disabled, but I had some even so. Seems like it stops only if I disable COOPMAT.

For sure it's an issue with the Intel Driver. Not sure if there are other ways to avoid.

@savvadesogle
Copy link
Copy Markdown

I think this kind of crash is called TDR (Timeout Detection and Recovery)

@gustrd
изображение
Does disabling energy saving help (+ASPM)?

@zero-one-soft
Copy link
Copy Markdown

It seems to me that my crashes happen with the bigger models more than the smaller
I did install the newest intel driver today , the version is now 32.0.101.8626 with the vulkan driver being on 1.4.340
so far i am not getting crashes so holding thumbs . hope it stopped now

that link state power management is off on my machine , is that correct ?

@gustrd
Copy link
Copy Markdown
Contributor

gustrd commented Mar 18, 2026

@savvadesogle , thank you!

I really believe you pinpointed the issue. After changing the power management I spent a whole day of intensive inference without a single TDR.

It should be published somewhere.

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* vulkan: disable mmvq on Intel Windows driver

* improve comment
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* vulkan: disable mmvq on Intel Windows driver

* improve comment
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* vulkan: disable mmvq on Intel Windows driver

* improve comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Vulkan's performance degradation(TG) on A770 from b7194 and FA problem

5 participants