Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support#18749
Vulkan: Optimize Matmul parameters for AMD GPUs with Coopmat support#18749
Conversation
|
There are some small but consistent improvements on RDNA3 as well with this.
|
|
Thank you for testing it! I was hoping for more, I guess it is too different from 8060S. There's probably some more tuning for RDNA3/4 dGPUs that can be done, but I don't have the hardware for that. |
|
You should try this for mul_mat_id as well! |
|
9070xt
|
I tried, but didn't find a good parameter set yet. I'll keep trying. @characharm Is that Windows or Linux? |
|
@0cc4m Windows. I rebooted between tests for accuracy. The numbers are stable. |
|
Thank you for testing, I wish the drivers would behave more similarly. I'll disable the change on Windows. |
|
Can you also test a dense model, though? Those should be more affected than MoE. |
|
|
@characharm Please check if #18763 restores your performance. |
|
No, the performance is the same. I compared it with the CI build, so the problem is not in my build. |
|
I think these tile sizes are also used for matmul id in some cases, so that could explain the effect on gpt-oss. |
|
@characharm Sorry, I missed disabling the large tile size. Try again, please.
No, I didn't enable the large tile for mul_mat_id, so unless the check somewhere is wrong, it should not be used at all. |
|
Not sure whether this helps, but Vulkan performance on RX 7600 has been going down. I don't use such a big model on this GPU, but interesting that it has degraded.
build: 7d77f07 (7108)
build: c9ced49 (7710) |
|
Can you test a model that actually fits into your GPU? That would likely give more usable data. |
Luckily, I had copied the results from an old build. Yeah, even for smaller models, there has been degradation. Not sure whether kernel upgrade played a role. Kernel: 6.6 (don't remember the exact patch)
build: dd5e8ca (6916) Kernel: 6.17.9-200
build: c9ced49 (7710) |
|
Can you add more information about your setup? What OS, what driver, what does your device info string say, etc? |
…gml-org#18749) * vulkan: Enable and optimize large matmul parameter combination for AMD * limit tuning to AMD GPUs with coopmat support * use tx_m values instead of _l
OS: Fedora 42 |
|
My guess would be that your driver is too old, for good Mesa coopmat performance you usually want 25.3 or higher. But I didn't want to cause an issue for older versions. |
25.1.9 is the latest. No updates are available for Fedora 42. |
|
25.1.9, despite being the newest for Fedora 42, is not good enough. If upgrade to Fedora 43 is not an option for you at the moment, you could try a newer mesa build from the che/mesa COPR repo. For example a merge request providing a significant PP speed improvement was merged into Mesa repo in August and is available since release 25.2.x or 25.3.x (not sure here). |
…gml-org#18749) * vulkan: Enable and optimize large matmul parameter combination for AMD * limit tuning to AMD GPUs with coopmat support * use tx_m values instead of _l
…gml-org#18749) * vulkan: Enable and optimize large matmul parameter combination for AMD * limit tuning to AMD GPUs with coopmat support * use tx_m values instead of _l
I tuned this on AMD Radeon 8060S, but a brief test also showed improvements on AMD RX 9060 XT.
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 0 = AMD Radeon RX 9060 XT (RADV GFX1200) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat