vulkan: Fix data races in coopmat1 mul_mat(_id) by jeffbolznv · Pull Request #20084 · ggml-org/llama.cpp

jeffbolznv · 2026-03-03T16:50:30Z

Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work.

I added shared memory data race detection for coopmat1 (KhronosGroup/Vulkan-ValidationLayers#11780) and this fixes the issues it found. No performance regressions on my system.

Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work.

0cc4m · 2026-03-06T07:41:15Z

I do see small performance regressions from this change on AMD. Considering even without barriers it seems to work, since the accesses are limited within subgroups, we don't need full barriers here. A perfect place for subgroupMemoryBarrierShared()? The overall differences are too small for me to be sure, but it does seem to help.

jeffbolznv · 2026-03-06T15:29:21Z

subgroupMemoryBarrierShared wouldn't be sufficient, but controlBarrier(gl_ScopeSubgroup, gl_ScopeSubgroup, gl_StorageSemanticsShared, gl_SemanticsAcquireRelease) (https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_memory_scope_semantics.txt#L443) ought to be.

I'm surprised this shows up as a real perf hit since this is just in the epilogue of the shader.

0cc4m · 2026-03-06T16:16:27Z

Why isn't it enough? It's about shared memory that's only used by the subgroup.

jeffbolznv · 2026-03-06T16:33:01Z

subgroupMemoryBarrierShared is only an OpMemoryBarrier. Synchronization requires a release on the storing thread, an acquire on the loading thread, and an "edge" between those that can be provided either by a control barrier or an atomic store and an atomic load that sees the stored value (this is in the "Synchronizes-With" section of the memory model appendix). When using a control barrier, the release and acquire can be performed by the same control barrier (similarly, the release and acquire can optionally be folded into the corresponding atomics). A subgroup control barrier ought to be relatively cheap compared to a workgroup control barrier.

0cc4m · 2026-03-07T06:15:10Z

Yeah, then let's use the minimal controlBarrier. My tests aren't fully conclusive, but it seems to make a small difference.

jeffbolznv · 2026-03-07T18:05:30Z

OK, changed to subgroup control barriers.

CISC · 2026-03-09T13:34:51Z

This seems to be causing MUL_MAT failures:
https://github.com/ggml-org/llama.cpp/actions/runs/22820245245/job/66191745551#step:9:32598

0cc4m · 2026-03-09T13:44:24Z

Damn, I missed that. But I don't see these failures locally, with llvmpipe. That's odd.

jeffbolznv · 2026-03-09T13:54:17Z

Do you have a recent version of llvmpipe that supports coopmat1?

I think we should just revert this for now. I wonder if llvmpipe has some bug in its handling of this unusual barrier instruction.

0cc4m · 2026-03-09T13:56:48Z

Ah yeah, on Arch I can trigger it with a newer llvmpipe version. But I don't think llvmpipe coopmat makes much sense anyways, so we can also just disable coopmat in the CI.

* vulkan: Fix data races in coopmat1 mul_mat(_id) Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work. * switch to subgroup control barriers

vulkan: Fix data races in coopmat1 mul_mat(_id)

eb65386

Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work.

jeffbolznv requested a review from 0cc4m as a code owner March 3, 2026 16:50

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 3, 2026

switch to subgroup control barriers

2fb720d

0cc4m approved these changes Mar 8, 2026

View reviewed changes

0cc4m merged commit cd18a50 into ggml-org:master Mar 8, 2026
70 of 73 checks passed

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Mar 9, 2026

vulkan: partial revert ggml-org#20084

b0d4ea2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: Fix data races in coopmat1 mul_mat(_id)#20084

vulkan: Fix data races in coopmat1 mul_mat(_id)#20084
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:coopmat1_races

jeffbolznv commented Mar 3, 2026

Uh oh!

0cc4m commented Mar 6, 2026

Uh oh!

jeffbolznv commented Mar 6, 2026

Uh oh!

0cc4m commented Mar 6, 2026

Uh oh!

jeffbolznv commented Mar 6, 2026

Uh oh!

0cc4m commented Mar 7, 2026

Uh oh!

jeffbolznv commented Mar 7, 2026

Uh oh!

Uh oh!

CISC commented Mar 9, 2026

Uh oh!

0cc4m commented Mar 9, 2026

Uh oh!

jeffbolznv commented Mar 9, 2026

Uh oh!

0cc4m commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeffbolznv commented Mar 3, 2026

Uh oh!

0cc4m commented Mar 6, 2026

Uh oh!

jeffbolznv commented Mar 6, 2026

Uh oh!

0cc4m commented Mar 6, 2026

Uh oh!

jeffbolznv commented Mar 6, 2026

Uh oh!

0cc4m commented Mar 7, 2026

Uh oh!

jeffbolznv commented Mar 7, 2026

Uh oh!

Uh oh!

CISC commented Mar 9, 2026

Uh oh!

0cc4m commented Mar 9, 2026

Uh oh!

jeffbolznv commented Mar 9, 2026

Uh oh!

0cc4m commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants