Skip to content

vulkan: Fix data races in coopmat1 mul_mat(_id)#20084

Merged
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:coopmat1_races
Mar 8, 2026
Merged

vulkan: Fix data races in coopmat1 mul_mat(_id)#20084
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:coopmat1_races

Conversation

@jeffbolznv
Copy link
Copy Markdown
Contributor

Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work.

I added shared memory data race detection for coopmat1 (KhronosGroup/Vulkan-ValidationLayers#11780) and this fixes the issues it found. No performance regressions on my system.

Add barriers between coopmat store and regular loads. We sort of got away with
this because it was the same subgroup accessing the values, but it's still a
race and may not work.
@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner March 3, 2026 16:50
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 3, 2026
@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Mar 6, 2026

I do see small performance regressions from this change on AMD. Considering even without barriers it seems to work, since the accesses are limited within subgroups, we don't need full barriers here. A perfect place for subgroupMemoryBarrierShared()? The overall differences are too small for me to be sure, but it does seem to help.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

subgroupMemoryBarrierShared wouldn't be sufficient, but controlBarrier(gl_ScopeSubgroup, gl_ScopeSubgroup, gl_StorageSemanticsShared, gl_SemanticsAcquireRelease) (https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_memory_scope_semantics.txt#L443) ought to be.

I'm surprised this shows up as a real perf hit since this is just in the epilogue of the shader.

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Mar 6, 2026

Why isn't it enough? It's about shared memory that's only used by the subgroup.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

subgroupMemoryBarrierShared is only an OpMemoryBarrier. Synchronization requires a release on the storing thread, an acquire on the loading thread, and an "edge" between those that can be provided either by a control barrier or an atomic store and an atomic load that sees the stored value (this is in the "Synchronizes-With" section of the memory model appendix). When using a control barrier, the release and acquire can be performed by the same control barrier (similarly, the release and acquire can optionally be folded into the corresponding atomics). A subgroup control barrier ought to be relatively cheap compared to a workgroup control barrier.

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Mar 7, 2026

Yeah, then let's use the minimal controlBarrier. My tests aren't fully conclusive, but it seems to make a small difference.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

OK, changed to subgroup control barriers.

@0cc4m 0cc4m merged commit cd18a50 into ggml-org:master Mar 8, 2026
70 of 73 checks passed
@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 9, 2026

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Mar 9, 2026

Damn, I missed that. But I don't see these failures locally, with llvmpipe. That's odd.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

Do you have a recent version of llvmpipe that supports coopmat1?

I think we should just revert this for now. I wonder if llvmpipe has some bug in its handling of this unusual barrier instruction.

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Mar 9, 2026

Ah yeah, on Arch I can trigger it with a newer llvmpipe version. But I don't think llvmpipe coopmat makes much sense anyways, so we can also just disable coopmat in the CI.

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Mar 9, 2026
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 10, 2026
* vulkan: Fix data races in coopmat1 mul_mat(_id)

Add barriers between coopmat store and regular loads. We sort of got away with
this because it was the same subgroup accessing the values, but it's still a
race and may not work.

* switch to subgroup control barriers
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* vulkan: Fix data races in coopmat1 mul_mat(_id)

Add barriers between coopmat store and regular loads. We sort of got away with
this because it was the same subgroup accessing the values, but it's still a
race and may not work.

* switch to subgroup control barriers
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* vulkan: Fix data races in coopmat1 mul_mat(_id)

Add barriers between coopmat store and regular loads. We sort of got away with
this because it was the same subgroup accessing the values, but it's still a
race and may not work.

* switch to subgroup control barriers
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* vulkan: Fix data races in coopmat1 mul_mat(_id)

Add barriers between coopmat store and regular loads. We sort of got away with
this because it was the same subgroup accessing the values, but it's still a
race and may not work.

* switch to subgroup control barriers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants