Skip to content

gpuav: Support coopmat in shared memory data race pass.#11780

Merged
spencer-lunarg merged 4 commits intoKhronosGroup:mainfrom
jeffbolznv:shmem_coopmat
Mar 3, 2026
Merged

gpuav: Support coopmat in shared memory data race pass.#11780
spencer-lunarg merged 4 commits intoKhronosGroup:mainfrom
jeffbolznv:shmem_coopmat

Conversation

@jeffbolznv
Copy link
Copy Markdown
Contributor

No description provided.

@jeffbolznv jeffbolznv requested a review from a team as a code owner March 2, 2026 21:49
@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 666500.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 666513.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22645 running.

Comment thread layers/gpuav/instrumentation/shared_memory_data_race.cpp Outdated
Comment thread layers/gpuav/instrumentation/shared_memory_data_race.cpp
Comment thread layers/gpuav/instrumentation/shared_memory_data_race.cpp Outdated
Comment thread layers/gpuav/spirv/module.cpp
Comment thread layers/gpuav/spirv/type_manager.cpp Outdated
Comment thread layers/gpuav/spirv/shared_memory_data_race_pass.cpp Outdated
@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 666602.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22647 running.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22647 failed.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 666699.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22648 running.

if ((ret & (STORE_BIT | ATOMIC_BIT)) != 0) {
error_payload = ErrorPayload(
inst_offset,
SpecConstantLinkShaderId | (kErrorGroup_SharedMemoryDataRace << kErrorGroup_Shift) | (kErrorSubCode_SharedMemoryDataRace_RaceOnLoad << kErrorSubCode_Shift),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to do in a follow up PR, but I think this should be a different error message to print it more clearly the error didn't have from a load, but a coop-load

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message will have the OpCooperativeMatrixLoadKHR opcode in it, so it's fairly clear.

return false;
}

// relies on some subgroup functionality
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I don't think we need this, if the user doesn't have coop-mat, it will not link in any subgroup sutff

also SPV_KHR_cooperative_matrix relies on SPIR-V 1.3, so you couldn't have a coop-mat without out anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to fix PositiveGpuAVBufferDeviceAddress.AtomicsWorkgroups which was failing in the previous CI run. Now that the sharedmemdatarace shader uses subgroup stuff, the inputs like SubgroupId get linked in, but the groupnonuniform capability is removed here:

    // Vulkan 1.1 is required, so if incoming SPIR-V is 1.0, might need to adjust it
    const uint32_t spirv_version_1_0 = 0x00010000;
    if (header_.version == spirv_version_1_0) {
        // SPV_KHR_storage_buffer_storage_class is needed, but glslang removes it from linking functions
        AddExtension("SPV_KHR_storage_buffer_storage_class");

        // Subgroups where added in Vulkan 1.1, so SPIR-V 1.0 can't use them
        // This is a bad hack around for someone using a SPIR-V 1.0
        RemoveCapability(spv::CapabilityGroupNonUniform);
    }

It looks like the new failure is something different, so I guess this fix worked (I saw spirv-val failures, not sure how that is configured in CI)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I've debated to just enforce 1.3 SPIR-V.... not going to worry too much about this for now

Copy link
Copy Markdown
Contributor

@spencer-lunarg spencer-lunarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments/idea, but can be applied in a follow-up PR if needed

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22648 failed.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

I can't reproduce the current CI failure locally (NegativeGpuAVSharedMemoryDataRace.CoopMatStoreLoad). Is there anything useful in the logs?

@spencer-lunarg
Copy link
Copy Markdown
Contributor

Note: Google Test filter = NegativeGpuAVSharedMemoryDataRace.CoopMatStoreLoad
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from NegativeGpuAVSharedMemoryDataRace
[ RUN      ] NegativeGpuAVSharedMemoryDataRace.CoopMatStoreLoad
Driver Name = NVIDIA
Driver Info = 550.78
/home/lunarg/.jenkins/vz3/Debug64/Vulkan-ValidationLayers/tests/framework/error_monitor.cpp:309: Failure
Failed
Validation Error: [ SharedMemoryDataRace-RaceOnLoad ] | MessageID = 0xf32d0b17
vkCmdDispatch(): A data race was detected on the shared memory variable "arr" in local invocation index 1 while performing a load operation. (Likely against local invocation index 0)
Stage = Compute.  Global invocation ID (x, y, z) = (1, 0, 0)
Command buffer (0x5bbba7df6e20)
	Compute Dispatch Index 0
Shader Module (0x140000000014) (internal ID 1)
SPIR-V Instruction: %42 = OpLoad %6 %41 48 19
(Unable to find shader source, build shader with debug info to get source information)

Objects: 3
    [0] VkQueue 0x5bbba6f81a90
    [1] VkCommandBuffer 0x5bbba7df6e20
    [2] VkPipeline 0x1b000000001b



[  FAILED  ] NegativeGpuAVSharedMemoryDataRace.CoopMatStoreLoad (1579 ms)
[----------] 1 test from NegativeGpuAVSharedMemoryDataRace (1579 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (1851 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] NegativeGpuAVSharedMemoryDataRace.CoopMatStoreLoad

 1 FAILED TEST

... not sure what is going on, should have reported the SharedMemoryDataRace only once due to the limit

not sure what is going on either... worse case, can silent it until I have access to the machine to look more

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

I added printfs and was lucky enough to see it happen once. There can intermittently be both RaceOnStore and RaceOnLoad, and the duplicate message limit applies per-VUID.

[ RUN      ] NegativeGpuAVSharedMemoryDataRace.CoopMatStoreLoad
vuid_text VALIDATION-SETTINGS
vuid_text WARNING-CreateInstance-status-message
vuid_text WARNING-CreateInstance-debug-warning
vuid_text WARNING-Setting-Limit-Adjusted
vuid_text WARNING-Setting-Limit-Adjusted
vuid_text SharedMemoryDataRace-RaceOnStore
vuid_text SharedMemoryDataRace-RaceOnLoad
C:\github\jeffbolznv\Vulkan-ValidationLayers\tests\framework\error_monitor.cpp(309): error: Failed
Validation Error: [ SharedMemoryDataRace-RaceOnLoad ] | MessageID = 0xf32d0b17
vkCmdDispatch(): A data race was detected on the shared memory variable "arr" in local invocation index 1 while performing a load operation. (Likely against local invocation index 0)
Stage = Compute.  Global invocation ID (x, y, z) = (1, 0, 0)
Command buffer (0x23d005870d0)
        Compute Dispatch Index 0
Shader Module (0x140000000014) (internal ID 1)
SPIR-V Instruction: %42 = OpLoad %6 %41 48 19
(Unable to find shader source, build shader with debug info to get source information)

Objects: 3
    [0] VkQueue 0x23d004214d0
    [1] VkCommandBuffer 0x23d005870d0
    [2] VkPipeline 0x1b000000001b



vuid_text SharedMemoryDataRace-RaceOnLoad
vuid_text SharedMemoryDataRace-RaceOnLoad
vuid_text SharedMemoryDataRace-RaceOnLoad
vuid_text SharedMemoryDataRace-RaceOnLoad

Any clever ideas what to do about this? Seems like it could affect some other tests, too.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build queued with queue ID 666868.

@jeffbolznv
Copy link
Copy Markdown
Contributor Author

I changed this and a few other tests to have a single race. This uncovered another bug, so I guess that worked out well.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22649 running.

@ci-tester-lunarg
Copy link
Copy Markdown
Collaborator

CI Vulkan-ValidationLayers build # 22649 passed.

@spencer-lunarg spencer-lunarg merged commit 8a324d3 into KhronosGroup:main Mar 3, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants