[MetaxGPU][test] adjust warp size in mhc_pre_big_fuse#4
[MetaxGPU][test] adjust warp size in mhc_pre_big_fuse#4yesuweiYYYY wants to merge 1 commit intoMetaX-MACA:devfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request increases the thread count from 96 to 128 and updates the thread binding threshold for shared memory operations from 32 to 64. A critical race condition was identified where threads reading from shared memory might do so before the writing threads have finished, necessitating the addition of a synchronization barrier.
| T.copy(mixes, mixes_shared, disable_tma=True) | ||
|
|
||
| if T.get_thread_binding() < 32: | ||
| if T.get_thread_binding() < 64: |
There was a problem hiding this comment.
There is a race condition between the threads writing to mixes_shared (threads 0-63) and the threads reading from it (threads 64-127 in the else block). Since these two groups belong to different warps (assuming a warp size of 32 or 64), a synchronization barrier is required to ensure that the data written to shared memory by the first group is visible to the second group before it is accessed. Adding T.syncthreads() before the second conditional block will resolve this issue and likely fix the data mismatch mentioned in the PR description.
| if T.get_thread_binding() < 64: | |
| T.syncthreads() | |
| if T.get_thread_binding() < 64: |
fix data mismatch 20% bug in mhc_pre_big_fuse.py in maca