Skip to content

[MetaxGPU][test] adjust warp size in mhc_pre_big_fuse#4

Open
yesuweiYYYY wants to merge 1 commit intoMetaX-MACA:devfrom
yesuweiYYYY:dev_pr2
Open

[MetaxGPU][test] adjust warp size in mhc_pre_big_fuse#4
yesuweiYYYY wants to merge 1 commit intoMetaX-MACA:devfrom
yesuweiYYYY:dev_pr2

Conversation

@yesuweiYYYY
Copy link
Copy Markdown

fix data mismatch 20% bug in mhc_pre_big_fuse.py in maca

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request increases the thread count from 96 to 128 and updates the thread binding threshold for shared memory operations from 32 to 64. A critical race condition was identified where threads reading from shared memory might do so before the writing threads have finished, necessitating the addition of a synchronization barrier.

T.copy(mixes, mixes_shared, disable_tma=True)

if T.get_thread_binding() < 32:
if T.get_thread_binding() < 64:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a race condition between the threads writing to mixes_shared (threads 0-63) and the threads reading from it (threads 64-127 in the else block). Since these two groups belong to different warps (assuming a warp size of 32 or 64), a synchronization barrier is required to ensure that the data written to shared memory by the first group is visible to the second group before it is accessed. Adding T.syncthreads() before the second conditional block will resolve this issue and likely fix the data mismatch mentioned in the PR description.

Suggested change
if T.get_thread_binding() < 64:
T.syncthreads()
if T.get_thread_binding() < 64:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant