Skip to content

Drain cp.async jobs after main loop#2008

Merged
jacobhinkle merged 2 commits intomainfrom
drain_cp_async_jobs
Mar 28, 2024
Merged

Drain cp.async jobs after main loop#2008
jacobhinkle merged 2 commits intomainfrom
drain_cp_async_jobs

Conversation

@jacobhinkle
Copy link
Collaborator

This just places a cp.async.wait_group 0 instruction immediately after any circular buffer main loop which is the approach taken by CUTLASS for pipelining GEMMs: (see mma_multistage.h#L664-L665). The previous fix for #2000, #2001, is reverted.

This is an alternative to #2005.

Fixes #2000

This is an alternative to #2005
@jacobhinkle jacobhinkle requested a review from zasdfgbnm March 27, 2024 19:09
@jacobhinkle
Copy link
Collaborator Author

!build --diff-bench

@jacobhinkle
Copy link
Collaborator Author

Fixed binary test. Python test is complex div correctness (unrelated). Code diffs look normal. I'm not sure why codediff 1/5 failed to upload but the artifact has a normal looking html. I'll merge once clang-build passes.

@jacobhinkle jacobhinkle merged commit 8ad56c6 into main Mar 28, 2024
@jacobhinkle jacobhinkle deleted the drain_cp_async_jobs branch March 28, 2024 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Matmul circular buffering interferes with epilogue smem reuse

2 participants