Replace QH16 bf16 kernel with a new one that does not use ptr_RP#2999
Open
JohnNikolay84 wants to merge 3 commits intomainfrom
Open
Replace QH16 bf16 kernel with a new one that does not use ptr_RP#2999JohnNikolay84 wants to merge 3 commits intomainfrom
JohnNikolay84 wants to merge 3 commits intomainfrom
Conversation
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
ce19134 to
46d6983
Compare
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
#2729 has introduced a new QH64 kernel that is not writing directly to ptr_RP and instead is writing split data into ptr_R/logits.
As this #2983 states other kernels like MLA_A16W16_1TG_4W_32mx1_16nx1_Coex0_Msk1_QH16.co do not follow the same logic and write into a null pointer instead.
Technical Details
This change is introducing a new kernel for nhead=32 bf16 that is using same convention as QH64 kernel. However I have not been able to find a kernel with mfma layouts 32x32x16, instead I am using the one with 16x16x32.
Test Plan
Run a new test in aiter and make sure it pass torch reference.
Run DeepSeek in TP4 and make sure it is not crashing.
Test Result
Submission Checklist