-
Notifications
You must be signed in to change notification settings - Fork 384
[BugFix] Correct index_map selection for transposed A matrix in MFMA Layout with k_dim==4 and open rocm-ci for gemmsr
#1627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…A when k_dim=4 Previously, when k_dim=4, the index_map always used non-transposed layout for matrix A regardless of the transposed flag. This caused precision issues for transposed GEMM operations on ROCm. Re-enable the previously skipped test cases for trans_A=True with float dtype.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughEnables additional GEMM SR test cases (int8 and float variants) and makes MFMA ldmatrix index-map selection transposed-aware for k_dim == 4, adjusting layout choices based on the transposed state. Changes
Sequence Diagram(s)(omitted — changes are limited to tests and internal mapping selection without a multi-component sequential flow) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom Pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (3)📓 Common learnings📚 Learning: 2026-01-06T05:20:45.325ZApplied to files:
📚 Learning: 2025-12-18T04:50:00.512ZApplied to files:
🧬 Code graph analysis (1)testing/python/tilelibrary/test_tilelang_tilelibrary_gemm.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
k_dim==4k_dim==4 and open rocm-ci for gemmsr
|
cc @Gongen-Ali |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the MFMA layout index map selection for transposed A matrices when k_dim==4 (used for float32 types), and re-enables previously disabled ROCm CI tests that were affected by this bug.
Key Changes:
- Adds proper transpose handling for A matrix when k_dim==4 in the
get_ldmatrix_index_mapmethod - Re-enables 4 test cases for int8 and float32 GEMM operations that were previously disabled due to precision issues on ROCm
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tilelang/intrinsics/mfma_macro_generator.py | Fixes index_map selection to use appropriate layout functions (4x16 vs 16x4) based on transpose flag for A matrix when k_dim==4, mirroring the existing logic for k_dim==16/32/64 |
| testing/python/tilelibrary/test_tilelang_tilelibrary_gemm.py | Re-enables 4 test cases (2 int8, 2 float32) with various transpose combinations that were previously commented out |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix the index_map selection so the MFMA layout will be right for float32 MFMA with transpose A. Problem only found for
k_dim == 4.Also open the corresponding CI previous closed by #1443
Previous generated code for
gemm_sr(128, 128, 128, True, False, T.float, T.float, T.float32, 128, 128, 32, 2, 128)is:The right should be:
Summary by CodeRabbit
Tests
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.