-
Notifications
You must be signed in to change notification settings - Fork 166
ahmed-bsod/gemm a8w8 gluon #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fa00faf to
ce1e6fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds gluon kernel implementations for GEMM_A8W8 operations, including both a standard version and a preshuffled weight version. The implementation supports int8 and FP8 (e4m3/e5m2) input types with various output types (bf16, fp16, fp32, int32).
- Added two new gluon kernel variants:
gemm_a8w8andgemm_a8w8_preshufflefor AMD CDNA4 architecture - Extended test coverage to validate all three implementations (triton, gluon, gluon_shuffle) across multiple data types and configurations
- Updated benchmark suite to support gluon implementations with command-line flags for easy performance comparison
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
aiter/ops/triton/gluon/gemm_a8w8.py |
New file implementing gluon-based GEMM_A8W8 kernels with standard and preshuffled weight variants |
op_tests/triton_tests/gemm/basic/test_gemm_a8w8.py |
Extended test suite to parametrize over implementation types and added support for int32 output dtype |
op_tests/op_benchmarks/triton/bench_gemm_a8w8.py |
Added command-line arguments for gluon and shuffle flags to enable performance benchmarking |
aiter/ops/triton/configs/gemm/gluon/gfx950-GEMM-A8W8.json |
Configuration file with tuned kernel parameters for gfx950 architecture |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
af5de67 to
acc815a
Compare
d069611 to
52fcd9c
Compare
52fcd9c to
ea4a722
Compare
Motivation
Added gluon kernels for GEMM_A8W8 and gluon preshuffled GEMM_A8W8
Test Plan
Ran the test_gemm_a8w8.py script to make sure all functional tests passing
Test Result
Tests passed 🔥