-
Notifications
You must be signed in to change notification settings - Fork 171
A4w4_asm_pro_max_v2 #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A4w4_asm_pro_max_v2 #741
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for new tile shapes (224x256 and 192x256) to the A4W4 GEMM assembly kernels and improves the tuning infrastructure. The changes enhance the performance optimization capabilities by expanding the available tile configurations and making the tuning process more dynamic.
- Added two new tile shapes (192x256 and 224x256) to the assembly kernel configuration
- Enhanced condition checks for split-K operations to ensure proper validation
- Improved tuning infrastructure to dynamically use all available kernels instead of hardcoded tiles
Reviewed Changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| hsa/gfx950/f4gemm/f4gemm_bf16_per1x32Fp4.csv | Adds kernel configurations for new 192x256 and 224x256 tile shapes |
| op_tests/test_gemm_a4w4.py | Improves split-K validation and adds comprehensive test cases for tuning |
| csrc/ck_gemm_a4w4_blockscale/gemm_a4w4_blockscale_tune.py | Enhances tuning to dynamically use all available kernels and fixes split-K validation |
| aiter/ops/gemm_op_a4w4.py | Updates type annotations for better consistency |
| aiter/configs/a4w4_blockscale_untuned_gemm_test.csv | Adds new test configuration file with comprehensive test cases |
| aiter/configs/a4w4_blockscale_tuned_gemm_test.csv | Adds new empty tuned configuration file |
add new tile shape 224x256 192x256