Skip to content

Conversation

@zufayu
Copy link
Contributor

@zufayu zufayu commented Jul 29, 2025

add new tile shape 224x256 192x256

Copilot AI review requested due to automatic review settings July 29, 2025 09:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for new tile shapes (224x256 and 192x256) to the A4W4 GEMM assembly kernels and improves the tuning infrastructure. The changes enhance the performance optimization capabilities by expanding the available tile configurations and making the tuning process more dynamic.

  • Added two new tile shapes (192x256 and 224x256) to the assembly kernel configuration
  • Enhanced condition checks for split-K operations to ensure proper validation
  • Improved tuning infrastructure to dynamically use all available kernels instead of hardcoded tiles

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
hsa/gfx950/f4gemm/f4gemm_bf16_per1x32Fp4.csv Adds kernel configurations for new 192x256 and 224x256 tile shapes
op_tests/test_gemm_a4w4.py Improves split-K validation and adds comprehensive test cases for tuning
csrc/ck_gemm_a4w4_blockscale/gemm_a4w4_blockscale_tune.py Enhances tuning to dynamically use all available kernels and fixes split-K validation
aiter/ops/gemm_op_a4w4.py Updates type annotations for better consistency
aiter/configs/a4w4_blockscale_untuned_gemm_test.csv Adds new test configuration file with comprehensive test cases
aiter/configs/a4w4_blockscale_tuned_gemm_test.csv Adds new empty tuned configuration file
Comments suppressed due to low confidence (1)

@valarLip valarLip merged commit c8557a0 into main Jul 31, 2025
13 checks passed
@valarLip valarLip deleted the a4w4_asm_pro_max_v2 branch July 31, 2025 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants