Skip to content

Conversation

@yzhou103
Copy link
Contributor

Motivation

update a8w8 bpreshuffle asm code and add it to tune

Technical Details

  1. add asm a8w8 bpreshuffle int8 codegen
  2. add asm a8w8 bpreshuffle int8 to gemm_a8w8_bpreshuffle_tune.py
  3. refactor gemm_a8w8_bpreshuffle_tune to support int8 tuning, add q_dtype_w for different quantization method.
  4. update asm a8w8 bpreshuffle int8 tuned shapes

Test Plan

python op_tests/test_gemm_a8w8.py
aiter/csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings October 11, 2025 03:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds ASM A8W8 bpreshuffle int8 codegen and integrates it into the tuning system. The changes extend the existing A8W8 bpreshuffle GEMM implementation to support int8 quantization alongside fp8, introducing new ASM kernels and updating the tuning framework to handle multiple quantization data types.

  • Adds ASM int8 kernel configuration and codegen for A8W8 bpreshuffle GEMM
  • Refactors tuning framework to support both fp8 and int8 quantization methods via q_dtype_w parameter
  • Updates kernel selection logic and API signatures to support new ASM kernels

Reviewed Changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
hsa/gfx942/i8gemm/i8gemm_bf16_perTokenI8.csv New kernel configuration for int8 ASM kernels
hsa/gfx942/i8gemm/codegen.py Code generator for ASM i8gemm kernel configurations
csrc/py_itfs_cu/asm_gemm_a8w8.cu Major refactoring of ASM GEMM interface with kernel selection logic
csrc/include/rocm_ops.hpp Updated Python binding parameters for new ASM interface
csrc/include/asm_gemm_a8w8.h Updated function signature for new parameters
csrc/ck_gemm_a8w8_bpreshuffle/gen_instances.py Added filtering for int8 dtype in tuning
csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py Major refactoring to support both fp8 and int8 tuning
csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.cu Updated to support BFloat16 output
csrc/ck_gemm_a8w8_bpreshuffle/README.md Documentation updates for new q_dtype_w parameter
aiter/utility/base_tuner.py Base tuner improvements for result handling
aiter/ops/gemm_op_a8w8.py Updated GEMM operations to use new configuration system
aiter/jit/optCompilerConfig.json Added blob generation command for i8gemm
aiter/configs/asm_a8w8_gemm.csv Updated ASM kernel configurations
aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv Added q_dtype_w column and int8 test cases

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@yzhou103 yzhou103 changed the title A8w8 asm codegen and tune [Mi35x] A8w8 asm codegen and tune Oct 13, 2025
@yzhou103 yzhou103 changed the title [Mi35x] A8w8 asm codegen and tune A8w8 asm codegen and tune Oct 16, 2025
@yzhou103 yzhou103 requested review from valarLip and zufayu October 20, 2025 10:06
@valarLip valarLip self-assigned this Oct 21, 2025
@valarLip valarLip merged commit 564527f into main Oct 23, 2025
16 checks passed
@valarLip valarLip deleted the a8w8_asm_codegen branch October 23, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants