Skip to content

Conversation

@ahmed-bsod
Copy link

Motivation

Added gluon kernels for GEMM_A8W8 and gluon preshuffled GEMM_A8W8

Test Plan

Ran the test_gemm_a8w8.py script to make sure all functional tests passing

Test Result

Tests passed 🔥

@ahmed-bsod ahmed-bsod requested review from a team and Copilot December 18, 2025 16:06
@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gemm_a8w8_gluon branch from fa00faf to ce1e6fa Compare December 18, 2025 16:06
@ahmed-bsod ahmed-bsod changed the title Ahmed bsod/gemm a8w8 gluon ahmed-bsod/gemm a8w8 gluon Dec 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds gluon kernel implementations for GEMM_A8W8 operations, including both a standard version and a preshuffled weight version. The implementation supports int8 and FP8 (e4m3/e5m2) input types with various output types (bf16, fp16, fp32, int32).

  • Added two new gluon kernel variants: gemm_a8w8 and gemm_a8w8_preshuffle for AMD CDNA4 architecture
  • Extended test coverage to validate all three implementations (triton, gluon, gluon_shuffle) across multiple data types and configurations
  • Updated benchmark suite to support gluon implementations with command-line flags for easy performance comparison

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
aiter/ops/triton/gluon/gemm_a8w8.py New file implementing gluon-based GEMM_A8W8 kernels with standard and preshuffled weight variants
op_tests/triton_tests/gemm/basic/test_gemm_a8w8.py Extended test suite to parametrize over implementation types and added support for int32 output dtype
op_tests/op_benchmarks/triton/bench_gemm_a8w8.py Added command-line arguments for gluon and shuffle flags to enable performance benchmarking
aiter/ops/triton/configs/gemm/gluon/gfx950-GEMM-A8W8.json Configuration file with tuned kernel parameters for gfx950 architecture

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gemm_a8w8_gluon branch 2 times, most recently from af5de67 to acc815a Compare December 18, 2025 16:43
@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gemm_a8w8_gluon branch 6 times, most recently from d069611 to 52fcd9c Compare December 19, 2025 17:56
@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gemm_a8w8_gluon branch from 52fcd9c to ea4a722 Compare January 2, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants