[TRITON]: Add fused GEMMs to optimize FF block #736

willzhou-amd · 2025-07-28T19:26:53Z

Adding fused GEMM + activation + gating options to reduce memory movement overhead in a FF block.

Changes:

Add activation string parameter support for gemm_a16w16 to fuse an activation function with the GEMM.
Add gemm_a16w16_gating.py which uses the first half of the (M, N) output along the N dimension as the gating activations and the second half as the layer activations. Produces a tensor of shape (M, N//2). This also includes an activation parameter for the gating activation.
Write tests.
~~Write a fully-fused E2E FF kernel.~~ Will go into next PR.

Benchmarks on 350x:

Post-tuning:

Benchmarks on 300x:

Multikernel-ff is where we're at right now (e.g launching two Aiter Triton GEMMs).

…marks

aiter/ops/triton/configs/gemm/MI350X-GEMM-A16W16-gated.json

aiter/ops/triton/gemm_a16w16_gated.py

rahulbatra85

Please add logging for the new op

willzhou-amd added 5 commits July 25, 2025 17:21

Add optional fused activation + gating args to a16w16 GEMM

f365115

Add tests for fused a16w16 GEMM

fd7625f

Formatting changes

910170d

Add gating & activation tests for fused a16w16 GEMM

abef383

Factor out fused GEMM kernel into separate file

c30e5dd

willzhou-amd self-assigned this Jul 28, 2025

willzhou-amd added 11 commits July 28, 2025 19:28

Revert testing tolerances to not affect CI

30b3a9c

Merge branch 'main' into willz/fused-ff-gemms

4c2cd6e

Update a16w16 gated GEMM tests

927f804

Factor out FF block function into separate file + write tests + bench…

9ced0f4

…marks

Add tests for full FF interface

c11dfba

Formatting changes

f827c20

Update fused-act-gate a16w16 benchmark

c40c8c0

Fix shape error for FF interface when y is not provided

22c5ba1

Add tests for ungated FF function

082eddb

Tune a16w16 performance for standard & gated GEMMs

daa3f01

Remove FF interface (will refactor for next PR)

03b9ac7

willzhou-amd requested review from rahulbatra85 and vgokhale July 30, 2025 15:08

rahulbatra85 reviewed Jul 30, 2025

View reviewed changes

aiter/ops/triton/configs/gemm/MI350X-GEMM-A16W16-gated.json Show resolved Hide resolved

rahulbatra85 reviewed Jul 30, 2025

View reviewed changes

aiter/ops/triton/gemm_a16w16_gated.py Show resolved Hide resolved

Add mi300x tuning configs for A16W16 & Gated A16W16 GEMMs

934bcb1

valarLip added the triton label Jul 31, 2025

willzhou-amd added 2 commits August 1, 2025 14:23

Fix error with DS config under new config format

1b8e84b

Restore config to previously tuned variables

6420a5b

willzhou-amd requested a review from rahulbatra85 August 1, 2025 14:39

rahulbatra85 reviewed Aug 1, 2025

View reviewed changes

aiter/ops/triton/gemm_a16w16_gated.py Show resolved Hide resolved

rahulbatra85 requested changes Aug 1, 2025

View reviewed changes

Add logging to a16w16 gated GEMM

ad5d0ec

willzhou-amd requested a review from rahulbatra85 August 1, 2025 15:55

rahulbatra85 approved these changes Aug 2, 2025

View reviewed changes

rahulbatra85 merged commit 4822e67 into main Aug 2, 2025
14 checks passed

rahulbatra85 deleted the willz/fused-ff-gemms branch August 2, 2025 01:50

willzhou-amd mentioned this pull request Aug 5, 2025

[TRITON]: End-to-end fused feed-forward kernel #778

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRITON]: Add fused GEMMs to optimize FF block #736

[TRITON]: Add fused GEMMs to optimize FF block #736

Uh oh!

willzhou-amd commented Jul 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rahulbatra85 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[TRITON]: Add fused GEMMs to optimize FF block #736

[TRITON]: Add fused GEMMs to optimize FF block #736

Uh oh!

Conversation

willzhou-amd commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rahulbatra85 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

willzhou-amd commented Jul 28, 2025 •

edited

Loading