Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 18, 2025

The accumulator dtype logic incorrectly promoted fp16/bf16 to int32 instead of float32, causing large RMSE errors in GEMM outputs. The flawed pattern acc_dtype = tl.int32 if C.type.element_ty != tl.int8 else tl.float32 (note the inverted logic) only handled int8 vs everything else.

Changes

  • New helper function iris.get_accumulator_dtype() in iris/util.py with proper promotion rules:

    • int8/int16/int32 → int32 (prevent overflow)
    • fp16/bf16 → fp32 (prevent precision loss)
    • fp32 → fp32, fp64 → fp64
  • Replaced 19 instances across iris/ccl/ and examples/ with calls to the helper

  • Unit tests in tests/unittests/test_get_accumulator_dtype.py

Usage

@triton.jit
def kernel(C, ...):
    # Before: acc_dtype = tl.float32 if C.type.element_ty != tl.int8 else tl.int32
    # After:
    acc_dtype = iris.get_accumulator_dtype(C.type.element_ty)
    acc = tl.zeros((BLOCK_M, BLOCK_N), dtype=acc_dtype)
Original prompt

This section details on the original issue you should resolve

<issue_title>Error in promotion of data type for accumulation</issue_title>
<issue_description>This piece of code in the iris/x/ kernels leads to error for many datatypes. Eg: fp16 get's accumulated into int32 leading to large RMSE errors in the output.

# Determine accumulator dtype based on output type
acc_dtype = tl.int32 if C.type.element_ty != tl.int8 else tl.float32

Eg. used here:
https://github.com/ROCm/iris/blob/1a09fae0572c2e2484f5abb0c09214a02ed4500b/iris/x/all_gather_gemm.py#L119C1-L119C73</issue_description>

<agent_instructions>Introduce a helper promote accumulator datatype helper function that can be consistently used across examples. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

@mawad-amd CoPilot suggestion is also incomplete here https://github.com//pull/296/changes#r2604284884

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 18, 2025 20:13
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Copilot AI changed the title [WIP] Add helper function to promote accumulator data type Add get_accumulator_dtype helper for proper dtype promotion in GEMM kernels Dec 18, 2025
Copilot AI requested a review from mawad-amd December 18, 2025 20:19
Copy link
Collaborator

@mawad-amd mawad-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.

@mawad-amd mawad-amd marked this pull request as ready for review December 18, 2025 20:57
@mawad-amd mawad-amd requested a review from BKP as a code owner December 18, 2025 20:57
Copy link

@aamarnat aamarnat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@mawad-amd
Copy link
Collaborator

Actually, ignore this PR for a second. I just noticed we need to rebase #296

Base automatically changed from muhosama/ccl-more to main December 19, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants