Skip to content

Conversation

@archana-ramalingam
Copy link

@archana-ramalingam archana-ramalingam commented Jan 6, 2026

Motivation

CMS for 256x192x32 TN TF32

Technical Details

Problem size: [4096, 3072, 1, 8192]

Tensile:

  • Non CMS: 559.367 us
  • CMS: 466.562 us
  • Speedup: 19.89%

hipblaslt-bench:

  • Non CMS (MT256x192x32_MI16x16x1): [ 643.908 us, 652.948 us, 653.238 us ]

  • CMS: 577.128 us

  • Custom MT256x256x32 (Winner): 549.898 us

    Speedup:

  • vs Non CMS kernels: 11.57% to 13.18% improvement

  • vs Custom MT256x256x32 kernel: 4.9% regression

Test Plan

Test Result

Submission Checklist

Copy link
Contributor

@talumbau talumbau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 I did notice this in the trace:

image

but it looks like these stalls aren't too bad overall. Really nice improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants