Skip to content

Conversation

@CuiCu-618
Copy link
Contributor

Motivation

Technical Details

  • Fuse k-select with softmax so we compute exp() only for top-K
  • Widen vector loads to 32B

Test Plan

Test Result

MI300A

N=64, K=5-8: ~1.12× - 1.53x kernel speedup;
N=256, K=5-8: ~0.97× - 1.13x kernel speedup;
N=512, K=5-8: ~0.98x - 1.76x kernel speedup;

Submission Checklist

@valarLip valarLip merged commit 79a951a into ROCm:main Aug 12, 2025
10 of 11 checks passed
@CuiCu-618 CuiCu-618 deleted the cucui/opt_topksoftmax branch August 12, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants