-
Notifications
You must be signed in to change notification settings - Fork 167
[TRITON] Add Triton Topk Kernel #458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hubertlu-tw Can you please run the black linter tool locally and fix the issues? |
Sure. I have ran the below two linters for the scripts I added and tested them locally. Thanks! |
|
@rahulbatra85 and @vgokhale could you please review the PR and let me know what I need to refactor or add? Thanks! |
rahulbatra85
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see some my comments.
Thanks!
|
@hubertlu-tw Let me know whenever you have changes made. Thanks! |
|
@rahulbatra85 I have refactored the code based on your suggestions. Thank you very much. |
For the input tensors used in the following workload (hidden_size=128256, K=8):
With the topk kernel change, it show about 9.63% improvement in request throughput and 40% improvement in TTFT.

For the input tensors used in another internal workload (hidden_size=16, K=2):

To run the performance benchmark and analyze the peak memory usage of the kernel for various input shapes and k,
To run the roofline model for the kernel for various input shapes and k,
To run the unit test of the kernel