Skip to content

Conversation

@hubertlu-tw
Copy link
Contributor

@hubertlu-tw hubertlu-tw commented May 21, 2025

For the input tensors used in the following workload (hidden_size=128256, K=8):

python3 -m sglang.launch_server --model meta-llama/Meta-Llama-3-8B-Instruct --speculative-algo EAGLE     --speculative-draft lmsys/sglang-EAGLE-LLaMA3-Instruct-8B --speculative-num-steps 5     --speculative-eagle-topk 8 --speculative-num-draft-tokens 64 --dtype float16 --port 30000

python3 -m sglang.bench_serving --backend sglang  --dataset-name random  --random-input 1024  --random-output 1024   --num-prompts 100   --request-rate 4

With the topk kernel change, it show about 9.63% improvement in request throughput and 40% improvement in TTFT.
image

For the input tensors used in another internal workload (hidden_size=16, K=2):
image

To run the performance benchmark and analyze the peak memory usage of the kernel for various input shapes and k,

aiter/op_tests/op_benchmarks/triton# python bench_topk.py --roofline

To run the roofline model for the kernel for various input shapes and k,

aiter/op_tests/op_benchmarks/triton# python bench_topk.py --roofline

To run the unit test of the kernel

aiter/op_tests/triton# pytest test_topk.py

@hubertlu-tw hubertlu-tw self-assigned this May 21, 2025
@hubertlu-tw hubertlu-tw changed the title Add Triton Topk Kernel [TRITON] Add Triton Topk Kernel May 22, 2025
@rahulbatra85
Copy link
Contributor

@hubertlu-tw Can you please run the black linter tool locally and fix the issues?
pip install black
black [name of the file]

@hubertlu-tw
Copy link
Contributor Author

@hubertlu-tw Can you please run the black linter tool locally and fix the issues? pip install black black [name of the file]

Sure. I have ran the below two linters for the scripts I added and tested them locally.

pip install black
black [name of the file]

pip install ruff
ruff check bench_topk.py --unsafe-fixes --fix [name of the file]

Thanks!

@hubertlu-tw
Copy link
Contributor Author

@rahulbatra85 and @vgokhale could you please review the PR and let me know what I need to refactor or add? Thanks!

Copy link
Contributor

@rahulbatra85 rahulbatra85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see some my comments.
Thanks!

@rahulbatra85
Copy link
Contributor

@hubertlu-tw Let me know whenever you have changes made. Thanks!

@hubertlu-tw
Copy link
Contributor Author

@rahulbatra85 I have refactored the code based on your suggestions. Thank you very much.

@valarLip valarLip merged commit d765e80 into ROCm:main Jun 20, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants