-
Notifications
You must be signed in to change notification settings - Fork 167
[TRITON]: Benchmarking scripts updates #650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…_model.py * Why? The idea is to have a master script that benchmarks the full set of associated kernels when given a model name. It's a little cleaner to place all kernel benchmarking scripts in /kernels and have the bench_model script call them. * How? See `bench_model.py`. Pytests are in bench_tests/
…emm_a8w8 benchmarking.
…d add --model arg
…s would yield a shape failure
fb3b15e to
54f9478
Compare
rahulbatra85
approved these changes
Jul 18, 2025
azaidy
approved these changes
Jul 18, 2025
Contributor
azaidy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
cagrikymk
pushed a commit
that referenced
this pull request
Jul 30, 2025
* Modify op_benchmark directory structure to add bench_tests/ and bench_model.py * Why? The idea is to have a master script that benchmarks the full set of associated kernels when given a model name. It's a little cleaner to place all kernel benchmarking scripts in /kernels and have the bench_model script call them. * How? See `bench_model.py`. Pytests are in bench_tests/ * Update table formatting for bench_gemm_a8w8 and add tests for bench_gemm_a8w8 benchmarking. * Add tensor parallel in bench_gemm_a8w8.py * Add -no_glu arg, fix error in tensor parallelism, and reset folder structure * Fix argparse & tensor parallel bug * Update bench_gemm_a8w8_blockscale.py and add repeated code to benchmark_utils * Consolidate bench fn * Consolidate bench fn: int8 blockscale * Unify argparse for MHA benchmarking * Update configs for mha bench * Broadcast updates to bench_batched_gemm_afp4wfp4.py * Fix issue with arg names in bench_batched_gemm_afp4wfp4 * Add stride shape upcasting * Broadcast changes to batch_gemm_afp4wfp4_pre_quant * Improve code reuse + fix benchmarking FLOP computation bug * Fix shape order to allow plots to display properly * Sweep through moe, extend_attn, prefill, rmsnorm, rope to fix bugs and add --model arg * Add --model and --shape support to bench_routing.py * Add MOE information to deepseek model config * Revert linting changes in the CK dir * Revert linting changes to ck dir * Black linting change * Fix f-string issue * Add --model support to bench_topk.py & set int64 stride flag in mha * Undo linting changes to csrc * Add informative error when trying to benchmark non-MoE models * Format with Black * Support model flag for bench_gemm_a16w16 * Add --layout flag support to int8 and fp16 GEMMs + set graph axes to logscale * Add --layout support to afp4wfp4 GEMM * Fix function naming in bench_gemm_afp4wfp4 * Replace missing comma * Add --layout support to batched afp4wfp4 pre quant gemm * Remove merge duplicates * Undo linting changes that removed CN comments * Fix bug with -M flag * Add --layout support to a8w8 blockscale gemm * Add --layout support to batched afp4wfp4 GEMM * Formatting changes * Formatting changes * Debug shape issue that causes segfault when K > M * Black linting change * Fix issue where running batched GEMM benchmarking scripts with no args would yield a shape failure * Linting changes * Add -o flag and other fixes for benchmark scripts * Fix moe_routing_sigmoid benchmark * add Mi350 config json for extend attention * Linting fixes * More formatting fixes * batched_gemm mxfp4 fixes * Linting changes * Fix batched_gemm_afp4wfp4_pre_quant benchmark --------- Co-authored-by: Rahul Batra <rahbatra@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature left over from #594.
Example usage (TN is default layout):
python op_tests/op_benchmarks/triton/bench_gemm_afp4wfp4.py --model llama3-8Bpython op_tests/op_benchmarks/triton/bench_gemm_afp4wfp4.py --model llama3-8B --layout NN