Skip to content

enable tp for benchmark#43750

Merged
remi-or merged 4 commits intohuggingface:mainfrom
sywangyi:tp_benchmark
Mar 19, 2026
Merged

enable tp for benchmark#43750
remi-or merged 4 commits intohuggingface:mainfrom
sywangyi:tp_benchmark

Conversation

@sywangyi
Copy link
Copy Markdown
Contributor

@sywangyi sywangyi commented Feb 5, 2026

enable tp in benchmark_v2, to ensure large model could run.

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented Feb 5, 2026

@remi-or , pls help review, thx very much.

@sywangyi
Copy link
Copy Markdown
Contributor Author

hi, @remi-or any thought to enable tp, cp in the benchmark tool?

Copy link
Copy Markdown
Collaborator

@remi-or remi-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, but otherwise lgtm. Have you tried out the benchmarking with TP? And if so, how were the results? I am curious what applications you are targeting this for 🙂 !

Comment thread benchmark_v2/framework/benchmark_config.py Outdated
Comment thread benchmark_v2/framework/benchmark_config.py Outdated
Comment thread benchmark_v2/framework/benchmark_runner.py Outdated
Comment thread benchmark_v2/framework/benchmark_runner.py
Comment thread benchmark_v2/framework/benchmark_runner.py
@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented Mar 5, 2026

Some nits, but otherwise lgtm. Have you tried out the benchmarking with TP? And if so, how were the results? I am curious what applications you are targeting this for 🙂 !

actually I would like to leverage this benchmark tool in xpu to broader model, so I need to run bigger model(like moe serial). one card could not run such model for memory limitation. so I need to run with ep and tp with multiple-cards. also tp, ep support in kernels path is only in our radar.

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
@remi-or
Copy link
Copy Markdown
Collaborator

remi-or commented Mar 5, 2026

Ok, this looks good! I would like to test n my end before merging, will do so soon. My question was: did you manage to run the benchmarker in a distributed setting on your end? Or is this a small change needed for that but not enough to enable the feature? Thanks

@sywangyi
Copy link
Copy Markdown
Contributor Author

sywangyi commented Mar 5, 2026

yes, I test by my side using torchrun --nproc-per-node 2 run_benchmarks.py --enable-tp, enough to enable the feature.

Copy link
Copy Markdown
Collaborator

@remi-or remi-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your patience.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@remi-or remi-or added this pull request to the merge queue Mar 19, 2026
Merged via the queue into huggingface:main with commit 62c46ce Mar 19, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants