-
Notifications
You must be signed in to change notification settings - Fork 872
Open
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: benchmarkIssues related to the benchmark infrastructureIssues related to the benchmark infrastructuremodule: user experienceIssues related to reducing friction for usersIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Describe the bug
As shown on the dashboard, the avg_inference_latency (ms) is skipped for LLM, and report only generate_time (ms) instead.
Upon checking the iOS run for example, a LLM job will run three tests on-device to report different metrics:
test_load_llama_3_2_1b_llama3_fb16_pte_iOS_17_2_1_iPhone15_4test_forward_llama_3_2_1b_llama3_fb16_pte_iOS_17_2_1_iPhone15_4test_generate_llama_3_2_1b_llama3_fb16_pte_tokenizer_model_iOS_17_2_1_iPhone15_4
While a non-LLM job will only run the first two tests (test_load_ and test_forward_ ) instead.
See detailed jobs here:
- LLM: https://github.com/pytorch/executorch/actions/runs/13403521306/job/37441009799
- non-LLM: https://github.com/pytorch/executorch/actions/runs/13403521306/job/37441008720
Three things to get clarification in this task:
- Because
test_forward_*is reported to both LLM and non-LLM, why isn't reported to the dash? - Let's annotate each metrics in the DB so users will know what exactly is measured by each.
3. Confirm if Android is measuring and reporting exact same metricsReport avg_inference_latency from Android LLM benchmark app #8578
Versions
trunk
Reactions are currently unavailable
Metadata
Metadata
Labels
enhancementNot as big of a feature, but technically not a bug. Should be easy to fixNot as big of a feature, but technically not a bug. Should be easy to fixmodule: benchmarkIssues related to the benchmark infrastructureIssues related to the benchmark infrastructuremodule: user experienceIssues related to reducing friction for usersIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
To triage
Status
In Progress
