feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906
feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906
Conversation
|
@vbaddi - can we restructure this as below? We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh? Maybe I didn't fully understand the table you gave above |
Thanks @anujgupt-github. These are all configurable from the config or model card passed, whatever needs to be edited, can either be changed in config or pass that args in The table is basically a dummy inputs running on QAic and providing the numbers for those modules. (/sess.run()) |
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…ce runtime stubs. 0415 Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
350d2b4 to
ab1aa24
Compare
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
|
@vbaddi do we need all these changes in modelling auto? can we have a separate component for exporting the model using the map you have built in SUPPORTED_CAUSAL_RUNTIME_MODEL_IDS? Since its not a mainline change we can keep it minimal |
| @@ -0,0 +1,20 @@ | |||
| # ----------------------------------------------------------------------------- | |||
There was a problem hiding this comment.
Please move the it to scripts scripts/microbenchmarks folder also add a readme file
WIP: This PR extends
enable_benchmark=Truesupport inQEFFAutoModelForCausalLMto all CausalLM models.What changed
gptj, mistral, mixtral, mpt, phi, phi3, qwen2, starcoder2, granite, olmo2).
get_benchmark_module_specs(...).CausalLM models.
Example benchmark output (Llama)
Input/Output shape section in report
Validation