feat: Enable benchmark-mode module inventory/export across all CausalLM architectures by vbaddi · Pull Request #906 · quic/efficient-transformers

vbaddi · 2026-04-03T10:17:26Z

WIP: This PR extends enable_benchmark=True support in QEFFAutoModelForCausalLM to all CausalLM models.

What changed

Added architecture coverage for CausalLM families (lama, gpt_oss including gpt2, codegen, falcon,
gptj, mistral, mixtral, mpt, phi, phi3, qwen2, starcoder2, granite, olmo2).
Added mixtral MoE module benchmark support (attention, decoder, moe).
Added seq_len passthrough to get_benchmark_module_specs(...).
Added tiny-model benchmark inventory test matrix to validate module dump behavior across all listed tiny
CausalLM models.
Kept benchmark behavior gated by enable_benchmark=True; non-benchmark flow remains backward compatible.

Example benchmark output (Llama)

Mode	Module	Type	Prefill ms	Decode ms
Prefill/Decode	Attention	Attention	0.7051	0.5613
Prefill/Decode	Decoder	Decoder	0.7975	0.6924

Input/Output shape section in report

  - attention inputs: {"attention_mask":[1,1,32,128],"hidden_states":[1,32,16],"past_key.0":
    [1,4,128,4],"past_value.0":[1,4,128,4],"position_ids":[1,32]}
  - attention outputs: {"attention_output":[1,32,16],"past_key_RetainedState":
    [1,4,128,4],"past_value_RetainedState":[1,4,128,4]}

Validation

   python -m pytest -q tests/unit_test/benchmarking/test_causal_lm_microbenchmark.py (25 passed)
   python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto (62 passed)

anujgupt-github · 2026-04-06T09:38:53Z

@vbaddi - can we restructure this as below?
We really need only benchmarks for Attention and FFN (incl expert interactions for MOE models)

We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh?

Maybe I didn't fully understand the table you gave above

vbaddi · 2026-04-09T04:52:28Z

@vbaddi - can we restructure this as below? We really need only benchmarks for Attention and FFN (incl expert interactions for MOE models)

We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh?

Maybe I didn't fully understand the table you gave above

Thanks @anujgupt-github. These are all configurable from the config or model card passed, whatever needs to be edited, can either be changed in config or pass that args in .from_pretrained()

The table is basically a dummy inputs running on QAic and providing the numbers for those modules. (/sess.run())

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…ce runtime stubs. 0415 Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

quic-rishinr · 2026-04-28T10:20:49Z

@vbaddi do we need all these changes in modelling auto? can we have a separate component for exporting the model using the map you have built in SUPPORTED_CAUSAL_RUNTIME_MODEL_IDS?

Since its not a mainline change we can keep it minimal

quic-rishinr · 2026-04-28T08:49:54Z

@@ -0,0 +1,20 @@
+# -----------------------------------------------------------------------------


Please move the it to scripts scripts/microbenchmarks folder also add a readme file

vbaddi added the enhancement New feature or request label Apr 3, 2026

vbaddi marked this pull request as draft April 3, 2026 10:17

vbaddi changed the title ~~(feat): Enable benchmark-mode module inventory/export across all CausalLM architectures~~ feat: Enable benchmark-mode module inventory/export across all CausalLM architectures Apr 3, 2026

vbaddi marked this pull request as ready for review April 4, 2026 14:38

vbaddi added 7 commits April 15, 2026 19:39

feat: Add MicroBenchmark module export/compile/generate to the QEff

cca4240

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: enable for all causal-lm models

dec0808

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: add license file to the example

85f980f

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: Add benchmark specific changes to fix gptoss; 0407

0b1e6d7

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: minor fixes; 0409

194c5d1

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: benchmarking guidelines, code cleanup, 0415

cb1ab3c

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: align benchmark wrappers with mainline attention API and add tra…

ab1aa24

…ce runtime stubs. 0415 Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi force-pushed the feat/enable_micro_benchmark branch from 350d2b4 to ab1aa24 Compare April 15, 2026 15:16

vbaddi added 4 commits April 15, 2026 23:14

nit: align benchmark w/blocking config. 0415

a66eb4e

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: align benchmark w/blocking config. 0416

5704d78

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: align benchmark w/other causal-lms. 0417

4b8f8cf

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: align w/guidelines, 0417

4a61009

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

quic-rishinr self-requested a review April 24, 2026 05:42

quic-rishinr requested changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906

feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906
vbaddi wants to merge 11 commits intoquic:mainfrom
vbaddi:feat/enable_micro_benchmark

vbaddi commented Apr 3, 2026 •

edited

Loading

Uh oh!

anujgupt-github commented Apr 6, 2026

Uh oh!

vbaddi commented Apr 9, 2026

Uh oh!

quic-rishinr commented Apr 28, 2026

Uh oh!

quic-rishinr Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,20 @@
		# -----------------------------------------------------------------------------

Conversation

vbaddi commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Example benchmark output (Llama)

Validation

Uh oh!

anujgupt-github commented Apr 6, 2026

Uh oh!

vbaddi commented Apr 9, 2026

Uh oh!

quic-rishinr commented Apr 28, 2026

Uh oh!

quic-rishinr Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vbaddi commented Apr 3, 2026 •

edited

Loading