Rope Fix for a single subfunction signature by abhishek-singh591 · Pull Request #880 · quic/efficient-transformers

abhishek-singh591 · 2026-03-24T13:10:55Z

Summary

This PR introduces the Rotary Position Embedding (RoPE) fix, ensuring that models generate a single unified subfunction signature during ONNX export.

Models Status After Applying the Fix

Models now producing a single subfunction signature

All causal LMs tested in the associated test file are functioning correctly, except those listed below.

Models still producing two different subfunction signatures

The following models continue to emit multiple subfunction signatures and require additional investigation:

Phi-1
StarCoder2
CodeGen

Models with issues unrelated to the RoPE fix

These models have separate problems that need to be addressed independently:

Granite-MoE
GPT-OSS
Mixtral

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

…uic_abhishek into rope_fix_subfunction

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

## Summary This PR introduces the **Rotary Position Embedding (RoPE) fix**, ensuring that models generate a **single unified subfunction signature** during ONNX export. ## Models Status After Applying the Fix ### Models now producing a single subfunction signature _All causal LMs tested in the associated test file are functioning correctly, except those listed below._ ### Models still producing **two different subfunction signatures** The following models continue to emit multiple subfunction signatures and require additional investigation: - [ ] Phi-1 - [ ] StarCoder2 - [ ] CodeGen ### Models with issues **unrelated** to the RoPE fix These models have separate problems that need to be addressed independently: - [ ] Granite-MoE - [ ] GPT-OSS - [ ] Mixtral --------- Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Onkar Chougule <ochougul@qti.qualcomm.com>

@anujgupt-github

… align the subfunction changes. (#882) This PR rebases dev/rebase_transformers_v4_57_3 onto main and consolidates our transformer rebase changes with the PR #880 subfunction/KV alignment so we keep the branch simpler and unify the subfunction approach. ### What changed - KV/subfunction alignment: - Applied PR #880-style wrapper changes for causal model families to reduce divergence from mainline. - Removed local resolve_kv_seq_len usage from remaining wrappers (grok_1, molmo) to match the cache-native pattern used elsewhere. - Removed now-unused helper resolve_kv_seq_len from QEfficient/utils/_utils.py. - Unit test updates: - Added a new quickcheck unit test for use_onnx_subfunctions=True that validates decoder-block subfunction cardinality per causal model. - Important: test counts only decoder model block functions (via get_submodules_for_export()), not all ONNX helper functions, so the assertion tracks the intended behavior. ### Decoder-block subfunction status (causal model list) - Single decoder-block subfunction: falcon, gpt2, gptj, granite, llama, mistral, mpt, olmo2, phi3, qwen2 - Multiple decoder-block subfunctions: codegen, gpt_oss, mixtral, phi (Phi-1), starcoder2 ### Tests verified ``` python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto Result after subfunction-count + KV-helper cleanup: 75 passed, 1 skipped ``` cc: @anujgupt-github @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com> Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com> Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Co-authored-by: Mamta Singh <mamtsing@qti.qualcomm.com> Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>

## Summary This PR introduces the **Rotary Position Embedding (RoPE) fix**, ensuring that models generate a **single unified subfunction signature** during ONNX export. ## Models Status After Applying the Fix ### Models now producing a single subfunction signature _All causal LMs tested in the associated test file are functioning correctly, except those listed below._ ### Models still producing **two different subfunction signatures** The following models continue to emit multiple subfunction signatures and require additional investigation: - [ ] Phi-1 - [ ] StarCoder2 - [ ] CodeGen ### Models with issues **unrelated** to the RoPE fix These models have separate problems that need to be addressed independently: - [ ] Granite-MoE - [ ] GPT-OSS - [ ] Mixtral --------- Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Onkar Chougule <ochougul@qti.qualcomm.com>

…bfunctions for cached text models (#928) ## Summary This change moves layer-invariant RoPE cos/sin indexing out of repeated decoder-layer subfunctions and into model-level forward paths. For cached decoder models, we were repeatedly doing: ``` cos = cos[position_ids].unsqueeze(1) sin = sin[position_ids].unsqueeze(1) ``` inside each decoder attention block. With ONNX subfunctions enabled, that indexing becomes part of the exported repeated subfunction body and contributes to the on-device regression we observed after the single-subfunction Rope Fix work #880 . This patch hoists that work once per forward pass and passes the already-shaped cos/sin tensors into each decoder layer. ## What changed Applied the refactor to the applicable QEff model families that thread static cached RoPE tensors through repeated decoder layers, including: - Llama - Llama SwiftKV - Gemma - Gemma2 - Mistral - Falcon - GPT-OSS - Granite - GraniteMoE - Mllama text path - Mixtral - Olmo2 - Phi3 - Qwen2 - Qwen3 - Qwen3 MoE - Qwen2.5 VL text path - Qwen3 VL text path - Qwen3 VL MoE text path For the Qwen VL text towers, the same idea is applied to the indexed/interleaved MRoPE preparation: the already-indexed cos/sin tensors are prepared once before the decoder-layer loop and reused across layers. ## Tests Added a TinyLlama regression test to assert that export with subfunctions still produces a single decoder-layer ONNX function. Verified: `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto` --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>

Added all the changes of rope_fix

9212283

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

abhishek-singh591 requested review from ochougul, quic-amitraj, quic-hemagnih and quic-rishinr as code owners March 24, 2026 13:10

abhishek-singh591 added 5 commits March 24, 2026 13:13

lint

8acdd17

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

made minnor fix

c225c0d

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Merge branch 'quic:main' into rope_fix_subfunction

4d29a6b

Added few changes

3f828ce

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Merge branch 'rope_fix_subfunction' of github.com:abhishek-singh591/q…

fef8ca4

…uic_abhishek into rope_fix_subfunction

vbaddi mentioned this pull request Mar 25, 2026

feat(rebase + unify subfunctions): rebase to transformers v4.57.3 and align the subfunction changes. #882

Merged

ochougul and others added 7 commits March 25, 2026 09:35

simplifying

f6adeaa

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

simplified modeling file

fc6f938

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

simplified modeling file

a45276d

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Merge branch 'quic:main' into rope_fix_subfunction

e651029

Modified llama swiftkv

dc2d645

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Modified llama swiftkv

fce8237

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>

Merge branch 'quic:main' into rope_fix_subfunction

7b69911

quic-hemagnih approved these changes Mar 26, 2026

View reviewed changes

quic-hemagnih merged commit 895d987 into quic:main Mar 26, 2026
5 checks passed

abhishek-singh591 mentioned this pull request Mar 26, 2026

Rope Fix for single subfunction signature #816

Closed

6 tasks

vbaddi mentioned this pull request Apr 20, 2026

feat(rope_fix): Hoist layer-invariant RoPE indexing out of decoder subfunctions for cached text models #928

Merged

quic-rishinr mentioned this pull request Apr 22, 2026

feat(rope_fix): Hoist layer-invariant RoPE indexing out of decoder su… #936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rope Fix for a single subfunction signature#880

Rope Fix for a single subfunction signature#880
quic-hemagnih merged 13 commits intoquic:mainfrom
abhishek-singh591:rope_fix_subfunction

abhishek-singh591 commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abhishek-singh591 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Models Status After Applying the Fix

Models now producing a single subfunction signature

Models still producing two different subfunction signatures

Models with issues unrelated to the RoPE fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abhishek-singh591 commented Mar 24, 2026 •

edited

Loading