Rope Fix for a single subfunction signature#880
Merged
quic-hemagnih merged 13 commits intoquic:mainfrom Mar 26, 2026
Merged
Conversation
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
…uic_abhishek into rope_fix_subfunction
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
quic-hemagnih
approved these changes
Mar 26, 2026
6 tasks
quic-akuruvil
pushed a commit
to quic-akuruvil/efficient_transformers
that referenced
this pull request
Mar 27, 2026
## Summary This PR introduces the **Rotary Position Embedding (RoPE) fix**, ensuring that models generate a **single unified subfunction signature** during ONNX export. ## Models Status After Applying the Fix ### Models now producing a single subfunction signature _All causal LMs tested in the associated test file are functioning correctly, except those listed below._ ### Models still producing **two different subfunction signatures** The following models continue to emit multiple subfunction signatures and require additional investigation: - [ ] Phi-1 - [ ] StarCoder2 - [ ] CodeGen ### Models with issues **unrelated** to the RoPE fix These models have separate problems that need to be addressed independently: - [ ] Granite-MoE - [ ] GPT-OSS - [ ] Mixtral --------- Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Onkar Chougule <ochougul@qti.qualcomm.com>
quic-rishinr
pushed a commit
that referenced
this pull request
Apr 3, 2026
… align the subfunction changes. (#882) This PR rebases dev/rebase_transformers_v4_57_3 onto main and consolidates our transformer rebase changes with the PR #880 subfunction/KV alignment so we keep the branch simpler and unify the subfunction approach. ### What changed - KV/subfunction alignment: - Applied PR #880-style wrapper changes for causal model families to reduce divergence from mainline. - Removed local resolve_kv_seq_len usage from remaining wrappers (grok_1, molmo) to match the cache-native pattern used elsewhere. - Removed now-unused helper resolve_kv_seq_len from QEfficient/utils/_utils.py. - Unit test updates: - Added a new quickcheck unit test for use_onnx_subfunctions=True that validates decoder-block subfunction cardinality per causal model. - Important: test counts only decoder model block functions (via get_submodules_for_export()), not all ONNX helper functions, so the assertion tracks the intended behavior. ### Decoder-block subfunction status (causal model list) - Single decoder-block subfunction: falcon, gpt2, gptj, granite, llama, mistral, mpt, olmo2, phi3, qwen2 - Multiple decoder-block subfunctions: codegen, gpt_oss, mixtral, phi (Phi-1), starcoder2 ### Tests verified ``` python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto Result after subfunction-count + KV-helper cleanup: 75 passed, 1 skipped ``` cc: @anujgupt-github @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com> Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com> Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com> Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com> Co-authored-by: Mamta Singh <mamtsing@qti.qualcomm.com> Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>
tv-karthikeya
pushed a commit
to qcdipankar/efficient-transformers
that referenced
this pull request
Apr 21, 2026
## Summary This PR introduces the **Rotary Position Embedding (RoPE) fix**, ensuring that models generate a **single unified subfunction signature** during ONNX export. ## Models Status After Applying the Fix ### Models now producing a single subfunction signature _All causal LMs tested in the associated test file are functioning correctly, except those listed below._ ### Models still producing **two different subfunction signatures** The following models continue to emit multiple subfunction signatures and require additional investigation: - [ ] Phi-1 - [ ] StarCoder2 - [ ] CodeGen ### Models with issues **unrelated** to the RoPE fix These models have separate problems that need to be addressed independently: - [ ] Granite-MoE - [ ] GPT-OSS - [ ] Mixtral --------- Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com> Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com> Co-authored-by: Onkar Chougule <ochougul@qti.qualcomm.com>
quic-rishinr
added a commit
that referenced
this pull request
Apr 22, 2026
…bfunctions for cached text models (#928) ## Summary This change moves layer-invariant RoPE cos/sin indexing out of repeated decoder-layer subfunctions and into model-level forward paths. For cached decoder models, we were repeatedly doing: ``` cos = cos[position_ids].unsqueeze(1) sin = sin[position_ids].unsqueeze(1) ``` inside each decoder attention block. With ONNX subfunctions enabled, that indexing becomes part of the exported repeated subfunction body and contributes to the on-device regression we observed after the single-subfunction Rope Fix work #880 . This patch hoists that work once per forward pass and passes the already-shaped cos/sin tensors into each decoder layer. ## What changed Applied the refactor to the applicable QEff model families that thread static cached RoPE tensors through repeated decoder layers, including: - Llama - Llama SwiftKV - Gemma - Gemma2 - Mistral - Falcon - GPT-OSS - Granite - GraniteMoE - Mllama text path - Mixtral - Olmo2 - Phi3 - Qwen2 - Qwen3 - Qwen3 MoE - Qwen2.5 VL text path - Qwen3 VL text path - Qwen3 VL MoE text path For the Qwen VL text towers, the same idea is applied to the indexed/interleaved MRoPE preparation: the already-indexed cos/sin tensors are prepared once before the decoder-layer loop and reused across layers. ## Tests Added a TinyLlama regression test to assert that export with subfunctions still produces a single decoder-layer ONNX function. Verified: `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto` --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces the Rotary Position Embedding (RoPE) fix, ensuring that models generate a single unified subfunction signature during ONNX export.
Models Status After Applying the Fix
Models now producing a single subfunction signature
All causal LMs tested in the associated test file are functioning correctly, except those listed below.
Models still producing two different subfunction signatures
The following models continue to emit multiple subfunction signatures and require additional investigation:
Models with issues unrelated to the RoPE fix
These models have separate problems that need to be addressed independently: