Skip to content

Rope Fix for a single subfunction signature#880

Merged
quic-hemagnih merged 13 commits intoquic:mainfrom
abhishek-singh591:rope_fix_subfunction
Mar 26, 2026
Merged

Rope Fix for a single subfunction signature#880
quic-hemagnih merged 13 commits intoquic:mainfrom
abhishek-singh591:rope_fix_subfunction

Conversation

@abhishek-singh591
Copy link
Copy Markdown
Contributor

@abhishek-singh591 abhishek-singh591 commented Mar 24, 2026

Summary

This PR introduces the Rotary Position Embedding (RoPE) fix, ensuring that models generate a single unified subfunction signature during ONNX export.

Models Status After Applying the Fix

Models now producing a single subfunction signature

All causal LMs tested in the associated test file are functioning correctly, except those listed below.

Models still producing two different subfunction signatures

The following models continue to emit multiple subfunction signatures and require additional investigation:

  • Phi-1
  • StarCoder2
  • CodeGen

Models with issues unrelated to the RoPE fix

These models have separate problems that need to be addressed independently:

  • Granite-MoE
  • GPT-OSS
  • Mixtral

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
ochougul and others added 7 commits March 25, 2026 09:35
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
@quic-hemagnih quic-hemagnih merged commit 895d987 into quic:main Mar 26, 2026
5 checks passed
quic-akuruvil pushed a commit to quic-akuruvil/efficient_transformers that referenced this pull request Mar 27, 2026
## Summary

This PR introduces the **Rotary Position Embedding (RoPE) fix**,
ensuring that models generate a **single unified subfunction signature**
during ONNX export.

## Models Status After Applying the Fix

### Models now producing a single subfunction signature
_All causal LMs tested in the associated test file are functioning
correctly, except those listed below._

### Models still producing **two different subfunction signatures**
The following models continue to emit multiple subfunction signatures
and require additional investigation:
- [ ] Phi-1
- [ ] StarCoder2
- [ ] CodeGen

### Models with issues **unrelated** to the RoPE fix
These models have separate problems that need to be addressed
independently:

- [ ] Granite-MoE
- [ ] GPT-OSS
- [ ] Mixtral

---------

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Co-authored-by: Onkar Chougule <ochougul@qti.qualcomm.com>
quic-rishinr pushed a commit that referenced this pull request Apr 3, 2026
… align the subfunction changes. (#882)

This PR rebases dev/rebase_transformers_v4_57_3 onto main and
consolidates our transformer rebase changes with the PR #880
subfunction/KV alignment so we keep the branch simpler and unify the
subfunction approach.

  ### What changed


  - KV/subfunction alignment:
- Applied PR #880-style wrapper changes for causal model families to
reduce divergence from mainline.
- Removed local resolve_kv_seq_len usage from remaining wrappers
(grok_1, molmo) to match the cache-native pattern used elsewhere.
- Removed now-unused helper resolve_kv_seq_len from
QEfficient/utils/_utils.py.
  - Unit test updates:
- Added a new quickcheck unit test for use_onnx_subfunctions=True that
validates decoder-block subfunction cardinality per causal model.
- Important: test counts only decoder model block functions (via
get_submodules_for_export()), not all ONNX helper functions, so the
assertion tracks the intended behavior.

  ### Decoder-block subfunction status (causal model list)

- Single decoder-block subfunction: falcon, gpt2, gptj, granite, llama,
mistral, mpt, olmo2, phi3, qwen2
- Multiple decoder-block subfunctions: codegen, gpt_oss, mixtral, phi
(Phi-1), starcoder2

  ### Tests verified

```
python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto
Result after subfunction-count + KV-helper cleanup: 75 passed, 1 skipped
```

cc: @anujgupt-github @quic-hemagnih

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Co-authored-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>
tv-karthikeya pushed a commit to qcdipankar/efficient-transformers that referenced this pull request Apr 21, 2026
## Summary

This PR introduces the **Rotary Position Embedding (RoPE) fix**,
ensuring that models generate a **single unified subfunction signature**
during ONNX export.

## Models Status After Applying the Fix

### Models now producing a single subfunction signature
_All causal LMs tested in the associated test file are functioning
correctly, except those listed below._

### Models still producing **two different subfunction signatures**
The following models continue to emit multiple subfunction signatures
and require additional investigation:
- [ ] Phi-1
- [ ] StarCoder2
- [ ] CodeGen

### Models with issues **unrelated** to the RoPE fix
These models have separate problems that need to be addressed
independently:

- [ ] Granite-MoE
- [ ] GPT-OSS
- [ ] Mixtral

---------

Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Co-authored-by: Onkar Chougule <ochougul@qti.qualcomm.com>
quic-rishinr added a commit that referenced this pull request Apr 22, 2026
…bfunctions for cached text models (#928)

## Summary

This change moves layer-invariant RoPE cos/sin indexing out of repeated
decoder-layer subfunctions and into model-level forward paths.

For cached decoder models, we were repeatedly doing:

```
cos = cos[position_ids].unsqueeze(1)
sin = sin[position_ids].unsqueeze(1)
```

inside each decoder attention block. With ONNX subfunctions enabled,
that indexing becomes part of the exported repeated subfunction body and
contributes to the on-device regression we observed after the
single-subfunction Rope Fix work #880 .

This patch hoists that work once per forward pass and passes the
already-shaped cos/sin tensors into each decoder layer.

## What changed

Applied the refactor to the applicable QEff model families that thread
static cached RoPE tensors through repeated decoder layers, including:

- Llama
- Llama SwiftKV
- Gemma
- Gemma2
- Mistral
- Falcon
- GPT-OSS
- Granite
- GraniteMoE
- Mllama text path
- Mixtral
- Olmo2
- Phi3
- Qwen2
- Qwen3
- Qwen3 MoE
- Qwen2.5 VL text path
- Qwen3 VL text path
- Qwen3 VL MoE text path

For the Qwen VL text towers, the same idea is applied to the
indexed/interleaved MRoPE preparation: the already-indexed cos/sin
tensors are prepared once before the decoder-layer loop and reused
across layers.

## Tests

Added a TinyLlama regression test to assert that export with
subfunctions still produces a single decoder-layer ONNX function.

Verified:

`python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n
auto`

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants