Skip to content

feat(rope_fix): Hoist layer-invariant RoPE indexing out of decoder subfunctions for cached text models#928

Merged
quic-rishinr merged 5 commits intoquic:mainfrom
vbaddi:dev/fix_rope_subfunctions
Apr 22, 2026
Merged

feat(rope_fix): Hoist layer-invariant RoPE indexing out of decoder subfunctions for cached text models#928
quic-rishinr merged 5 commits intoquic:mainfrom
vbaddi:dev/fix_rope_subfunctions

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Apr 20, 2026

Summary

This change moves layer-invariant RoPE cos/sin indexing out of repeated decoder-layer subfunctions and into model-level forward paths.

For cached decoder models, we were repeatedly doing:

cos = cos[position_ids].unsqueeze(1)
sin = sin[position_ids].unsqueeze(1)

inside each decoder attention block. With ONNX subfunctions enabled, that indexing becomes part of the exported repeated subfunction body and contributes to the on-device regression we observed after the single-subfunction Rope Fix work #880 .

This patch hoists that work once per forward pass and passes the already-shaped cos/sin tensors into each decoder layer.

What changed

Applied the refactor to the applicable QEff model families that thread static cached RoPE tensors through repeated decoder layers, including:

  • Llama
  • Llama SwiftKV
  • Gemma
  • Gemma2
  • Mistral
  • Falcon
  • GPT-OSS
  • Granite
  • GraniteMoE
  • Mllama text path
  • Mixtral
  • Olmo2
  • Phi3
  • Qwen2
  • Qwen3
  • Qwen3 MoE
  • Qwen2.5 VL text path
  • Qwen3 VL text path
  • Qwen3 VL MoE text path

For the Qwen VL text towers, the same idea is applied to the indexed/interleaved MRoPE preparation: the already-indexed cos/sin tensors are prepared once before the decoder-layer loop and reused across layers.

Tests

Added a TinyLlama regression test to assert that export with subfunctions still produces a single decoder-layer ONNX function.

Verified:

python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto

@vbaddi vbaddi added the enhancement New feature or request label Apr 20, 2026
vbaddi and others added 4 commits April 21, 2026 16:12
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…ding docstrings, Updated test

Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
@quic-rishinr quic-rishinr force-pushed the dev/fix_rope_subfunctions branch from 1fe8719 to 3705f65 Compare April 21, 2026 10:42
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@quic-rishinr quic-rishinr merged commit 15f8231 into quic:main Apr 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants