Ft experimental v1 from quic-akuruvil by quic-akuruvil · Pull Request #97 · qraniumcitest/efficient-transformers

quic-akuruvil · 2026-03-30T11:40:49Z

No description provided.

In this PR, I have created 3 test pipelines: dummy model execution, few layers execution, and full layer model execution. --------- Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

Model: Qwen/Qwen3-VL-30B-A3B-Instruct Adding changes to fix disagg mode 3 qpc output issue Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>

@anujgupt-github

…code/Vision/Encoder/Embedding) (quic#904) ## Summary The backend compiler team requested a new specializations.json format where each entry carries a meaningful graph name (e.g. "Prefill", "Decode") ## Changes - **`QEfficient/utils/_utils.py`** — new `_infer_specialization_name()` and `to_named_specializations()` helpers - **`QEfficient/base/modeling_qeff.py`** — `_compile()` uses new format - **`QEfficient/compile/qnn_compiler.py`** — QNN path uses new format - **`QEfficient/compile/compile_helper.py`** — legacy `create_and_dump_specializations()` uses new format ## Name inference rules | Keys present | Assigned name | |---|---| | `vision_size` / `img_size` / `grid_*`, no `seq_len` | `Vision` | | `encoder_ctx_len`, no `seq_len` | `Encoder` | | `sequence_length`, no `seq_len` | `Embedding` | | `seq_len != 1` | `Prefill` | | `seq_len == 1` | `Decode` | | anything else | `Graph_N` | ## Testing 21-unit tests added to `tests/unit_test/models/test_model_quickcheck.py` covering causal LM, continuous batching, VLM vision/language, Whisper, encoder/decoder, text embedding, and end-to-end JSON roundtrip. cc: @anujgupt-github @quic-rishinr --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>

…bfunctions for cached text models (quic#928) ## Summary This change moves layer-invariant RoPE cos/sin indexing out of repeated decoder-layer subfunctions and into model-level forward paths. For cached decoder models, we were repeatedly doing: ``` cos = cos[position_ids].unsqueeze(1) sin = sin[position_ids].unsqueeze(1) ``` inside each decoder attention block. With ONNX subfunctions enabled, that indexing becomes part of the exported repeated subfunction body and contributes to the on-device regression we observed after the single-subfunction Rope Fix work quic#880 . This patch hoists that work once per forward pass and passes the already-shaped cos/sin tensors into each decoder layer. ## What changed Applied the refactor to the applicable QEff model families that thread static cached RoPE tensors through repeated decoder layers, including: - Llama - Llama SwiftKV - Gemma - Gemma2 - Mistral - Falcon - GPT-OSS - Granite - GraniteMoE - Mllama text path - Mixtral - Olmo2 - Phi3 - Qwen2 - Qwen3 - Qwen3 MoE - Qwen2.5 VL text path - Qwen3 VL text path - Qwen3 VL MoE text path For the Qwen VL text towers, the same idea is applied to the indexed/interleaved MRoPE preparation: the already-indexed cos/sin tensors are prepared once before the decoder-layer loop and reused across layers. ## Tests Added a TinyLlama regression test to assert that export with subfunctions still produces a single decoder-layer ONNX function. Verified: `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto` --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com> Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com> Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>

… v4.57.3 compatibility (quic#933) ## Problem After the transformers v4.57.3 rebase (commit 7acb860), the following error occurs: TypeError: QEffLlama4VisionModel.forward() got an unexpected keyword argument 'vision_feature_layer' ## Root Cause Analysis 1. In transformers v4.57.3, Llama4ForConditionalGeneration.get_image_features() now passes additional parameters (vision_feature_layer and vision_feature_select_strategy) to the vision model's forward method via **kwargs. 2. The call chain: - QEffLlama4EncoderWrapper.forward() calls self.model.get_image_features() - get_image_features() (from transformers library) calls self.vision_model(**kwargs) - QEffLlama4VisionModel.forward() was not accepting **kwargs 3. Since QEffLlama4VisionModel overrides the forward() method but didn't accept **kwargs, it raised a TypeError when these unexpected arguments were passed. ## Solution Added **kwargs parameter to QEffLlama4VisionModel.forward() method signature. This allows the method to accept vision_feature_layer and vision_feature_select_strategy parameters from the parent class's get_image_features() method, even though they're not used in the QEff implementation. ## Impact - Backward compatible change - Fixes ONNX export failures for Llama4 vision models - Maintains compatibility with transformers v4.57.3 API Tested with: examples/image_text_to_text/models/llama4/single_image.py Signed-off-by: sudheepm <sudheepm@qti.qualcomm.com> Co-authored-by: sudheepm <sudheepm@qti.qualcomm.com>

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>

- Added a logger which will log onto console and file. This code is similar to existing QEff. Finetuning logger code. - Also added dist_utils which serves as utility code when dealing with distributed training. - Added logger test cases for sanity checks. --------- Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

cherry picking PRs- 697,658,667,666,656,652,647,649,645 --------- Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com> Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com> Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com> Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

…c#872) we are only cherry-picking PR-787, 791,813,795, skipping rebasing PR 785, cherry-picking experimental related branches from PR 692,747 --------- Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com> Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com> Co-authored-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com> Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>