Skip to content

Ft experimental v1 from quic-akuruvil#97

Open
quic-akuruvil wants to merge 33 commits intoqraniumcitest:mainfrom
quic-akuruvil:ft_experimental_v1
Open

Ft experimental v1 from quic-akuruvil#97
quic-akuruvil wants to merge 33 commits intoqraniumcitest:mainfrom
quic-akuruvil:ft_experimental_v1

Conversation

@quic-akuruvil
Copy link
Copy Markdown

No description provided.

@quic-akuruvil quic-akuruvil force-pushed the ft_experimental_v1 branch 3 times, most recently from e9e7a7f to c5b43e5 Compare April 16, 2026 10:38
abukhoy and others added 4 commits April 17, 2026 13:32
In this PR,
I have created 3 test pipelines: dummy model execution, few layers
execution, and full layer model execution.

---------

Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Model: Qwen/Qwen3-VL-30B-A3B-Instruct

Adding changes to fix disagg mode 3 qpc output issue

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
…code/Vision/Encoder/Embedding) (quic#904)

## Summary
The backend compiler team requested a new specializations.json format
where each entry carries a meaningful graph name (e.g. "Prefill",
"Decode")

  ## Changes
- **`QEfficient/utils/_utils.py`** — new `_infer_specialization_name()`
and `to_named_specializations()` helpers
- **`QEfficient/base/modeling_qeff.py`** — `_compile()` uses new format
  - **`QEfficient/compile/qnn_compiler.py`** — QNN path uses new format
- **`QEfficient/compile/compile_helper.py`** — legacy
`create_and_dump_specializations()` uses new format

  ## Name inference rules
  | Keys present | Assigned name |
  |---|---|
  | `vision_size` / `img_size` / `grid_*`, no `seq_len` | `Vision` |
  | `encoder_ctx_len`, no `seq_len` | `Encoder` |
  | `sequence_length`, no `seq_len` | `Embedding` |
  | `seq_len != 1` | `Prefill` |
  | `seq_len == 1` | `Decode` |
  | anything else | `Graph_N` |

## Testing
21-unit tests added to `tests/unit_test/models/test_model_quickcheck.py`
covering causal LM, continuous batching, VLM vision/language, Whisper,
encoder/decoder, text embedding, and end-to-end JSON roundtrip.

cc: @anujgupt-github @quic-rishinr

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
…bfunctions for cached text models (quic#928)

## Summary

This change moves layer-invariant RoPE cos/sin indexing out of repeated
decoder-layer subfunctions and into model-level forward paths.

For cached decoder models, we were repeatedly doing:

```
cos = cos[position_ids].unsqueeze(1)
sin = sin[position_ids].unsqueeze(1)
```

inside each decoder attention block. With ONNX subfunctions enabled,
that indexing becomes part of the exported repeated subfunction body and
contributes to the on-device regression we observed after the
single-subfunction Rope Fix work quic#880 .

This patch hoists that work once per forward pass and passes the
already-shaped cos/sin tensors into each decoder layer.

## What changed

Applied the refactor to the applicable QEff model families that thread
static cached RoPE tensors through repeated decoder layers, including:

- Llama
- Llama SwiftKV
- Gemma
- Gemma2
- Mistral
- Falcon
- GPT-OSS
- Granite
- GraniteMoE
- Mllama text path
- Mixtral
- Olmo2
- Phi3
- Qwen2
- Qwen3
- Qwen3 MoE
- Qwen2.5 VL text path
- Qwen3 VL text path
- Qwen3 VL MoE text path

For the Qwen VL text towers, the same idea is applied to the
indexed/interleaved MRoPE preparation: the already-indexed cos/sin
tensors are prepared once before the decoder-layer loop and reused
across layers.

## Tests

Added a TinyLlama regression test to assert that export with
subfunctions still produces a single decoder-layer ONNX function.

Verified:

`python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n
auto`

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
sudheepm-wq and others added 22 commits April 24, 2026 11:08
… v4.57.3 compatibility (quic#933)

## Problem
After the transformers v4.57.3 rebase (commit 7acb860), the following
error occurs:
TypeError: QEffLlama4VisionModel.forward() got an unexpected keyword
argument 'vision_feature_layer'

## Root Cause Analysis
1. In transformers v4.57.3,
Llama4ForConditionalGeneration.get_image_features()
now passes additional parameters (vision_feature_layer and
vision_feature_select_strategy)
   to the vision model's forward method via **kwargs.

2. The call chain:
- QEffLlama4EncoderWrapper.forward() calls
self.model.get_image_features()
- get_image_features() (from transformers library) calls
self.vision_model(**kwargs)
   - QEffLlama4VisionModel.forward() was not accepting **kwargs

3. Since QEffLlama4VisionModel overrides the forward() method but didn't
accept **kwargs,
   it raised a TypeError when these unexpected arguments were passed.

## Solution
Added **kwargs parameter to QEffLlama4VisionModel.forward() method
signature.
This allows the method to accept vision_feature_layer and
vision_feature_select_strategy
parameters from the parent class's get_image_features() method, even
though they're
not used in the QEff implementation.

## Impact
- Backward compatible change
- Fixes ONNX export failures for Llama4 vision models
- Maintains compatibility with transformers v4.57.3 API

Tested with: examples/image_text_to_text/models/llama4/single_image.py

Signed-off-by: sudheepm <sudheepm@qti.qualcomm.com>
Co-authored-by: sudheepm <sudheepm@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
- Added a logger which will log onto console and file. This code is
similar to existing QEff. Finetuning logger code.
- Also added dist_utils which serves as utility code when dealing with
distributed training.
- Added logger test cases for sanity checks.

---------

Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
cherry picking PRs- 697,658,667,666,656,652,647,649,645

---------

Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
…c#872)

we are only cherry-picking PR-787, 791,813,795, skipping rebasing PR
785, cherry-picking experimental related branches from PR 692,747

---------

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Co-authored-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Adding config file to support style remix dataset

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
…ntation (quic#893)

1) Added unit test cases for Pipeline Parallelism
2) Added documentation on how to run these tests
3) Created a constants file

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Added testcase to test compare loss and metrics for different sdks to stable sdk

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Updating PP CLI command as per latest changes in config manager
In future, this command should also be updated if any changes are done
in single SOC CLI command

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Ann Kuruvilla and others added 6 commits April 27, 2026 10:39
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Added the following support for easy visualization of training and
validation statistics:

1. train_logger callback function which captures the per epoch time, per
epoch loss metric and per epoch perplexity
2. This function also captures number of trainable parameters, number of
samples in training and eval dataset
3. All these are logged into a log file which can be given as an input
by user by setting the flag --log_file_path in the input config .yaml
file.

Signed-off-by: abhamidi <abhamidi@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Anusha V.S Bhamidipati <abhamidi@qti.qualcomm.com>
…ults in trainer config

Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants