Skip to content

[QEff.finetuning] Hf config update#795

Merged
quic-akuruvil merged 52 commits intoquic:ft_experimentalfrom
tchawada:final_hf
Mar 4, 2026
Merged

[QEff.finetuning] Hf config update#795
quic-akuruvil merged 52 commits intoquic:ft_experimentalfrom
tchawada:final_hf

Conversation

@tchawada
Copy link
Copy Markdown
Contributor

@tchawada tchawada commented Feb 17, 2026

Added QAIC validation to config_manager.py and introduced a default value for prompt_func in the Alpaca dataset configuration. Updated same in documentation.

To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q QEfficient/finetune/experimental/tests/test_integrated.py

ochougul and others added 30 commits January 6, 2026 14:57
carry over patch   quic#693

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Co-authored-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
Updating README, custom script for 2-layer instruction for Wan

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Added step wise instructions for MULTI NODE Finetuning.

---------

Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Add support for multi-node Distributed Data Parallel (DDP) training to
the QEfficient finetuning pipeline. This enables scaling training across
multiple nodes while keeping the existing single-node behavior
unchanged.

Commands for DDP across 2 servers:
For the Master Addr or the Primary Machine, use node-rank as 0:
QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=2 --nproc-per-node=4
--seed 0 --node-rank=0 --master_addr=<MASTER_NODE_IP> --master_port=8000
-m QEfficient.cloud.finetune --device qaic --enable_ddp --model_name
"meta-llama/Llama-3.2-1B" --dataset alpaca_dataset --train_batch_size 1
--val_batch_size 1 --num_epochs 1 --max_train_step 200 --max_eval_step
50

For Node 1, use node-rank as 1:
QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes=2 --nproc-per-node=4
--seed 0 --node-rank=1 --master_addr=<MASTER_NODE_IP> --master_port=8000
-m QEfficient.cloud.finetune --device qaic --enable_ddp --model_name
"meta-llama/Llama-3.2-1B" --dataset alpaca_dataset --train_batch_size 1
--val_batch_size 1 --num_epochs 1 --max_train_step 200 --max_eval_step
50

---------

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
QEfficient should ignore providing `-mdp-load-partition-config` when
`-mdp-dump-partition-config` is provided in compiler_options of compile
API.

---------

Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Handled the edge case where num samples in a dataset are less than 20.
Corrected the dataset link in grammar_dataset.py

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Since CCL is deactivated by default, the value of CCL lists (ccl_prefill
and ccl_decode) should be None by default. In infer.py script the value
of these lists wasn't None and it caused the problem of ccl activation
by default. In this PR we addressed this issue.

---------

Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
In this PR:
1) We have modified the code to support PP+DDP on multi-server setup
2) Added preprocessing file for grammar dataset
3) Modified the naming convention for output dir to include the node
rank of the server

---------

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Added default NPI file for Gemma3. 

1. Eliminates the need to provide NPI file as an extra argument by user.
NPI file added as default, no need to provide it explicitly in the
example script

---------

Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com>
Co-authored-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Co-authored-by: Amit Raj <amitraj@qti.qualcomm.com>
…WQ and FP8 models. (quic#735)

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Removed OpenGVLab/InternVL2_5-1B and OpenGVLab/InternVL3_5-1B test due
to a compiler issue to unblock the CI

---------

Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Updated Qeff version to mainline

---------

Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Reverts quic#741

Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
The decode‑only GPT‑OSS model was failing when executing subfunctions
due to somehow considering a dynamic dim value during reduced‑sum
calculation. This caused incorrect tensor reduction and resulted in
compilation errors.
The fix replaces the reduction logic with an einsum-based computation,
ensuring stable and deterministic summation regardless of dimension
shape.

---------

Signed-off-by: asmigosw <asmigosw@qti.qualcomm.com>
- updated the random sampling gold text, ids for InternVL2_5-1B

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Support to skip export, compilation if qpc already exists
 - Updated Flux, wan configs, pipelines with qpc_path changes

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
The SW issue came with prompt + generation length > SW.

Fix
1. Cache updated with HybridSlidingWindowCache in cache utils

---------

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Fix gemma3 to support cb with new SW code

Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
This PR fixes subfunction-based export issues for the following models:

1. `bigcode/starcoder`  
2. `ibm-granite/granite-20b-code-base-8k`  
3. `ibm-granite/granite-20b-code-instruct-8k`  
4. `Qwen3-30B-A3B-Instruct-2507`  
5. `Mixtral-8x7B`

In addition, it updates the Causal LM subfunction test file to make it
more robust and resilient across models.

---------

Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Updated the mainline version to 1.22.0.dev0

Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
qaic-exec is going to be deprecated. Updated the code to use new
qaic-compile for compile API.

---------

Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
- skip subfn handling in export utils for diffusers, we handle this in
export() of diffuser models

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
@quic-akuruvil
Copy link
Copy Markdown
Contributor

quic-akuruvil commented Feb 26, 2026

Added QAIC validation to config_manager.py and introduced a default value for prompt_func in the Alpaca dataset configuration. Updated same in documentation.

To run integrated_test for DDP use following command: QAIC_VISIBLE_DEVICES=20,21 torchrun --nproc-per-node=2 -m pytest -q /home/tchawada/optimizer_task/final/QEff_tanisha/QEfficient/finetune/experimental/tests/test_integrated.py

Please use Qefficient relative path here and remove the absolute local path. QEfficient/finetune/experimental/tests/test_integrated.py

logger.warning(f"Training loss: {train_result.training_loss:.4f}")
logger.warning(f"Evaluation loss: {eval_result['eval_loss']:.4f}")
assert abs(train_result.training_loss - eval_result["eval_loss"]) < TRAIN_EVAL_EPOCH_LOSS_DIFF_THRESHOLD

Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we used to do, in the mainline tests, a reference match with the Reference Loss values(Reference being the previous stable SDK- added in reference_data.py). Please include that assertion too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def test_finetune_assert(


# Optimizer configuration
optimizers:
optimizer_name: "adamw"
optimizer_name: "AdamW"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using AdamW everywhere in code as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

SEQ_CLS = "AutoModelForSequenceClassification"
SEQ_2_SEQ_LM = "AutoModelForSeq2SeqLM"


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L42 till end of file.
If these constants are only used in test cases then it will be better to move it to tests directory's constants.py file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

default=1,
metadata={"help": "Number of workers for the DataLoader."},
)
remove_samples_with_empty_columns: bool = field(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an additional argument. In our case we will need this always. Any reason to expose it to user?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dataset file we are taking this from kwargs, so I aaded this, should I remove this?

Comment thread QEfficient/finetune/experimental/core/config_manager.py
Comment thread QEfficient/finetune/experimental/tests/test_config.yaml
Comment thread QEfficient/finetune/experimental/tests/test_dataset.py
Comment thread QEfficient/finetune/experimental/tests/test_integrated.py Outdated
Comment thread QEfficient/utils/device_utils.py
Comment thread QEfficient/cloud/finetune_experimental.py
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a sample config here. Add in comments description of the config, eg: sft_single_device_config.yaml in QEfficient/finetune/experimental/configs/sample_config.yaml

tchawada added 3 commits March 3, 2026 16:05
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Comment thread docs/source/hf_finetune.md
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Comment thread docs/source/hf_finetune.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this(infer) tested and verified? @tchawada

tchawada added 2 commits March 4, 2026 14:58
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
@quic-akuruvil quic-akuruvil merged commit 0c49669 into quic:ft_experimental Mar 4, 2026
3 checks passed
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 6, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 6, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 6, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 8, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 8, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 8, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>


Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 9, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 10, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 17, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 23, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 23, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 23, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 24, 2026
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.