Skip to content

[QEff. Finetune]: Added Base dataset class and SFT dataset classes along with its test cases.#647

Merged
quic-meetkuma merged 5 commits intoquic:ft_experimentalfrom
quic-dhirajku:ft_datasets
Dec 5, 2025
Merged

[QEff. Finetune]: Added Base dataset class and SFT dataset classes along with its test cases.#647
quic-meetkuma merged 5 commits intoquic:ft_experimentalfrom
quic-dhirajku:ft_datasets

Conversation

@quic-dhirajku
Copy link
Copy Markdown
Contributor

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Updated the dataset.py file to only enable support for SFTDataset types.
Created test file to check the functionalities accordingly.

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
@quic-meetkuma quic-meetkuma changed the title Ft datasets [QEff. Finetune]: Added Base dataset class and SFT dataset classes along with its test cases. Dec 2, 2025
Comment thread QEfficient/finetune/experimental/tests/test_dataset.py Outdated
Copy link
Copy Markdown
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it has really good changes and extended test cases. Some minor polishing is needed before merge. Thanks. :)

Comment thread QEfficient/finetune/experimental/core/dataset.py Outdated
Comment thread QEfficient/finetune/experimental/core/dataset.py Outdated
Comment thread QEfficient/finetune/experimental/core/dataset.py
raise RuntimeError("Either provide completion_template or completion_func in the config.")

# Call parent class __init__ which will call _initialize_dataset
super().__init__(dataset_name, split, seed, **kwargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good cleanup in init

self.dataset = splitted_dataset["train"]
else:
# Load dataset from HuggingFace
db = load_dataset_builder(self.dataset_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good addition over load_dataset.

Comment thread QEfficient/finetune/experimental/tests/test_dataset.py Outdated
Comment thread QEfficient/finetune/experimental/tests/test_dataset.py Outdated
Comment thread QEfficient/finetune/experimental/tests/test_dataset.py Outdated
Comment thread QEfficient/finetune/experimental/tests/test_dataset.py Outdated
Comment thread QEfficient/finetune/experimental/tests/test_dataset.py Outdated
Reduced the use of MagicMock to create dataset to a minimal level.
Couldn't find a dummy HF dataset for SFT task so using a dummy dataset for that purpose.

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Copy link
Copy Markdown
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Post that we can merge. Really good suite of testcases which covers almost all the cases of SFTDataset class.

Thanks

Comment thread QEfficient/finetune/experimental/core/dataset.py
Comment thread QEfficient/finetune/experimental/core/dataset.py Outdated
Comment thread QEfficient/finetune/experimental/core/dataset.py Outdated
Comment thread QEfficient/finetune/experimental/core/dataset.py Outdated
Moved apply_train_test_split to dataset_utils.py now.
Additional check for JSON file path validity was added and test was added for it as well.
_setup_template method doesn't modify self.dataset directly, same for apply_train_test_split.

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Copy link
Copy Markdown
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let us merge it.

Copy link
Copy Markdown
Contributor

@quic-akuruvil quic-akuruvil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@quic-meetkuma quic-meetkuma merged commit 5cd3fd1 into quic:ft_experimental Dec 5, 2025
3 of 4 checks passed
quic-dhirajku added a commit to quic-dhirajku/efficient-transformers that referenced this pull request Jan 2, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-dhirajku added a commit that referenced this pull request Jan 2, 2026
…ong with its test cases. (#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Jan 16, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Feb 4, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Feb 4, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Feb 4, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Feb 4, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Feb 5, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada pushed a commit to tchawada/QEff_tanisha that referenced this pull request Feb 5, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-swatia pushed a commit to quic-swatia/efficient-transformers that referenced this pull request Feb 9, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-akuruvil pushed a commit to quic-akuruvil/efficient_transformers that referenced this pull request Feb 9, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-akuruvil pushed a commit to quic-akuruvil/efficient_transformers that referenced this pull request Feb 16, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-swatia pushed a commit to quic-swatia/efficient-transformers that referenced this pull request Feb 26, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 6, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 6, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 6, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 8, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 8, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 8, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 9, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 10, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 17, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe pushed a commit to smedhe/QEff_Sharvari that referenced this pull request Mar 18, 2026
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants