[QEff. Finetune]: Added Base dataset class and SFT dataset classes along with its test cases.#647
Merged
quic-meetkuma merged 5 commits intoquic:ft_experimentalfrom Dec 5, 2025
Conversation
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Updated the dataset.py file to only enable support for SFTDataset types. Created test file to check the functionalities accordingly. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-meetkuma
requested changes
Dec 2, 2025
Contributor
quic-meetkuma
left a comment
There was a problem hiding this comment.
Overall it has really good changes and extended test cases. Some minor polishing is needed before merge. Thanks. :)
| raise RuntimeError("Either provide completion_template or completion_func in the config.") | ||
|
|
||
| # Call parent class __init__ which will call _initialize_dataset | ||
| super().__init__(dataset_name, split, seed, **kwargs) |
| self.dataset = splitted_dataset["train"] | ||
| else: | ||
| # Load dataset from HuggingFace | ||
| db = load_dataset_builder(self.dataset_name) |
Contributor
There was a problem hiding this comment.
This is good addition over load_dataset.
Reduced the use of MagicMock to create dataset to a minimal level. Couldn't find a dummy HF dataset for SFT task so using a dummy dataset for that purpose. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-meetkuma
requested changes
Dec 5, 2025
Contributor
quic-meetkuma
left a comment
There was a problem hiding this comment.
Minor comments. Post that we can merge. Really good suite of testcases which covers almost all the cases of SFTDataset class.
Thanks
Moved apply_train_test_split to dataset_utils.py now. Additional check for JSON file path validity was added and test was added for it as well. _setup_template method doesn't modify self.dataset directly, same for apply_train_test_split. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-meetkuma
approved these changes
Dec 5, 2025
Contributor
quic-meetkuma
left a comment
There was a problem hiding this comment.
LGTM. Let us merge it.
quic-dhirajku
added a commit
to quic-dhirajku/efficient-transformers
that referenced
this pull request
Jan 2, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-dhirajku
added a commit
that referenced
this pull request
Jan 2, 2026
…ong with its test cases. (#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Jan 16, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Feb 4, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Feb 4, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Feb 4, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Feb 4, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Feb 5, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
tchawada
pushed a commit
to tchawada/QEff_tanisha
that referenced
this pull request
Feb 5, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-swatia
pushed a commit
to quic-swatia/efficient-transformers
that referenced
this pull request
Feb 9, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-akuruvil
pushed a commit
to quic-akuruvil/efficient_transformers
that referenced
this pull request
Feb 9, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-akuruvil
pushed a commit
to quic-akuruvil/efficient_transformers
that referenced
this pull request
Feb 16, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
quic-swatia
pushed a commit
to quic-swatia/efficient-transformers
that referenced
this pull request
Feb 26, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 6, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 6, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 6, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 8, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 8, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 8, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com> Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 9, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 10, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 17, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
smedhe
pushed a commit
to smedhe/QEff_Sharvari
that referenced
this pull request
Mar 18, 2026
…ong with its test cases. (quic#647) Edited the SFTDataset class to enable custom dataset loading. Updated the dataset.py file to only enable support for SFTDataset type. Created test file to check the functionalities. --------- Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.