Fix: Allow the user to select whether to drop the last sentence#51
Merged
Fix: Allow the user to select whether to drop the last sentence#51
Conversation
abhinadduri
reviewed
Jul 23, 2025
Collaborator
abhinadduri
left a comment
There was a problem hiding this comment.
to avoid dropping the same cell sentence each time, can we randomize the order of the mini batch each epoch?
Collaborator
Author
As I understand, after merging #50 , we're creating new batches every epoch. So the contents of the last partial batch would change every epoch. Therefore I don't think we're dropping the same set of sentences every epoch. But to make sure, I could add a unit test to check this. |
abhinadduri
approved these changes
Aug 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a configurable
drop_lastparameter to thePerturbationDataModuleclass, providing users with control over whether incomplete sentence sets should be dropped during data loading.Changes
New parameter: Added
drop_last: bool = Falseparameter toPerturbationDataModule.__init__()Documentation: Updated docstring to describe the new parameter's behavior
Implementation: The parameter now controls the
drop_lastbehavior inPerturbationBatchSamplerinstead of being hardcoded to FalseBehavior
When
drop_last=True, sentence sets that are smaller thancell_sentence_lenwill be dropped during data loading.The default value remains
Falseto preserve backward compatibility, meaning incomplete sentence sets will be kept by default.This PR passes all existing unit tests