Skip to content

Fix: Allow the user to select whether to drop the last sentence#51

Merged
Rive-001 merged 5 commits intomainfrom
fix/drop-last-batch
Aug 1, 2025
Merged

Fix: Allow the user to select whether to drop the last sentence#51
Rive-001 merged 5 commits intomainfrom
fix/drop-last-batch

Conversation

@Rive-001
Copy link
Collaborator

This PR adds a configurable drop_last parameter to the PerturbationDataModule class, providing users with control over whether incomplete sentence sets should be dropped during data loading.

Changes

New parameter: Added drop_last: bool = False parameter to PerturbationDataModule.__init__()
Documentation: Updated docstring to describe the new parameter's behavior
Implementation: The parameter now controls the drop_last behavior in PerturbationBatchSampler instead of being hardcoded to False

Behavior

When drop_last=True, sentence sets that are smaller than cell_sentence_len will be dropped during data loading.

The default value remains False to preserve backward compatibility, meaning incomplete sentence sets will be kept by default.

This PR passes all existing unit tests

@Rive-001 Rive-001 requested review from abhinadduri and beabevi July 21, 2025 20:20
@Rive-001 Rive-001 self-assigned this Jul 21, 2025
Copy link
Collaborator

@abhinadduri abhinadduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid dropping the same cell sentence each time, can we randomize the order of the mini batch each epoch?

@Rive-001
Copy link
Collaborator Author

to avoid dropping the same cell sentence each time, can we randomize the order of the mini batch each epoch?

As I understand, after merging #50 , we're creating new batches every epoch. So the contents of the last partial batch would change every epoch. Therefore I don't think we're dropping the same set of sentences every epoch. But to make sure, I could add a unit test to check this.

@Rive-001 Rive-001 requested a review from abhinadduri July 23, 2025 17:25
@Rive-001 Rive-001 merged commit 0255efe into main Aug 1, 2025
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants