Skip to content

MTF optimize dataloading#298

Merged
thomasw21 merged 7 commits intothomas/mtf_train_scriptfrom
thomas/mtf_optimize_dataloading
Jul 4, 2022
Merged

MTF optimize dataloading#298
thomasw21 merged 7 commits intothomas/mtf_train_scriptfrom
thomas/mtf_optimize_dataloading

Conversation

@thomasw21
Copy link
Member

No description provided.

 - Create size API for MTF dataset
 - Use new size API to build packed index much faster
@thomasw21 thomasw21 changed the base branch from main to thomas/mtf_train_script July 3, 2022 21:20
@thomasw21 thomasw21 marked this pull request as ready for review July 3, 2022 22:08
@thomasw21 thomasw21 requested a review from Muennighoff July 3, 2022 22:12
@Muennighoff
Copy link
Collaborator

Nice, re-creating an indexed training dataset with this branch to compare to the previous one

Copy link
Collaborator

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great it reduces from 247min to 8min for me for the P3T0 training set.

@thomasw21
Copy link
Member Author

Let's go!

@thomasw21 thomasw21 merged commit f2df771 into thomas/mtf_train_script Jul 4, 2022
@thomasw21 thomasw21 deleted the thomas/mtf_optimize_dataloading branch July 4, 2022 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants