Skip to content

Rectify Missing Dataloader Preparation Call in PaddingFree Plugin Method#63

Merged
fabianlim merged 1 commit intofoundation-model-stack:mainfrom
achew010:fix-padding-free-collator
Aug 6, 2024
Merged

Rectify Missing Dataloader Preparation Call in PaddingFree Plugin Method#63
fabianlim merged 1 commit intofoundation-model-stack:mainfrom
achew010:fix-padding-free-collator

Conversation

@achew010
Copy link
Copy Markdown
Contributor

@achew010 achew010 commented Aug 6, 2024

Description

This PR fixes a missing dataloader preparation call in the PaddingFree Plugin that will cause num optimization steps in distributed experiments to be computed wrongly.

With this fix, the relative performance improvement between padded vs padding-free approximately matches for

  • FMS-Acceleration PaddingFree plugin, 24%
  • Natively supported padding free in Transformers, 26%

ORCA Math 20K Subset - 1 Epoch

Transformers

Model DataProcess Grad Accum Per Device Batch Size Num Devices Time Throughput (token/s) / device
Mistral-7B Padded 2 4 8 812 1216
Mistral-7B Transformers Native PaddingFree 2 4 8 596 1658

FMS-Acceleration Padding-Free Plugin

Model DataProcess Grad Accum Per Device Batch Size Num Devices Time Throughput (token/s) / device
Mistral-7B Padded 2 4 8 818 1208
Mistral-7B PaddingFree Plugin 2 4 8 621 1568

@achew010 achew010 requested a review from fabianlim as a code owner August 6, 2024 09:26
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
@achew010 achew010 force-pushed the fix-padding-free-collator branch from 9d1cb39 to 8a3e8bf Compare August 6, 2024 09:33
@fabianlim
Copy link
Copy Markdown
Contributor

fabianlim commented Aug 6, 2024

Whats the difference between "Padded-Longest" and "Transformers Padded"? If there is no difference please edit the description to make it consistent

@fabianlim fabianlim merged commit b159600 into foundation-model-stack:main Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants