Introduce Padding-Free Plugin to FMS-Acceleration#57
Merged
fabianlim merged 9 commits intofoundation-model-stack:mainfrom Aug 1, 2024
Merged
Introduce Padding-Free Plugin to FMS-Acceleration#57fabianlim merged 9 commits intofoundation-model-stack:mainfrom
fabianlim merged 9 commits intofoundation-model-stack:mainfrom
Conversation
3f08e09 to
decc009
Compare
fabianlim
reviewed
Jul 29, 2024
fabianlim
requested changes
Jul 29, 2024
fabianlim
reviewed
Jul 29, 2024
fabianlim
reviewed
Jul 29, 2024
Contributor
|
Make sure go through this checklist https://github.com/foundation-model-stack/fms-acceleration/tree/main/plugins/framework#adding-new-plugins For benches maybe we can think about how to make a seperate set from the current set. Since this is completely seperate from other plugins, so that we do not have to rerun all the benches everytime. This will require some changes to the benchmarking. Maybe one simple solution is to just have a difference |
71321a1 to
3238801
Compare
fabianlim
reviewed
Aug 1, 2024
fabianlim
reviewed
Aug 1, 2024
fabianlim
reviewed
Aug 1, 2024
fabianlim
reviewed
Aug 1, 2024
fabianlim
reviewed
Aug 1, 2024
915ba17 to
bff3128
Compare
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
66f9cc2 to
c9e355a
Compare
fabianlim
approved these changes
Aug 1, 2024
fabianlim
added a commit
that referenced
this pull request
Aug 2, 2024
* edits to readme Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> * Apply suggestions from code review Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com> Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> * more readme changes Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> --------- Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces support for a new padding-free plugin to FMS-Acceleration, this will allow for users to speed up their finetuning by performing attention computation without padding. This can be activated through the
sft_trainercli by passing plugin argumentpadding_free- e.g.--padding_free huggingfaceCurrently uses a fork of fms-hf-tuning to
sft_trainerargumentNote
Test
The following comparison is between a padded example and a padding free example.
We observe a 27% increase in runtime efficiency through the padding-free plugin, processing the same number of tokens
The improvement is dataset dependent as we see different performance improvements across datasets (see reference PR) possibly due to varying sequence length distributions from each dataset (longer sequences will lead to larger throughputs and more improvement).
Note:
The throughput results from SFTTrainer metrics will include the padding tokens if
padding=True(see here). Instead we use train-runtime to compare.Alpaca
Reproduce
Padded Experiment
Result
Padding-Free Experiment
Reproduce
Result