-
Notifications
You must be signed in to change notification settings - Fork 4.5k
ShardFormer/Sequence Parallelism Optimization #5533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
0593f04
sequence parallel optimization
KKZ20 fe5fac2
validate sequence parallel in llama (code to be polished)
KKZ20 ee95f94
shardformer api writing
KKZ20 98a2eeb
integrate sequence parallel in ShardFormer
KKZ20 28c11b7
fix pp bugs and sp bugs for LlaMa model
KKZ20 cd41e42
integrating ring-based sequence parallelism into ShardFormer
KKZ20 391dc64
fix bugs when useing sp and flashattention together
KKZ20 13fc14c
fix operation function name
KKZ20 83e6044
support flash attention for ulysses-style sp
KKZ20 7557691
clarify sp process group
KKZ20 9698a87
fix compatibility bugs in moe plugin
KKZ20 7a31083
fix fused linear bugs
KKZ20 74457df
fix linear layer test
KKZ20 858f55d
support gpt model all-to-all sp
KKZ20 0b115b4
modify shard data dimension (meant to be dim=-1)
KKZ20 d146040
support megtron-style sp and distributed attn for llama model
linsj20 362b5b6
finish sp mode 3 support for gpt
KKZ20 7293b16
using all_to_all_single when batch size is 1
KKZ20 65db8b2
support mode 2 sp in gpt2 (#5)
linsj20 e72bd87
polish code
KKZ20 2076bcf
enable distributed attn mask when using sp mode 2 and 3 in llama
KKZ20 bb18577
automatically enable flash attn when using sp mode 2 and 3 in llama
KKZ20 9788fd8
inplace attn mask
KKZ20 544a06d
add zero2 support for sequence parallel
KKZ20 c3d0e83
polish code
KKZ20 9f2f1fe
fix bugs
KKZ20 33963a3
fix gemini checkpoint io
KKZ20 700c26d
loose tensor checking atol and rtol
KKZ20 9a36add
add comment
KKZ20 0e0ac18
fix llama layernorm grad
KKZ20 cbb3025
fix zero grad
KKZ20 3391d3e
fix zero grad
KKZ20 cc28bd4
fix conflict
KKZ20 1a3825d
update split and gather auto grad func
KKZ20 76a22da
sequence parallel: inside text split (#6)
linsj20 7e80cc4
polish code (part 1)
KKZ20 eff6978
polish code (part 2)
KKZ20 26f7bf8
polish code (part 2.5)
KKZ20 2beac05
polish code (part 3)
linsj20 e5dcd93
polish code
KKZ20 ace07c9
fix ulysses style ZeRO
linsj20 56a5ba8
fix llama and gpt sp
KKZ20 2a30925
Merge branch 'main' into rebase/sp
KKZ20 93c958f
polish code
KKZ20 48580c7
move ulysses grad sync to ddp (#9)
linsj20 aea4fb6
remove zero_stage and unbind the grad sync for alltoall sp
KKZ20 07ae37b
add 2d group creation test
linsj20 145e879
remove useless code
KKZ20 7c31455
change shard config not to enable sp when enable_all_optimizations
KKZ20 794800a
add sp warnings for several model
KKZ20 daec9e8
remove useless code
KKZ20 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.