-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Adding the new feature of FPDT #6462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
c076827
fix the bug of deepspeed sequence parallel working with batch size la…
1b8a8c1
Merge branch 'master' into master
samadejacobs ed34e89
apply yapf formatting
89b119e
Formatting fixes
loadams 7db5798
Merge branch 'microsoft:master' into master
YJHMITWEB 0beff24
add FPDT
4522ed7
Merge branch 'master' into master
YJHMITWEB c15d1d8
Merge branch 'master' into master
tjruwase 69f3892
modify streams
8ef9f5a
modify streams
b43c5ec
Merge branch 'master' into master
loadams a55d1f5
remove duplication of alltoall
1cbd59d
Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed
6bfd76f
remove duplication of pos
4eeadca
fix format
8994991
Merge branch 'master' into master
tohtana 128286c
fix format and add unit test for fpdt
386f606
Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed
ebea5b0
add einops
5c8eec8
add flashattn
a7e175a
Merge branch 'master' into master
tohtana 764a572
add requirements for flash-attn in FPDT
14e91b8
Merge branch 'master' of github.com:YJHMITWEB/DeepSpeed
77dcd38
Merge branch 'master' into master
tohtana 7a5c29c
Merge branch 'master' into master
tohtana 534cb93
skip test when fa is unavailable
972ddda
formatting
37bc694
add workflow to run a6000 tests
8f8aaa0
Merge branch 'master' into FPDT
ac7baf6
revert world sizes for tests
0d2b624
Merge pull request #1 from YJHMITWEB/tohtana/merge_FPDT
tohtana 8935529
update workflow
edd2e05
update image version
464d117
remove --no-build-isolation
7389f66
remove requirements file for flash-attn
5f859be
remove flash-attn requirements from setup.py
56cb647
fix pip command
164f459
modify unit test for fpdt
3eb816d
modify unit test for fpdt
2ae68dc
modify unit test for fpdt
b1b2688
modify unit test for fpdt
67aa3df
modify unit test for fpdt
42461d2
modify unit test for fpdt
d637d60
modify unit test for fpdt
907c79d
modify unit test for fpdt
8f5d039
modify unit test for fpdt
02c2fbf
modify unit test for fpdt
f570213
modify unit test for fpdt
5b8c419
add condition for using fpdt offloading
bd090c8
add condition for using fpdt offloading
e48e85b
add flash-attn version check
af24777
Merge branch 'master' into master
tohtana ebaf56c
add unit test directory as test trigger
9e811b8
add cron for test and reporting for nightly CI failures
a7522da
add multiGPU fpdt unit test
209adab
add multiGPU fpdt unit test
dbeea8a
add multiGPU fpdt unit test
845e42d
add multiGPU fpdt unit test
8b2549c
add multiGPU fpdt unit test
058c973
add multiGPU fpdt unit test
0dcc234
add multiGPU fpdt unit test
d1be5d3
add multiGPU fpdt unit test
3a0feba
add multiGPU fpdt unit test
8c57812
add multiGPU fpdt unit test
43decf6
add multiGPU fpdt unit test
d39585c
add multiGPU fpdt unit test
389b1a3
add multiGPU fpdt unit test
958f3bf
add multiGPU fpdt unit test
af025c5
add multiGPU fpdt unit test
2230377
Merge branch 'master' into master
tohtana 7636690
Merge branch 'master' into master
tohtana 10cdc5f
Merge branch 'master' into master
loadams b2175a4
Merge branch 'master' into master
loadams 1e801ce
Merge branch 'master' into master
loadams File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| name: nv-flash-attn | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| pull_request: | ||
| paths: | ||
| - 'deepspeed/sequence/**' | ||
| - 'tests/unit/sequence_parallelism/**' | ||
| - '.github/workflows/nv-flash-attn.yml' | ||
| schedule: | ||
| - cron: "0 0 * * *" | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
loadams marked this conversation as resolved.
|
||
|
|
||
| jobs: | ||
| unit-tests: | ||
| runs-on: [self-hosted, nvidia, a6000] | ||
| container: | ||
| image: nvcr.io/nvidia/pytorch:24.03-py3 | ||
| ports: | ||
| - 80 | ||
| options: --gpus all --shm-size "8G" | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Check container state | ||
| run: | | ||
| ldd --version | ||
| nvcc --version | ||
| nvidia-smi | ||
| python -c "import torch; print('torch:', torch.__version__, torch)" | ||
| python -c "import torch; print('CUDA available:', torch.cuda.is_available())" | ||
| - name: Install transformers | ||
| run: | | ||
| git clone --depth=1 https://github.com/huggingface/transformers | ||
| cd transformers | ||
| git rev-parse --short HEAD | ||
| python -m pip install . | ||
| - name: Install deepspeed | ||
| run: | | ||
| python -m pip install .[dev] | ||
| ds_report | ||
| - name: Install FlashAttention | ||
| run: | | ||
| python -m pip install flash-attn | ||
| - name: Python environment | ||
| run: | | ||
| python -m pip list | ||
| - name: Unit tests | ||
| run: | | ||
| unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch | ||
| cd tests | ||
| python -m pytest --color=yes --durations=0 --verbose -rF unit/sequence_parallelism/test_ulysses.py --torch_ver="2.3" --cuda_ver="12" | ||
| - name: Open GitHub issue if nightly CI fails | ||
| if: ${{ failure() && (github.event_name == 'schedule') }} | ||
| uses: JasonEtco/create-an-issue@v2 | ||
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| with: | ||
| filename: .github/ISSUE_TEMPLATE/ci_failure_report.md | ||
| update_existing: true | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.