Skip to content

[booster] gemini plugin support shard checkpoint#3610

Merged
ver217 merged 28 commits intohpcaitech:mainfrom
flybird11111:main
May 5, 2023
Merged

[booster] gemini plugin support shard checkpoint#3610
ver217 merged 28 commits intohpcaitech:mainfrom
flybird11111:main

Conversation

@flybird11111
Copy link
Copy Markdown
Contributor

@flybird11111 flybird11111 commented Apr 20, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 #3609

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

gemini plugin support shard checkpoint to avoid large checkpoint files.

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

Comment thread colossalai/booster/plugin/gemini_plugin.py Outdated
Comment thread colossalai/checkpoint_io/utils.py Outdated
Comment thread colossalai/booster/plugin/gemini_plugin.py Outdated
Comment thread tests/test_zero/test_gemini/test_zeroddp_state_dict_shard.py Outdated
Comment thread colossalai/booster/plugin/gemini_plugin.py Outdated
Comment thread colossalai/checkpoint_io/general_checkpoint_io.py Outdated
Comment thread colossalai/booster/plugin/gemini_plugin.py Outdated
@flybird11111 flybird11111 changed the title gemini plugin support shard checkpoint [API Refactoring]gemini plugin support shard checkpoint Apr 24, 2023
@ver217 ver217 added Run Build and Test API related to API changes labels Apr 25, 2023
@ver217 ver217 changed the title [API Refactoring]gemini plugin support shard checkpoint [booster] gemini plugin support shard checkpoint Apr 25, 2023
Comment thread tests/test_checkpoint_io/test_general_checkpoint_io.py Outdated
Comment thread colossalai/checkpoint_io/utils.py
Comment thread tests/test_checkpoint_io/test_general_checkpoint_io.py
Comment thread colossalai/checkpoint_io/general_checkpoint_io.py Outdated
Comment thread colossalai/checkpoint_io/general_checkpoint_io.py Outdated
Comment thread examples/language/gpt/gemini/train_gpt_demo.py Outdated
Comment thread colossalai/checkpoint_io/general_checkpoint_io.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2023

The code coverage for the changed files is 85%.

Click me to view the complete report
Name                                                           Stmts   Miss  Cover
----------------------------------------------------------------------------------
colossalai/booster/plugin/gemini_plugin.py                       115     16    86%
colossalai/checkpoint_io/checkpoint_io_base.py                    68     16    76%
colossalai/checkpoint_io/general_checkpoint_io.py                 69      7    90%
colossalai/checkpoint_io/index_file.py                            65     16    75%
colossalai/checkpoint_io/utils.py                                157     33    79%
colossalai/zero/gemini/gemini_ddp.py                             400     66    84%
tests/test_checkpoint_io/test_general_checkpoint_io.py           141      3    98%
tests/test_zero/test_gemini/test_zeroddp_state_dict_shard.py      42      1    98%
----------------------------------------------------------------------------------
TOTAL                                                           1057    158    85%

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2023

The code coverage for the changed files is 85%.

Click me to view the complete report
Name                                                           Stmts   Miss  Cover
----------------------------------------------------------------------------------
colossalai/booster/plugin/gemini_plugin.py                       115     16    86%
colossalai/checkpoint_io/checkpoint_io_base.py                    68     16    76%
colossalai/checkpoint_io/general_checkpoint_io.py                 69      7    90%
colossalai/checkpoint_io/index_file.py                            65     16    75%
colossalai/checkpoint_io/utils.py                                157     33    79%
colossalai/zero/gemini/gemini_ddp.py                             400     66    84%
tests/test_checkpoint_io/test_general_checkpoint_io.py           141      3    98%
tests/test_zero/test_gemini/test_zeroddp_state_dict_shard.py      42      1    98%
----------------------------------------------------------------------------------
TOTAL                                                           1057    158    85%

@ver217 ver217 merged commit 307894f into hpcaitech:main May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API related to API changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants