Skip to content

[zero] support shard optimizer state dict of zero#4194

Merged
ver217 merged 3 commits intohpcaitech:feature/zerofrom
Gy-Lu:feature/zero
Jul 11, 2023
Merged

[zero] support shard optimizer state dict of zero#4194
ver217 merged 3 commits intohpcaitech:feature/zerofrom
Gy-Lu:feature/zero

Conversation

@Gy-Lu
Copy link
Copy Markdown
Contributor

@Gy-Lu Gy-Lu commented Jul 7, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

closed #4186

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@Gy-Lu Gy-Lu added the compatibility related to compatibility label Jul 7, 2023
@Gy-Lu Gy-Lu changed the title [zero] support shard optimizer of zero [zero] support shard optimizer state dict of zero Jul 7, 2023
@Gy-Lu Gy-Lu linked an issue Jul 7, 2023 that may be closed by this pull request
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 87%.

Click me to view the complete report
Name                                                            Stmts   Miss  Cover
-----------------------------------------------------------------------------------
colossalai/booster/booster.py                                      64     10    84%
colossalai/booster/plugin/gemini_plugin.py                        110      9    92%
colossalai/booster/plugin/low_level_zero_plugin.py                125     10    92%
colossalai/booster/plugin/plugin_base.py                           40     10    75%
colossalai/booster/plugin/torch_ddp_plugin.py                      66      2    97%
colossalai/booster/plugin/torch_fsdp_plugin.py                     97     13    87%
colossalai/zero/low_level/_utils.py                               119     66    45%
colossalai/zero/low_level/bookkeeping/bucket_store.py              51      0   100%
colossalai/zero/low_level/bookkeeping/gradient_store.py            31      1    97%
colossalai/zero/low_level/bookkeeping/parameter_store.py           16      0   100%
colossalai/zero/low_level/low_level_optim.py                      309     20    94%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py       59      6    90%
tests/test_checkpoint_io/test_low_level_zero_checkpoint_io.py      46      1    98%
tests/test_zero/test_low_level/test_grad_acc.py                    92     31    66%
tests/test_zero/test_low_level/test_zero1_2.py                    102      1    99%
tests/test_zero/test_low_level/test_zero_ckpt.py                   69      5    93%
tests/test_zero/test_low_level/test_zero_init.py                   40      4    90%
tests/test_zero/test_low_level/test_zero_tp.py                     66      1    98%
-----------------------------------------------------------------------------------
TOTAL                                                            1502    190    87%

@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is %.

Click me to view the complete report
Name                                                            Stmts   Miss  Cover
-----------------------------------------------------------------------------------
colossalai/booster/booster.py                                      64     17    73%
colossalai/booster/plugin/gemini_plugin.py                        110     60    45%
colossalai/booster/plugin/low_level_zero_plugin.py                125     10    92%
colossalai/booster/plugin/plugin_base.py                           40     10    75%
colossalai/booster/plugin/torch_ddp_plugin.py                      66     26    61%
colossalai/booster/plugin/torch_fsdp_plugin.py                     97     48    51%
colossalai/zero/low_level/_utils.py                               119     66    45%
colossalai/zero/low_level/bookkeeping/bucket_store.py              51      0   100%
colossalai/zero/low_level/bookkeeping/gradient_store.py            31      1    97%
colossalai/zero/low_level/bookkeeping/parameter_store.py           16      0   100%
colossalai/zero/low_level/low_level_optim.py                      312     20    94%
tests/test_booster/test_plugin/test_low_level_zero_plugin.py       59      6    90%
tests/test_checkpoint_io/test_low_level_zero_checkpoint_io.py      46      1    98%
tests/test_zero/test_low_level/test_grad_acc.py                    92     31    66%
tests/test_zero/test_low_level/test_zero1_2.py                    102      1    99%
tests/test_zero/test_low_level/test_zero_ckpt.py                   69      5    93%
tests/test_zero/test_low_level/test_zero_init.py                   40      4    90%
tests/test_zero/test_low_level/test_zero_tp.py                     66      1    98%
-----------------------------------------------------------------------------------
TOTAL                                                            1505    307    80%

@ver217 ver217 merged commit 2d7bc68 into hpcaitech:feature/zero Jul 11, 2023
ver217 pushed a commit to ver217/ColossalAI that referenced this pull request Jul 13, 2023
* support shard optimizer of zero

* polish code

* support sync grad manually
ver217 pushed a commit that referenced this pull request Jul 31, 2023
* support shard optimizer of zero

* polish code

* support sync grad manually
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compatibility related to compatibility

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[zero] add support for shard state dict save/load

2 participants