Skip to content

[checkpointio] sharded optimizers checkpointing feature adapted for DDP plugin#4002

Merged
ver217 merged 1 commit intohpcaitech:developfrom
Fridge003:feature/optimizer-checkpoint
Jun 16, 2023
Merged

[checkpointio] sharded optimizers checkpointing feature adapted for DDP plugin#4002
ver217 merged 1 commit intohpcaitech:developfrom
Fridge003:feature/optimizer-checkpoint

Conversation

@Fridge003
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234
#3986

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Adapted the newly developed feature: sharded optimizers checkpointing to TorchDDPPlugin.
Tidied up arguments passing in several booster IO methods.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@Fridge003 Fridge003 requested a review from ver217 June 15, 2023 10:03
Comment thread colossalai/booster/booster.py Outdated
Comment thread colossalai/booster/booster.py Outdated
Comment thread colossalai/booster/plugin/torch_ddp_plugin.py Outdated
Comment thread colossalai/booster/plugin/torch_ddp_plugin.py Outdated
Comment thread colossalai/checkpoint_io/utils.py
Comment thread colossalai/checkpoint_io/utils.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 87%.

Click me to view the complete report
Name                                                       Stmts   Miss  Cover
------------------------------------------------------------------------------
colossalai/booster/booster.py                                 64     10    84%
colossalai/booster/plugin/gemini_plugin.py                   110      9    92%
colossalai/booster/plugin/torch_ddp_plugin.py                 66      5    92%
colossalai/booster/plugin/torch_fsdp_plugin.py                97     13    87%
colossalai/checkpoint_io/checkpoint_io_base.py                66      9    86%
colossalai/checkpoint_io/general_checkpoint_io.py            102      8    92%
colossalai/checkpoint_io/index_file.py                        70     17    76%
colossalai/checkpoint_io/utils.py                            250     43    83%
colossalai/zero/gemini/gemini_ddp.py                         402     67    83%
tests/test_checkpoint_io/test_general_checkpoint_io.py       108      0   100%
tests/test_checkpoint_io/test_torch_ddp_checkpoint_io.py      55      0   100%
------------------------------------------------------------------------------
TOTAL                                                       1390    181    87%

@Fridge003 Fridge003 force-pushed the feature/optimizer-checkpoint branch from 3a3d0aa to 4d6fd73 Compare June 16, 2023 02:08
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 87%.

Click me to view the complete report
Name                                                                  Stmts   Miss  Cover
-----------------------------------------------------------------------------------------
colossalai/booster/booster.py                                            64     10    84%
colossalai/booster/mixed_precision/fp16_torch.py                         46      2    96%
colossalai/booster/mixed_precision/mixed_precision_base.py                9      1    89%
colossalai/booster/plugin/gemini_plugin.py                              110      9    92%
colossalai/booster/plugin/low_level_zero_plugin.py                       94      8    91%
colossalai/booster/plugin/plugin_base.py                                 40     10    75%
colossalai/booster/plugin/torch_ddp_plugin.py                            66      5    92%
colossalai/booster/plugin/torch_fsdp_plugin.py                           97     13    87%
colossalai/checkpoint_io/checkpoint_io_base.py                           66      9    86%
colossalai/checkpoint_io/general_checkpoint_io.py                       105      8    92%
colossalai/checkpoint_io/index_file.py                                   70     17    76%
colossalai/checkpoint_io/utils.py                                       243     43    82%
colossalai/zero/gemini/gemini_ddp.py                                    402     67    83%
tests/test_autochunk/test_autochunk_diffuser/test_autochunk_unet.py      36     10    72%
tests/test_checkpoint_io/test_general_checkpoint_io.py                  108      0   100%
tests/test_checkpoint_io/test_torch_ddp_checkpoint_io.py                 55      0   100%
-----------------------------------------------------------------------------------------
TOTAL                                                                  1611    212    87%

@Fridge003 Fridge003 force-pushed the feature/optimizer-checkpoint branch from 4d6fd73 to d0888af Compare June 16, 2023 05:37
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 84%.

Click me to view the complete report
Name                                                         Stmts   Miss  Cover
--------------------------------------------------------------------------------
colossalai/booster/booster.py                                   64     10    84%
colossalai/booster/mixed_precision/fp16_torch.py                46     24    48%
colossalai/booster/mixed_precision/mixed_precision_base.py       9      1    89%
colossalai/booster/plugin/gemini_plugin.py                     110      9    92%
colossalai/booster/plugin/low_level_zero_plugin.py              94      8    91%
colossalai/booster/plugin/plugin_base.py                        40     10    75%
colossalai/booster/plugin/torch_ddp_plugin.py                   66      5    92%
colossalai/booster/plugin/torch_fsdp_plugin.py                  97     13    87%
colossalai/checkpoint_io/checkpoint_io_base.py                  66      9    86%
colossalai/checkpoint_io/general_checkpoint_io.py              105      8    92%
colossalai/checkpoint_io/index_file.py                          70     17    76%
colossalai/checkpoint_io/utils.py                              243     43    82%
colossalai/zero/gemini/gemini_ddp.py                           402     97    76%
tests/test_checkpoint_io/test_general_checkpoint_io.py         108      0   100%
tests/test_checkpoint_io/test_torch_ddp_checkpoint_io.py        55      0   100%
--------------------------------------------------------------------------------
TOTAL                                                         1575    254    84%

@Fridge003 Fridge003 force-pushed the feature/optimizer-checkpoint branch from d0888af to 90cdf2d Compare June 16, 2023 05:51
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 58%.

Click me to view the complete report
Name                                                         Stmts   Miss  Cover
--------------------------------------------------------------------------------
colossalai/booster/booster.py                                   64     10    84%
colossalai/booster/mixed_precision/fp16_torch.py                46     24    48%
colossalai/booster/mixed_precision/mixed_precision_base.py       9      1    89%
colossalai/booster/plugin/gemini_plugin.py                     110     60    45%
colossalai/booster/plugin/low_level_zero_plugin.py              94      8    91%
colossalai/booster/plugin/plugin_base.py                        40     10    75%
colossalai/booster/plugin/torch_ddp_plugin.py                   66      3    95%
colossalai/booster/plugin/torch_fsdp_plugin.py                  97     48    51%
colossalai/checkpoint_io/checkpoint_io_base.py                  66      9    86%
colossalai/checkpoint_io/general_checkpoint_io.py              105      9    91%
colossalai/checkpoint_io/index_file.py                          70     17    76%
colossalai/checkpoint_io/utils.py                              243     76    69%
colossalai/zero/gemini/gemini_ddp.py                           402    348    13%
tests/test_checkpoint_io/test_torch_ddp_checkpoint_io.py        55      0   100%
--------------------------------------------------------------------------------
TOTAL                                                         1467    623    58%

@ver217 ver217 added the API related to API changes label Jun 16, 2023
@ver217 ver217 merged commit 822c3d4 into hpcaitech:develop Jun 16, 2023
ver217 pushed a commit to ver217/ColossalAI that referenced this pull request Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API related to API changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants