Skip to content

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin#4196

Merged
ver217 merged 3 commits intohpcaitech:feature/gemini-checkpointfrom
Fridge003:feature/optimizer-checkpoint
Jul 13, 2023
Merged

[checkpointio] Sharded Optimizer Checkpoint for Gemini Plugin#4196
ver217 merged 3 commits intohpcaitech:feature/gemini-checkpointfrom
Fridge003:feature/optimizer-checkpoint

Conversation

@Fridge003
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

resolved #4035

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Implement sharded optimizer checkpointing feature.
For detailed design, please refer to #4140

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@Fridge003 Fridge003 requested a review from ver217 July 7, 2023 09:58
@Fridge003 Fridge003 force-pushed the feature/optimizer-checkpoint branch from 93c3725 to c627f3e Compare July 10, 2023 05:13
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 93%.

Click me to view the complete report
Name                                                        Stmts   Miss  Cover
-------------------------------------------------------------------------------
colossalai/booster/plugin/gemini_plugin.py                    162     14    91%
colossalai/zero/gemini/gemini_optimizer.py                    408     38    91%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py          84      0   100%
tests/test_checkpoint_io/test_gemini_torch_compability.py     116      0   100%
-------------------------------------------------------------------------------
TOTAL                                                         770     52    93%

@Fridge003 Fridge003 force-pushed the feature/optimizer-checkpoint branch from c627f3e to eabc175 Compare July 11, 2023 05:58
@Fridge003 Fridge003 changed the base branch from main to feature/gemini-checkpoint July 11, 2023 07:45
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 91%.

Click me to view the complete report
Name                                                        Stmts   Miss  Cover
-------------------------------------------------------------------------------
colossalai/booster/plugin/gemini_plugin.py                    145     14    90%
colossalai/checkpoint_io/general_checkpoint_io.py              92      8    91%
colossalai/checkpoint_io/utils.py                             264     44    83%
colossalai/zero/gemini/gemini_optimizer.py                    406     37    91%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py          84      0   100%
tests/test_checkpoint_io/test_gemini_torch_compability.py     116      0   100%
-------------------------------------------------------------------------------
TOTAL                                                        1107    103    91%

@ver217 ver217 merged commit e0eeccf into hpcaitech:feature/gemini-checkpoint Jul 13, 2023
Fridge003 pushed a commit to Fridge003/ColossalAI that referenced this pull request Jul 20, 2023
…ech#4196)

* * sharded optimizer checkpoint for gemini plugin

* * modify test to reduce testing time

* * update doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[checkpoint] support optimizer checkpointing feature for gemini

2 participants