Skip to content

[gemini] Fix a bug in chunk configuration searching#4056

Merged
Fridge003 merged 1 commit intohpcaitech:mainfrom
Fridge003:hotfix/fix_gemini_chunk_config_searching
Jun 25, 2023
Merged

[gemini] Fix a bug in chunk configuration searching#4056
Fridge003 merged 1 commit intohpcaitech:mainfrom
Fridge003:hotfix/fix_gemini_chunk_config_searching

Conversation

@Fridge003
Copy link
Copy Markdown
Contributor

@Fridge003 Fridge003 commented Jun 21, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

#4052

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Since the chunk_size attribute in Gemini should be used in number of elements, the modification proposed in #4052 can't be adopted.

Instead, to avoid confusion, arguments including search_range_mb, min_chunk_size_mb and search_interval_bytes should be renamed to search_range_m, min_chunk_size_m and search_interval. In this way, there should be no doubt that chunk_size takes number of elements as its unit rather than number of bytes.

Also, some tests are modified due to this renaming of arguments.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@ver217 ver217 added the bug Something isn't working label Jun 21, 2023
@Fridge003 Fridge003 force-pushed the hotfix/fix_gemini_chunk_config_searching branch from 833a88d to 852f9ca Compare June 25, 2023 03:33
@Fridge003 Fridge003 force-pushed the hotfix/fix_gemini_chunk_config_searching branch from 852f9ca to 0bb0b48 Compare June 25, 2023 05:43
@github-actions
Copy link
Copy Markdown
Contributor

The code coverage for the changed files is 89%.

Click me to view the complete report
Name                                                                           Stmts   Miss  Cover
--------------------------------------------------------------------------------------------------
colossalai/booster/plugin/gemini_plugin.py                                       110      9    92%
colossalai/zero/gemini/chunk/search_utils.py                                      89      1    99%
colossalai/zero/gemini/chunk/utils.py                                             31      4    87%
colossalai/zero/gemini/gemini_ddp.py                                             402     67    83%
tests/test_auto_parallel/test_tensor_shard/test_compatibility_with_gemini.py      80     80     0%
tests/test_checkpoint_io/test_gemini_checkpoint_io.py                             80      0   100%
tests/test_tensor/test_tp_with_zero.py                                            98      1    99%
tests/test_zero/test_gemini/test_fwd_bwd.py                                      116      1    99%
tests/test_zero/test_gemini/test_gemini_use_rmt.py                                73      1    99%
tests/test_zero/test_gemini/test_grad_clip.py                                     81      1    99%
tests/test_zero/test_gemini/test_inference.py                                     99      1    99%
tests/test_zero/test_gemini/test_optim.py                                        130      1    99%
tests/test_zero/test_gemini/test_search.py                                        70      1    99%
tests/test_zero/test_gemini/test_zeroddp_state_dict.py                            82      5    94%
tests/test_zero/test_gemini/test_zeroddp_state_dict_shard.py                      42      1    98%
tests/test_zero/test_gemini/test_zerooptim_state_dict.py                          67      1    99%
--------------------------------------------------------------------------------------------------
TOTAL                                                                           1650    175    89%

@Fridge003 Fridge003 merged commit 2c8ae37 into hpcaitech:main Jun 25, 2023
@Fridge003 Fridge003 deleted the hotfix/fix_gemini_chunk_config_searching branch June 25, 2023 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants