Skip to content

[shardformer] support module saving and loading#4062

Merged
FoolPlayer merged 2 commits intohpcaitech:feature/shardformerfrom
FrankLeeeee:hotfix/dtensor-api
Jun 22, 2023
Merged

[shardformer] support module saving and loading#4062
FoolPlayer merged 2 commits intohpcaitech:feature/shardformerfrom
FrankLeeeee:hotfix/dtensor-api

Conversation

@FrankLeeeee
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Fixed #4061

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

This PR added support for state_dict and load_state_dict for shardformer layers. The work can be summarized as

  1. refactored the dtensor API to only use torch.Tensor instead of creating a subclass.
  2. implemented and created utility functions for distributed tensors
  3. implemented state_dict and load_state_dict for parallel module
  4. added tests for weight loading and saving for shardformer layers.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@FrankLeeeee FrankLeeeee linked an issue Jun 21, 2023 that may be closed by this pull request
@FrankLeeeee FrankLeeeee self-assigned this Jun 21, 2023
@FrankLeeeee
Copy link
Copy Markdown
Contributor Author

Some test screenshot:

Screenshot 2023-06-21 at 16 17 14 Screenshot 2023-06-21 at 15 37 29

@FoolPlayer FoolPlayer merged commit 50ee6c0 into hpcaitech:feature/shardformer Jun 22, 2023
@FrankLeeeee FrankLeeeee deleted the hotfix/dtensor-api branch June 22, 2023 06:45
FrankLeeeee added a commit that referenced this pull request Jun 26, 2023
* [shardformer] support module saving and loading

* polish code
flybird11111 pushed a commit to flybird11111/ColossalAI that referenced this pull request Jul 3, 2023
* [shardformer] support module saving and loading

* polish code
FrankLeeeee added a commit that referenced this pull request Jul 4, 2023
* [shardformer] support module saving and loading

* polish code
ver217 pushed a commit to ver217/ColossalAI that referenced this pull request Jul 13, 2023
* [shardformer] support module saving and loading

* polish code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[shardformer] support layer state_dict and load_state_dict

2 participants