Skip to content

[shardformer] Support Mistral for Shardformer#4836

Closed
eric8607242 wants to merge 2 commits intohpcaitech:mainfrom
eric8607242:feature/mistral
Closed

[shardformer] Support Mistral for Shardformer#4836
eric8607242 wants to merge 2 commits intohpcaitech:mainfrom
eric8607242:feature/mistral

Conversation

@eric8607242
Copy link
Copy Markdown
Contributor

@eric8607242 eric8607242 commented Sep 28, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

#4835

📝 What does this PR do?

Hi, I added a new policy and model to support Mistral with ShardFormer.
The current policy supports tensor parallel, fused normalization, and flash attention.

Although the minimum version requirement for Mitral is transformers==4.34.0.dev0 currently, it is still very excited to make ColossalAI support such impressive models!!!

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@eric8607242 eric8607242 changed the title [shardformer] Add Mistral support for Shardformer [shardformer] Support Mistral for Shardformer Sep 30, 2023
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 2, 2023

The code coverage for the changed files is 27%.

Click me to view the complete report
Name                                            Stmts   Miss  Cover
-------------------------------------------------------------------
colossalai/shardformer/layer/normalization.py      50     10    80%
colossalai/shardformer/modeling/mistral.py         42     42     0%
colossalai/shardformer/policies/mistral.py         58     58     0%
-------------------------------------------------------------------
TOTAL                                             150    110    27%

@Fridge003
Copy link
Copy Markdown
Contributor

Hi, have you tested the correctness of your codes?

@eric8607242
Copy link
Copy Markdown
Contributor Author

eric8607242 commented Oct 7, 2023

Hi @Fridge003,

I have successfully fine-tuned mistral-7B with the HybridPlugin (flash attention + tensor parallelism + fused normalization) with my code.
Did you encounter any issues to run with this setting?

@Fridge003
Copy link
Copy Markdown
Contributor

Hi, @eric8607242. Usually when we implement policy for a new model, the corresponding tests should also be added under folder tests/test_shardformer/test_model. Since our CI doesn't use the newest version of transformers library, this pull request might bypass the CI tests. I want to make sure that the codes in this PR doesn't cause error.

@eric8607242
Copy link
Copy Markdown
Contributor Author

Hi @Fridge003 ,

I see.
This code does not raise any errors in my testing.
As I mentioned above, the mistral-7B can be fine-tuned successfully.

But, in this PR, pipeline parallelism is not yet supported.

@zhoumengbo
Copy link
Copy Markdown

Hello, why has the code not been merged into the trunk? Is there something wrong? Can you provide the complete fine-tuning code for mistral?

@eric8607242
Copy link
Copy Markdown
Contributor Author

Hi @zhoumengbo,
Unfortunately, I have no idea why this MR has not been merged yet.

I modified the fine-tuning code from https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/hybridparallelism/finetune.py

@flybird11111 flybird11111 marked this pull request as draft November 10, 2023 02:25
@flybird11111 flybird11111 marked this pull request as ready for review November 10, 2023 02:25
@flybird11111 flybird11111 requested a review from a team as a code owner November 10, 2023 02:25
@flybird11111
Copy link
Copy Markdown
Contributor

Hi @zhoumengbo, Unfortunately, I have no idea why this MR has not been merged yet.

I modified the fine-tuning code from https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/hybridparallelism/finetune.py

Hi, this PR lacks code for unit tests, we would greatly appreciate it if you could help add unit tests. Alternatively, you could submit it to the feature/shardformer branch, and we will enhance the tests for this PR. Thank you.

@flybird11111 flybird11111 marked this pull request as draft November 10, 2023 03:13
@flybird11111 flybird11111 marked this pull request as ready for review November 10, 2023 03:13
@eric8607242 eric8607242 marked this pull request as draft November 14, 2023 02:07
@eric8607242 eric8607242 marked this pull request as ready for review November 14, 2023 02:11
@eric8607242 eric8607242 changed the base branch from main to feature/shardformer November 23, 2023 14:42
@eric8607242 eric8607242 changed the base branch from feature/shardformer to main November 23, 2023 14:43
@eric8607242
Copy link
Copy Markdown
Contributor Author

Close this PR as I create a new PR to feature/shardformer in #5103.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants