[shardformer] Support Mistral for Shardformer#4836
[shardformer] Support Mistral for Shardformer#4836eric8607242 wants to merge 2 commits intohpcaitech:mainfrom
Conversation
|
The code coverage for the changed files is 27%. Click me to view the complete report |
|
Hi, have you tested the correctness of your codes? |
|
Hi @Fridge003, I have successfully fine-tuned mistral-7B with the HybridPlugin (flash attention + tensor parallelism + fused normalization) with my code. |
|
Hi, @eric8607242. Usually when we implement policy for a new model, the corresponding tests should also be added under folder |
|
Hi @Fridge003 , I see. But, in this PR, pipeline parallelism is not yet supported. |
|
Hello, why has the code not been merged into the trunk? Is there something wrong? Can you provide the complete fine-tuning code for mistral? |
|
Hi @zhoumengbo, I modified the fine-tuning code from https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/hybridparallelism/finetune.py |
Hi, this PR lacks code for unit tests, we would greatly appreciate it if you could help add unit tests. Alternatively, you could submit it to the feature/shardformer branch, and we will enhance the tests for this PR. Thank you. |
|
Close this PR as I create a new PR to |
📌 Checklist before creating the PR
[doc/gemini/tensor/...]: A concise description🚨 Issue number
#4835
📝 What does this PR do?
Hi, I added a new policy and model to support Mistral with ShardFormer.
The current policy supports tensor parallel, fused normalization, and flash attention.
Although the minimum version requirement for Mitral is
transformers==4.34.0.dev0currently, it is still very excited to make ColossalAI support such impressive models!!!💥 Checklist before requesting a review
⭐️ Do you enjoy contributing to Colossal-AI?
Tell us more if you don't enjoy contributing to Colossal-AI.