[BUG]:  errors of combine shard_init and tensor parallel in rlhf examples

### 🐛 Describe the bug

I'm working on the RLHF implementation in `application/ChatGPT`, 

Can I use **LoRA** in my actor/critic and turn `shard_init` on while also **tensor parallelize** my model after the initialization? I' ve tried several combinations all of the above functionalities and I've found problems under the following settings:
- LoRA + shard_init + TP: I modify the construction function of `LoRALinear` in order to provide it the correct weight shape, but I found something went wrong resulting in `NotImplementedError` raised in colossalai/nn/_ops/embedding.py:130
- shard_init + TP: also the same `NotImplementedError`
<img width="762" alt="image" src="https://user-images.githubusercontent.com/31499806/224622287-29780b35-edd2-4216-b9eb-443ce5f8b021.png">

And also a noob question: when I use `shard_init`, I cannot let `tp_degree` different from `nproc_per_node` which raises error
<img width="902" alt="image" src="https://user-images.githubusercontent.com/31499806/224626750-99fb871a-d972-4420-855e-3b96937949b3.png">


While (LoRA + TP) runs well so I think **maybe the problem is purely because my implementation of tensor parallelization has some conflicts with the provided scripts**?

So here I first provide the TP code and my running scripts first. And about LoRA problem since it needs to provide another file, so maybe we can see it after (shard_init + TP) runs right.


## Reproduction
[my_train_dummy.py.zip](https://github.com/hpcaitech/ColossalAI/files/10954339/my_train_dummy.py.zip)
also modify `GPTActor` and `GPTCritic` by adding an input of lora_rank like the following code
<img width="509" alt="image" src="https://user-images.githubusercontent.com/31499806/224628599-b7edae69-58d2-4560-a096-92c2ca071088.png">



### Environment

torch              1.13.1
torchaudio         0.13.1
torchvision        0.14.1
colossalai         0.2.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: errors of combine shard_init and tensor parallel in rlhf examples #3118

🐛 Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: errors of combine shard_init and tensor parallel in rlhf examples #3118

Description

🐛 Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions