[BUG]: 使用HybridParallelPlugin做Pipeline并行测试GPU利用率不稳定

### 🐛 Describe the bug

使用HybridParallelPlugin做Pipeline并行测试GPU利用率不稳定，经常出现低于50%的情况，而Megatron一般都会维持在90%以上。想问下是Pipeline通信这块还没有优化好吗？

代码是基于  https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/llama2/benchmark.py  修改，仅修改了数据load部分。

我的一些参数
plugin = HybridParallelPlugin(tp_size=1,
pp_size=2,
enable_flash_attention=True,
enable_fused_normalization=True,
enable_jit_fused=True,
microbatch_size=2,
precision='bf16',
zero_stage=1)

batch_size=32
context-length= 4096

速度上也较慢，仅4.8samples/sec
![image](https://github.com/hpcaitech/ColossalAI/assets/9442170/01ac9b2d-d311-4782-9495-48f891997973)

以下是利用率截图
![image](https://github.com/hpcaitech/ColossalAI/assets/9442170/dbd189dd-a861-4c70-8909-6a17e2ed66cc)

### Environment

NCCL version 2.18.1+cuda12.1 ClossalAI3.2 PyTorch2.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: 使用HybridParallelPlugin做Pipeline并行测试GPU利用率不稳定 #4747

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: 使用HybridParallelPlugin做Pipeline并行测试GPU利用率不稳定 #4747

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions