[BUG]: chatgpt ppo training hangs when using gemini

### 🐛 Describe the bug

## Problem

Running `ChatGPT/examples/train_prompts.py`, I found sometimes the training hangs when using Gemini.

This occurs when randomly while changing batch size.

## Possible reason

I found the [padding policy](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__.padding) is to pad to the longest sequence in the batch.
In DDP scheme, different process may have different input lengths due to random sampling. That is to say, they may have different generation steps.

When using Gemini, which need communication during forward, different forward steps leads to different number of communication calls. And this asymmetric communication leads to hang.

## Possible solution

Change padding policy to `'max_length'`, see [huggingface tokenizer doc](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.__call__.padding) for more details.

In addition, when enabling early stopping, we should also consider DDP and ensure the number of generation steps of each i process is the same.



### Environment

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: chatgpt ppo training hangs when using gemini #3161

🐛 Describe the bug

Problem

Possible reason

Possible solution

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: chatgpt ppo training hangs when using gemini #3161

Description

🐛 Describe the bug

Problem

Possible reason

Possible solution

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions