Trainer is using DataParallel on parallelized models 

## Environment info

- `transformers` version: 4.2.0
- Platform: Ubuntu 20.04
- Python version:  3.8.5
- PyTorch version (GPU?): 1.7.1 / CUDA 11.2
- Using GPU in script?:
- Using distributed or parallel set-up in script?:

### Who can help
@sgugger @stas00 


## Information

I'm trying out the 4.2.0 release with a training script that had been working in 4.1.1. 

I'm parallelizing my model over two GPUs, and I had been using the `--model_parallel` training arg in the previous version. Now that it's no longer used, I removed the arg from my training command, but I'm getting an error as though the DataParallel is being used and the model isn't being detected as parallelized:
`RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1`

I did some debugging, and everything seems okay with my model (`trainer. is_model_parallel` returns True). But the `trainer. args.n_gpu` is still 2.

I admit that I don't totally understand what's happening in the trainer code, but it might be an error on line 289?
[`self.args._n_gpu = 1`](https://github.com/huggingface/transformers/blob/126fd281bc309ec29caef99e982640265c8a4fba/src/transformers/trainer.py#L289)

Should that be `self.args.n_gpu = 1`, without the leading underscore?


## To reproduce

Steps to reproduce the behavior:

1. Parallelize a model
2. Train on a machine with multiple GPUs





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainer is using DataParallel on parallelized models #9577

Environment info

Who can help

Information

To reproduce

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Trainer is using DataParallel on parallelized models #9577

Description

Environment info

Who can help

Information

To reproduce

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions