-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Closed
Description
Hi,
I really like the text_to_image training script and had really good results with it. I want to train a model now with multiple GPUs.
My current understanding:
I want to use a data paralell setup thus each GPU gets its batches and at the end they sync the gradients. This should not result in a significant increase of training time (right?).
So far, when training with 2 GPUs the training time is doubled compared to a single GPU setup. The loss converges faster because of the higher batch size but the overall throughput / h would not increase that way.
Please correct me if i am wrong.
If i am correct, how do i enable the DP setup in accelerate or should I revert to torch and its DataParalell wrapper?
Metadata
Metadata
Assignees
Labels
No labels