[trainer] make generate work with multigpu#8716
Conversation
It did - thank you! This is a very ambiguous situation for a user who wants to use HF trainer in their code. When to use What happens here is Here are some possible solutions to resolve this ambiguity:
|
|
We can certainly improve the documentation and the debugging experience. I think I prefer the solution 2 since 1. is too magic (so will probably make things harder to debug) and 3 is not compatible with the regular Doing |
Did you mean to say "needs the wrapped model"? Unless I'm misreading what you wrote 3rd solution is the right one, since the Trainer doesn't do anything with the wrapped model. I don't know though whether this is so everywhere. The 4th solution is passing
Except it won't be wrapped per se most of the time - very confusing to the user. Currently it should be called |
|
I meant the wrapped model, sorry. |
|
I'm getting this issue too using a T5 Model on multiple gpus
Is this supposed to be resolved? I've never seen this before. I've tried with 4.10.0 as well as current master branch |
|
Is it possible you somehow have a really old If not, as always we need a way to reproduce the problem as the first step. And ideally in a new issue so that it can be tracked. But you can also see the fix in this PR and try to trace it to where the Thank you. |
This PR:
Chances are that this would be the same problem with any other
model.foocalls as this is not the first time this is happening. i.e. the base model class most likely needs to made aware ofDataParalleland transparently get themodelat the calling point.@sgugger, @LysandreJik, @patrickvonplaten
Fixes: #8713