[trainer] remove --model_parallel#9451
Conversation
LysandreJik
left a comment
There was a problem hiding this comment.
Indeed, LGTM! We should have been more attentive during the review, no harm done.
@sgugger for info, this was removed here: 9f675b0#diff-ed55888e6665791fe92cc8fc0c499da54f4ace6738551cd9a2591881cda076deL245-L248
|
Thanks for putting it back. Since we're in a PR on this test alone, can we "fix" it to ignore the |
|
2 things:
I change this code last night to; and it doesn't work when I try: It doesn't look it ever worked... i.e. MP works when setup up manually but doesn't work in trainer. p.s. I tagged you on that discussion - not sure if you saw it. |
That's not a discovery on my side, that is exactly why I keep saying that the argument
That would be a big breaking change in the API, and beginners actually want to have the parallelism work out of the box when they have several GPUs, so I don't see why change something that works. |
It doesn't work
OK, then the flag should be there with the default On? Surely a user should be able not to run DP and it's not possible at the moment. |
|
OK, so I did remove The problem is
But I guess I need to first figure out how to make MP work in trainer at all, I doesn't look it was ever tried or tested. As it fails for me. |
|
FWIW, I suspect t5 MP wasn't tested/made to work with |
|
OK, I committed the bulk of it, and @sgugger will push some magic to deal with tests should be failing I think until he does that. |
--model_parallel
|
So now I can see I can jokingly blame my initial mistake on @sgugger since he wanted it removed all along and so I unconsciously did it during rebasing and he unconsciously saw this as the right thing to do during the review ;) It's all Freud's fault anyway ;) |
|
I added a wrapped first, but it looked out of place so I introduced and documented a new attribute: |
|
@sgugger, I must be doing something wrong - that docstring section of |
|
and here is why I removed The tests were failing with: |
|
Thank you for fixing the docs, @sgugger! |
| if hasattr(model, "is_parallelizable") and model.is_parallelizable and model.model_parallel: | ||
| self.is_model_parallel = True | ||
| else: | ||
| self.is_model_parallel = False |
Per @sgugger's request removing
--model_parallelin trainer, as it was never tested or made to work with the trainer.We will get back to it in the future.
This PR doesn't introduce breaking changes, since
--model_parallelnever worked (well other than in my MP PRs that have been parked for now, since they are very inefficient and we are looking for a better approach, rather than waste time on sorting those out).@LysandreJik, @sgugger