[trainer] --model_parallel hasn't been implemented for most models#9347
[trainer] --model_parallel hasn't been implemented for most models#9347LysandreJik merged 6 commits intohuggingface:masterfrom
Conversation
| if self.args.model_parallel: | ||
| # XXX: ideally this register should be maintained elsewhere so that the trainer could just do | ||
| # if model.model_parallel_is_supported() | ||
| mp_supported = ["gpt2", "t5"] |
There was a problem hiding this comment.
Maybe we can check like this for now:
if not hasattr(model, "model_parallel"):
raise ValueError(f"{model.config.model_type} implementation currently doesn't support model parallelism, therefore --model_parallel cl arg cannot be used")
patrickvonplaten
left a comment
There was a problem hiding this comment.
Thanks for the PR! I agree that we should add an assert like this.
Also don't think keeping a list we'd manually have to extend is the best way to go here. Maybe just checking whether the model has the attribute model_parallel is good enough for now....Wdyt?
|
@alexorona proposed to have the I see this PR as a quick band-aid since we released the new cl arg w/o checking that it always works. And then we will surely improve it as we generalize MP and not leave it this way. This is definitely not how it'll remain in the long run. |
LysandreJik
left a comment
There was a problem hiding this comment.
LGTM as a hotfix.
Regarding model_parallel, if that's to be a method in PreTrainedModel and cannot be used to distinguish between models which are parallelizable and models which are not, I think we can go ahead and add a flag for parallelizable models, like model.parallelizable.
Having a way to distinguish between parallelizable models and non-parallelizable models sounds like a must as we continue adding parallelization support.
sgugger
left a comment
There was a problem hiding this comment.
LGTM with the suggestion from Patrick.
| if self.args.model_parallel: | ||
| # XXX: ideally this register should be maintained elsewhere so that the trainer could just do | ||
| # if model.model_parallel_is_supported() | ||
| mp_supported = ["gpt2", "t5"] |
|
So should we merge this one as a hot-fix? An absolute yes to And also what do you think about tests? Currently we hardcore a list of parallelizable models: transformers/tests/test_modeling_t5.py Line 491 in 086718a should it remain this way or should we automatically derive those from the model by iterating over transformers/tests/test_modeling_t5.py Line 489 in 086718a and automatically deriving which are parallelizable. Less code to write in the future. |
|
I'd rather merge as a hotfix the proper check and then worry about the tests in a follow up PR (I think we should have a combination of a flag (like for pruning) and checking the models having the attributes there). |
|
It no longer will be hot, but yes, I will code that ;) thank you for the feedback, @sgugger
I'm not sure what you mean here. An example would be helpful to understand what you propose. |
|
The class |
|
OK, I added if you prefer w/o |
sgugger
left a comment
There was a problem hiding this comment.
I'm fine with this design but it differs from what we were talking about, so we should check the others are fine with it too before merging.
|
|
||
| - **base_model_prefix** (:obj:`str`) -- A string indicating the attribute associated to the base model in | ||
| derived classes of the same architecture adding modules on top of the base model. | ||
| - **_is_parallelizable** (:obj:`bool`) -- A flag indicating whether this model supports model parallelization. |
There was a problem hiding this comment.
A private flag should not appear in the public documentation.
There was a problem hiding this comment.
ah, right! so put the doc for the property then?
this seems to be different from others - how should I document it then?
Or perhaps just make it into a public member? what is the standard?
There was a problem hiding this comment.
Why not have just this class attribute be public and no property?
There was a problem hiding this comment.
I thought it was cleaner since it should be read-only, but it's fine as non-property. changed.
I don't think we have a "way" so that's why I'm never sure when something should be a property or a public attribute.
thank you for the feedback/ideas, @sgugger.
There was a problem hiding this comment.
TBH base_model_prefix is the same, it should be read-only in theory but we have it as a simple class attribute, so let's stay simple for this new attribute too :-)
Yes, of course. that's why it is no longer a hotfix, but it seems to be fine - only one user has filed an issue about using a non-working |
|
So since the only change I proposed is from |
|
Yes, let's wait for him to review this tomorrow morning (he's on European time for the next month or so). |
LysandreJik
left a comment
There was a problem hiding this comment.
LGTM, this is very clean!
Apparently we unleashed
--model_parallelin trainer w/o checking if the model supports MP (most don't). This PR:As we are gradually starting to build MP-support a cleaner solution will be made in the future, but for now this is good enough to prevent misleading false expectations as reported in #9336
(Also for the future, I'm not sure whether it'd be better to check
model.config.architectures, which would be more precise than checkingmodel_typesince it's thearchitecturesthat may or may not support MP within the samemodel_type- but that's a different discussion).Fixes: #9336
@patrickvonplaten, @sgugger