[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token#8043
Conversation
|
LGTM! |
|
@patrickvonplaten , @sshleifer |
Ohoh :-/ can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic |
Yeah this line is actually called at every step -> can you check whether removing the |
|
It's still very slow even after removing that line. I'll try to find the exact commit which is causing this slowdown. |
sshleifer
left a comment
There was a problem hiding this comment.
LGTM, would prefer stuff moved to init, but don't feel strongly.
| if self.config.pad_token_id is None and self.config.eos_token_id is not None: | ||
| logger.warn( | ||
| f"The `config.pad_token_id` is `None`. Using `config.eos_token_id` = {self.config.eos_token_id} for padding.." | ||
| ) |
There was a problem hiding this comment.
what if eos_token_id is None? Should we raise?
There was a problem hiding this comment.
Might be a bit too edge-casy but eos_token_id could be None in which case padding would never take place
There was a problem hiding this comment.
should we raise early in that case?
There was a problem hiding this comment.
What I meant is that there are models, like openai-gpt or ctrl that do not have a eos_token_id nor do they have a pad_token_id => the way it is implemented now these models could still make use of seq2seqTrainer because they would never require padding (because they never finish early). So I'd just leave it as it is - or if you think that models that don't have an EOS token should not use Seq2SeqTrainer we could raise as well - up to you!
There was a problem hiding this comment.
Didn't understand that they always go to max_length your implem makes total sense. Thanks for clarifying.
| pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else self.config.eos_token_id | ||
|
|
||
| if pad_token_id is None: | ||
| raise ValueError( | ||
| f"Make sure that either `config.pad_token_id` or `config.eos_token_id` is defined if tensor has to be padded to `max_length`={max_length}" | ||
| ) |
There was a problem hiding this comment.
Should we check in __init__ for faster failures?
What does this PR do?
This PR adds padding for models without padding token as well. The logic is the following:
If model predicts
targets<max_length=> model has to have at least aneos_token_id. If model has noconfig.pad_token_iddefined than the model simply uses theconfig.eos_token_idfor padding.If the model has no
config.eos_token_id, => model cannot generate predictions shorter thanmax_length. In this case padding will never happen.@sshleifer @patil-suraj - you guys were right -> the
Trainerrequires padding in any case (also if model has no padding token).Could you guys review this PR and see if these fixes in Seq2Seq Trainer are ok for you?