Skip to content

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token#8043

Merged
patrickvonplaten merged 9 commits intohuggingface:masterfrom
patrickvonplaten:fix_seq2seq_trainer
Oct 26, 2020
Merged

[Seq2Seq Trainer] Make sure padding is implemented for models without pad_token#8043
patrickvonplaten merged 9 commits intohuggingface:masterfrom
patrickvonplaten:fix_seq2seq_trainer

Conversation

@patrickvonplaten
Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten commented Oct 26, 2020

What does this PR do?

This PR adds padding for models without padding token as well. The logic is the following:

  1. If model predicts targets < max_length => model has to have at least an eos_token_id. If model has no config.pad_token_id defined than the model simply uses the config.eos_token_id for padding.

  2. If the model has no config.eos_token_id, => model cannot generate predictions shorter than max_length. In this case padding will never happen.

@sshleifer @patil-suraj - you guys were right -> the Trainer requires padding in any case (also if model has no padding token).
Could you guys review this PR and see if these fixes in Seq2Seq Trainer are ok for you?

@patil-suraj
Copy link
Copy Markdown
Contributor

LGTM!

Comment thread examples/seq2seq/seq2seq_trainer.py Outdated
@patil-suraj
Copy link
Copy Markdown
Contributor

patil-suraj commented Oct 26, 2020

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

@patrickvonplaten
Copy link
Copy Markdown
Contributor Author

patrickvonplaten commented Oct 26, 2020

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

Ohoh :-/ can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic inputs = copy.deepcopy(inputs).

@patrickvonplaten
Copy link
Copy Markdown
Contributor Author

@patrickvonplaten , @sshleifer
I am seeing a major slowdown on TPU V3-8,
last time (9e68d07) sshleifer/student_marian_en_ro_6_3 finished 1 epoch in ~6 mins,
now on this branch it's showing ~1hr 20 mins

Ohh, can you narrow down the commit that caused the slow-down? I took a look again at https://github.com/huggingface/transformers/pull/7809/files and this line I added could be problematic inputs = copy.deepcopy(inputs).

Yeah this line is actually called at every step -> can you check whether removing the copy operation speeds the seq2seq trainer up again? I've been a bit sloppy there I think :-/

@patil-suraj
Copy link
Copy Markdown
Contributor

patil-suraj commented Oct 26, 2020

It's still very slow even after removing that line. I'll try to find the exact commit which is causing this slowdown.

Copy link
Copy Markdown
Contributor

@sshleifer sshleifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, would prefer stuff moved to init, but don't feel strongly.

if self.config.pad_token_id is None and self.config.eos_token_id is not None:
logger.warn(
f"The `config.pad_token_id` is `None`. Using `config.eos_token_id` = {self.config.eos_token_id} for padding.."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if eos_token_id is None? Should we raise?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a bit too edge-casy but eos_token_id could be None in which case padding would never take place

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we raise early in that case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that there are models, like openai-gpt or ctrl that do not have a eos_token_id nor do they have a pad_token_id => the way it is implemented now these models could still make use of seq2seqTrainer because they would never require padding (because they never finish early). So I'd just leave it as it is - or if you think that models that don't have an EOS token should not use Seq2SeqTrainer we could raise as well - up to you!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't understand that they always go to max_length your implem makes total sense. Thanks for clarifying.

Comment thread examples/seq2seq/seq2seq_trainer.py Outdated
Comment on lines +219 to +224
pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else self.config.eos_token_id

if pad_token_id is None:
raise ValueError(
f"Make sure that either `config.pad_token_id` or `config.eos_token_id` is defined if tensor has to be padded to `max_length`={max_length}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check in __init__ for faster failures?

@patrickvonplaten patrickvonplaten merged commit 664c7ec into huggingface:master Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants