Skip to content

[examples/seq2seq] support label smoothing#9844

Merged
patil-suraj merged 6 commits intohuggingface:masterfrom
patil-suraj:fix-s2s-lbl-smooth
Feb 5, 2021
Merged

[examples/seq2seq] support label smoothing#9844
patil-suraj merged 6 commits intohuggingface:masterfrom
patil-suraj:fix-s2s-lbl-smooth

Conversation

@patil-suraj
Copy link
Copy Markdown
Contributor

@patil-suraj patil-suraj commented Jan 27, 2021

What does this PR do?

Add support for label smoothing by adding prepare_decoder_input_ids_from_labels method to all seq2seq models which will let us prepare decoder_input_ids outside the model.

For context, we need to pass decoder_input_ids for label smoothing because we don't pass labels to avoid calculating loss twice, which leads to speeds degradation, see #9713.

@sgugger , @patrickvonplaten what do we think about adding prepare_decoder_input_ids_from_labels to every seq2seq model, there are already shift_tokens_right/_shift_right methods, but the name is a bit confusing IMO to use outside the model.

Comment thread examples/seq2seq/run_seq2seq.py Outdated
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for doing this! I like it a lot!
I don't know if the shift methods are used for something else in the seq2seq methods, but if this was their only use, we could maybe deprecate them?

Comment thread examples/seq2seq/run_seq2seq.py Outdated
Comment thread examples/seq2seq/run_seq2seq.py Outdated
@patil-suraj
Copy link
Copy Markdown
Contributor Author

patil-suraj commented Feb 4, 2021

I don't know if the shift methods are used for something else in the seq2seq methods, but if this was their only use, we could maybe deprecate them?

those are used for exactly the same reason, prepare decoder_input_ids by shifting labels, and those are mostly used inside the models, so yeah, think we could deprecate them

@patil-suraj patil-suraj changed the title [WIP][examples/seq2seq] support label smoothing and enc/emb freezing [examples/seq2seq] support label smoothing and enc/emb freezing Feb 5, 2021
Comment thread examples/seq2seq/run_seq2seq.py Outdated
Comment thread examples/seq2seq/run_seq2seq.py Outdated
Comment thread examples/seq2seq/run_seq2seq.py Outdated
Comment thread examples/seq2seq/run_seq2seq.py Outdated
Comment thread src/transformers/models/bart/modeling_bart.py Outdated
Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

One thing, I'd change however would be to not allow to pass the tokenizer.pad_token_id to prepare_decoder_input_ids_from_labels => think the model should always have it's own pad_token_id defined in the config. What do you think?

@patil-suraj
Copy link
Copy Markdown
Contributor Author

I agree we could remove the pad_token_id argument.

Comment thread examples/seq2seq/run_seq2seq.py Outdated
Copy link
Copy Markdown
Contributor Author

@patil-suraj patil-suraj Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger

Just realized this, here labels is a list but we need a tensor for prepare_decoder_input_ids_from_labels method. And can't turn that into a tensor here since it's not padded.

Thinking more about this, I would say we remove the ignore_pad_token_for_loss argument, this is rather a confusing name. In the previous script, it was used to specify that we should ignore pad tokens as it is (by setting the ignore_index in loss to pad_token_id ) rather than replacing them with -100 (because of the label smoothing issues). So in any case the pad tokens were ignored, and I don't think there is any reason to not ignore them.

So IMO we should

  • remove the ignore_pad_token_for_loss argument
  • set padding to True when pad_to_max_length is False
  • replace pad with -100 in the pre-processing function
  • and remove/deprecate the DataCollatorForSeq2Seq

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set padding to True when pad_to_max_length is False

This will pad to the maximum length of the batches sent to the map method of the dataset, not the maximum length of a batch, so this does not work. DataCollatorForSeqSeq is needed until we have the datasets v2 release to do that padding as a transform on the dataset (soon!) but we can't deprecate it just now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, yes. Then do you think we could pass the prepare_decoder_input_ids_from_labels method to DataCollatorForSeq2Seq and prepare the decoder_input_ids there ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with that solution, yes. We can pass the model to DataCollatorForSeq2Seq and do the check for the method there (better to pass the object than a function).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sgugger , @patil-suraj
I am trying to run "run_seq2seq.py" to train mT5 for translation task. I am getting the following error though:

ImportError: cannot import name 'DataCollatorForSeq2Seq' from 'transformers' (unknown location)

It seems that DataCollatorForSeq2Seq is already removed from the transformers package, right?

P.S. my transformers version: 4.2.2 installed using pip

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Arman-IMRSV,

please use issues to report bugs

@patil-suraj patil-suraj requested a review from sgugger February 5, 2021 16:17
@patil-suraj patil-suraj changed the title [examples/seq2seq] support label smoothing and enc/emb freezing [examples/seq2seq] support label smoothing Feb 5, 2021
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the work here!

@patil-suraj patil-suraj merged commit 1cd1651 into huggingface:master Feb 5, 2021
@patil-suraj patil-suraj deleted the fix-s2s-lbl-smooth branch February 5, 2021 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants