[examples/seq2seq] support label smoothing by patil-suraj · Pull Request #9844 · huggingface/transformers

patil-suraj · 2021-01-27T13:08:15Z

What does this PR do?

Add support for label smoothing by adding prepare_decoder_input_ids_from_labels method to all seq2seq models which will let us prepare decoder_input_ids outside the model.

For context, we need to pass decoder_input_ids for label smoothing because we don't pass labels to avoid calculating loss twice, which leads to speeds degradation, see #9713.

@sgugger , @patrickvonplaten what do we think about adding prepare_decoder_input_ids_from_labels to every seq2seq model, there are already shift_tokens_right/_shift_right methods, but the name is a bit confusing IMO to use outside the model.

sgugger

Thanks a lot for doing this! I like it a lot!
I don't know if the shift methods are used for something else in the seq2seq methods, but if this was their only use, we could maybe deprecate them?

patil-suraj · 2021-02-04T14:36:07Z

I don't know if the shift methods are used for something else in the seq2seq methods, but if this was their only use, we could maybe deprecate them?

those are used for exactly the same reason, prepare decoder_input_ids by shifting labels, and those are mostly used inside the models, so yeah, think we could deprecate them

patrickvonplaten

LGTM!

One thing, I'd change however would be to not allow to pass the tokenizer.pad_token_id to prepare_decoder_input_ids_from_labels => think the model should always have it's own pad_token_id defined in the config. What do you think?

patil-suraj · 2021-02-05T14:33:32Z

I agree we could remove the pad_token_id argument.

patil-suraj · 2021-02-05T16:17:39Z

@sgugger

Just realized this, here labels is a list but we need a tensor for prepare_decoder_input_ids_from_labels method. And can't turn that into a tensor here since it's not padded.

Thinking more about this, I would say we remove the ignore_pad_token_for_loss argument, this is rather a confusing name. In the previous script, it was used to specify that we should ignore pad tokens as it is (by setting the ignore_index in loss to pad_token_id ) rather than replacing them with -100 (because of the label smoothing issues). So in any case the pad tokens were ignored, and I don't think there is any reason to not ignore them.

So IMO we should

remove the ignore_pad_token_for_loss argument

set padding to True when pad_to_max_length is False

replace pad with -100 in the pre-processing function

and remove/deprecate the DataCollatorForSeq2Seq

set padding to True when pad_to_max_length is False

This will pad to the maximum length of the batches sent to the map method of the dataset, not the maximum length of a batch, so this does not work. DataCollatorForSeqSeq is needed until we have the datasets v2 release to do that padding as a transform on the dataset (soon!) but we can't deprecate it just now.

Aah, yes. Then do you think we could pass the prepare_decoder_input_ids_from_labels method to DataCollatorForSeq2Seq and prepare the decoder_input_ids there ?

I'm happy with that solution, yes. We can pass the model to DataCollatorForSeq2Seq and do the check for the method there (better to pass the object than a function).

Hey @sgugger , @patil-suraj
I am trying to run "run_seq2seq.py" to train mT5 for translation task. I am getting the following error though:

ImportError: cannot import name 'DataCollatorForSeq2Seq' from 'transformers' (unknown location)

It seems that DataCollatorForSeq2Seq is already removed from the transformers package, right?

P.S. my transformers version: 4.2.2 installed using pip

Hey @Arman-IMRSV,

please use issues to report bugs

sgugger

Thanks a lot for the work here!

patil-suraj mentioned this pull request Jan 29, 2021

[seq2seq] correctly handle mt5 #9879

Merged

patil-suraj commented Feb 4, 2021

View reviewed changes

Comment thread examples/seq2seq/run_seq2seq.py Outdated

sgugger approved these changes Feb 4, 2021

View reviewed changes

Comment thread examples/seq2seq/run_seq2seq.py Outdated

Comment thread examples/seq2seq/run_seq2seq.py Outdated

patil-suraj changed the title ~~[WIP][examples/seq2seq] support label smoothing and enc/emb freezing~~ [examples/seq2seq] support label smoothing and enc/emb freezing Feb 5, 2021