Add separated decoder_head_mask for T5 Models by stancld · Pull Request #9634 · huggingface/transformers

stancld · 2021-01-16T11:29:07Z

Fix issue #9632

This PR separates head_mask and decoder_head_mask for T5 models, and thus enables to specify different head masks for an encoder and decoder.

Description:

Replace a single input argument head_mask with a separated couple head_mask and decoder_head_mask for the T5 models: T5Model, T5ForConditionalGeneration, TFT5Model, TFT5ForConditionalGeneration
Slightly change the order of input arguments to follow the convention of first 7 arguments introduced in PR Add head_mask/decoder_head_mask for BART #9569 for BART-based models, i.e. "input_ids", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "head_mask", "decoder_head_mask", "encoder_outputs"
Currently, the updated PyTorch T5 model does not pass test_forward_signature in tests/test_modeling_common.py. This problem will be diminished once PR Add head_mask/decoder_head_mask for BART #9569 to be merged.

Reviewer: @patrickvonplaten (the code is ready for review)

* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration * Slightly change the order of input args to be in accordance with the convention from BART-based models introduced within the PR huggingface#9569.

* Separate head_mask and decoder_head_mask args in TF T5 models * Slightly change the order of input args to follow convention of BART-based models updated in PR huggingface#9569 * Update test_forward_signature tests/test_modeling_tf_common.py w.r.t. the changed order of input args

patrickvonplaten · 2021-01-18T08:07:26Z

Great, that looks nice! Let's first merge #9569 and then rebase this PR so that it passes all tests :-)

LysandreJik

Cool, welcome feature!

This is a slight breaking change, isn't it? Users that once used the head_mask had it control both their encoder and their decoder. From now on, specifying head_mask only controls the encoder, leaving the decoder to be controlled by decoder_head_mask.

Can we do a deprecation cycle, where if no decoder_head_mask is given, we set it to the value of head_mask? Having a FutureWarning there would be nice, too.

talkhaldi · 2021-01-19T05:20:33Z

Thanks for fixing this!

I have one note/question: This seems to only apply to self-attention heads, not heads in the cross attention module, right? Is this intentional?

* Add FutureWarnings for T5 and TFT5 models warning a user that input argument `head_mask` was split into two arguments - `head_mask` and `decoder_head_mask` * Add default behaviour - `decoder_head_mask` is set to copy `head_mask`

stancld · 2021-01-19T07:22:25Z

@talkhaldi Thank you very much for pointing this out. It seems you're right and this is not intentional by myself. It'll be fixed in another commit.

* Make proper usage of head_mask and decoder_head_mask in cross_attention * Fix conditions for raising FutureWarning

stancld · 2021-01-19T09:11:04Z

Hey @patrickvonplaten and @LysandreJik. I've added some FutureWarning into the code to handle cases when only head_mask is passed by a user. Also, I fixed a cross-attention issue noted by @talkhaldi.
I believe, the PR is now ready for review as all the checks have passed after the rebasing.

LysandreJik

LGTM! Thanks for making the change @stancld.

LysandreJik · 2021-01-19T10:45:51Z

+        if head_mask is not None and decoder_head_mask is None:
+            if self.config.num_layers == self.config.num_decoder_layers:
+                warning_msg = """
+                The input argument `head_mask` was split into two arguments `head_mask` and `decoder_head_mask`.
+                Currently, `decoder_head_mask` is set to copy `head_mask`, but this feature is deprecated and will be
+                removed in future versions. If you do not want to use any `decoder_head_mask` now, please set
+                `decoder_head_mask = torch.ones(num_layers, num_heads)`.
+                """
+                warnings.warn(warning_msg, FutureWarning)
+                decoder_head_mask = head_mask


Great message!

sgugger

Thanks for your PR, it's very clean!

sgugger · 2021-01-19T16:55:38Z

+        # FutureWarning: head_mask was separated into two input args - head_mask, decoder_head_mask
+        if head_mask is not None and decoder_head_mask is None:
+            if self.config.num_layers == self.config.num_decoder_layers:
+                warning_msg = """


I see this message is used twice, maybe it could be refactored in a private constant?

That's a good point, thanks!

sgugger · 2021-01-19T16:56:02Z

        """
+        # FutureWarning: head_mask was separated into two input args - head_mask, decoder_head_mask
+        if head_mask is not None and decoder_head_mask is None:
+            warning_msg = """


stancld added 3 commits January 16, 2021 11:13

Add decoder_head_mask for PyTorch T5 model

343da65

* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration * Slightly change the order of input args to be in accordance with the convention from BART-based models introduced within the PR huggingface#9569.

Make style for modeling_t5.py

7fa5dd4

patrickvonplaten approved these changes Jan 18, 2021

View reviewed changes

patrickvonplaten requested review from LysandreJik and sgugger January 18, 2021 08:06

LysandreJik reviewed Jan 18, 2021

View reviewed changes

stancld added 2 commits January 19, 2021 07:25

Merge remote-tracking branch 'upstream/master' into decoder_mask_for_T5

f3bfa50

Add FutureWarnings for T5 and TFT5 models

d7f4a54

* Add FutureWarnings for T5 and TFT5 models warning a user that input argument `head_mask` was split into two arguments - `head_mask` and `decoder_head_mask` * Add default behaviour - `decoder_head_mask` is set to copy `head_mask`

stancld added 2 commits January 19, 2021 09:30

Fix T5 modeling and FutureWarning

5355e23

* Make proper usage of head_mask and decoder_head_mask in cross_attention * Fix conditions for raising FutureWarning

Reformat FutureWarning in T5 modeling

1afd4bb

LysandreJik approved these changes Jan 19, 2021

View reviewed changes

sgugger approved these changes Jan 19, 2021

View reviewed changes

Refactor the warning message

cd5f97a

patrickvonplaten merged commit 2ebbbf5 into huggingface:master Jan 19, 2021

stancld mentioned this pull request Jan 19, 2021

Missing argument: decoder_head_mask for T5 #9632

Closed

stancld deleted the decoder_mask_for_T5 branch January 26, 2021 16:36

stancld mentioned this pull request Jan 26, 2021

Missing head_mask and decoder_head_mask arguments in encoder-decoder models #9814

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add separated decoder_head_mask for T5 Models#9634

Add separated decoder_head_mask for T5 Models#9634
patrickvonplaten merged 8 commits intohuggingface:masterfrom
stancld:decoder_mask_for_T5

stancld commented Jan 16, 2021 •

edited

Loading

Uh oh!

patrickvonplaten commented Jan 18, 2021

Uh oh!

LysandreJik left a comment

Uh oh!

talkhaldi commented Jan 19, 2021

Uh oh!

stancld commented Jan 19, 2021

Uh oh!

stancld commented Jan 19, 2021

Uh oh!

LysandreJik left a comment

Uh oh!

LysandreJik Jan 19, 2021

Uh oh!

sgugger left a comment

Uh oh!

sgugger Jan 19, 2021

Uh oh!

stancld Jan 19, 2021

Uh oh!

sgugger Jan 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

stancld commented Jan 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix issue #9632

Uh oh!

patrickvonplaten commented Jan 18, 2021

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

talkhaldi commented Jan 19, 2021

Uh oh!

stancld commented Jan 19, 2021

Uh oh!

stancld commented Jan 19, 2021

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik Jan 19, 2021

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger Jan 19, 2021

Choose a reason for hiding this comment

Uh oh!

stancld Jan 19, 2021

Choose a reason for hiding this comment

Uh oh!

sgugger Jan 19, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stancld commented Jan 16, 2021 •

edited

Loading