Add head_mask/decoder_head_mask for BART by stancld · Pull Request #9569 · huggingface/transformers

stancld · 2021-01-13T16:22:37Z

This PR implement head_mask and decoder_head_mask for PyTorch BART-based models. The full list, please, see below:

BART
MBart
Blenderbot
BlenderbotSmall
Marian
Pegasus

This PR is a follow up on the closed PR #9404.

Motivation:
According to HuggingFace's websites "There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call “BERTology”)." This PR enables to mask attention heads in encoder and decoder models exactly like for BERT. This PR thus creates an opportunity to study the importance of attention heads in encoder-decoder BERT-like model.

Description

New arguments head_mask anddecoder_head_mask are passed to all the BART-based models ...Model, ...ForConditionalGeneration and ...ForQuestionAnswering after four arguments input_ids, attention_mask, decoder_input_ids, decoder_attention_mask so that a testing and whole pipeline remains smooth.
This PR also contains updated test_headmasking, which currently works fine with one problem - BART-based models do not satisfy a condition:

self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0).

Fixing this problem is currently underway.

Reviewer: @patrickvonplaten

This branch implement head_mask and decoder_head_mask for BART-based models. Full list below: - BART - MBart - Blenderbot - BlenderbotSmall - Marian - Pegasus Everything is accompanied with updated testing.

patrickvonplaten · 2021-01-13T16:38:05Z

Thanks for opening a new PR. Let me know if you need a review (It's also ok if I go into the PR and fix some things if your stuck :-) )

stancld · 2021-01-13T17:05:02Z

@patrickvonplaten I hope this PR is again ready for review. The only thing remaining to resolve is that issue in test_headmasking described above. Currently, I've been trying to fix this one, but I'll be grateful for sure if you can have a look at that too :)

* Fix text_headmasking for BART-like models which has only 2 layers in each modules. The condition ``` self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0) ``` is, therefore, invalid for encoder-decoder models considering the `head_mask` ``` head_mask = torch.ones( self.model_tester.num_hidden_layers, self.model_tester.num_attention_heads, device=torch_device, ) head_mask[0, 0] = 0 head_mask[-1, :-1] = 0 ``` specified in the `test_headmasking` test/function.

stancld · 2021-01-14T12:18:13Z

Hey @patrickvonplaten. I would like to inform you I fixed test_headmasking for BART-based. The problem was that code inside

self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)

pointed to the last layer of encoder/decoder (encoder-decoder models have only 2 layers in each module while BERT has 5 layers during testing). At the end of the day, this condition was invalid for BART-based models considering the head_mask to be

head_mask = torch.ones(
    self.model_tester.num_hidden_layers,
    self.model_tester.num_attention_heads,
    device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0

I hope this PR is then ready for review.

patrickvonplaten · 2021-01-14T23:18:59Z

    is_encoder_decoder = True
    test_pruning = False
-    test_head_masking = False
+    test_head_masking = True


patrickvonplaten · 2021-01-14T23:19:49Z

+            if model.config.is_encoder_decoder:
+                signature = inspect.signature(model.forward)
+                arg_names = [*signature.parameters.keys()]
+                if "decoder_head_mask" in arg_names:  # necessary diferentiation because of T5 model


good for me for now - could you maybe open an issue saying that T5 should separate "head_mask" and "decoder_head_mask" and ping me on it? Then we can clean this up for T5 at a later stage :-)

patrickvonplaten

This is awesome! Amazing work @stancld - very clean and nice comments. If you want it would be awesome if you could open an issue regarding T5 having only a "head_mask", but no "decoder_head_mask" and ping me on it so that we can fix this in a follow-up PR :-)

Otherwise, LGTM!

sgugger

Looks great! Thanks for adding this!

* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration * Slightly change the order of input args to be in accordance with the convention from BART-based models introduced within the PR huggingface#9569.

* Separate head_mask and decoder_head_mask args in TF T5 models * Slightly change the order of input args to follow convention of BART-based models updated in PR huggingface#9569 * Update test_forward_signature tests/test_modeling_tf_common.py w.r.t. the changed order of input args

* Add head_mask and decoder_head_mask input arguments for TF BART-based models as a TF counterpart to the PR huggingface#9569 * Add test_headmasking functionality to tests/test_modeling_tf_common.py * TODO: Add a test to verify that we can get a gradient back for importance score computation

LysandreJik

This is great, very clean implementation! Thanks for implementing the tests, too.

LGTM!

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add decoder_head_mask for PyTorch T5 model * Add decoder_head_mask args into T5Model and T5ForConditionalGeneration * Slightly change the order of input args to be in accordance with the convention from BART-based models introduced within the PR #9569. * Make style for modeling_t5.py * Add decoder_head_mask for TF T5 models * Separate head_mask and decoder_head_mask args in TF T5 models * Slightly change the order of input args to follow convention of BART-based models updated in PR #9569 * Update test_forward_signature tests/test_modeling_tf_common.py w.r.t. the changed order of input args * Add FutureWarnings for T5 and TFT5 models * Add FutureWarnings for T5 and TFT5 models warning a user that input argument `head_mask` was split into two arguments - `head_mask` and `decoder_head_mask` * Add default behaviour - `decoder_head_mask` is set to copy `head_mask` * Fix T5 modeling and FutureWarning * Make proper usage of head_mask and decoder_head_mask in cross_attention * Fix conditions for raising FutureWarning * Reformat FutureWarning in T5 modeling * Refactor the warning message

* Add head_mask/decoder_head_mask for TF BART models * Add head_mask and decoder_head_mask input arguments for TF BART-based models as a TF counterpart to the PR #9569 * Add test_headmasking functionality to tests/test_modeling_tf_common.py * TODO: Add a test to verify that we can get a gradient back for importance score computation * Remove redundant #TODO note Remove redundant #TODO note from tests/test_modeling_tf_common.py * Fix assertions * Make style * Fix ...Model input args and adjust one new test * Add back head_mask and decoder_head_mask to BART-based ...Model after the last commit * Remove head_mask ande decoder_head_mask from input_dict in TF test_train_pipeline_custom_model as these two have different shape than other input args (Necessary for passing this test) * Revert adding global_rng in test_modeling_tf_common.py

Add head_mask/decoder_head_mask for BART

ac9dda4

This branch implement head_mask and decoder_head_mask for BART-based models. Full list below: - BART - MBart - Blenderbot - BlenderbotSmall - Marian - Pegasus Everything is accompanied with updated testing.

stancld mentioned this pull request Jan 13, 2021

Add head_mask/decoder_head_mask for BART #9404

Closed

stancld added 2 commits January 14, 2021 12:27

Adjust test_modeling_common.py to reflect T5 input args

41c0a4e

patrickvonplaten reviewed Jan 14, 2021

View reviewed changes

Comment thread tests/test_modeling_common.py

patrickvonplaten reviewed Jan 14, 2021

View reviewed changes

patrickvonplaten approved these changes Jan 14, 2021

View reviewed changes

patrickvonplaten requested review from LysandreJik and sgugger January 14, 2021 23:23

sgugger approved these changes Jan 15, 2021

View reviewed changes

stancld mentioned this pull request Jan 16, 2021

Missing argument: decoder_head_mask for T5 #9632

Closed

stancld mentioned this pull request Jan 16, 2021

Add separated decoder_head_mask for T5 Models #9634

Merged

stancld mentioned this pull request Jan 16, 2021

Add head_mask/decoder_head_mask for TF BART models #9639

Merged

LysandreJik approved these changes Jan 18, 2021

View reviewed changes

Comment thread tests/test_modeling_common.py Outdated

patrickvonplaten and others added 4 commits January 18, 2021 12:54

Update tests/test_modeling_common.py

457fe0d

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Apply suggestions from code review

2b0d58e

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

make style

c1159bd

make fix-copies

f007d54

patrickvonplaten merged commit 357fb1c into huggingface:master Jan 18, 2021

stancld deleted the head_mask_for_bart_new branch January 26, 2021 16:35

stancld mentioned this pull request Jan 26, 2021

Missing head_mask and decoder_head_mask arguments in encoder-decoder models #9814

Closed

This was referenced Jan 26, 2021

Add head_mask and decoder_head_mask to FSMT #9819

Merged

Add head_mask and decoder_head_mask to PyTorch LED #9856

Merged

Add head_mask, decoder_head_mask, cross_head_mask to ProphetNet #9964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add head_mask/decoder_head_mask for BART#9569

Add head_mask/decoder_head_mask for BART#9569
patrickvonplaten merged 7 commits intohuggingface:masterfrom
stancld:head_mask_for_bart_new

stancld commented Jan 13, 2021

Uh oh!

patrickvonplaten commented Jan 13, 2021

Uh oh!

stancld commented Jan 13, 2021

Uh oh!

stancld commented Jan 14, 2021

Uh oh!

Uh oh!

patrickvonplaten Jan 14, 2021

Uh oh!

patrickvonplaten Jan 14, 2021 •

edited

Loading

Uh oh!

patrickvonplaten left a comment

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stancld commented Jan 13, 2021

Uh oh!

patrickvonplaten commented Jan 13, 2021

Uh oh!

stancld commented Jan 13, 2021

Uh oh!

stancld commented Jan 14, 2021

Uh oh!

Uh oh!

patrickvonplaten Jan 14, 2021

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

patrickvonplaten Jan 14, 2021 •

edited

Loading