Skip to content

Add head_mask and decoder_head_mask to TF LED#9988

Merged
LysandreJik merged 4 commits intohuggingface:masterfrom
stancld:TF_LED_head_mask
Feb 9, 2021
Merged

Add head_mask and decoder_head_mask to TF LED#9988
LysandreJik merged 4 commits intohuggingface:masterfrom
stancld:TF_LED_head_mask

Conversation

@stancld
Copy link
Copy Markdown
Contributor

@stancld stancld commented Feb 3, 2021

This PR implements head_mask and decoder_head_mask for TF LED (and Longformer as there's a copy dependency) and it is the follow-up to the open issue #9814.

Motivation: This PR is a part of an endeavour to enable the usage of head_mask and decoder_head_mask for all encoder-decoder transformers following the recent work on BART-like models (#9639).


Fixes: #9814

Reviewers: @jplu @patrickvonplaten @LysandreJik @sgugger

@stancld stancld changed the title Add head_mask and decoder_head_mask to TF LED [WIP]: Add head_mask and decoder_head_mask to TF LED Feb 3, 2021
@stancld stancld changed the title [WIP]: Add head_mask and decoder_head_mask to TF LED Add head_mask and decoder_head_mask to TF LED Feb 4, 2021
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this!

# compute global attn probs
global_attn_probs_float = tf.nn.softmax(global_attn_scores, axis=-1)

# apply layer head maskin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# apply layer head maskin
# apply layer head masking

Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Awesome that you've added the head masking also for both the local and global attention. Longformer is a tricky model so great job!

@LysandreJik LysandreJik requested a review from jplu February 8, 2021 21:00
Copy link
Copy Markdown
Contributor

@jplu jplu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot for the great work on this!

Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @stancld!

@LysandreJik LysandreJik merged commit e7381c4 into huggingface:master Feb 9, 2021
@stancld stancld deleted the TF_LED_head_mask branch February 9, 2021 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing head_mask and decoder_head_mask arguments in encoder-decoder models

5 participants