Skip to content

LED#9278

Merged
patrickvonplaten merged 35 commits intohuggingface:masterfrom
patrickvonplaten:Led
Jan 5, 2021
Merged

LED#9278
patrickvonplaten merged 35 commits intohuggingface:masterfrom
patrickvonplaten:Led

Conversation

@patrickvonplaten
Copy link
Copy Markdown
Contributor

@patrickvonplaten patrickvonplaten commented Dec 23, 2020

What does this PR do?

Adds LongformerEncoderDecoder (LED) from @ibeltagy - see: https://github.com/allenai/longformer#longformer

Todo:

  • Important: position embeddings have to be cut to correctly convert original Bart-ilke checkpoints to LED. The reason is that Bart uses a position embedding hack because of which the embedding idx 0 and 1 are never used resulting in an embedding matrix that has a length of 1026 instead of 1024, see:
    # Bart is set up so that if padding_idx is specified then offset the embedding ids by 2
    . All LED checkpoints are hence cut to remove this hack in LED:
model = LEDForConditionalGeneration.from_pretrained("./led-base-16384")
model.model.encoder.embed_positions.weight = torch.nn.Parameter(model.model.encoder.embed_positions.weight[2:, :])
model.model.decoder.embed_positions.weight = torch.nn.Parameter(model.model.decoder.embed_positions.weight[2:, :])
model.save_pretrained("./led-base-16384")

TODO after PR is merged:

  • Correctly add # Copied from .... statements from Bart and Longformer (this probably requires the Bart refactor to be merged before)
  • Open issue regarding problems with TF save_model test
  • Correct templates: delete unnecessary test for tf bart; add gradient checkpointing by default in PT

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

@patrickvonplaten patrickvonplaten changed the title LED [WIP]LED Dec 23, 2020
@patrickvonplaten patrickvonplaten changed the title [WIP]LED [WIP] LED Dec 23, 2020
Comment thread src/transformers/models/led/modeling_led.py
return outputs


# Copied from transformers.models.longformer.modeling_longformer.LongformerSelfAttention with Longformer->LEDEncoder
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibeltagy this line ensures that the whole class has to be exactly the same as the corresponding class in LongformerSelfAttention or else the tests would throw an error. This is our safety check to make sure that the original class (which in other libs would just be imported and re-used here) cannot change without this class being changed accordingly as well.

Comment thread tests/test_modeling_tf_led.py
@patrickvonplaten patrickvonplaten changed the title [WIP] LED LED Dec 27, 2020
Copy link
Copy Markdown
Contributor

@jplu jplu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!!! Just left few tiny comments.

Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job implementing this! Happy to hear the model templates made your life easier, too.

Left a few nits, but LGTM!

PS: That 16384 embedding size is incredible!

Comment thread docs/source/model_doc/led.rst
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py
Comment thread tests/test_modeling_tf_led.py
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks for adding this new model!

Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
Comment thread src/transformers/models/led/modeling_led.py Outdated
Comment thread src/transformers/models/led/modeling_tf_led.py Outdated
@jplu
Copy link
Copy Markdown
Contributor

jplu commented Jan 5, 2021

@patrickvonplaten when you have time, can you fix the conflicts and apply the same updates merged in Longformer to LED. Thanks!

@patrickvonplaten patrickvonplaten merged commit 189387e into huggingface:master Jan 5, 2021
@patrickvonplaten patrickvonplaten deleted the Led branch January 5, 2021 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants