Add tie_weights() to LM heads and set bias in set_output_embeddings()#28948
Add tie_weights() to LM heads and set bias in set_output_embeddings()#28948amyeroberts merged 5 commits intohuggingface:mainfrom
Conversation
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for digging into this and fixing for all these models!
Just two things I think we need to do before merging:
- Run the slow model tests for these models to confirm this change is backwards compatible.
- Apply the test to all models.
There was a problem hiding this comment.
This should be applied to all the models. Could you move the test to ModelTesterMixin?
There was a problem hiding this comment.
Done.
I also noticed that there were other non-slow tests in ModelTesterMixin that loaded/saved pretrained models.
I had a hunch that the saving/loading should actually be pretty fast, and so ran the tests without the @slow tag. The before/after runs on CircleCI showed similar timings for tests_torch, so my 2 cents is that we don't need to mark this test as @slow.
Before run (tests_torch ran in 6m58s):
https://app.circleci.com/pipelines/github/huggingface/transformers/84140/workflows/0998b8ba-9336-49e5-9680-2ddd86443669
After run (tests_torch ran in 6m48s):
https://app.circleci.com/pipelines/github/huggingface/transformers/84451/workflows/f5efa9af-c079-4303-8611-574f8a3bf7bd
The bias were not tied correctly in some LM heads, and this change should fix that.
eafefb4 to
e40f605
Compare
There was a problem hiding this comment.
This came from an existing comment:
It seems like the DetaForObjectDetection model has logic to manipulate the params during init, and so it simply doesn't support initializing on a meta device.
With that said, should we add some kind of flag in PreTrainedModel that would throw an error when trying to initial on meta device for models that do not support it? So far, it seems like DetaModel is the only one, so maybe not worth the effort to do so.
There was a problem hiding this comment.
Let's keep as-is for the moment, until we have more affected models. DETR and DETR-like models are special but I think they're the only case.
…nnot init on meta device
59b1fa2 to
e454391
Compare
amyeroberts
left a comment
There was a problem hiding this comment.
Great work - thanks for adding this!
There was a problem hiding this comment.
Let's keep as-is for the moment, until we have more affected models. DETR and DETR-like models are special but I think they're the only case.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ddings() (huggingface#28948)" This reverts commit 725f4ad.
What does this PR do?
This fixes a bug from the wrong bias in prediction heads in some situations. The
predictions.biasneeds to be tied topredictions.decoder.biasinsidetie_weights().Repro Steps:
test_modeling_bert.pyand run it:The issue was uncovered in #28802.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @amyeroberts