support new marian models#15831
Conversation
| # if word embeddings are not tied, make sure that lm head is resized as well | ||
| if ( | ||
| self.config.share_encoder_decoder_embeddings | ||
| and self.get_output_embeddings() is not None | ||
| and not self.config.tie_word_embeddings | ||
| ): | ||
| old_lm_head = self.get_output_embeddings() | ||
| new_lm_head = self._get_resized_lm_head(old_lm_head, new_num_tokens) | ||
| self.set_output_embeddings(new_lm_head) |
There was a problem hiding this comment.
This will only resize the lm_head if embeddings are shared.
| # if embeddings are shared this will return shared embeddings otherwise decoder embed_tokens | ||
| word_embeddings = self.get_decoder().get_input_embeddings() | ||
| self._tie_or_clone_weights(output_embeddings, word_embeddings) |
There was a problem hiding this comment.
We always return decoder embeddings here. This should work for both cases, shared or not shared.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
|
||
| def get_decoder_input_embeddings(self): | ||
| if self.config.share_encoder_decoder_embeddings: | ||
| raise ValueError( |
There was a problem hiding this comment.
Why raise an error here? It's totally fine to just return self.get_input_embeddigs() in this case no?
There was a problem hiding this comment.
Still don't think we need to raise here ;-)
patrickvonplaten
left a comment
There was a problem hiding this comment.
Overall, I'm in favor of adding the new Marian checkpoints the way it is shown here. The change from having a Marian model that always force-tied encoder and decoder embeddings to a Marian model that can switch between force-tied and no-tied encoder input embeddings and encoder output embeddings is the better option here IMO even though it does go a bit again our philosophy of not changing existing model code.
The main reasons why I'm in favor of the approach as it's implemented now are (with the feedback given below):
- All the changes of this PR are also applicable to existing Marian V1 checkpoints. More specifically all Marian V1 checkpoints can be loaded here with
share_encoder_decoder_embeddings=Falseand then fine-tuned with embeddings not being tied. - Marian V2 comes from the exact same library as Marian V1 and is the same model. Creating a new name here (Marian V2) could confuse users.
Thoughts @LysandreJik @sgugger ?
sgugger
left a comment
There was a problem hiding this comment.
Ok for me. It's really pushing the test for a new model to its limit, but I understand the arguments to keep it in the same model.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Looks good to me in general.
Left a couple of comments.
Also given that a bunch of new model checkpoints will be added here - let's maybe add a slow integration test as well?
What does this PR do?
This PR updates the Marian model:
lm_head.vocabsintokenizerforsrcandtgtlanguageTo support this, the PR introduces the following new methods:
get_decoder_input_embeddingsandset_decoder_input_embeddingsTo get and set the decoder embeddings when the embeddings are not shared. These methods will raise an error if the embeddings are shared.
resize_decoder_token_embeddingsTo only resize the decoder embeddings. Will raise an error if the embeddings are shared.
This PR also adds two new config attributes to
MarianConfig:share_encoder_decoder_embeddings: to indicate if emb should be shared or notdecoder_vocab_size: to specify the vocab size for decoder when emb are not shared.And the following methods from
PreTrainedModelclass are overridden to support these changes:tie_weights_resize_token_embeddingsFixes #15109