Add position ids in forward pass to opt model#33121
Add position ids in forward pass to opt model#33121ArthurZucker merged 13 commits intohuggingface:mainfrom
Conversation
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey! in general the trick is that adding a new argument needs to be done at the end of the forward pass otherwise you are breaking the model for people who directly call the model
| _CONFIG_FOR_DOC = "BioGptConfig" | ||
|
|
||
|
|
||
| # Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt |
There was a problem hiding this comment.
let's just add a TODO here or also update that model
There was a problem hiding this comment.
should I return the comment as well? it causes a fail when using make fixup.
I tried to look to update BioGPT, but it seems to be a lot of code from different models, so i didn't know if i should have touched it. I can work on it next.
| super().__init__(num_embeddings + self.offset, embedding_dim) | ||
|
|
||
| def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0): | ||
| def forward(self, position_ids: torch.LongTensor): |
There was a problem hiding this comment.
that is kind of a breaking change for this module 😓
There was a problem hiding this comment.
why is this a problem? the weights are the same so loading should work. and this module should not be used by outside code so it is not supposed to break anything.
There was a problem hiding this comment.
Yeah but it has caused issue in the past 😉
There was a problem hiding this comment.
OK, how do you think I should do it? The module should get position_ids to work with packed sentences. Should I add position_ids as the last argument with None as default?
|
I tried to keep the API as similar as possible to Thanks for the feedback! |
ArthurZucker
left a comment
There was a problem hiding this comment.
cc @gante WDYT about this? In general IMO we should just run basic position ids init. Tho taking padding into account should be "alright", it's already done in generate, this would help for forward and training.
Just need to be careful as we also want to support packing
| super().__init__(num_embeddings + self.offset, embedding_dim) | ||
|
|
||
| def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0): | ||
| def forward(self, position_ids: torch.LongTensor): |
There was a problem hiding this comment.
Yeah but it has caused issue in the past 😉
|
about the forward call for the embedding layer. I think it has to take position ids as an argument. otherwise it will not work with packed sentences. |
|
@ArthurZucker I am thinking that maybe the best solution for the embedding layer is to add position_ids as an arg to the forward pass with default None. this is probably backward compatible, but will still help with packed sentences. the problem is probably that the code will not be very nice |
gante
left a comment
There was a problem hiding this comment.
@ArthurZucker I'm pro position_ids as it standardizes OPT wrt other models 🙌
@avishaiElmakies Thank you for adding the fix 🤗 Have a look at unresolved comments (you'd be surprised with how easy it is to break code for other external libraries, hyrum's law definitely applies to transformers)
|
@gante, thanks! Happy to contribute. I would love some guidance on the last two comments left, what should I do with the position_ids in the embedding module. In my opion it should be able to get position_ids to work with packed sentences. Maybe a last argument with default None and a check? And about the one liners, would love some guidance on that |
|
@ArthurZucker would love some guidance here so i can finish and move on to other models. |
|
@ArthurZucker changed what you said and refactored the embedding class to be backward compatible. would love some feedback |
ArthurZucker
left a comment
There was a problem hiding this comment.
Feel free to merge @gante if it's alright with you! 🤗 and thanks for your contribution!
| _CONFIG_FOR_DOC = "BioGptConfig" | ||
|
|
||
|
|
||
| # Copied from transformers.models.opt.modeling_opt.OPTLearnedPositionalEmbedding with OPT->BioGpt |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* start working on adding position ids * add docs * Refactor modeling_biogpt.py and modeling_opt.py for code consistency * fix 2 PR comments * move position_ids to end of args * remove trailing white space * add comment with TODO * bug fix gradient checkpointing * fixup * missed on position_ids * remove _attention_to_position_ids and refactor embedding class * remove redundent code --------- Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
What does this PR do?
This pull request adds position_ids to the forward of
OPTin a similar fashion togemmaandllama. #32937Some models didn't have an argument for position_ids in their forward pass.
There are two main reasons we would like for all LM models to get positions ids.
https://github.com/huggingface/transformers/blob/v4.44.1/src/transformers/modeling_flash_attention_utils.py#L270
This handles only OPT, so i can start small and get some feedback.
changes:
Nonecreate position_ids based on attention (similar to original version, so it should work the same if position_ids are not given)a few notes:
make fixup.feature-request #32937
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
@ArthurZucker would love some feedback.