Fix llama sin_cached/cos_cached backward compatibility#29299
Fix llama sin_cached/cos_cached backward compatibility#29299fxmarty wants to merge 2 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim)) | ||
| self.register_buffer("inv_freq", inv_freq, persistent=False) | ||
|
|
||
| # TODO: Remove in 4.40. |
There was a problem hiding this comment.
why 4.40 here? This kind of version dependant removal would be for deprecation of a feature, but AFAICT in the PR comment we don't have an implemented fix which replaces this
There was a problem hiding this comment.
@amyeroberts I just followed 7d312ad The sin_cached attribute will be removed in 4.40. cc @gante
There was a problem hiding this comment.
huh - OK. Won't this means things still break though?
There was a problem hiding this comment.
I don't think we can remove them, no 💔
|
@fxmarty The extent of the fix may depend on the following question: are the libraries downstream broken because a) of the lack of the tensors, or because b) the lack of the tensors AND their values? The PR as it stands would fix a), but it probably wouldn't fix b). Full story of how this came to be:
|
|
Note to ourselves: non-permanent buffers can't be treated as common variables for deprecation purposes 😬 |
|
#29198 will add them at init time. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Let's not duplicate the work
|
Was not aware #29198 was a fix for that, nice! Note that with |
|
Feel free to comment over there |
The
_sin_cached&_cos_cachedare never set in the init (compare to https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/llama/modeling_llama.py#L134-L136), which yields errors in external packages as backward compatibility is broken (e.g. in https://github.com/AutoGPTQ/AutoGPTQ/blob/6b55300dd83326504ee6e02b730fa4451adfa479/auto_gptq/modeling/_utils.py#L95-L96)IMO this should be in a patch release.