Fix spurious position_ids warnings for at least 40 architectures#45437
Fix spurious position_ids warnings for at least 40 architectures#45437tomaarsen merged 2 commits intohuggingface:mainfrom
Conversation
As these are now handled by the generic case
|
[For maintainers] Suggested jobs to run (before merge) run-slow: owlv2, owlvit |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45437&sha=9bdb99 |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Ig the failing tests aren't related
| # Same idea for `position_ids`: used to be a persistent buffer, now `persistent=False` in most models. | ||
| has_position_ids_buffers = any(buffer.endswith("position_ids") for buffer, _ in self.named_buffers()) | ||
| if has_position_ids_buffers: | ||
| additional_unexpected_patterns.append(r"(^|\.)position_ids$") |
There was a problem hiding this comment.
fine by me, I don't think there is any model where position ids are so special and must be loaded from ckpt
There was a problem hiding this comment.
Exactly, I think there shouldn't be any. Plus, I believe this is only ignoring warnings for where the model isn't expecting it, but the checkpoint has it extra, so these are always cases where the weights are just being ignored (i.e. no actual changes in behaviour, just less warnings).
What does this PR do?
Supersedes #45385
Code Agent Policy
I used an agent to track down how many architectures this affected
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Details
A lot of my older BERT-adjacent models have started getting warnings:
This warning is reasonable: BERT doesn't need any
position_idsto be stored in the weights, because it's literally just antorch.arange. After the v5 release, these warnings started popping up. The warnings aren't useful: users might get freaked out, not realising that their models will still work 100% correctly.I found out that this is an issue on a lot of architectures, so I propose a generic fix, just like with
rotary_emb.inv_freq.I asked an agent to find all architectures with a
persistent=Falseposition_ids, then use my https://github.com/huggingface/sentence-transformers/blob/main/tests/base/modules/transformer/transformers_tiny_models.json mapping to grab old (and tiny) models that might emit this warning:embeddings.position_idstext_model.embeddings.position_idstext_model.roberta.embeddings.position_ids,vision_model.embeddings.position_idsbert.embeddings.position_idsembeddings.position_idstext_model.embeddings.position_idsembeddings.position_idschar_embeddings.position_idstext_model.embeddings.position_ids,vision_model.embeddings.position_idstext_model.embeddings.position_ids,vision_model.embeddings.position_idstext_model.embeddings.position_ids,vision_model.embeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idstext_model.embeddings.position_idsembeddings.position_idsembeddings.position_ids,image_encoder.vision_model.embeddings.position_idstext_model.embeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idstransformer.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsembeddings.text_embeddings.position_idsembeddings.position_idstext_model.embeddings.position_ids,vision_model.embeddings.position_idsposition_idsembeddings.position_idsembeddings.position_idsembeddings.position_idsIn short: it found 54 checkpoints, and 45 of those emitted the
position_idswarnings. Now, none of them do. This should clean up a lot of the warnings that users are experiencing with my older https://huggingface.co/sentence-transformers models.Sidenote:
bros,clap,mraregisterposition_idsas a persistent buffer. We can (or perhaps should?) probably set these to no longer be persistent.Who can review?
@Cyrilvallez @zucchini-nlp