perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models#45555
perf: avoid recomputing rotary_emb for each layer in some Google and ModernBERT models#45555vasqu merged 2 commits intohuggingface:mainfrom
Conversation
|
I checked gemma4, they are correctly doing: But I'll check if some other Google models are also doing it the unoptimized way |
vasqu
left a comment
There was a problem hiding this comment.
Careful approval, just checking with run-slow + update us if you find other models :D
|
run-slow: gemma3 |
|
This comment contains models: ["models/gemma3"] |
|
Yes sounds good! Can you wait for the slow tests to return results before pushing? Other than that, feel free to add all of them here + lets adjust the PR title |
gemma3): Avoid recomputing rotary_emb for each layer |
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma3, gemma3n, modernbert, modernbert_decoder, t5gemma2 |
|
run-slow: gemma3, gemma3n, modernbert, modernbert_decoder, t5gemma2 |
|
This comment contains models: ["models/gemma3", "models/gemma3n", "models/modernbert", "models/modernbert_decoder", "models/t5gemma2"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Awesome, merging this :) |

What does this PR do?
Following #45144 (comment)
See picture below (of the comment #45144 (comment)) for details (comment link will not be directly available after resolve and will just point to the linked PR)
Click to expand image
Code Agent Policy
The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.
PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.
This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read
CONTRIBUTING.md.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
It was discussed with @vasqu and also @zucchini-nlp.
I kept Raushan suggestion with just set() (but can also do
self.rotary_emb.layer_typesit's a minor detail either way.)