🚨 [v5] Refactor RoPE for layer types#39847
Merged
zucchini-nlp merged 103 commits intohuggingface:mainfrom Oct 17, 2025
Merged
Conversation
Closed
ArthurZucker
reviewed
Aug 5, 2025
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, arcee, aria, bamba, bitnet, blt, chameleon |
Member
Author
|
Just to be sure, let the slow CI run on some important models |
Contributor
|
This comment contains run-slow, running the specified jobs: models: ['models/gemma', 'models/gemma3', 'models/gpt2', 'models/lfm2_vl', 'models/llama', 'models/llava', 'models/mistral', 'models/olmo', 'models/paligemma', 'models/qwen2_5_omni', 'models/qwen2_vl'] |
Member
Author
|
🤞🏻 |
6 tasks
1 task
Contributor
|
This PR caused QwenLM/Qwen3-Omni#93 |
Collaborator
yonigozlan
pushed a commit
to yonigozlan/transformers
that referenced
this pull request
Oct 21, 2025
* update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
ngazagna-qc
pushed a commit
to ngazagna-qc/transformers
that referenced
this pull request
Oct 23, 2025
* update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
1 task
4 tasks
5 tasks
This was referenced Dec 13, 2025
SangbumChoi
pushed a commit
to SangbumChoi/transformers
that referenced
this pull request
Jan 23, 2026
* update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
alvarobartt
added a commit
to huggingface/text-embeddings-inference
that referenced
this pull request
Feb 18, 2026
alvarobartt
added a commit
to huggingface/text-embeddings-inference
that referenced
this pull request
Feb 18, 2026
alvarobartt
added a commit
to huggingface/text-embeddings-inference
that referenced
this pull request
Feb 18, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR enables rope layers to compute different frequencies for different layer types, which will help us to support models like ModernBert without monkey patching config on-the-fly
Main changes:
rope_parametersis a required attribute if model has RoPE layers. The attr must be a dict containingrope_thetaand optionally other parameters to configure rope. In case we want different params per layer type, it should be a nested dict of format{"full_attn": {**rope_params}, "sliding_attn": {**different_rope_params}}rope_scalingis deprecated in favor ofrope_parametersand raises warning. The latter name is more descriptiveeager_attention_forward, and copied with modular in each fileinv_freqfor each type. If the given layer types has no rope parameters saved in config (e.g.config.rope_scalinghas no key=="sliding_window") we raise an errorTypedDict. It will make our lives easier when we decide to enforce strict type validation on configsThe changes are BC and we will support old-format config files, and standardize it when initializing the config class. The best way to review is to start from
modeling_rope_utils.py->all llama model files->gemma2 and gemma33 model files->tests