Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active#45414
Merged
ArthurZucker merged 2 commits intomainfrom Apr 13, 2026
Merged
Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active#45414ArthurZucker merged 2 commits intomainfrom
ArthurZucker merged 2 commits intomainfrom
Conversation
When `kernels` is installed, `@use_kernelized_func` attaches a `rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to be invoked during forward. The model's forward still calls the plain Python `apply_rotary_pos_emb`, so `rotary_fn` is never executed and the trace desynchronizes, raising `IndexError: pop from an empty deque` on the second forward. Skip attaching the kernelized submodule when ZeRO-3 is enabled; users running under ZeRO-3 fall back to the Python implementation, which is what they were getting before #41147. Fixes #45137
vasqu
approved these changes
Apr 13, 2026
Contributor
vasqu
left a comment
There was a problem hiding this comment.
Discussed internally: Disables kernels for deepspeed for now but that's better than no working version at all
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
added a commit
that referenced
this pull request
Apr 13, 2026
…45414) * Fix `IndexError: pop from an empty deque` under DeepSpeed ZeRO-3 When `kernels` is installed, `@use_kernelized_func` attaches a `rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to be invoked during forward. The model's forward still calls the plain Python `apply_rotary_pos_emb`, so `rotary_fn` is never executed and the trace desynchronizes, raising `IndexError: pop from an empty deque` on the second forward. Skip attaching the kernelized submodule when ZeRO-3 is enabled; users running under ZeRO-3 fall back to the Python implementation, which is what they were getting before #41147. Fixes #45137 * Add dates to new model cards to satisfy check-repository-consistency
sirzechs66
pushed a commit
to sirzechs66/transformers
that referenced
this pull request
Apr 18, 2026
…uggingface#45414) * Fix `IndexError: pop from an empty deque` under DeepSpeed ZeRO-3 When `kernels` is installed, `@use_kernelized_func` attaches a `rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to be invoked during forward. The model's forward still calls the plain Python `apply_rotary_pos_emb`, so `rotary_fn` is never executed and the trace desynchronizes, raising `IndexError: pop from an empty deque` on the second forward. Skip attaching the kernelized submodule when ZeRO-3 is enabled; users running under ZeRO-3 fall back to the Python implementation, which is what they were getting before huggingface#41147. Fixes huggingface#45137 * Add dates to new model cards to satisfy check-repository-consistency
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #45137. Re-opened from #45395 on a same-repo branch so CI can run.
Since #41147, attention layers are decorated with
@use_kernelized_func(apply_rotary_pos_emb)which attaches arotary_fnchildnn.Moduleat init when thekernelslibrary is available. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to fire during forward. The attention forward still calls the plain Pythonapply_rotary_pos_emb, sorotary_fnis never invoked and the parameter-fetch trace desynchronizes, raising:on the second forward (reproducible via TRL's RLOO/GRPO trainers under ZeRO-3, see huggingface/trl#4899).
Fix
Skip attaching the kernelized submodule when
is_deepspeed_zero3_enabled()is true. Under ZeRO-3 the Pythonapply_rotary_pos_embpath is used (same behavior as before #41147). Non-ZeRO-3 users are unaffected.The second commit refreshes dates on three model cards (
pp_chart2table,slanext,uvdoc) that were missing them onmain— required forcheck-repository-consistencyto pass.Test plan
IndexError: pop from an empty dequekernelizestill replacesrotary_fnwhen not under ZeRO-3make style+check-repository-consistencypass