Skip to content

Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active#45414

Merged
ArthurZucker merged 2 commits intomainfrom
fix-zero3
Apr 13, 2026
Merged

Fix IndexError with DeepSpeed ZeRO-3 when kernels rotary is active#45414
ArthurZucker merged 2 commits intomainfrom
fix-zero3

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Summary

Fixes #45137. Re-opened from #45395 on a same-repo branch so CI can run.

Since #41147, attention layers are decorated with @use_kernelized_func(apply_rotary_pos_emb) which attaches a rotary_fn child nn.Module at init when the kernels library is available. DeepSpeed ZeRO-3's parameter coordinator traces the module graph at init and expects every registered submodule to fire during forward. The attention forward still calls the plain Python apply_rotary_pos_emb, so rotary_fn is never invoked and the parameter-fetch trace desynchronizes, raising:

IndexError: pop from an empty deque
  at deepspeed/runtime/zero/partitioned_param_coordinator.py

on the second forward (reproducible via TRL's RLOO/GRPO trainers under ZeRO-3, see huggingface/trl#4899).

Fix

Skip attaching the kernelized submodule when is_deepspeed_zero3_enabled() is true. Under ZeRO-3 the Python apply_rotary_pos_emb path is used (same behavior as before #41147). Non-ZeRO-3 users are unaffected.

The second commit refreshes dates on three model cards (pp_chart2table, slanext, uvdoc) that were missing them on main — required for check-repository-consistency to pass.

Test plan

When `kernels` is installed, `@use_kernelized_func` attaches a
`rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's
parameter coordinator traces the module graph at init and expects
every registered submodule to be invoked during forward. The model's
forward still calls the plain Python `apply_rotary_pos_emb`, so
`rotary_fn` is never executed and the trace desynchronizes, raising
`IndexError: pop from an empty deque` on the second forward.

Skip attaching the kernelized submodule when ZeRO-3 is enabled; users
running under ZeRO-3 fall back to the Python implementation, which is
what they were getting before #41147.

Fixes #45137
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed internally: Disables kernels for deepspeed for now but that's better than no working version at all

@ArthurZucker ArthurZucker added the for patch Tag issues / labels that should be included in the next patch label Apr 13, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ArthurZucker ArthurZucker merged commit 52f2268 into main Apr 13, 2026
37 of 40 checks passed
@ArthurZucker ArthurZucker deleted the fix-zero3 branch April 13, 2026 16:38
ArthurZucker added a commit that referenced this pull request Apr 13, 2026
…45414)

* Fix `IndexError: pop from an empty deque` under DeepSpeed ZeRO-3

When `kernels` is installed, `@use_kernelized_func` attaches a
`rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's
parameter coordinator traces the module graph at init and expects
every registered submodule to be invoked during forward. The model's
forward still calls the plain Python `apply_rotary_pos_emb`, so
`rotary_fn` is never executed and the trace desynchronizes, raising
`IndexError: pop from an empty deque` on the second forward.

Skip attaching the kernelized submodule when ZeRO-3 is enabled; users
running under ZeRO-3 fall back to the Python implementation, which is
what they were getting before #41147.

Fixes #45137

* Add dates to new model cards to satisfy check-repository-consistency
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
…uggingface#45414)

* Fix `IndexError: pop from an empty deque` under DeepSpeed ZeRO-3

When `kernels` is installed, `@use_kernelized_func` attaches a
`rotary_fn` child `nn.Module` to attention layers. DeepSpeed ZeRO-3's
parameter coordinator traces the module graph at init and expects
every registered submodule to be invoked during forward. The model's
forward still calls the plain Python `apply_rotary_pos_emb`, so
`rotary_fn` is never executed and the trace desynchronizes, raising
`IndexError: pop from an empty deque` on the second forward.

Skip attaching the kernelized submodule when ZeRO-3 is enabled; users
running under ZeRO-3 fall back to the Python implementation, which is
what they were getting before huggingface#41147.

Fixes huggingface#45137

* Add dates to new model cards to satisfy check-repository-consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexError: pop from an empty deque with DeepSpeed ZeRO3

3 participants