Fix GPT-OSS TP IndexError and unwrapping DTensor by akshan-main · Pull Request #42356 · huggingface/transformers

akshan-main · 2025-11-24T10:04:02Z

[GPT-OSS] Fix Tensor Parallelism IndexError and DTensor casting

What does this PR do?

This PR fixes two specific issues preventing GPT-OSS models from training with Tensor Parallelism (TP) and FSDP, as reported in #41819.

The changes are:

Fix IndexError in TP Hooks:
- Bug: The tensor_parallel hooks in transformers expect the input tensor (hidden states) to be passed as the first positional argument (args[0]). The GptOssDecoderLayer was previously passing hidden_states as a keyword argument, causing the hook to fail with IndexError: tuple index out of range.
- Fix: Updated GptOssDecoderLayer.forward to pass hidden_states as the first positional argument.
Fix DTensor Casting in Eager Attention:
- Bug: When TP is enabled, module.sinks is wrapped as a DTensor. The eager_attention_forward function attempts to torch.cat this with attn_weights (a local tensor), causing a crash.
- Fix: Added a check to detect if sinks is a DTensor and unwrap it before concatenation.

Status:
I have applied these changes to modular_gpt_oss.py and ran make fix-copies

Fixes #41819

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. (Linked above)
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@3outeille
@ArthurZucker

github-actions · 2025-11-25T04:03:52Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss

akshan-main · 2025-11-28T03:40:47Z

Hey @3outeille @ArthurZucker ! It'd be great if this pr can be reviewed and you'd want me to integrate anything else

3outeille · 2025-11-24T12:36:48Z


-    sinks = module.sinks.reshape(1, -1, 1, 1).expand(query.shape[0], -1, query.shape[-2], -1)
+    sinks = module.sinks
+    if type(sinks).__name__ == "DTensor":


we would like to not have Dtensor logic in the modeling. For example, sinks are supposed to use local_rowwise (cf https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt_oss/configuration_gpt_oss.py#L41) which is supposed to not return a Dtensor (cf https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/tensor_parallel.py#L1171) but somehow doesnt work

I think cleanest way to handle this without modifying the HF modeling would be to understand why the sinks is still Dtensor after local_rowwise

Hi @akshan-main , @3outeille , attn_weights should also be Dtensor right, if the model is prepared for tp_auto. When accelerate uses _prepare_tp function, it first prepares the model by converting all the model parameters to Dtensor.

3outeille · 2025-11-24T12:37:16Z


-    sinks = module.sinks.reshape(1, -1, 1, 1).expand(query.shape[0], -1, query.shape[-2], -1)
+    sinks = module.sinks
+    if type(sinks).__name__ == "DTensor":


same statement as above about Dtensor and local_rowwise

3outeille · 2025-11-28T07:41:32Z

sorry context switching so hard that I forgot to click on submit the reviews lol

akshan-main · 2025-11-29T15:59:24Z

sorry context switching so hard that I forgot to click on submit the reviews lol

haha understandable! I will work on trying to fix this

quic-akuruvil · 2025-12-11T06:00:40Z

Hi @akshan-main, I am also seeing DTensor type for self.weights and self.bias in GptOssTopKRouter module. And at multiple other places

3outeille · 2025-12-16T16:11:04Z

@akshan-main @quic-akuruvil encountered the issue again and had to fix it: #42906

akshan-main and others added 3 commits November 24, 2025 01:55

Fix GPT-OSS TP IndexError and unwrapping DTensor

cf20a81

Merge branch 'main' into fix-gpt-oss-tp

4f35628

Merge branch 'main' into fix-gpt-oss-tp

f0bbe43

ArthurZucker requested a review from 3outeille November 24, 2025 12:18

akshan-main added 2 commits November 24, 2025 09:20

Merge branch 'main' into fix-gpt-oss-tp

044caf1

Merge branch 'main' into fix-gpt-oss-tp

807c984

3outeille reviewed Nov 28, 2025

View reviewed changes

3outeille mentioned this pull request Dec 16, 2025

fix Dtensor and tensor mismatch #42906

Merged

3outeille closed this Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPT-OSS TP IndexError and unwrapping DTensor#42356

Fix GPT-OSS TP IndexError and unwrapping DTensor#42356
akshan-main wants to merge 5 commits intohuggingface:mainfrom
akshan-main:fix-gpt-oss-tp

akshan-main commented Nov 24, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Nov 25, 2025

Uh oh!

akshan-main commented Nov 28, 2025

Uh oh!

3outeille Nov 24, 2025

Uh oh!

quic-akuruvil Dec 16, 2025

Uh oh!

3outeille Nov 24, 2025

Uh oh!

3outeille commented Nov 28, 2025 •

edited

Loading

Uh oh!

akshan-main commented Nov 29, 2025

Uh oh!

quic-akuruvil commented Dec 11, 2025 •

edited

Loading

Uh oh!

3outeille commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akshan-main commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[GPT-OSS] Fix Tensor Parallelism IndexError and DTensor casting

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Nov 25, 2025

Uh oh!

akshan-main commented Nov 28, 2025

Uh oh!

3outeille Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

quic-akuruvil Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

3outeille Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

3outeille commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshan-main commented Nov 29, 2025

Uh oh!

quic-akuruvil commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

3outeille commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akshan-main commented Nov 24, 2025 •

edited

Loading

3outeille commented Nov 28, 2025 •

edited

Loading

quic-akuruvil commented Dec 11, 2025 •

edited

Loading