[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training by ivarflakstad · Pull Request #39012 · huggingface/transformers

ivarflakstad · 2025-06-24T17:49:40Z

DeepseekV3ModelTest::test_torch_compile_for_training fails with torch._dynamo.exc.Unsupported: Dynamic shape operator. Attempting to remedy by torch._dynamo.config flags.

Pytorch was struggling to capture the graph when we were using if token_indices.numel() > 0:, so I switched to only loop over the experts that are matching from the expert mask. Should be an improvement regardless of capturing.

HuggingFaceDocBuilderDev · 2025-06-24T18:07:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ivarflakstad · 2025-06-24T18:27:34Z

run-slow: deepseek_v3

github-actions · 2025-06-24T18:29:05Z

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

ivarflakstad · 2025-06-24T18:45:19Z

run-slow: deepseek_v3

github-actions · 2025-06-24T18:46:46Z

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

ivarflakstad · 2025-06-24T20:16:03Z

run-slow: deepseek_v3

github-actions · 2025-06-24T20:17:39Z

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

ivarflakstad · 2025-06-24T20:32:19Z

run-slow: deepseek_v3

github-actions · 2025-06-24T20:33:49Z

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

ArthurZucker

You need to run python utils/modular_model_converter.py --files deepseek3 to change the modeling otherwise test will never be fit

ArthurZucker · 2025-07-01T16:18:17Z

        expert_mask = expert_mask.permute(2, 0, 1)

-        for expert_idx in range(len(self.experts)):
+        # Loop over all available experts in the model and perform the computation on each expert


available or matched experts?

Attempt to fix by capturing dynamic output shapes and scalar outputs

1c19052

ivarflakstad force-pushed the fix-deepspeed-torch-compile-for-training-test branch from 1932a52 to 1c19052 Compare June 24, 2025 17:53

Merge branch 'main' into fix-deepspeed-torch-compile-for-training-test

cfef33b

ivarflakstad added 2 commits June 24, 2025 20:15

Merge branch 'main' into fix-deepspeed-torch-compile-for-training-test

52d6326

Merge branch 'main' into fix-deepspeed-torch-compile-for-training-test

412c317

Capture only dynamic output shapes

0949bfc

Loop only over matching experts

a871c0b

Revert capture to evaluate new moe impl

afe6fe9

ivarflakstad added 2 commits June 27, 2025 18:07

Move changes to modular

51e6009

Merge branch 'main' into fix-deepspeed-torch-compile-for-training-test

6dc97e6

ArthurZucker reviewed Jul 1, 2025

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training#39012

[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training#39012
ivarflakstad wants to merge 9 commits intomainfrom
fix-deepspeed-torch-compile-for-training-test

ivarflakstad commented Jun 24, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ivarflakstad commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ivarflakstad commented Jun 24, 2025

Uh oh!

github-actions Bot commented Jun 24, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ivarflakstad commented Jun 24, 2025 •

edited

Loading