Skip to content

[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training#39012

Draft
ivarflakstad wants to merge 9 commits intomainfrom
fix-deepspeed-torch-compile-for-training-test
Draft

[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training#39012
ivarflakstad wants to merge 9 commits intomainfrom
fix-deepspeed-torch-compile-for-training-test

Conversation

@ivarflakstad
Copy link
Copy Markdown
Member

@ivarflakstad ivarflakstad commented Jun 24, 2025

DeepseekV3ModelTest::test_torch_compile_for_training fails with torch._dynamo.exc.Unsupported: Dynamic shape operator. Attempting to remedy by torch._dynamo.config flags.

Pytorch was struggling to capture the graph when we were using if token_indices.numel() > 0:, so I switched to only loop over the experts that are matching from the expert mask. Should be an improvement regardless of capturing.

@ivarflakstad ivarflakstad force-pushed the fix-deepspeed-torch-compile-for-training-test branch from 1932a52 to 1c19052 Compare June 24, 2025 17:53
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ivarflakstad
Copy link
Copy Markdown
Member Author

run-slow: deepseek_v3

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

@ivarflakstad
Copy link
Copy Markdown
Member Author

run-slow: deepseek_v3

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

@ivarflakstad
Copy link
Copy Markdown
Member Author

run-slow: deepseek_v3

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

@ivarflakstad
Copy link
Copy Markdown
Member Author

run-slow: deepseek_v3

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/deepseek_v3']
quantizations: [] ...

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to run python utils/modular_model_converter.py --files deepseek3 to change the modeling otherwise test will never be fit

expert_mask = expert_mask.permute(2, 0, 1)

for expert_idx in range(len(self.experts)):
# Loop over all available experts in the model and perform the computation on each expert
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

available or matched experts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants