[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training#39012
[WIP] Fix DeepseekV3ModelTest::test_torch_compile_for_training#39012ivarflakstad wants to merge 9 commits intomainfrom
Conversation
1932a52 to
1c19052
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: deepseek_v3 |
|
This comment contains run-slow, running the specified jobs: models: ['models/deepseek_v3'] |
|
run-slow: deepseek_v3 |
|
This comment contains run-slow, running the specified jobs: models: ['models/deepseek_v3'] |
|
run-slow: deepseek_v3 |
|
This comment contains run-slow, running the specified jobs: models: ['models/deepseek_v3'] |
|
run-slow: deepseek_v3 |
|
This comment contains run-slow, running the specified jobs: models: ['models/deepseek_v3'] |
ArthurZucker
left a comment
There was a problem hiding this comment.
You need to run python utils/modular_model_converter.py --files deepseek3 to change the modeling otherwise test will never be fit
| expert_mask = expert_mask.permute(2, 0, 1) | ||
|
|
||
| for expert_idx in range(len(self.experts)): | ||
| # Loop over all available experts in the model and perform the computation on each expert |
There was a problem hiding this comment.
available or matched experts?
DeepseekV3ModelTest::test_torch_compile_for_trainingfails withtorch._dynamo.exc.Unsupported: Dynamic shape operator. Attempting to remedy bytorch._dynamo.configflags.Pytorch was struggling to capture the graph when we were using
if token_indices.numel() > 0:, so I switched to only loop over the experts that are matching from the expert mask. Should be an improvement regardless of capturing.