🚨 fix + tests dense & MoE TP all reduce (decoder only)#43722
🚨 fix + tests dense & MoE TP all reduce (decoder only)#43722
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ed GPU management - Updated `run_dense_tests.sh` and `run_moe_tests.sh` to support parallel execution of tests using available GPU pairs. - Changed variable names for clarity, replacing `NUM_GPUS` with `GPUS_PER_TEST`. - Enhanced output messages to reflect the number of parallel test slots and GPU usage. - Implemented logic to handle skipped tests and updated result reporting to include skipped counts. - Removed `TensorParallelTesterMixin` from `CausalLMModelTest` and integrated it into `ModelTesterMixin` for better structure in test classes.
Cyrilvallez
left a comment
There was a problem hiding this comment.
Just a few very early thoughts!
…lecting for mergeModuleList
- Modified `run_dense_tests.sh` and `run_moe_tests.sh` to change the pytest keyword from "test_tensor_parallel" to "test_tp_" for improved test targeting. - Cleaned up comments and removed unused code in `test_tensor_parallel_mixin.py` to streamline the testing process and enhance readability.
There was a problem hiding this comment.
cc @SunMarc this is valid but happy if you can have a look
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa |
|
Workflow Run ⚙️💔 This comment contains |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa |
|
run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa |
|
This comment contains models: ["models/apertus", "models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/ernie4_5_moe", "models/exaone4", "models/exaone_moe", "models/flex_olmo", "models/gemma3", "models/gemma3n", "models/glm4_moe", "models/glm4_moe_lite", "models/glm_moe_dsa"] |
|
run-slow: apertus, deepseek_v2, deepseek_v3, dots1, ernie4_5_moe, exaone4, exaone_moe, flex_olmo, gemma3, gemma3n, glm4_moe, glm4_moe_lite, glm_moe_dsa |
|
This comment contains models: ["models/apertus", "models/deepseek_v2", "models/deepseek_v3", "models/dots1", "models/ernie4_5_moe", "models/exaone4", "models/exaone_moe", "models/flex_olmo", "models/gemma3", "models/gemma3n", "models/glm4_moe", "models/glm4_moe_lite", "models/glm_moe_dsa"] |
CI ResultsCommit Info
Model CI Report❌ 3 new failed tests from this PR 😭
|
Let's make sure it works for decoder only first (We skip VLM + Encoder-decoder for now)
Introduction, forward, backward, generation (with convert mapping triggering) test agains TP vs non-TP baseline
./run_dense_tests.sh results_dense