fix backward pass test in tensor parallel for Dense model#42811
Merged
3outeille merged 2 commits intov5-test_tensor_parallel_moefrom Dec 11, 2025
Merged
fix backward pass test in tensor parallel for Dense model#428113outeille merged 2 commits intov5-test_tensor_parallel_moefrom
3outeille merged 2 commits intov5-test_tensor_parallel_moefrom
Conversation
5f548ed
into
v5-test_tensor_parallel_moe
12 of 17 checks passed
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
pushed a commit
that referenced
this pull request
Dec 16, 2025
* begin Moe test tensor parallel * create tiny moe model + fix test tensor parallel Moe eaeaae * create tiny moe model + fix test tensor parallel Moe eaeaae fix tensor parallel MoE test fix tensor parallel MoE test * fix backward pass test in tensor parallel for Dense model (#42811) * fix * linting * use mixtral instead for testing * fix dtensor and tensor mismatch * linting * checkout test tensor parallel to be like main * avoid hack and create class instead
ArthurZucker
pushed a commit
that referenced
this pull request
Dec 17, 2025
* begin Moe test tensor parallel * create tiny moe model + fix test tensor parallel Moe eaeaae * create tiny moe model + fix test tensor parallel Moe eaeaae fix tensor parallel MoE test fix tensor parallel MoE test * fix backward pass test in tensor parallel for Dense model (#42811) * fix * linting * use mixtral instead for testing * fix dtensor and tensor mismatch * linting * checkout test tensor parallel to be like main * avoid hack and create class instead * fix loading ep * add moe test * now EP inference works again but pass still fails * Add ColwiseParallelReplicate and RowwiseParallelReplicate classes for replicated layouts * clean * eaza * aeaeaea * eaeaa * linting
SangbumChoi
pushed a commit
to SangbumChoi/transformers
that referenced
this pull request
Jan 23, 2026
* begin Moe test tensor parallel * create tiny moe model + fix test tensor parallel Moe eaeaae * create tiny moe model + fix test tensor parallel Moe eaeaae fix tensor parallel MoE test fix tensor parallel MoE test * fix backward pass test in tensor parallel for Dense model (huggingface#42811) * fix * linting * use mixtral instead for testing * fix dtensor and tensor mismatch * linting * checkout test tensor parallel to be like main * avoid hack and create class instead
SangbumChoi
pushed a commit
to SangbumChoi/transformers
that referenced
this pull request
Jan 23, 2026
* begin Moe test tensor parallel * create tiny moe model + fix test tensor parallel Moe eaeaae * create tiny moe model + fix test tensor parallel Moe eaeaae fix tensor parallel MoE test fix tensor parallel MoE test * fix backward pass test in tensor parallel for Dense model (huggingface#42811) * fix * linting * use mixtral instead for testing * fix dtensor and tensor mismatch * linting * checkout test tensor parallel to be like main * avoid hack and create class instead * fix loading ep * add moe test * now EP inference works again but pass still fails * Add ColwiseParallelReplicate and RowwiseParallelReplicate classes for replicated layouts * clean * eaza * aeaeaea * eaeaa * linting
3outeille
added a commit
that referenced
this pull request
Jan 30, 2026
* begin Moe test tensor parallel * create tiny moe model + fix test tensor parallel Moe eaeaae * create tiny moe model + fix test tensor parallel Moe eaeaae fix tensor parallel MoE test fix tensor parallel MoE test * fix backward pass test in tensor parallel for Dense model (#42811) * fix * linting * use mixtral instead for testing * fix dtensor and tensor mismatch * linting * checkout test tensor parallel to be like main * avoid hack and create class instead * fix loading ep * add moe test * now EP inference works again but pass still fails * linting * now load from checkpoint. Creating a nn.Parameter for param_value will not transfer its attribute (especially _is_hf_initialized) * forward now works (add LocalPackedColwise + dont use EP router) * for now test in float32 * dont do all_reduce manually for GatherParellel. Convert to dtensor approach * Remove dtensor dependency in Tensor Parallel (#43157) * dense test is passing * Refactor tensor parallel implementation by removing unused partition_tensor methods * keep removing dependencies on Dtensor * rename test file * Update tensor parallel plans to use "colwise_gather_output" across multiple models * Remove unused "gather" references and update tensor parallel plans to "colwise_gather_output" in multiple model configurations. * Refactor tensor parallel plans in Fbgemm and FineGrained quantizers by removing unused configurations and comments related to "gather" operations. * add 'split_input' option in RowwiseParallel + replace rowwise_replicate 'rowwise_split_input' * Add PackedColwiseParallel and PackedRowwiseParallel + Update configuration plans * mixing files and some fix for tp and tp_plan * clean tensor paralle api * linting * linting * Refactor core model loading and tensor parallel utilities. Improved parameter handling in `set_param_for_module` and updated tensor sharding functions. Removed deprecated code and added new utility functions for block size calculations. * code quality * make fixup * tp workf for dense and moe in float32 only * fix merge conflicts that broke TP * revert parsing for tp plan * all reduce after experts * compile compatible dist ops * fix gate_up_proj gradient test by doing splitting thtat takes into account that it is fused + all_reduce to get full gradient before functional.linear * fix moe backward fp32 * remove functional.Linear to use nn.Linear in experts (this way we attach hooks) * moe work with tied embedding as well * style * all tests pass * make fix-up * typo * use transformer seed + pytest parametrized * Moved weight and bias dim mapping to ParallelInterface * simplifed shard tensor signature * sync shard_tensor logic with the one in origin/main * add function check to avoid mismatch check during set_param_for_module * remove disable. I was in an older torch version * Add pytest skip condition for tensor parallel tests requiring PyTorch >= 2.9 * linting * linting * fixing remaining modular * linting * Refactor get_expected_sharded_shape to be only one call * Remove redundant prepare_module_tp method from TensorParallelLayer subclasses --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.