Skip to content

fix backward pass test in tensor parallel for Dense model#42811

Merged
3outeille merged 2 commits intov5-test_tensor_parallel_moefrom
fix-test_tensor_parallel_backward_dense
Dec 11, 2025
Merged

fix backward pass test in tensor parallel for Dense model#42811
3outeille merged 2 commits intov5-test_tensor_parallel_moefrom
fix-test_tensor_parallel_backward_dense

Conversation

@3outeille
Copy link
Copy Markdown
Member

No description provided.

@3outeille 3outeille changed the base branch from main to v5-test_tensor_parallel_moe December 11, 2025 14:06
@3outeille 3outeille merged commit 5f548ed into v5-test_tensor_parallel_moe Dec 11, 2025
12 of 17 checks passed
@3outeille 3outeille deleted the fix-test_tensor_parallel_backward_dense branch December 11, 2025 14:13
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker pushed a commit that referenced this pull request Dec 16, 2025
* begin Moe test tensor parallel

* create tiny moe model + fix test tensor parallel Moe

eaeaae

* create tiny moe model + fix test tensor parallel Moe

eaeaae

fix tensor parallel MoE test
fix tensor parallel MoE test

* fix backward pass test in tensor parallel for Dense model (#42811)

* fix

* linting

* use mixtral instead for testing

* fix dtensor and tensor mismatch

* linting

* checkout test tensor parallel to be like main

* avoid hack and create class instead
ArthurZucker pushed a commit that referenced this pull request Dec 17, 2025
* begin Moe test tensor parallel

* create tiny moe model + fix test tensor parallel Moe

eaeaae

* create tiny moe model + fix test tensor parallel Moe

eaeaae

fix tensor parallel MoE test
fix tensor parallel MoE test

* fix backward pass test in tensor parallel for Dense model (#42811)

* fix

* linting

* use mixtral instead for testing

* fix dtensor and tensor mismatch

* linting

* checkout test tensor parallel to be like main

* avoid hack and create class instead

* fix loading ep

* add moe test

* now EP inference works again but pass still fails

* Add ColwiseParallelReplicate and RowwiseParallelReplicate classes for replicated layouts

* clean

* eaza

* aeaeaea

* eaeaa

* linting
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* begin Moe test tensor parallel

* create tiny moe model + fix test tensor parallel Moe

eaeaae

* create tiny moe model + fix test tensor parallel Moe

eaeaae

fix tensor parallel MoE test
fix tensor parallel MoE test

* fix backward pass test in tensor parallel for Dense model (huggingface#42811)

* fix

* linting

* use mixtral instead for testing

* fix dtensor and tensor mismatch

* linting

* checkout test tensor parallel to be like main

* avoid hack and create class instead
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* begin Moe test tensor parallel

* create tiny moe model + fix test tensor parallel Moe

eaeaae

* create tiny moe model + fix test tensor parallel Moe

eaeaae

fix tensor parallel MoE test
fix tensor parallel MoE test

* fix backward pass test in tensor parallel for Dense model (huggingface#42811)

* fix

* linting

* use mixtral instead for testing

* fix dtensor and tensor mismatch

* linting

* checkout test tensor parallel to be like main

* avoid hack and create class instead

* fix loading ep

* add moe test

* now EP inference works again but pass still fails

* Add ColwiseParallelReplicate and RowwiseParallelReplicate classes for replicated layouts

* clean

* eaza

* aeaeaea

* eaeaa

* linting
3outeille added a commit that referenced this pull request Jan 30, 2026
* begin Moe test tensor parallel

* create tiny moe model + fix test tensor parallel Moe

eaeaae

* create tiny moe model + fix test tensor parallel Moe

eaeaae

fix tensor parallel MoE test
fix tensor parallel MoE test

* fix backward pass test in tensor parallel for Dense model (#42811)

* fix

* linting

* use mixtral instead for testing

* fix dtensor and tensor mismatch

* linting

* checkout test tensor parallel to be like main

* avoid hack and create class instead

* fix loading ep

* add moe test

* now EP inference works again but pass still fails

* linting

* now load from checkpoint. Creating a nn.Parameter for param_value will not transfer its attribute (especially _is_hf_initialized)

* forward now works (add LocalPackedColwise + dont use EP router)

* for now test in float32

* dont do all_reduce manually for GatherParellel. Convert to dtensor approach

* Remove dtensor dependency in Tensor Parallel (#43157)

* dense test is passing

* Refactor tensor parallel implementation by removing unused partition_tensor methods

* keep removing dependencies on Dtensor

* rename test file

* Update tensor parallel plans to use "colwise_gather_output" across multiple models

* Remove unused "gather" references and update tensor parallel plans to "colwise_gather_output" in multiple model configurations.

* Refactor tensor parallel plans in Fbgemm and FineGrained quantizers by removing unused configurations and comments related to "gather" operations.

* add 'split_input' option in RowwiseParallel + replace rowwise_replicate 'rowwise_split_input'

* Add PackedColwiseParallel and PackedRowwiseParallel + Update configuration plans

* mixing files and some fix for tp and tp_plan

* clean tensor paralle api

* linting

* linting

* Refactor core model loading and tensor parallel utilities. Improved parameter handling in `set_param_for_module` and updated tensor sharding functions. Removed deprecated code and added new utility functions for block size calculations.

* code quality

* make fixup

* tp workf for dense and moe in float32 only

* fix merge conflicts that broke TP

* revert parsing for tp plan

* all reduce after experts

* compile compatible dist ops

* fix gate_up_proj gradient test by doing splitting thtat takes into account that it is fused  + all_reduce to get full gradient before functional.linear

* fix moe backward fp32

* remove functional.Linear to use nn.Linear in experts (this way we attach hooks)

* moe work with tied embedding as well

* style

* all tests pass

* make fix-up

* typo

* use transformer seed + pytest parametrized

* Moved weight and bias dim mapping to ParallelInterface

* simplifed shard tensor signature

* sync shard_tensor logic with the one in origin/main

* add function check to avoid mismatch check during  set_param_for_module

* remove disable. I was in an older torch version

* Add pytest skip condition for tensor parallel tests requiring PyTorch >= 2.9

* linting

* linting

* fixing remaining modular

* linting

* Refactor get_expected_sharded_shape to be only one call

* Remove redundant prepare_module_tp method from TensorParallelLayer subclasses

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants