[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247

yuanlehome · 2025-11-26T09:11:45Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

随着并行策略的丰富，一些layer中nranks的命名已经不合适，需要改为准确的tp_size命名。
row parallel linear后的bias加法需要在all reduce之前，但在V1 loader下并没有处理这种情况。

Modifications

将linear.py中nranks -> tp_size。
v1 loader下为bias参数添加tp_row_bias属性用以在bias load时除以tp_size，这样可以保证在all reduce后的数值正确性。
将分布在各moe后端的all_reduce操作，统一挪到moe layer中做。
优化一些skip_quant的逻辑

Usage or Command

无。

Accuracy Tests

无。

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-26T09:11:54Z

Thanks for your contribution!

Copilot

Pull request overview

This PR refactors tensor parallelism-related code by standardizing variable naming and improving bias handling for distributed training. The changes focus on code consistency and correctness without altering the core functionality.

Standardizes variable naming from nranks to tp_size across multiple modules for better clarity
Introduces special handling for row-parallel bias division in tensor parallelism
Removes unused variables and simplifies conditional logic in RowParallelLinear

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
fastdeploy/model_executor/utils.py	Adds `tp_row_bias` attribute handling to divide bias by tensor parallel size during weight loading
fastdeploy/model_executor/models/qwen3moe.py	Removes unused `self.nranks` variable
fastdeploy/model_executor/models/qwen3.py	Renames `nranks` to `tp_size` for consistency
fastdeploy/model_executor/models/qwen2.py	Removes unused `self.nranks` variable
fastdeploy/model_executor/models/ernie4_5_moe.py	Removes unused `self.nranks` variable
fastdeploy/model_executor/layers/mtp_linear.py	Renames `self.nranks` to `self.tp_size`
fastdeploy/model_executor/layers/lm_head.py	Renames `self.nranks` to `self.tp_size`
fastdeploy/model_executor/layers/linear.py	Renames variables, removes unused field, adds bias attribute handling, and simplifies logic
fastdeploy/model_executor/layers/backends/intel_hpu/attention/hpu_attn_backend.py	Renames `self.nranks` to `self.tp_size`

fastdeploy/model_executor/utils.py

fastdeploy/model_executor/layers/linear.py

codecov-commenter · 2025-11-26T10:23:55Z

Codecov Report

❌ Patch coverage is 70.73171% with 12 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@214942e). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/layers/linear.py	65.38%	6 Missing and 3 partials ⚠️
fastdeploy/model_executor/utils.py	33.33%	1 Missing and 1 partial ⚠️
fastdeploy/model_executor/layers/mtp_linear.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5247   +/-   ##
==========================================
  Coverage           ?   59.92%           
==========================================
  Files              ?      317           
  Lines              ?    38774           
  Branches           ?     5843           
==========================================
  Hits               ?    23234           
  Misses             ?    13703           
  Partials           ?     1837

Flag	Coverage Δ
GPU	`59.92% <70.73%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gongshaotian · 2025-11-26T12:43:05Z

fastdeploy/model_executor/layers/backends/intel_hpu/attention/hpu_attn_backend.py

@@ -211,7 +211,7 @@ def __init__(
        self.speculate_max_draft_token_num: int = llm_config.speculative_config.num_speculative_tokens
        self.keep_pd_step_flag: bool = llm_config.speculative_config.model_type == "mtp"
        self.rank: int = llm_config.parallel_config.tensor_parallel_rank


self.rank 的命名也不准确，应该直接叫self.tp_rank

gongshaotian

LGTM

yuanlehome added 2 commits November 26, 2025 16:58

rename nranks to tp_size and fix bias in v1 loader

60f5e96

fix

324fb43

Copilot AI review requested due to automatic review settings November 26, 2025 09:11

Copilot started reviewing on behalf of yuanlehome November 26, 2025 09:12 View session

Copilot finished reviewing on behalf of yuanlehome November 26, 2025 09:13

Copilot AI reviewed Nov 26, 2025

View reviewed changes

fastdeploy/model_executor/utils.py Show resolved Hide resolved

fastdeploy/model_executor/layers/linear.py Show resolved Hide resolved

fastdeploy/model_executor/layers/linear.py Show resolved Hide resolved

fastdeploy/model_executor/layers/linear.py Show resolved Hide resolved

update

a19bbc8

yuanlehome changed the title ~~Refine bias and ranks~~ [Optimization] Refine row parallel bias and nranks and moe all_reduce Nov 26, 2025

gongshaotian reviewed Nov 26, 2025

View reviewed changes

gongshaotian approved these changes Nov 26, 2025

View reviewed changes

yuanlehome merged commit cb56d46 into PaddlePaddle:develop Nov 26, 2025
15 of 19 checks passed

fmiao2372 mentioned this pull request Dec 3, 2025

[Intel HPU] fix memory fragmentation issue and fix moe all_reduce issue #5357

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247

[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247

Uh oh!

yuanlehome commented Nov 26, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 26, 2025 •

edited

Loading

Uh oh!

gongshaotian Nov 26, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247

[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247

Uh oh!

Conversation

yuanlehome commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gongshaotian Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuanlehome commented Nov 26, 2025 •

edited

Loading

codecov-commenter commented Nov 26, 2025 •

edited

Loading