Skip to content

Conversation

@yuanlehome
Copy link
Collaborator

@yuanlehome yuanlehome commented Nov 26, 2025

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

  1. 随着并行策略的丰富,一些layer中nranks的命名已经不合适,需要改为准确的tp_size命名。
  2. row parallel linear后的bias加法需要在all reduce之前,但在V1 loader下并没有处理这种情况。

Modifications

  1. 将linear.py中nranks -> tp_size。
  2. v1 loader下为bias参数添加tp_row_bias属性用以在bias load时除以tp_size,这样可以保证在all reduce后的数值正确性。
  3. 将分布在各moe后端的all_reduce操作,统一挪到moe layer中做。
  4. 优化一些skip_quant的逻辑

Usage or Command

无。

Accuracy Tests

无。

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings November 26, 2025 09:11
@paddle-bot
Copy link

paddle-bot bot commented Nov 26, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors tensor parallelism-related code by standardizing variable naming and improving bias handling for distributed training. The changes focus on code consistency and correctness without altering the core functionality.

  • Standardizes variable naming from nranks to tp_size across multiple modules for better clarity
  • Introduces special handling for row-parallel bias division in tensor parallelism
  • Removes unused variables and simplifies conditional logic in RowParallelLinear

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
fastdeploy/model_executor/utils.py Adds tp_row_bias attribute handling to divide bias by tensor parallel size during weight loading
fastdeploy/model_executor/models/qwen3moe.py Removes unused self.nranks variable
fastdeploy/model_executor/models/qwen3.py Renames nranks to tp_size for consistency
fastdeploy/model_executor/models/qwen2.py Removes unused self.nranks variable
fastdeploy/model_executor/models/ernie4_5_moe.py Removes unused self.nranks variable
fastdeploy/model_executor/layers/mtp_linear.py Renames self.nranks to self.tp_size
fastdeploy/model_executor/layers/lm_head.py Renames self.nranks to self.tp_size
fastdeploy/model_executor/layers/linear.py Renames variables, removes unused field, adds bias attribute handling, and simplifies logic
fastdeploy/model_executor/layers/backends/intel_hpu/attention/hpu_attn_backend.py Renames self.nranks to self.tp_size

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 70.73171% with 12 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@214942e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/layers/linear.py 65.38% 6 Missing and 3 partials ⚠️
fastdeploy/model_executor/utils.py 33.33% 1 Missing and 1 partial ⚠️
fastdeploy/model_executor/layers/mtp_linear.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5247   +/-   ##
==========================================
  Coverage           ?   59.92%           
==========================================
  Files              ?      317           
  Lines              ?    38774           
  Branches           ?     5843           
==========================================
  Hits               ?    23234           
  Misses             ?    13703           
  Partials           ?     1837           
Flag Coverage Δ
GPU 59.92% <70.73%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yuanlehome yuanlehome changed the title Refine bias and ranks [Optimization] Refine row parallel bias and nranks and moe all_reduce Nov 26, 2025
@@ -211,7 +211,7 @@ def __init__(
self.speculate_max_draft_token_num: int = llm_config.speculative_config.num_speculative_tokens
self.keep_pd_step_flag: bool = llm_config.speculative_config.model_type == "mtp"
self.rank: int = llm_config.parallel_config.tensor_parallel_rank
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.rank 的命名也不准确,应该直接叫self.tp_rank

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanlehome yuanlehome merged commit cb56d46 into PaddlePaddle:develop Nov 26, 2025
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants