-
Notifications
You must be signed in to change notification settings - Fork 693
[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimization] Refine row parallel bias and nranks and moe all_reduce #5247
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors tensor parallelism-related code by standardizing variable naming and improving bias handling for distributed training. The changes focus on code consistency and correctness without altering the core functionality.
- Standardizes variable naming from
nrankstotp_sizeacross multiple modules for better clarity - Introduces special handling for row-parallel bias division in tensor parallelism
- Removes unused variables and simplifies conditional logic in RowParallelLinear
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/utils.py | Adds tp_row_bias attribute handling to divide bias by tensor parallel size during weight loading |
| fastdeploy/model_executor/models/qwen3moe.py | Removes unused self.nranks variable |
| fastdeploy/model_executor/models/qwen3.py | Renames nranks to tp_size for consistency |
| fastdeploy/model_executor/models/qwen2.py | Removes unused self.nranks variable |
| fastdeploy/model_executor/models/ernie4_5_moe.py | Removes unused self.nranks variable |
| fastdeploy/model_executor/layers/mtp_linear.py | Renames self.nranks to self.tp_size |
| fastdeploy/model_executor/layers/lm_head.py | Renames self.nranks to self.tp_size |
| fastdeploy/model_executor/layers/linear.py | Renames variables, removes unused field, adds bias attribute handling, and simplifies logic |
| fastdeploy/model_executor/layers/backends/intel_hpu/attention/hpu_attn_backend.py | Renames self.nranks to self.tp_size |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5247 +/- ##
==========================================
Coverage ? 59.92%
==========================================
Files ? 317
Lines ? 38774
Branches ? 5843
==========================================
Hits ? 23234
Misses ? 13703
Partials ? 1837
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -211,7 +211,7 @@ def __init__( | |||
| self.speculate_max_draft_token_num: int = llm_config.speculative_config.num_speculative_tokens | |||
| self.keep_pd_step_flag: bool = llm_config.speculative_config.model_type == "mtp" | |||
| self.rank: int = llm_config.parallel_config.tensor_parallel_rank | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.rank 的命名也不准确,应该直接叫self.tp_rank
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
Modifications
Usage or Command
无。
Accuracy Tests
无。
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.