[Feature] Support eplb for fd #4599

rainyfly · 2025-10-27T06:47:58Z

Motivation

Support EPLB.

为保证MoE部分不同专家之间的负载均衡，会将共享专家和高负载的细粒度专家在集群的不同GPU做多个复制，让GPU把更多的热数据（发给共享专家的）跑起来。

EPLB 通过复制高负载专家（Redundant Experts Strategy）并对专家分配进行启发式调整，确保不同 GPU 之间的负载均衡。这种方法解决了专家并行中因专家负载不均导致的计算资源浪费问题。分层负载平衡策略也可用于预填充阶段，具有较小的专家并行规模。

paddle-bot · 2025-10-27T06:48:05Z

Thanks for your contribution!

yuanlehome · 2025-11-03T03:14:38Z

fastdeploy/worker/worker_process.py

+            rank_expert_list, logical_to_physical_map, expert_count
+        )
+        # TO BE FIXED
+        self.worker.get_model().update_state_dict(state_dicts)


注意，后续这里需要采用copy_覆盖原始权重的形式，而不是替换原Tensor对象

收到，后边修改

yuanlehome

PR标题和描述尽量详细点

rainyfly added 2 commits October 27, 2025 14:46

support eplb

63fa08f

support eplb

2cde62c

kevincheng2 previously approved these changes Oct 31, 2025

View reviewed changes

Merge branch 'develop' into support_eplb_for_fd

fffbc00

kevincheng2 dismissed their stale review via fffbc00 October 31, 2025 07:31

kevincheng2 approved these changes Nov 3, 2025

View reviewed changes

yuanlehome reviewed Nov 3, 2025

View reviewed changes

yuanlehome approved these changes Nov 3, 2025

View reviewed changes

Jiang-Jia-Jun added the skip-ci: coverage label Nov 3, 2025

Jiang-Jia-Jun merged commit f83d0cf into PaddlePaddle:develop Nov 3, 2025
24 of 28 checks passed

kevincheng2 mentioned this pull request Nov 3, 2025

[Feature] Support eplb for ep #4786

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support eplb for fd #4599

[Feature] Support eplb for fd #4599

Uh oh!

rainyfly commented Oct 27, 2025 •

edited by kevincheng2

Loading

Uh oh!

paddle-bot bot commented Oct 27, 2025

Uh oh!

yuanlehome Nov 3, 2025

Uh oh!

kevincheng2 Nov 3, 2025

Uh oh!

yuanlehome left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Feature] Support eplb for fd #4599

[Feature] Support eplb for fd #4599

Uh oh!

Conversation

rainyfly commented Oct 27, 2025 • edited by kevincheng2 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

paddle-bot bot commented Oct 27, 2025

Uh oh!

yuanlehome Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

kevincheng2 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

yuanlehome left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rainyfly commented Oct 27, 2025 •

edited by kevincheng2

Loading