Skip to content

Conversation

@kevincheng2
Copy link
Collaborator

@kevincheng2 kevincheng2 commented Nov 3, 2025

Motivation

EPLB 通过复制高负载专家(Redundant Experts Strategy)并对专家分配进行启发式调整,确保不同 GPU 之间的负载均衡。这种方法解决了专家并行中因专家负载不均导致的计算资源浪费问题。分层负载平衡策略也可用于预填充阶段,具有较小的专家并行规模。

为 fd 增加EPLB功能

Modifications

  1. 删除环境变量配置方式,使用启动参数配置eplb参数
  2. 增加路由接口,用于跨dp、跨机间状态同步
    @app.post("/rearrange_experts")
    @app.post("/get_per_expert_tokens_stats")
    @app.post("/check_redundant")
    
  3. 共享变量使用FD IPCSignal统一管理,IPCSignal 新增指定shm_size 创建方式
  4. 支持DP、EP、TP混合场景下的EPLB

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
     ...
     --enable-eplb \
     --eplb-config '{"redundant_experts_num": 8, "redundant_expert_async_load_model_shmem_size_gb": 10}'

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 3, 2025

Thanks for your contribution!

@kevincheng2 kevincheng2 mentioned this pull request Nov 3, 2025
5 tasks
@kevincheng2 kevincheng2 mentioned this pull request Nov 11, 2025
5 tasks
@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 32.73196% with 261 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@95f3c8c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/entrypoints/engine_client.py 5.88% 143 Missing and 1 partial ⚠️
fastdeploy/worker/worker_process.py 19.23% 40 Missing and 2 partials ⚠️
fastdeploy/eplb/experts_manager.py 37.03% 31 Missing and 3 partials ⚠️
fastdeploy/inter_communicator/ipc_signal.py 52.00% 11 Missing and 1 partial ⚠️
fastdeploy/entrypoints/openai/api_server.py 35.29% 11 Missing ⚠️
fastdeploy/config.py 76.00% 5 Missing and 1 partial ⚠️
fastdeploy/engine/args_utils.py 78.57% 2 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py 25.00% 2 Missing and 1 partial ⚠️
fastdeploy/eplb/async_expert_loader.py 66.66% 2 Missing and 1 partial ⚠️
fastdeploy/eplb/utils.py 92.00% 1 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #4782   +/-   ##
==========================================
  Coverage           ?   58.64%           
==========================================
  Files              ?      317           
  Lines              ?    38657           
  Branches           ?     5810           
==========================================
  Hits               ?    22669           
  Misses             ?    14175           
  Partials           ?     1813           
Flag Coverage Δ
GPU 58.64% <32.73%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yuanlehome
yuanlehome previously approved these changes Nov 21, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds EPLB (Expert Parallelism Load Balancer) functionality to the FastDeploy API server, enabling load balancing across experts in MoE models through redundant expert allocation and heuristic adjustments. The implementation replaces environment variable configuration with command-line parameters and introduces new API endpoints for cross-DP/cross-machine state synchronization.

Key Changes:

  • Introduces EPLBConfig for centralized EPLB configuration management
  • Adds three new API endpoints: /rearrange_experts, /get_per_expert_tokens_stats, and /check_redundant
  • Refactors shared memory management to use IPCSignal for all EPLB-related state

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
fastdeploy/config.py Adds EPLBConfig class with parameter-based initialization replacing environment variables
fastdeploy/eplb/*.py Core EPLB implementation including expert manager, async loader, and rebalancing algorithms
fastdeploy/inter_communicator/* Extends IPCSignal to support custom shm_size and adds RearrangeExpertStatus enum
fastdeploy/worker/worker_process.py Refactors EPLB initialization and execution into separate methods
fastdeploy/entrypoints/* Adds EPLB API endpoints and integrates EPLB configuration into engine client
fastdeploy/engine/*.py Passes EPLB configuration to worker processes via command-line arguments
tests/eplb/* Comprehensive unit tests for EPLB functionality
requirements.txt Adds cuda-python dependency
Comments suppressed due to low confidence (1)

fastdeploy/entrypoints/engine_client.py:772

  • Except block directly handles BaseException.
            except:

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 8e4e3ff into PaddlePaddle:develop Nov 24, 2025
14 of 18 checks passed
@kevincheng2 kevincheng2 deleted the eplb branch January 19, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants