-
Notifications
You must be signed in to change notification settings - Fork 690
[Feature] support eplb in api_server #4782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #4782 +/- ##
==========================================
Coverage ? 58.64%
==========================================
Files ? 317
Lines ? 38657
Branches ? 5810
==========================================
Hits ? 22669
Misses ? 14175
Partials ? 1813
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds EPLB (Expert Parallelism Load Balancer) functionality to the FastDeploy API server, enabling load balancing across experts in MoE models through redundant expert allocation and heuristic adjustments. The implementation replaces environment variable configuration with command-line parameters and introduces new API endpoints for cross-DP/cross-machine state synchronization.
Key Changes:
- Introduces EPLBConfig for centralized EPLB configuration management
- Adds three new API endpoints:
/rearrange_experts,/get_per_expert_tokens_stats, and/check_redundant - Refactors shared memory management to use IPCSignal for all EPLB-related state
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/config.py | Adds EPLBConfig class with parameter-based initialization replacing environment variables |
| fastdeploy/eplb/*.py | Core EPLB implementation including expert manager, async loader, and rebalancing algorithms |
| fastdeploy/inter_communicator/* | Extends IPCSignal to support custom shm_size and adds RearrangeExpertStatus enum |
| fastdeploy/worker/worker_process.py | Refactors EPLB initialization and execution into separate methods |
| fastdeploy/entrypoints/* | Adds EPLB API endpoints and integrates EPLB configuration into engine client |
| fastdeploy/engine/*.py | Passes EPLB configuration to worker processes via command-line arguments |
| tests/eplb/* | Comprehensive unit tests for EPLB functionality |
| requirements.txt | Adds cuda-python dependency |
Comments suppressed due to low confidence (1)
fastdeploy/entrypoints/engine_client.py:772
- Except block directly handles BaseException.
except:
8f451df
Motivation
EPLB 通过复制高负载专家(Redundant Experts Strategy)并对专家分配进行启发式调整,确保不同 GPU 之间的负载均衡。这种方法解决了专家并行中因专家负载不均导致的计算资源浪费问题。分层负载平衡策略也可用于预填充阶段,具有较小的专家并行规模。
为 fd 增加EPLB功能
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.