[Models] Add forward_meta to moe models' forward function#5138
[Models] Add forward_meta to moe models' forward function#5138Wanglongzhi2001 merged 9 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull Request Overview
This PR adds the forward_meta parameter to MoE (Mixture of Experts) models' forward functions to enable access to MoE phase information during forward computation. The change is needed because the forward_meta.moe_phase.phase is used in the fused MoE backend to determine whether to use prefill or decode execution paths.
Key Changes:
- Updated core MoE layer to accept and propagate
forward_metaparameter through the computation pipeline - Modified all MoE and MLP forward methods across multiple model architectures to include
forward_metaparameter - Updated the speculative decoding module to pass
forward_metatoempty_input_forwardcalls
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/layers/moe/moe.py | Added forward_meta parameter to FusedMoE forward method and propagated it to quant_method.apply calls |
| fastdeploy/model_executor/layers/moe/fused_moe_backend_base.py | Added forward_meta parameter to MoEMethodBase.apply and uses it to check moe_phase |
| fastdeploy/model_executor/models/qwen3moe.py | Updated Qwen3MoeBlock and Qwen3MLP forward signatures to include forward_meta |
| fastdeploy/model_executor/models/qwen2.py | Updated Qwen2MLP forward signature to include forward_meta |
| fastdeploy/model_executor/models/gpt_oss.py | Updated GptOssMoe forward signature to include and propagate forward_meta |
| fastdeploy/model_executor/models/glm4_moe.py | Updated Glm4MoeMLP and Glm4Moe forward signatures to include forward_meta |
| fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py | Updated Ernie4_5_VLMoE and related classes to include forward_meta; updated empty_input_forward calls |
| fastdeploy/model_executor/models/ernie4_5_mtp.py | Updated empty_input_forward signature to accept forward_meta parameter |
| fastdeploy/model_executor/models/ernie4_5_moe.py | Updated Ernie4_5_MLP and Ernie4_5_MoE forward signatures to include forward_meta; updated empty_input_forward calls |
| fastdeploy/model_executor/models/deepseek_v3.py | Updated DeepSeekV3MLP and DeepSeekV3MoE forward signatures to include forward_meta |
| fastdeploy/spec_decode/mtp.py | Updated empty_input_forward call to pass forward_meta parameter |
Comments suppressed due to low confidence (1)
fastdeploy/model_executor/layers/moe/moe.py:615
- Call to method FusedMoE.forward_split_allgather with too many arguments; should be no more than 2.
out = self.forward_split_allgather(x, gate, forward_meta)
| def forward(self, hidden_states: paddle.Tensor, vl_moe_meta: VLMoEMeta): | ||
| def forward(self, hidden_states: paddle.Tensor, forward_meta: ForwardMeta, vl_moe_meta: VLMoEMeta): | ||
| if self.num_shared_experts > 0: | ||
| shared_experts_out = self.shared_experts(hidden_states) |
There was a problem hiding this comment.
Missing forward_meta parameter in the shared_experts call. The shared_experts is an instance of Ernie4_5_VLMLP (which inherits from Ernie4_5_MLP) and now requires forward_meta as the second parameter. The call should be: shared_experts_out = self.shared_experts(hidden_states, forward_meta)
| shared_experts_out = self.shared_experts(hidden_states) | |
| shared_experts_out = self.shared_experts(hidden_states, forward_meta) |
| and token_num >= self.tp_size | ||
| ): | ||
| out = self.forward_split_allgather(x, gate) | ||
| out = self.forward_split_allgather(x, gate, forward_meta) |
There was a problem hiding this comment.
The forward_split_allgather method is being called with forward_meta parameter (line 615), but its method signature at line 576 doesn't include this parameter. The signature should be updated to: def forward_split_allgather(self, x: paddle.Tensor, gate: nn.Layer, forward_meta: ForwardMeta): and the parameter should be passed to quant_method.apply at line 591.
| forward_meta=forward_meta, | ||
| hidden_states=hidden_states, |
There was a problem hiding this comment.
Inconsistent parameter ordering: The method signature has forward(self, hidden_states: paddle.Tensor, forward_meta: ForwardMeta) (line 98), but the call uses forward_meta=forward_meta, hidden_states=hidden_states (lines 356-357). While this works with keyword arguments, it's inconsistent with the positional order. Consider using positional order: self.mlp(hidden_states, forward_meta) for better consistency with the method signature.
| forward_meta=forward_meta, | |
| hidden_states=hidden_states, | |
| hidden_states, | |
| forward_meta, |
| forward_meta=forward_meta, | ||
| ) | ||
| if self.num_shared_experts > 0: | ||
| s_x = self.shared_experts(hidden_states) |
There was a problem hiding this comment.
Missing forward_meta parameter in the shared_experts call. The shared_experts is an instance of Ernie4_5_MLP which now requires forward_meta as the second parameter (line 98). The call should be: s_x = self.shared_experts(hidden_states, forward_meta)
| s_x = self.shared_experts(hidden_states) | |
| s_x = self.shared_experts(hidden_states, forward_meta) |
|
|
||
| def forward(self, x): | ||
| def forward(self, x, forward_meta): | ||
| shared_experts_out = self.shared_experts(x) |
There was a problem hiding this comment.
Missing forward_meta parameter in the shared_experts call. The shared_experts is an instance of Glm4MoeMLP which now requires forward_meta as the second parameter (line 88). The call should be: shared_experts_out = self.shared_experts(x, forward_meta)
| shared_experts_out = self.shared_experts(x) | |
| shared_experts_out = self.shared_experts(x, forward_meta) |
| shared_experts_out = self.shared_experts(hidden_states) | ||
| moe_out = self.experts(hidden_states, self.gate) |
There was a problem hiding this comment.
Missing forward_meta parameter in both shared_experts and experts calls. Both methods now require forward_meta. The calls should be:
shared_experts_out = self.shared_experts(hidden_states, forward_meta)moe_out = self.experts(hidden_states, self.gate, forward_meta)
| shared_experts_out = self.shared_experts(hidden_states) | |
| moe_out = self.experts(hidden_states, self.gate) | |
| shared_experts_out = self.shared_experts(hidden_states, forward_meta) | |
| moe_out = self.experts(hidden_states, self.gate, forward_meta) |
beca8e9 to
0f27bd8
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5138 +/- ##
==========================================
Coverage ? 59.50%
==========================================
Files ? 325
Lines ? 40273
Branches ? 6097
==========================================
Hits ? 23965
Misses ? 14402
Partials ? 1906
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c1fc576 to
43f6457
Compare
| # chunked MoE related | ||
| moe_num_chunk: int = 1 | ||
| max_moe_num_chunk: int = 1 |
There was a problem hiding this comment.
好的,下个 PR 改正
| self.forward_meta.moe_num_chunk = (token_num + chunk_size - 1) // chunk_size | ||
| else: | ||
| self.fd_config.parallel_config.moe_num_chunk = 1 | ||
| self.forward_meta.moe_num_chunk = 1 |
There was a problem hiding this comment.
ep phase 的修改是不是也放在 meta里
There was a problem hiding this comment.
是的,这部分还涉及到 EP Runner 的修改,得设计下,下个 PR 改
591e408
43f6457 to
591e408
Compare
…le#5138) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix
Motivation
In some scenarios, such as chunked MoE, we need to update the state of MoE. It's reasonable to write this state variable in
forward_meta, so we need to add theforward_metaparameter to theFusedMoE'sforwardfunction.Modifications
Add forward_meta to moe models' forward function.
Usage or Command
No change
Accuracy Tests
No change.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.