Skip to content

feat(moe): NSP-blocked expert dispatch for Qwen3MOE and GPT-OSS prefill#935

Open
vbaddi wants to merge 7 commits intoquic:mainfrom
vbaddi:feat/prefill_moe
Open

feat(moe): NSP-blocked expert dispatch for Qwen3MOE and GPT-OSS prefill#935
vbaddi wants to merge 7 commits intoquic:mainfrom
vbaddi:feat/prefill_moe

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Apr 21, 2026

Adds NSP-parallel expert-blocked dispatch to the chunked prefill MoE path for Qwen3MOE and GPT-OSS, replacing the sequential per-expert loop with a batched packed-prefix approach.

Configuration:
  export EXPERT_BLOCKING_NUM_NSP=16   # default: 1 NSP per expert (best perf at T=256)
  export EXPERT_BLOCKING_NUM_NSP=8    # 2 NSPs per expert
  export EXPERT_BLOCKING_NUM_NSP=2    # for testing

Falls back to the original per-expert loop if num_experts % EXPERT_BLOCKING_NUM_NSP !=0.
EXPERT_BLOCKING_NUM_NSP=2 pytest tests/transformers/models/test_moe_prefill_blocked.py -v

Update (0429):
export EXPERT_BLOCKING_PACKED_CHUNK_SIZE=256 for chunk PL of 512

@vbaddi vbaddi added the enhancement New feature or request label Apr 21, 2026
Comment thread QEfficient/transformers/models/gpt_oss/modeling_gpt_oss.py Outdated
Comment thread QEfficient/customop/ctx_scatter_gather.py
vbaddi added 6 commits April 30, 2026 07:17
Add expert-blocked NSP-parallel prefill forward to QEffPrefillChunkedQwen3MoeSparseMoeBlock
and QEffPrefillOnlyChunkedGptOssMLP. Controlled via EXPERT_BLOCKING_NUM_NSP env var.
Fix CtxScatterFunc3D/CtxGatherFunc3D eager forward for INT32_MAX sentinel handling.
Add disagg-mode tests for both models with tiny configs.

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…prefill

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi force-pushed the feat/prefill_moe branch from a0fe82c to 6b049bc Compare April 30, 2026 01:49
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants