feat(moe): NSP-blocked expert dispatch for Qwen3MOE and GPT-OSS prefill by vbaddi · Pull Request #935 · quic/efficient-transformers

vbaddi · 2026-04-21T21:01:31Z

Adds NSP-parallel expert-blocked dispatch to the chunked prefill MoE path for Qwen3MOE and GPT-OSS, replacing the sequential per-expert loop with a batched packed-prefix approach.

Configuration:
  export EXPERT_BLOCKING_NUM_NSP=16   # default: 1 NSP per expert (best perf at T=256)
  export EXPERT_BLOCKING_NUM_NSP=8    # 2 NSPs per expert
  export EXPERT_BLOCKING_NUM_NSP=2    # for testing

Falls back to the original per-expert loop if num_experts % EXPERT_BLOCKING_NUM_NSP !=0.
EXPERT_BLOCKING_NUM_NSP=2 pytest tests/transformers/models/test_moe_prefill_blocked.py -v

Update (0429):
export EXPERT_BLOCKING_PACKED_CHUNK_SIZE=256 for chunk PL of 512

Add expert-blocked NSP-parallel prefill forward to QEffPrefillChunkedQwen3MoeSparseMoeBlock and QEffPrefillOnlyChunkedGptOssMLP. Controlled via EXPERT_BLOCKING_NUM_NSP env var. Fix CtxScatterFunc3D/CtxGatherFunc3D eager forward for INT32_MAX sentinel handling. Add disagg-mode tests for both models with tiny configs. Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…prefill Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi assigned vbaddi and quic-mamta Apr 21, 2026

vbaddi added the enhancement New feature or request label Apr 21, 2026

anujgupt-github reviewed Apr 21, 2026

View reviewed changes

Comment thread QEfficient/transformers/models/gpt_oss/modeling_gpt_oss.py Outdated

ochougul reviewed Apr 23, 2026

View reviewed changes

Comment thread QEfficient/customop/ctx_scatter_gather.py

vbaddi added 6 commits April 30, 2026 07:17

nit: weights re-route fixes

a5bd93a

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: weights re-route fixes v1

c4ef4c8

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit(0423): gpt oss moe fixed and nit

290839e

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit(0424): ctx batch idx cast to int32

2804851

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit(0429): qwen3_moe, gpt_oss: port cumsum scatter-gather-update MoE …

6b049bc

…prefill Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi force-pushed the feat/prefill_moe branch from a0fe82c to 6b049bc Compare April 30, 2026 01:49

nit(0429): update modeling files

1ae7b23

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(moe): NSP-blocked expert dispatch for Qwen3MOE and GPT-OSS prefill#935

feat(moe): NSP-blocked expert dispatch for Qwen3MOE and GPT-OSS prefill#935
vbaddi wants to merge 7 commits intoquic:mainfrom
vbaddi:feat/prefill_moe

vbaddi commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vbaddi commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vbaddi commented Apr 21, 2026 •

edited

Loading