Update sharded_moe.py to support top2 gate with Tutel#6948
Update sharded_moe.py to support top2 gate with Tutel#6948xenshinu wants to merge 4 commits intodeepspeedai:masterfrom
Conversation
@microsoft-github-policy-service agree company="University of Michigan" |
|
Not sure if this check is needed I see that in I didn't see any specify check for non-zero mask. |
|
Hi @xenshinu - FYI you'll need to update from the CLA to DCO requirements in this PR if you're able to? |
Signed-off-by: Xueshen Liu <liuxs@umich.edu>
@xenshinu - looks like the formatting check is failing, could you run the pre-commit formatter? |
@xenshinu - curious if you would be able to follow up on this? |
Thanks for your reminder. I'll push the TopK version soon and finish the format issue. |
Tutel is forced to be unused on k > 1 since #2053
Given the fact that multiple experts per token is very common, and the gather and scatter operation without Tutel is so inefficient, I added support of tutel to top2 gate and tested on pipeline engine. This can be done for any k actually, I'll push that later when I have time to test,