Skip non-selected experts for mixtral and qwen2_moe#32429
Skip non-selected experts for mixtral and qwen2_moe#32429ArthurZucker merged 7 commits intohuggingface:mainfrom
Conversation
|
@ArthurZucker, I encountered a conflict between torch.fx and a dynamic for loop in my implementation. I haven't found a concise solution for this issue yet. Could you help or just give up for this? |
|
Thanks for the info. I am not quite familiar with 'torch.fx'. Seems like there is a trade-off between enabling FX tracing and skipping experts. |
ArthurZucker
left a comment
There was a problem hiding this comment.
#31173 is related as well.
Yeah if fx's failing not 100% sure we want to do this, but we do needs benches of some sort regarding reduced IO, because for qwen with a lot of experts, it can be significatn
ArthurZucker
left a comment
There was a problem hiding this comment.
Still down to get this merged! WOuld you mind just producing small benches of before / after?
|
same problem. |
|
Thanks for the PR 🤗 |
ArthurZucker
left a comment
There was a problem hiding this comment.
Actually would like to get rid of the to_list() that introduce device synch
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* Skip non-selected experts for mixtral and qwen2_moe * Fix: tensor tolist() * WIP: tokenization test * fix modular source of truth * nits --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Skip non-selected experts for mixtral and qwen2_moe * Fix: tensor tolist() * WIP: tokenization test * fix modular source of truth * nits --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429
- Replace data-dependent .nonzero() operation with static expert loop - Resolves GuardOnDataDependentSymNode error during torch.export - Maintains identical functionality while enabling export compatibility - Fixes issue introduced in PR huggingface#32429 - Add tests for torch.export compatibility
Fixes #32283
This PR avoids redundant computation for some MoE models (mixtral and qwen2_moe).
The current implementation loops all the experts and inevitably loads experts' weight, which brings extra IO costs.
@ArthurZucker