fix(granitemoe*): Only create block_sparse_moe if num_local_experts > 0#42036
fix(granitemoe*): Only create block_sparse_moe if num_local_experts > 0#42036Rocketknight1 merged 7 commits intohuggingface:mainfrom
Conversation
|
Looks like I need to regenerate the |
|
Interestingly, when I run Looking a little deeper, this seems to be caused by the inheritance from The part that I'm still confused about is why regenerating the |
|
It appears that putting creation of |
e1e87db to
be876f8
Compare
|
🤦 Ok, all that was because of a bad copy-paste somewhere that had me creating a completely incorrect block for |
be876f8 to
1257ced
Compare
…erts > 0 Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
1257ced to
f0cebbb
Compare
|
One more redo: Based on advice from @ArthurZucker, since there are no models using either |
| return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype) | ||
|
|
||
|
|
||
| class GraniteFlashAttentionKwargs(TypedDict, total=False): |
There was a problem hiding this comment.
It looks like these just got moved during the regeneration. I'm not sure if should be included (to enforce consistency with the generation script) or excluded (to minimize the size of the change).
There was a problem hiding this comment.
Don't stress about the modeling file - if the modular is correct then make fix-copies should handle it all
|
Hi @gabe-l-hart, thanks for the PR! You can get the code style tests to pass with Overall it looks good to me, and it does seem like the zero-experts case was accidentally deleted. Will wait for @ArthurZucker to confirm before merging! |
|
@Rocketknight1 Thanks! I'll get it cleaned up and hopefully green today |
Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
|
Yep, looks good now! If you don't get core maintainer approval within a few days, ping me and I'll see what I can do |
|
@Rocketknight1 @ArthurZucker I lost track of this one. I just resolved the conflict with the GH web ui, but I didn't look to see if there are other changes that would conflict (ie this being solved at a higher level in the class hierarchy). Do we still need this PR to support non-hybrid/non-moe with |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: granitemoehybrid |
Rocketknight1
left a comment
There was a problem hiding this comment.
Yes, I'm happy to approve now! The change makes sense, and given the existence of has_experts, the 0-expert path is clearly intended but just broken.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
… 0 (huggingface#42036) * fix(granitemoehybid): Only set self.block_sparse_moe if num_local_experts > 0 Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(granitemoehybrid): Regenerate modeling_granitemoehybrid.py Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * style: Fix import order Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * make fix-copies --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Branch: GraniteMoeAsDenseFix
What does this PR do?
With the introduction of
modular_granitemoe.pyin #40132, the conditional that allowedGraniteMoeto also encapsulate dense models as a degenerate case was accidentally removed. This is never actually needed for theGraniteMoearchitecture directly, butGraniteMoeis reused inGraniteMoeSharedand thenGraniteMoeHybridwhich do need this ability to also encapsulate dense FFN blocks in place of the MoE block.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker I believe this came in with your PR for MoE in vLLM, so I'd love your sanity check on this fix.