Add padding-free to Granite hybrid moe models #39677
Add padding-free to Granite hybrid moe models #39677ArthurZucker merged 7 commits intohuggingface:mainfrom
Conversation
90b14f3 to
b5d755b
Compare
|
I verified that |
|
There is FAILED tests/models/granitemoe/test_modeling_granitemoe.py::GraniteMoeModelTest::test_causal_lm_can_accept_kwargs - TypeError: forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/granitemoeshared/test_modeling_granitemoeshared.py::GraniteMoeSharedModelTest::test_causal_lm_can_accept_kwargs - TypeError: forward() got an unexpected keyword argument 'num_items_in_batch' |
|
yeah @ArthurZucker trying to figure out those fails now; passing for me fine, locally |
b5d755b to
9cf8634
Compare
|
Oh, I see, it's |
|
One left! |
|
yeah I forgot to run the modular util. Will ping when I think it's all good |
1ae4bc2 to
7eb0852
Compare
| class DeepseekVLHybridForConditionalGeneration(DeepseekVLHybridPreTrainedModel, GenerationMixin): | ||
| _tied_weights_keys = ["model.language_model.embed_tokens.weight", "lm_head.weight"] | ||
| _supports_static_cache = True | ||
| _can_compile_fullgraph = True |
There was a problem hiding this comment.
These were induced by running python utils/modular_model_converter.py. Not sure if I should keep them?
There was a problem hiding this comment.
yep our fault, don't worry!
There was a problem hiding this comment.
these broke some other tests, so I'm undoing the changes
|
[For maintainers] Suggested jobs to run (before merge) run-slow: bamba, granitemoe, granitemoehybrid, granitemoeshared |
|
Don't worry about other failing tests they should be unrelated! |
|
Okay merging! thanks for adding this support! |
|
Thanks @ArthurZucker ! super quick |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util
What does this PR do?
Enables padding-free training for granite hybrid moe models, analogously to #35861
Previously,
**kwargswere not being properly passed down correctly and attempts at padding-free training were silently wrong.The padding-free correctness tests were also updated to verify that that the losses for models agree.
CC @vasqu @ArthurZucker @fabianlim @Swanand-Kadhe
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.