Add padding-free to Granite hybrid moe models by garrett361 · Pull Request #39677 · huggingface/transformers

garrett361 · 2025-07-25T16:30:31Z

What does this PR do?

Enables padding-free training for granite hybrid moe models, analogously to #35861

Previously, **kwargs were not being properly passed down correctly and attempts at padding-free training were silently wrong.

The padding-free correctness tests were also updated to verify that that the losses for models agree.

CC @vasqu @ArthurZucker @fabianlim @Swanand-Kadhe

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

garrett361 · 2025-07-25T16:35:25Z

I verified that GraniteMoeHybridForCausalLM and BambaForCausalLM both pass the updated padding-free tests

ArthurZucker

LGTM! 🤗

ArthurZucker · 2025-07-25T17:21:40Z

There is

FAILED tests/models/granitemoe/test_modeling_granitemoe.py::GraniteMoeModelTest::test_causal_lm_can_accept_kwargs - TypeError: forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/granitemoeshared/test_modeling_granitemoeshared.py::GraniteMoeSharedModelTest::test_causal_lm_can_accept_kwargs - TypeError: forward() got an unexpected keyword argument 'num_items_in_batch'

garrett361 · 2025-07-25T17:25:03Z

yeah @ArthurZucker trying to figure out those fails now; passing for me fine, locally

❯ pytest -rsfP tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py -k accept_kwargs
===================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.11.13, pytest-7.4.4, pluggy-1.6.0
rootdir: /u/goon/github/garrett361/transformers
configfile: pyproject.toml
plugins: anyio-4.9.0, order-1.3.0, timeout-2.4.0, rerunfailures-15.1, xdist-3.8.0, asyncio-0.23.8, rich-0.2.0, hypothesis-6.136.4
asyncio: mode=Mode.STRICT
collected 457 items / 455 deselected / 2 selected

tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py::BambaModelTest::test_causal_lm_can_accept_kwargs PASSED                                                                                                [ 50%]
tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py::GraniteMoeHybridModelTest::test_causal_lm_can_accept_kwargs PASSED                                                                                     [100%]

====================================================================================================== warnings summary =======================================================================================================
.venv/lib/python3.11/site-packages/_pytest/config/__init__.py:1373
  /u/goon/github/garrett361/transformers/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: asyncio_default_fixture_loop_scope

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================================================================== PASSES ============================================================================================================
_______________________________________________________________________________________ BambaModelTest.test_causal_lm_can_accept_kwargs _______________________________________________________________________________________
---------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
The fast path for Bamba will be used when running the model on a GPU
Bamba requires an initialized `HybridMambaAttentionDynamicCache` to return a cache. None was provided, so no cache will be returned.
_________________________________________________________________________________ GraniteMoeHybridModelTest.test_causal_lm_can_accept_kwargs __________________________________________________________________________________
---------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
The fast path for GraniteMoeHybrid will be used when running the model on a GPU
GraniteMoeHybrid requires an initialized `HybridMambaAttentionDynamicCache` to return a cache. Because one was not provided, no cache will be returned.
======================================================================================== 2 passed, 455 deselected, 1 warning in 8.65s =========================================================================================

garrett361 · 2025-07-25T17:30:11Z

Oh, I see, it's tests/models/granitemoe/test_modeling_granitemoe.py, not tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py. Will fix.

ArthurZucker · 2025-07-25T17:52:54Z

One left!

garrett361 · 2025-07-25T17:56:07Z

yeah I forgot to run the modular util. Will ping when I think it's all good

garrett361 · 2025-07-25T18:02:06Z

 class DeepseekVLHybridForConditionalGeneration(DeepseekVLHybridPreTrainedModel, GenerationMixin):
    _tied_weights_keys = ["model.language_model.embed_tokens.weight", "lm_head.weight"]
-    _supports_static_cache = True
+    _can_compile_fullgraph = True


These were induced by running python utils/modular_model_converter.py. Not sure if I should keep them?

yep our fault, don't worry!

these broke some other tests, so I'm undoing the changes

github-actions · 2025-07-25T18:07:51Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, granitemoe, granitemoehybrid, granitemoeshared

ArthurZucker · 2025-07-25T18:09:17Z

Don't worry about other failing tests they should be unrelated!

ArthurZucker · 2025-07-25T18:10:46Z

Okay merging! thanks for adding this support!

garrett361 · 2025-07-25T18:13:44Z

Thanks @ArthurZucker ! super quick

HuggingFaceDocBuilderDev · 2025-07-25T18:21:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* start fixing kwarg handling * fmt * updates padding free tests * docs * add missing kwargs modeling_granitemoe.py * run modular util * rm unrelated changes from modular util

garrett361 force-pushed the granite-moe-padding-free branch from 90b14f3 to b5d755b Compare July 25, 2025 16:31

ArthurZucker approved these changes Jul 25, 2025

View reviewed changes

garrett361 force-pushed the granite-moe-padding-free branch from b5d755b to 9cf8634 Compare July 25, 2025 17:25

garrett361 added 6 commits July 25, 2025 17:56

start fixing kwarg handling

9311e59

fmt

272315a

updates padding free tests

dd8f3d0

docs

2370c5a

add missing kwargs modeling_granitemoe.py

472d37f

run modular util

7eb0852

garrett361 force-pushed the granite-moe-padding-free branch from 1ae4bc2 to 7eb0852 Compare July 25, 2025 18:00

garrett361 commented Jul 25, 2025

View reviewed changes

rm unrelated changes from modular util

dbccf79

ArthurZucker added the Flash Attention label Jul 25, 2025

ArthurZucker merged commit 97f8c71 into huggingface:main Jul 25, 2025
14 of 19 checks passed

Conversation

garrett361 commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

garrett361 commented Jul 25, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Jul 25, 2025

Uh oh!

garrett361 commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garrett361 commented Jul 25, 2025

Uh oh!

ArthurZucker commented Jul 25, 2025

Uh oh!

garrett361 commented Jul 25, 2025

Uh oh!

garrett361 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

garrett361 Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 25, 2025

Uh oh!

ArthurZucker commented Jul 25, 2025

Uh oh!

ArthurZucker commented Jul 25, 2025

Uh oh!

Uh oh!

garrett361 commented Jul 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

garrett361 commented Jul 25, 2025 •

edited

Loading

garrett361 commented Jul 25, 2025 •

edited

Loading