Skip to content

Add padding-free to Granite hybrid moe models #39677

Merged
ArthurZucker merged 7 commits intohuggingface:mainfrom
garrett361:granite-moe-padding-free
Jul 25, 2025
Merged

Add padding-free to Granite hybrid moe models #39677
ArthurZucker merged 7 commits intohuggingface:mainfrom
garrett361:granite-moe-padding-free

Conversation

@garrett361
Copy link
Copy Markdown
Contributor

@garrett361 garrett361 commented Jul 25, 2025

What does this PR do?

Enables padding-free training for granite hybrid moe models, analogously to #35861

Previously, **kwargs were not being properly passed down correctly and attempts at padding-free training were silently wrong.

The padding-free correctness tests were also updated to verify that that the losses for models agree.

CC @vasqu @ArthurZucker @fabianlim @Swanand-Kadhe

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@garrett361 garrett361 force-pushed the granite-moe-padding-free branch from 90b14f3 to b5d755b Compare July 25, 2025 16:31
@garrett361
Copy link
Copy Markdown
Contributor Author

I verified that GraniteMoeHybridForCausalLM and BambaForCausalLM both pass the updated padding-free tests

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🤗

@ArthurZucker
Copy link
Copy Markdown
Collaborator

There is

FAILED tests/models/granitemoe/test_modeling_granitemoe.py::GraniteMoeModelTest::test_causal_lm_can_accept_kwargs - TypeError: forward() got an unexpected keyword argument 'num_items_in_batch'
FAILED tests/models/granitemoeshared/test_modeling_granitemoeshared.py::GraniteMoeSharedModelTest::test_causal_lm_can_accept_kwargs - TypeError: forward() got an unexpected keyword argument 'num_items_in_batch'

@garrett361
Copy link
Copy Markdown
Contributor Author

garrett361 commented Jul 25, 2025

yeah @ArthurZucker trying to figure out those fails now; passing for me fine, locally

❯ pytest -rsfP tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py -k accept_kwargs
===================================================================================================== test session starts =====================================================================================================
platform linux -- Python 3.11.13, pytest-7.4.4, pluggy-1.6.0
rootdir: /u/goon/github/garrett361/transformers
configfile: pyproject.toml
plugins: anyio-4.9.0, order-1.3.0, timeout-2.4.0, rerunfailures-15.1, xdist-3.8.0, asyncio-0.23.8, rich-0.2.0, hypothesis-6.136.4
asyncio: mode=Mode.STRICT
collected 457 items / 455 deselected / 2 selected

tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py::BambaModelTest::test_causal_lm_can_accept_kwargs PASSED                                                                                                [ 50%]
tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py::GraniteMoeHybridModelTest::test_causal_lm_can_accept_kwargs PASSED                                                                                     [100%]

====================================================================================================== warnings summary =======================================================================================================
.venv/lib/python3.11/site-packages/_pytest/config/__init__.py:1373
  /u/goon/github/garrett361/transformers/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: asyncio_default_fixture_loop_scope

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================================================================================================== PASSES ============================================================================================================
_______________________________________________________________________________________ BambaModelTest.test_causal_lm_can_accept_kwargs _______________________________________________________________________________________
---------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
The fast path for Bamba will be used when running the model on a GPU
Bamba requires an initialized `HybridMambaAttentionDynamicCache` to return a cache. None was provided, so no cache will be returned.
_________________________________________________________________________________ GraniteMoeHybridModelTest.test_causal_lm_can_accept_kwargs __________________________________________________________________________________
---------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
The fast path for GraniteMoeHybrid will be used when running the model on a GPU
GraniteMoeHybrid requires an initialized `HybridMambaAttentionDynamicCache` to return a cache. Because one was not provided, no cache will be returned.
======================================================================================== 2 passed, 455 deselected, 1 warning in 8.65s =========================================================================================

@garrett361 garrett361 force-pushed the granite-moe-padding-free branch from b5d755b to 9cf8634 Compare July 25, 2025 17:25
@garrett361
Copy link
Copy Markdown
Contributor Author

Oh, I see, it's tests/models/granitemoe/test_modeling_granitemoe.py, not tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py. Will fix.

@ArthurZucker
Copy link
Copy Markdown
Collaborator

One left!

@garrett361
Copy link
Copy Markdown
Contributor Author

yeah I forgot to run the modular util. Will ping when I think it's all good

@garrett361 garrett361 force-pushed the granite-moe-padding-free branch from 1ae4bc2 to 7eb0852 Compare July 25, 2025 18:00
class DeepseekVLHybridForConditionalGeneration(DeepseekVLHybridPreTrainedModel, GenerationMixin):
_tied_weights_keys = ["model.language_model.embed_tokens.weight", "lm_head.weight"]
_supports_static_cache = True
_can_compile_fullgraph = True
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were induced by running python utils/modular_model_converter.py. Not sure if I should keep them?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep our fault, don't worry!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these broke some other tests, so I'm undoing the changes

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, granitemoe, granitemoehybrid, granitemoeshared

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Don't worry about other failing tests they should be unrelated!

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Okay merging! thanks for adding this support!

@ArthurZucker ArthurZucker merged commit 97f8c71 into huggingface:main Jul 25, 2025
14 of 19 checks passed
@garrett361
Copy link
Copy Markdown
Contributor Author

Thanks @ArthurZucker ! super quick

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants