Skip to content

casually dropping the most capable open weights on the planet#45192

Merged
Cyrilvallez merged 5 commits intohuggingface:mainfrom
RyanMullins:my-third-model
Apr 2, 2026
Merged

casually dropping the most capable open weights on the planet#45192
Cyrilvallez merged 5 commits intohuggingface:mainfrom
RyanMullins:my-third-model

Conversation

@RyanMullins
Copy link
Copy Markdown
Contributor


What does this PR do?

model previously unable to use tools

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @Cyrilvallez @eustlb @zucchini-nlp @Rocketknight1

RyanMullins and others added 2 commits April 2, 2026 10:24
---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, gemma3n, gemma4

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@ngxson
Copy link
Copy Markdown
Member

ngxson commented Apr 2, 2026

🚀

@Cyrilvallez Cyrilvallez merged commit 91b1ab1 into huggingface:main Apr 2, 2026
23 of 28 checks passed
ArthurZucker pushed a commit that referenced this pull request Apr 2, 2026
* model previously unable to use tools



---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>

* fix sign: commit was not added before

* re-add latest commit about rms norm

* preemptively skip the integration tests for now

---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
danielhanchen added a commit to unslothai/unsloth that referenced this pull request Apr 2, 2026
Gemma-4 support landed in transformers main
(huggingface/transformers#45192). Update the version pin from
5.5.0.dev0 to 5.5.0 across loader, Studio version switcher,
and the MLX installer. Also fix install_gemma4_mlx.sh which
referenced a non-existent v5.5-release branch -- pin it to
the correct commit (91b1ab1) instead.
danielhanchen added a commit to unslothai/unsloth that referenced this pull request Apr 2, 2026
Gemma-4 support landed in transformers main
(huggingface/transformers#45192). Update the version pin from
5.5.0.dev0 to 5.5.0 across loader, Studio version switcher,
and the MLX installer. Also fix install_gemma4_mlx.sh which
referenced a non-existent v5.5-release branch -- pin it to
the correct commit (91b1ab1) instead.
@robertgshaw2-redhat
Copy link
Copy Markdown

elite pr title :)

@emidoots
Copy link
Copy Markdown

emidoots commented Apr 2, 2026

very exciting, congrats!

pass


@unittest.skip("Integration Tests are not up-to-date yet! TODO Cyril: update me pretty pretty please!")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cyrilvallez update your pretty pretty please :-)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and there are 2 failures (non integration but slow tests if you are motivated)

FAILED tests/models/gemma4/test_modeling_gemma4.py::Gemma4TextModelTest::test_torch_compile_for_training - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0) (unhinted: Eq(u0, 0)).  (Size-like symbols: none)

consider using data-dependent friendly APIs such as guard_or_false, guard_or_true and statically_known_true.
Caused by: (ransformers/src/transformers/integrations/moe.py:231 in _grouped_mm_fallback_backward)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing

For C++ stack trace, run with TORCHDYNAMO_EXTENDED_DEBUG_CPP=1

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
FAILED tests/models/gemma4/test_modeling_gemma4.py::Gemma4Vision2TextModelTest::test_sdpa_can_dispatch_on_flash - RuntimeError: No available kernel. Aborting execution.

marvinzh pushed a commit to marvinzh/transformers that referenced this pull request Apr 3, 2026
…gface#45192)

* model previously unable to use tools



---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>

* fix sign: commit was not added before

* re-add latest commit about rms norm

* preemptively skip the integration tests for now

---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Apr 4, 2026
…gface#45192)

* model previously unable to use tools



---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>

* fix sign: commit was not added before

* re-add latest commit about rms norm

* preemptively skip the integration tests for now

---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
shibizhao pushed a commit to shibizhao/unsloth-npu that referenced this pull request Apr 7, 2026
Gemma-4 support landed in transformers main
(huggingface/transformers#45192). Update the version pin from
5.5.0.dev0 to 5.5.0 across loader, Studio version switcher,
and the MLX installer. Also fix install_gemma4_mlx.sh which
referenced a non-existent v5.5-release branch -- pin it to
the correct commit (91b1ab1) instead.
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
…gface#45192)

* model previously unable to use tools



---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>

* fix sign: commit was not added before

* re-add latest commit about rms norm

* preemptively skip the integration tests for now

---------

Co-authored-by: Douglas Reid <dougreid@google.com>
Co-authored-by: Luciano Martins <lucianomartins@google.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Phil Culliton <philculliton@google.com>
Co-authored-by: Sara Smoot <sarasmoot@google.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Eustache Le Bihan <eustache.lebihan@huggingface.co>
Co-authored-by: Joshua Lochner <joshua@huggingface.co>
Co-authored-by: Matthew Carrigan <matt@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Jeff Dean <jeff@google.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
jamesbraza added a commit to EdisonScientific/SkyRL that referenced this pull request Apr 23, 2026
Gemma4Config (added in huggingface/transformers#45192)
and other composite VLM configs (e.g., Qwen2.5-VL) nest attention fields under
text_config rather than exposing them on the top-level config. The ulysses
monkey patch read model.config.num_attention_heads directly, which raises
AttributeError for these models.

PreTrainedConfig.get_text_config returns self for text-only models and the
text sub-config for VLMs, so this is a no-op for Qwen3/Llama3/DeepSeek and
unblocks Gemma4 in transformers 5.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jamesbraza added a commit to EdisonScientific/SkyRL that referenced this pull request Apr 23, 2026
Gemma4Config (added in huggingface/transformers#45192)
and other composite VLM configs (e.g., Qwen2.5-VL) nest attention fields under
text_config rather than exposing them on the top-level config. The ulysses
monkey patch read model.config.num_attention_heads directly, which raises
AttributeError for these models.

PreTrainedConfig.get_text_config returns self for text-only models and the
text sub-config for VLMs, so this is a no-op for Qwen3/Llama3/DeepSeek and
unblocks Gemma4 in transformers 5.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jamesbraza added a commit to EdisonScientific/SkyRL that referenced this pull request Apr 29, 2026
Gemma4Config (added in huggingface/transformers#45192)
and other composite VLM configs (e.g., Qwen2.5-VL) nest attention fields under
text_config rather than exposing them on the top-level config. The ulysses
monkey patch read model.config.num_attention_heads directly, which raises
AttributeError for these models.

PreTrainedConfig.get_text_config returns self for text-only models and the
text sub-config for VLMs, so this is a no-op for Qwen3/Llama3/DeepSeek and
unblocks Gemma4 in transformers 5.6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants