Add support for SmolLM3 models by dacorvo · Pull Request #934 · huggingface/optimum-neuron

dacorvo · 2025-08-08T13:24:40Z

What does this PR do?

This adds support for the SmolLM3 model.

This required the following packages to be updated:

transformers -> 4.55.*,
vllm -> 0.10.0,
pytorch -> 2.7.1

HuggingFaceDocBuilderDev · 2025-08-08T13:29:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-08-24T08:05:07Z

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

dacorvo · 2025-08-25T15:16:22Z

@JingyaHuang there seems to be an issue with SD/SDXL models when bumping the transformers version. The issue comes from the changes in transformers modifying the processing of model outputs: huggingface/transformers#39120.
There is now a tracing error when compiling the SD text encoders (CLIP models), because they now output dictionaries instead of tuples. At first I thought it was an error to try to output dictionaries, but then I realized it looks like it is something we actually enforce in the config, but for some reason the CLIP models previously ignored the return_dict configuration.
Can you take a look and see what is really expected/supported ?

Starting from transformers 4.54, there is an error when compiling Qwen2.5-0.5M with a sequence length of 128. This is a very unlikely configuration, and not one we want to cache. The pipeline code is therefore modified to align on default values that are actually tested in the NeuronModelForCausalLM export tests.

CLIP models used in SD pipelines do not specify return_dict in their config but the tracing fails if return_dict is True, which is now the default in transformers.

In the latest transformers version, it is not done automatically anymore.

JingyaHuang · 2025-08-26T16:45:46Z

from v4.53.3, transformers removed some apis used by granite here

https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42

won't it be a problem for granite model @tengomucho ? I came across something like:

  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging

tengomucho · 2025-08-27T08:48:28Z

from v4.53.3, transformers removed some apis used by granite here

https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42

won't it be a problem for granite model @tengomucho ? I came across something like:
  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging

I do not see where they have been removed, I still see them on the v4.43:
v4.53: https://github.com/huggingface/transformers/blob/a5923d4de7df2fbd1f373dfcfe983216b79b6937/src/transformers/models/granite/modeling_granite.py#L38
On the main branch they have changed and now they use the more generic TransformersKwargs, but that has happen after the release:
main: https://github.com/huggingface/transformers/blob/ff8b88a948fc2f6aba421ca64ad165291928dcee/src/transformers/models/granite/modeling_granite.py#L37

The latest T5Block layer in transformers does not expect the past_key_value to be returned by the T5Attention anymore.

dacorvo · 2025-08-27T11:51:53Z

from v4.53.3, transformers removed some apis used by granite here
https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42
won't it be a problem for granite model @tengomucho ? I came across something like:
  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging
I do not see where they have been removed, I still see them on the v4.43: v4.53: https://github.com/huggingface/transformers/blob/a5923d4de7df2fbd1f373dfcfe983216b79b6937/src/transformers/models/granite/modeling_granite.py#L38 On the main branch they have changed and now they use the more generic TransformersKwargs, but that has happen after the release: main: https://github.com/huggingface/transformers/blob/ff8b88a948fc2f6aba421ca64ad165291928dcee/src/transformers/models/granite/modeling_granite.py#L37

LossKwargs is gone, and is now TransformersKwargs. I fixed granite and llama in that pull-request (second commit).

from v4.53.3, transformers removed some apis used by granite here
https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42
won't it be a problem for granite model @tengomucho ? I came across something like:
  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging
I do not see where they have been removed, I still see them on the v4.43: v4.53: https://github.com/huggingface/transformers/blob/a5923d4de7df2fbd1f373dfcfe983216b79b6937/src/transformers/models/granite/modeling_granite.py#L38 On the main branch they have changed and now they use the more generic TransformersKwargs, but that has happen after the release: main: https://github.com/huggingface/transformers/blob/ff8b88a948fc2f6aba421ca64ad165291928dcee/src/transformers/models/granite/modeling_granite.py#L37

LossKwargs is indeed gone and is now TransformersKwargs. I fixed it in that pull-request for both llama and granite.

JingyaHuang

lgtm, thanks for the feature and fixing the compatibility with trfrs!!

JingyaHuang · 2025-08-27T13:27:04Z

    "granite": "hf-internal-testing/tiny-random-GraniteForCausalLM",
    "phi3": "yujiepan/phi-4-tiny-random",
    "mixtral": "dacorvo/Mixtral-tiny",
+    "smollm3": "HuggingFaceTB/SmolLM3-3B",


no tiny version?

Unfortunately, no

github-actions Bot added the Stale label Aug 24, 2025

dacorvo removed the Stale label Aug 25, 2025

dacorvo force-pushed the smollm3 branch 2 times, most recently from b744417 to 1166877 Compare August 25, 2025 08:47

dacorvo added 10 commits August 26, 2025 11:46

refactor(inference): reorder automodels

7a1e59e

chore: bump transformers version

10c54a9

chore: bump vllm version

c6dcb50

test(vllm): device argument is deprecated

f0700e0

chore: bump dev version

431556b

feat(inference): add SmolLM3

28c679a

test(decoder): add smollm3 tests

a7fadc2

ci: add smollm3 models to cache workflow

0e393aa

fix(Mixtral): workaround null head_dim

8143978

dacorvo force-pushed the smollm3 branch 3 times, most recently from 224226f to 5701183 Compare August 26, 2025 15:07

dacorvo added 2 commits August 26, 2025 15:19

fix: do not return dict in CLIP models

af17ad4

CLIP models used in SD pipelines do not specify return_dict in their config but the tracing fails if return_dict is True, which is now the default in transformers.

fix(t5): explicitly convert past_key_values to a Cache

ea64e27

In the latest transformers version, it is not done automatically anymore.

dacorvo force-pushed the smollm3 branch from 5701183 to ea64e27 Compare August 26, 2025 15:20

fix(t5): adapt T5 attention custom modeling

d665dc8

The latest T5Block layer in transformers does not expect the past_key_value to be returned by the T5Attention anymore.

fix(generation): call to non-existent method

1932d28

dacorvo requested review from JingyaHuang and tengomucho August 27, 2025 13:16

dacorvo requested a review from michaelbenayoun August 27, 2025 13:16

dacorvo marked this pull request as ready for review August 27, 2025 13:16

JingyaHuang approved these changes Aug 27, 2025

View reviewed changes

dacorvo merged commit 5438a89 into main Aug 27, 2025
7 of 8 checks passed

dacorvo deleted the smollm3 branch August 27, 2025 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for SmolLM3 models#934

Add support for SmolLM3 models#934
dacorvo merged 14 commits intomainfrom
smollm3

dacorvo commented Aug 8, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2025

Uh oh!

github-actions Bot commented Aug 24, 2025

Uh oh!

dacorvo commented Aug 25, 2025

Uh oh!

JingyaHuang commented Aug 26, 2025

Uh oh!

tengomucho commented Aug 27, 2025

Uh oh!

dacorvo commented Aug 27, 2025 •

edited

Loading

Uh oh!

JingyaHuang left a comment

Uh oh!

JingyaHuang Aug 27, 2025

Uh oh!

dacorvo Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dacorvo commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2025

Uh oh!

github-actions Bot commented Aug 24, 2025

Uh oh!

dacorvo commented Aug 25, 2025

Uh oh!

JingyaHuang commented Aug 26, 2025

Uh oh!

tengomucho commented Aug 27, 2025

Uh oh!

dacorvo commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JingyaHuang left a comment

Choose a reason for hiding this comment

Uh oh!

JingyaHuang Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

dacorvo Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dacorvo commented Aug 8, 2025 •

edited

Loading

dacorvo commented Aug 27, 2025 •

edited

Loading