Skip to content

Add support for SmolLM3 models#934

Merged
dacorvo merged 14 commits intomainfrom
smollm3
Aug 27, 2025
Merged

Add support for SmolLM3 models#934
dacorvo merged 14 commits intomainfrom
smollm3

Conversation

@dacorvo
Copy link
Copy Markdown
Collaborator

@dacorvo dacorvo commented Aug 8, 2025

What does this PR do?

This adds support for the SmolLM3 model.

This required the following packages to be updated:

  • transformers -> 4.55.*,
  • vllm -> 0.10.0,
  • pytorch -> 2.7.1

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions Bot added the Stale label Aug 24, 2025
@dacorvo dacorvo removed the Stale label Aug 25, 2025
@dacorvo dacorvo force-pushed the smollm3 branch 2 times, most recently from b744417 to 1166877 Compare August 25, 2025 08:47
@dacorvo
Copy link
Copy Markdown
Collaborator Author

dacorvo commented Aug 25, 2025

@JingyaHuang there seems to be an issue with SD/SDXL models when bumping the transformers version. The issue comes from the changes in transformers modifying the processing of model outputs: huggingface/transformers#39120.
There is now a tracing error when compiling the SD text encoders (CLIP models), because they now output dictionaries instead of tuples. At first I thought it was an error to try to output dictionaries, but then I realized it looks like it is something we actually enforce in the config, but for some reason the CLIP models previously ignored the return_dict configuration.
Can you take a look and see what is really expected/supported ?

dacorvo added 10 commits August 26, 2025 11:46
Starting from transformers 4.54, there is an error when compiling
Qwen2.5-0.5M with a sequence length of 128. This is a very unlikely
configuration, and not one we want to cache.
The pipeline code is therefore modified to align on default values that
are actually tested in the NeuronModelForCausalLM export tests.
@dacorvo dacorvo force-pushed the smollm3 branch 3 times, most recently from 224226f to 5701183 Compare August 26, 2025 15:07
CLIP models used in SD pipelines do not specify return_dict in their config
but the tracing fails if return_dict is True, which is now the default
in transformers.
In the latest transformers version, it is not done automatically
anymore.
@JingyaHuang
Copy link
Copy Markdown
Collaborator

from v4.53.3, transformers removed some apis used by granite here

https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42

won't it be a problem for granite model @tengomucho ? I came across something like:

  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging

@tengomucho
Copy link
Copy Markdown
Collaborator

from v4.53.3, transformers removed some apis used by granite here

https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42

won't it be a problem for granite model @tengomucho ? I came across something like:

  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging

I do not see where they have been removed, I still see them on the v4.43:
v4.53: https://github.com/huggingface/transformers/blob/a5923d4de7df2fbd1f373dfcfe983216b79b6937/src/transformers/models/granite/modeling_granite.py#L38
On the main branch they have changed and now they use the more generic TransformersKwargs, but that has happen after the release:
main: https://github.com/huggingface/transformers/blob/ff8b88a948fc2f6aba421ca64ad165291928dcee/src/transformers/models/granite/modeling_granite.py#L37

The latest T5Block layer in transformers does not expect the
past_key_value to be returned by the T5Attention anymore.
@dacorvo
Copy link
Copy Markdown
Collaborator Author

dacorvo commented Aug 27, 2025

from v4.53.3, transformers removed some apis used by granite here
https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42
won't it be a problem for granite model @tengomucho ? I came across something like:

  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging

I do not see where they have been removed, I still see them on the v4.43: v4.53: https://github.com/huggingface/transformers/blob/a5923d4de7df2fbd1f373dfcfe983216b79b6937/src/transformers/models/granite/modeling_granite.py#L38 On the main branch they have changed and now they use the more generic TransformersKwargs, but that has happen after the release: main: https://github.com/huggingface/transformers/blob/ff8b88a948fc2f6aba421ca64ad165291928dcee/src/transformers/models/granite/modeling_granite.py#L37

LossKwargs is gone, and is now TransformersKwargs. I fixed granite and llama in that pull-request (second commit).

from v4.53.3, transformers removed some apis used by granite here
https://github.com/huggingface/optimum-neuron/blob/0b7e63536a9d17a3d4530cd397e2231016e66067/optimum/neuron/models/training/granite/modeling_granite.py#L30C32-L30C42
won't it be a problem for granite model @tengomucho ? I came across something like:

  File "/home/ubuntu/pyvenv/aws_neuron_venv_2.24_pt_2.7/lib/python3.10/site-packages/optimum/neuron/models/training/granite/modeling_granite.py", line 30, in <module>
    from transformers.utils import LossKwargs, can_return_tuple, logging

I do not see where they have been removed, I still see them on the v4.43: v4.53: https://github.com/huggingface/transformers/blob/a5923d4de7df2fbd1f373dfcfe983216b79b6937/src/transformers/models/granite/modeling_granite.py#L38 On the main branch they have changed and now they use the more generic TransformersKwargs, but that has happen after the release: main: https://github.com/huggingface/transformers/blob/ff8b88a948fc2f6aba421ca64ad165291928dcee/src/transformers/models/granite/modeling_granite.py#L37

LossKwargs is indeed gone and is now TransformersKwargs. I fixed it in that pull-request for both llama and granite.

@dacorvo dacorvo marked this pull request as ready for review August 27, 2025 13:16
Copy link
Copy Markdown
Collaborator

@JingyaHuang JingyaHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the feature and fixing the compatibility with trfrs!!

"granite": "hf-internal-testing/tiny-random-GraniteForCausalLM",
"phi3": "yujiepan/phi-4-tiny-random",
"mixtral": "dacorvo/Mixtral-tiny",
"smollm3": "HuggingFaceTB/SmolLM3-3B",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no tiny version?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no

@dacorvo dacorvo merged commit 5438a89 into main Aug 27, 2025
7 of 8 checks passed
@dacorvo dacorvo deleted the smollm3 branch August 27, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants