Skip to content

Look for the pad_token_id in the right place for Llama4#43539

Merged
Rocketknight1 merged 1 commit intomainfrom
fix_llama4_pad_token_id
Feb 9, 2026
Merged

Look for the pad_token_id in the right place for Llama4#43539
Rocketknight1 merged 1 commit intomainfrom
fix_llama4_pad_token_id

Conversation

@Rocketknight1
Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 commented Jan 27, 2026

Llama4 look for pad_token_id on self.config in some cases, but I think it actually lives on self.config.text_config. This PR should fix things! There was a similar issue with Qwen3, but thankfully I couldn't find any other affected models.

Fixes #43525

@Rocketknight1
Copy link
Copy Markdown
Member Author

cc @zucchini-nlp

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a quick fix!

Can you also check if models reported in #43334 (comment) actually need a fix or not? Qwen3-VL-MoE for sure is faulty and has no PAD in its text config. I am not sure about other models

I want us to fix those pad issues all at once if possible

Comment on lines -1190 to +1193
self.pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else -1
if hasattr(self.config, "pad_token_id"):
self.pad_token_id = self.config.pad_token_id
else:
self.pad_token_id = self.config.text_config.pad_token_id or -1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in case of llama4, we need to fix the modeling code to obtain it from pad_token_id = self.config.text_config.pad_token_id. Usually the special tokens live inside a text config

@zucchini-nlp
Copy link
Copy Markdown
Member

I ran a tiny test and got 16 models failing, might be worth checking these ones? 👀

FAILED tests/models/esm/test_modeling_esm.py::EsmModelTest::test_attention_outputs - TypeError: ne() received an invalid combination of arguments - got (NoneType), but expected one of:
FAILED tests/models/exaone4/test_modeling_exaone4.py::Exaone4ModelTest::test_attention_outputs - AttributeError: 'Exaone4Config' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/glm46v/test_modeling_glm46v.py::Glm46VModelTest::test_attention_outputs - AttributeError: 'Glm4vTextConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_attention_outputs - AttributeError: 'Glm4vTextConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/glm_image/test_modeling_glm_image.py::GlmImageModelTest::test_attention_outputs - AttributeError: 'GlmImageTextConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/glm_ocr/test_modeling_glm_ocr.py::GlmOcrModelTest::test_attention_outputs - AttributeError: 'GlmOcrTextConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeModelTest::test_attention_outputs - AttributeError: 'GPTBigCodeConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py::GPTBigCodeMHAModelTest::test_attention_outputs - AttributeError: 'GPTBigCodeConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/gpt_neox/test_modeling_gpt_neox.py::GPTNeoXModelTest::test_attention_outputs - AttributeError: 'GPTNeoXConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/gptj/test_modeling_gptj.py::GPTJModelTest::test_attention_outputs - AttributeError: 'GPTJConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/jetmoe/test_modeling_jetmoe.py::JetMoeModelTest::test_attention_outputs - AttributeError: 'JetMoeConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/mpt/test_modeling_mpt.py::MptModelTest::test_attention_outputs - AttributeError: 'MptConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/phi/test_modeling_phi.py::PhiModelTest::test_attention_outputs - AttributeError: 'PhiConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py::Qwen3VLMoeModelTest::test_attention_outputs - AttributeError: 'Qwen3VLMoeTextConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/stablelm/test_modeling_stablelm.py::StableLmModelTest::test_attention_outputs - AttributeError: 'StableLmConfig' object has no attribute 'pad_token_id'. Did you mean: 'bos_token_id'?
FAILED tests/models/tvp/test_modeling_tvp.py::TVPModelTest::test_attention_outputs - AttributeError: 'TvpConfig' object has no attribute 'pad_token_id'

@Rocketknight1
Copy link
Copy Markdown
Member Author

Hey @zucchini-nlp, sorry for the delay while I chased CI issues! I think this is actually okay, and we don't need to fix other models. This only applies to VLMs where pad_token_id may be on the root config or the text_config, but the other cases of that were fixed here and here. In the other cases in your list, I think those are raw text LMs which probably don't have this issue, since they don't have text_config, right?

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, we also have llama4! Sure, let's merge it, I wonder why it was fixed before with the other batch of models haha

@Rocketknight1 Rocketknight1 force-pushed the fix_llama4_pad_token_id branch from bba5c45 to 7eb5dda Compare February 9, 2026 12:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 9, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: llama4

@Rocketknight1 Rocketknight1 merged commit 9e4a8c4 into main Feb 9, 2026
26 checks passed
@Rocketknight1 Rocketknight1 deleted the fix_llama4_pad_token_id branch February 9, 2026 17:24
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributeError: 'Llama4Config' object has no attribute 'pad_token_id'

3 participants