Skip to content

Fixes configuration default values#43592

Merged
zucchini-nlp merged 14 commits intohuggingface:mainfrom
zucchini-nlp:pad-token-ids
Jan 30, 2026
Merged

Fixes configuration default values#43592
zucchini-nlp merged 14 commits intohuggingface:mainfrom
zucchini-nlp:pad-token-ids

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Jan 29, 2026

What does this PR do?

Fixes #43525
Fixes #43572

Adds missing pad_token_id and tie_word_embeddings to config classes with their defaults

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp added the for patch Tag issues / labels that should be included in the next patch label Jan 29, 2026
@zucchini-nlp
Copy link
Copy Markdown
Member Author

Adding fix to tie_word_embeddings, don't merge!

@zucchini-nlp zucchini-nlp changed the title Fixes 'pad_token_id' issues Fixes configuration default values Jan 29, 2026
@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: cohere2, deformable_detr, emu3, exaone4, falcon_mamba, fast_vlm, flava, florence2, glm46v, got_ocr2, gpt_bigcode, gpt_neox, gptj, internvl, jetmoe, mamba

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/cohere2", "models/deformable_detr", "models/emu3", "models/exaone4", "models/falcon_mamba", "models/fast_vlm", "models/flava", "models/florence2", "models/glm46v", "models/got_ocr2", "models/gpt_bigcode", "models/gpt_neox", "models/gptj", "models/internvl", "models/jetmoe", "models/mamba"]
quantizations: []

Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one comment!

Comment thread src/transformers/models/cohere2/configuration_cohere2.py
@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@zucchini-nlp
Copy link
Copy Markdown
Member Author

@bot /style

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Deformable detr is flaky now, apparently related to the random order of tests 😢 Not reproducible locally if I run a single testcase

@zucchini-nlp
Copy link
Copy Markdown
Member Author

@bot /repo

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 29, 2026

Repo. Consistency bot fixed some files and pushed the changes.

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) January 30, 2026 09:55
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: cohere2, cohere2_vision, deepseek_vl, deepseek_vl_hybrid, deformable_detr, emu3, exaone4, falcon_mamba, fast_vlm, flava, florence2, glm46v, got_ocr2, gpt_bigcode, gpt_neox, gptj

@zucchini-nlp zucchini-nlp disabled auto-merge January 30, 2026 10:35
@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: llava_onevision, llava_next_video

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/llava_next_video", "models/llava_onevision"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) January 30, 2026 11:02
@zucchini-nlp zucchini-nlp merged commit 562106f into huggingface:main Jan 30, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

missing pad_token_idx in StableLmConfig after 5.0 update AttributeError: 'Llama4Config' object has no attribute 'pad_token_id'

3 participants