Skip to content

Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge#45156

Closed
Cursx wants to merge 9 commits intohuggingface:mainfrom
Cursx:fix-issue
Closed

Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge#45156
Cursx wants to merge 9 commits intohuggingface:mainfrom
Cursx:fix-issue

Conversation

@Cursx
Copy link
Copy Markdown

@Cursx Cursx commented Apr 1, 2026

What does this PR do?

After the merge_and_unload() operation in PEFT, embed_tokens and lm_head become independent tensors with different values, but config.tie_word_embeddings remains True. The load-side already detects this using torch.equal in tie_weights() and skips tying, but save_pretrained() writes the incorrect config as-is.(tie_word_embeddings=True is already in a "semantic error" state in memory. Changing it to False is closer to the actual situation.)

issue # 45127——PEFT merge_and_unload() creates an inconsistent state (weights have been separated, but configuration has not been updated).
impact: Downstream tools (GGUF converters, quantization scripts) trust this config directly, leading to silent weight corruption.

test

I wrote a simple script to reproduce the problem and tested it locally.

I ran make fix-repo and performed the following related tests:
test_save_pretrained_auto_fixes_diverged_tied_embeddings(new test)
test_tied_weights_are_not_tied_if_both_present_but_different(load-side )
test_tied_weights_are_tied_if_both_present_and_similar
test_tied_weights_are_always_tied_from_config

The CI test is also passed.

Fixes #45127

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by
code agents. We are currently bottlenecked by our ability to review and respond to them. As a result,
we ask that new users do not submit pure code agent PRs at this time.
You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents
not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this
repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result,
this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

  • I confirm that this is not a pure code agent PR.

I used multiple AI models (Gemini, Claude, Kimi) to cross-validate edge cases and boundary conditions — different models behave differently around tied embeddings, which made CI failures harder to predict than expected. AI helped me locate these edge cases and I verified they weren't hallucinations.

I have read CONTRIBUTING.md, and tried my best to follow the instructions therein.

Before submitting

Who can review?

@Cyrilvallez @BenjaminBossan

@Cursx Cursx changed the title Fix issue Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45156&sha=440822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Model collapse after merging LoRA with extended vocabulary on models with tie_word_embeddings=True (e.g., Qwen2.5 0.5B)

1 participant