Skip to content

Conversation

@lizexu123
Copy link
Collaborator

@lizexu123 lizexu123 commented Jul 7, 2025

修复了Qwen3-8b乱码问题, tie_word_embeddings一直为True的情况,考虑了tie_word_embeddings为False的情况,并且添加了权重lm_head.weight为列切的方式加载。修复后Qwen3-8b精度正确

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ming1753 ming1753 closed this Jul 7, 2025
@ming1753 ming1753 reopened this Jul 7, 2025
@Jiang-Jia-Jun Jiang-Jia-Jun requested a review from yuanlehome July 7, 2025 13:42
@lizexu123 lizexu123 changed the title fix qwen3.py [Bug fix] fix qwen3.py Jul 7, 2025
@lizexu123 lizexu123 changed the title [Bug fix] fix qwen3.py [Bug fix] Fixed the garbled text issues in Qwen3-8B Jul 7, 2025
else:
if self.tie_word_embeddings:
self.out_linear.weight.set_value(
get_tensor(state_dict.pop(self.linear_weight_key)).astype(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里要恢复

Comment on lines 247 to 248
prefix="lm_head",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format code

yuanlehome
yuanlehome previously approved these changes Jul 7, 2025
Jiang-Jia-Jun
Jiang-Jia-Jun previously approved these changes Jul 8, 2025
embedding_dim=fd_config.model_config.hidden_size,
num_embeddings=fd_config.model_config.vocab_size,
prefix=(f"{fd_config.model_config.prefix_name}.embed_tokens"),
prefix="lm_head",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tie_word_embeddings True 或 False 时,prefix 这里都相同嘛?看起来 267行 True 的时候没用这个 prefix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上一个commit是有区分的,远乐建议参考下ernie4_5_moe.py中的写法,应该是在下面set_state_dict下面,如果是权重共享,就把lm_head.out_linear的权重设置为embedding层的权重,本地测过qwen3-0.6b,精度正确

@lizexu123 lizexu123 dismissed stale reviews from Jiang-Jia-Jun and yuanlehome via acc9889 July 8, 2025 02:38
@lizexu123 lizexu123 closed this Jul 8, 2025
@lizexu123 lizexu123 reopened this Jul 8, 2025
@yuanlehome yuanlehome merged commit 525be24 into PaddlePaddle:develop Jul 8, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants