-
Notifications
You must be signed in to change notification settings - Fork 690
[Bug fix] Fixed the garbled text issues in Qwen3-8B #2737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
| else: | ||
| if self.tie_word_embeddings: | ||
| self.out_linear.weight.set_value( | ||
| get_tensor(state_dict.pop(self.linear_weight_key)).astype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里要恢复
| prefix="lm_head", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format code
| embedding_dim=fd_config.model_config.hidden_size, | ||
| num_embeddings=fd_config.model_config.vocab_size, | ||
| prefix=(f"{fd_config.model_config.prefix_name}.embed_tokens"), | ||
| prefix="lm_head", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tie_word_embeddings True 或 False 时,prefix 这里都相同嘛?看起来 267行 True 的时候没用这个 prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上一个commit是有区分的,远乐建议参考下ernie4_5_moe.py中的写法,应该是在下面set_state_dict下面,如果是权重共享,就把lm_head.out_linear的权重设置为embedding层的权重,本地测过qwen3-0.6b,精度正确
修复了Qwen3-8b乱码问题, tie_word_embeddings一直为True的情况,考虑了tie_word_embeddings为False的情况,并且添加了权重lm_head.weight为列切的方式加载。修复后Qwen3-8b精度正确