[BugFix][Models] Add tie_word_embeddings for lmhead
#4916
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
动态图运行
Qwen2-0.5B报错:定位到是
self.lm_head.weight权重没有初始化Qwen2-0.5B / 1.5B等小模型为了减少总参数量,提升小模型核心部分的参数占比,开启tie_word_embeddings=True将输入 Embedding 参数与输出 lm_head 的参数共享#3502 没有考虑
Qwen2-0.5B / 1.5B这类小模型,没有像 #3255 一样给lm_head做初始化,故本PR添加Modifications
给 Qwen2 组网部分添加
lm_head的权重初始化Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.