Add tests for qwen + allow uninitialized weights in Llama model#8552
Add tests for qwen + allow uninitialized weights in Llama model#8552
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8552
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit 8b51959 with merge base 2859e47 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
5ef3fef to
c58edc5
Compare
c58edc5 to
955b991
Compare
| try: | ||
| # assign=True: load params/buffers by assignment instead of performing an in-place copy. | ||
| # Because we are using device="meta", tensors do not have memory associated with them | ||
| # and an in-place copy is a no-op. Use assign=True in load_state_dict for this scenario. | ||
| missing, unexpected = self.model_.load_state_dict( | ||
| checkpoint, | ||
| strict=False, | ||
| assign=True, | ||
| ) # self.model_ = Transformer(gptconf) | ||
| except RuntimeError as e: |
There was a problem hiding this comment.
So it doesn't error out when loading examples/models/llama/params/demo_rand_params.pth or any checkpoint that is incompatible with the model architecture. We also have no way to not specify a checkpoint, I looked into removing the default val for that arg but it's going to take some work since it's relied on internally in a lot of places
| try: | ||
| # assign=True: load params/buffers by assignment instead of performing an in-place copy. | ||
| # Because we are using device="meta", tensors do not have memory associated with them | ||
| # and an in-place copy is a no-op. Use assign=True in load_state_dict for this scenario. | ||
| missing, unexpected = self.model_.load_state_dict( | ||
| checkpoint, | ||
| strict=False, | ||
| assign=True, | ||
| ) # self.model_ = Transformer(gptconf) | ||
| except RuntimeError as e: |
There was a problem hiding this comment.
So it doesn't error out when loading examples/models/llama/params/demo_rand_params.pth or any checkpoint that is incompatible with the model architecture. We also have no way to not specify a checkpoint, I looked into removing the default val for that arg but it's going to take some work since it's relied on internally in a lot of places
| "llama3_1", | ||
| "llama3_2", | ||
| "static_llama", | ||
| "qwen2_5", |
There was a problem hiding this comment.
Sorry I accidentally deleted the original comment about ordering, but I was going to say that I think this is clearer to list all the llama models first
47bed4c to
9b5516b
Compare
larryliu0820
left a comment
There was a problem hiding this comment.
I'm ok with the changes but I'm really concerned on llama/model.py and I think we should clean it up. I'll create a separate issue.
Summary
Add basic ci test for Qwen model. Requires some changes to
llama/model.pyto allow uninitialized (random) weights.Test plan