Conversation
There was a problem hiding this comment.
Due to copy, I expect the issue to be the same over here.
ArthurZucker
left a comment
There was a problem hiding this comment.
I'll work on the from_pretrained issue that prevents the base_model_prefix from working here!
The only issue with this is the fact that it will completely break BC for people who trained a LlamaForQuestionAnswering as their checkpoints will have transformers instead of model no?
|
I have seen issues with from from_pretrained, long due a fix for them 👀 |
|
Yes, it would be completely breaking :/ I've tried using some loading hooks which didn't work but I don't have the time to properly look into this atm. Should I close this then? |
|
@ArthurZucker Any updates on this? Should I close this? 👀 |
|
Hey sorry no, It's on my TODO list, hope to get to it next week! |
|
No worries! |
|
BTW I think we can close this, #36033 should make it easier to update without breaking |
|
Nice, will close then 👍 |
What does this PR do?
Fixes issues with llama for qna when using
from_pretrainedto load any base model. Currently, when we load any llama model, we get a 100% mismatch (i.e. everything is randomly initialized). The workaround is to manually save the (base) model and then load it from disk (ref. #30381):This is very unintuitive and goes against the usage of auto classes and from_pretrained which should be quick and easy.
I tried to use loading hooks in #34038 to keep it BC but to no avail (I could fool the error messages tho :D). So it might be breaking older versions which I think is warranted to ensure having an easy
from_pretrainedcall instead. #29258 tried the same at first to and then opted for a version which doesn't work as expected.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker