Skip to content

Fix Llama QnA from_pretrained#34061

Closed
vasqu wants to merge 2 commits intohuggingface:mainfrom
vasqu:llama-for-qna
Closed

Fix Llama QnA from_pretrained#34061
vasqu wants to merge 2 commits intohuggingface:mainfrom
vasqu:llama-for-qna

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented Oct 10, 2024

What does this PR do?

Fixes issues with llama for qna when using from_pretrained to load any base model. Currently, when we load any llama model, we get a 100% mismatch (i.e. everything is randomly initialized). The workaround is to manually save the (base) model and then load it from disk (ref. #30381):

from transformers import AutoModel, AutoModelForQuestionAnswering
model = AutoModel.from_pretrained("meta-llama/llama-2-7b-hf")
model.save_pretrained("base_model")
model = AutoModelForQuestionAnswering.from_pretrained("base_model")

This is very unintuitive and goes against the usage of auto classes and from_pretrained which should be quick and easy.

I tried to use loading hooks in #34038 to keep it BC but to no avail (I could fool the error messages tho :D). So it might be breaking older versions which I think is warranted to ensure having an easy from_pretrained call instead. #29258 tried the same at first to and then opted for a version which doesn't work as expected.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to copy, I expect the issue to be the same over here.

@vasqu vasqu mentioned this pull request Oct 10, 2024
5 tasks
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on the from_pretrained issue that prevents the base_model_prefix from working here!

The only issue with this is the fact that it will completely break BC for people who trained a LlamaForQuestionAnswering as their checkpoints will have transformers instead of model no?

@ArthurZucker
Copy link
Copy Markdown
Collaborator

I have seen issues with from from_pretrained, long due a fix for them 👀

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Oct 10, 2024

Yes, it would be completely breaking :/ I've tried using some loading hooks which didn't work but I don't have the time to properly look into this atm.

Should I close this then?

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Oct 24, 2024

@ArthurZucker Any updates on this? Should I close this? 👀

@ArthurZucker
Copy link
Copy Markdown
Collaborator

Hey sorry no, It's on my TODO list, hope to get to it next week!

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Oct 30, 2024

No worries!

@ArthurZucker
Copy link
Copy Markdown
Collaborator

BTW I think we can close this, #36033 should make it easier to update without breaking

@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Feb 11, 2025

Nice, will close then 👍

@vasqu vasqu closed this Feb 11, 2025
@vasqu vasqu deleted the llama-for-qna branch February 11, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants