Skip to content

FIx base model loading#36581

Closed
SunMarc wants to merge 5 commits intomainfrom
base-model-loading
Closed

FIx base model loading#36581
SunMarc wants to merge 5 commits intomainfrom
base-model-loading

Conversation

@SunMarc
Copy link
Copy Markdown
Member

@SunMarc SunMarc commented Mar 6, 2025

What does this PR do?

Fixes #36579
This PR fixes loading into a base model when using a checkpoint from a non base model.
The solution is a bit hacky for now but it will be cleaned after @Cyrilvallez refactor which deals with all these prefix logic.

The following fails:

from transformers import AutoModel
model = AutoModel.from_pretrained("meta-llama/Llama-3.1-8B-Instruct", device_map="auto")

@github-actions github-actions Bot marked this pull request as draft March 6, 2025 11:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 6, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@SunMarc SunMarc marked this pull request as ready for review March 6, 2025 11:22
@SunMarc SunMarc requested a review from ArthurZucker March 6, 2025 11:22
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jiqing-feng
Copy link
Copy Markdown
Contributor

Verified on my issue.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay trusting you on this one, can you add a test with that legacy model? Will help!

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob needed for a quick fix

@Cyrilvallez
Copy link
Copy Markdown
Member

Just checked, and it was fixed by #36033!

@SunMarc
Copy link
Copy Markdown
Member Author

SunMarc commented Mar 12, 2025

thanks @Cyrilvallez !

@SunMarc SunMarc closed this Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AutoModel failed with empty tensor error

5 participants