Skip to content

Bump transformers to 4.56.1#136

Merged
jackzhxng merged 6 commits intohuggingface:mainfrom
jackzhxng:jz/bump-transformers-2
Sep 16, 2025
Merged

Bump transformers to 4.56.1#136
jackzhxng merged 6 commits intohuggingface:mainfrom
jackzhxng:jz/bump-transformers-2

Conversation

@jackzhxng
Copy link
Copy Markdown
Collaborator

@jackzhxng jackzhxng commented Sep 4, 2025

Some changes needed for the bump:

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jackzhxng
Copy link
Copy Markdown
Collaborator Author

Looks like we need to fix some stuff 😂

)
num_heads = getattr(config, "num_key_value_heads", config.num_attention_heads)
head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
self.early_initialization(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this doing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In pr description)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linked pr description says that cache_position will also be removed. Is it talking about cache maintaining cache_position? Something to keep track of

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's the part on early_initialization - basically you need to call this to initialize some of the attributes you need on each cache layer

Comment on lines +483 to +487
args=(),
kwargs={
"input_ids": input_ids,
"cache_position": cache_position,
},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel
Copy link
Copy Markdown
Collaborator

ok looks largely benign as in things that have to be done to make it work. But I would suggest you put comments around the changes to make it easier to review. I dont know why all the changes are needed

@jackzhxng jackzhxng merged commit 828ae02 into huggingface:main Sep 16, 2025
65 of 81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants