Skip to content

Fix jamba#41309

Merged
Cyrilvallez merged 10 commits intomainfrom
fix-jamba
Oct 3, 2025
Merged

Fix jamba#41309
Cyrilvallez merged 10 commits intomainfrom
fix-jamba

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented Oct 3, 2025

What does this PR do?

Jamba was mostly destroyed in #40132, this PR fixes it!

Slow tests are good

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 3, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: bamba, falcon_h1, granitemoehybrid, jamba, zamba

Comment on lines -151 to +172
if len(self.key_cache) <= layer_idx:
if len(self.key_cache) <= layer_idx or self.key_cache[layer_idx].shape[-1] == 0:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this check, get_seq_length would return the batch size when the cache is empty, because layers are initialized as torch.tensor([[]] * batch_size, device=device), which is batch size dim for shape[-2].... 🫠🫠🫠🫠

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for cleaning up

Comment thread src/transformers/models/jamba/modular_jamba.py Outdated
Comment thread src/transformers/models/jamba/modular_jamba.py Outdated
Comment thread src/transformers/models/jamba/modular_jamba.py Outdated
@Cyrilvallez Cyrilvallez merged commit c2b3cc3 into main Oct 3, 2025
20 checks passed
@Cyrilvallez Cyrilvallez deleted the fix-jamba branch October 3, 2025 14:54
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
* reactivate tests

* first pass

* fix

* fix bias

* fix and simplify

* finally fix this stupid bug

* add skips

* remove bad stuff

* fix copies

* simplify
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* reactivate tests

* first pass

* fix

* fix bias

* fix and simplify

* finally fix this stupid bug

* add skips

* remove bad stuff

* fix copies

* simplify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants