Skip to content

Fix synced multi-GPU generation with LLMs and VLMs#35893

Merged
zucchini-nlp merged 3 commits intohuggingface:mainfrom
ManukyanD:fix_multi_gpu_generation
Feb 5, 2025
Merged

Fix synced multi-GPU generation with LLMs and VLMs#35893
zucchini-nlp merged 3 commits intohuggingface:mainfrom
ManukyanD:fix_multi_gpu_generation

Conversation

@ManukyanD
Copy link
Copy Markdown
Contributor

What does this PR do?

Generation with some LLMs and VLMs in synced multi-GPU settings crashes because the cache_position goes out of bounds. The issue was solved for Gemma2 in this PR. I have noticed that this issue is also present in several other LLMs, as well as several VLMs. This PR fixes the issue for Bamba, Bloom, Chameleon, Cohere2, Jamba, MLlama, Qwen2 VL, Qwen2.5 VL and Zamba.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@zucchini-nlp
@gante
@ArthurZucker

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for later review and thanks for fixing! Indeed we need to use same exceptions when model overrides prepare_inputs

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp merged commit d8080d5 into huggingface:main Feb 5, 2025
@ManukyanD ManukyanD deleted the fix_multi_gpu_generation branch February 5, 2025 10:18
@gante
Copy link
Copy Markdown
Contributor

gante commented Feb 5, 2025

@ManukyanD Thank you for the fix 💛

MekkCyber pushed a commit that referenced this pull request Feb 7, 2025
* Fix synced multi-GPU generation

* fix copies

---------

Co-authored-by: Davit Manukyan <ManukyanD>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
elvircrn pushed a commit to elvircrn/transformers that referenced this pull request Feb 13, 2025
* Fix synced multi-GPU generation

* fix copies

---------

Co-authored-by: Davit Manukyan <ManukyanD>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 16, 2025
* Fix synced multi-GPU generation

* fix copies

---------

Co-authored-by: Davit Manukyan <ManukyanD>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants