Llama: partial 4d masks#29731
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for fixing this and the tests ❤️
|
@gante, thank you for the PR! I have couple of suggestions:
|
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
Hi @poedator 👋 We have other bug fixes and more impactful features in our pipeline, so I'll not work on your suggestions (at least not for now) :) However, we're always open to PRs! |
* partial 4d masks * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
What does this PR do?
Reintroduces support for partial 4D masks in Llama (and other models with support for the static cache).
Fixes #29525
Thank you @poedator for a clean description and a test case -- I was unaware our previous versions supported this use of 4D attention masks 🤗