Skip to content

[CB] Refactors the way we access paged#41370

Merged
ArthurZucker merged 6 commits intomainfrom
fix-kernels-cb
Oct 6, 2025
Merged

[CB] Refactors the way we access paged#41370
ArthurZucker merged 6 commits intomainfrom
fix-kernels-cb

Conversation

@ArthurZucker
Copy link
Collaborator

What does this PR do?

The user decides on the function, but CB handles itself which "interface" wrapper to use. This should make stuff easier

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@remi-or remi-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Not sure we can run slow tests for this, but it's worth making sure the example still works with paged|eager, paged|sdpa or with paged|flash_attention_2 -- kernels is not available on AMD so we need the classic FA package.

"eager_paged": eager_paged_attention_forward,
"paged|flash_attention2": paged_attention_forward,
"paged|sdpa": sdpa_attention_paged_forward,
"paged|eager": eager_paged_attention_forward,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should paged|flex_attention be an option as well? I see it listed below in the tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not supported yet AFAIK

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good to know 👍 ty

@ArthurZucker ArthurZucker merged commit 0395ed5 into main Oct 6, 2025
26 checks passed
@ArthurZucker ArthurZucker deleted the fix-kernels-cb branch October 6, 2025 15:55
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* up

* refactor the way we handle paged attention

* affect serve as well

* update

* fix

* cup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments