Skip to content

Add epsilon#47

Merged
SimJeg merged 3 commits intomainfrom
simon/update-vnorm-2
Feb 12, 2025
Merged

Add epsilon#47
SimJeg merged 3 commits intomainfrom
simon/update-vnorm-2

Conversation

@SimJeg
Copy link
Copy Markdown
Collaborator

@SimJeg SimJeg commented Feb 11, 2025

Inspired by experiments on CriticalKVPress (#46) I noticed the most important parameter is epsilon. This parameter appears to be a key for big performances boost. In this PR I propose a very simple to the ExpectedAttentionPress to include this epsilon. I get even better perfs using ||WoV|| instead of ||V|| (see branch simon/update-vnorm).

@SimJeg SimJeg requested a review from maxjeblick February 12, 2025 08:11
@SimJeg
Copy link
Copy Markdown
Collaborator Author

SimJeg commented Feb 12, 2025

Below are additional results for x3 and x4 compression. I'm wondering if I should add CriticalKVPress.vwl1norm in this PR too 🤔 cc. @FFY0

image

Copy link
Copy Markdown
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@FFY0
Copy link
Copy Markdown
Contributor

FFY0 commented Feb 12, 2025

I'm wondering if I should add CriticalKVPress.vwl1norm in this PR too 🤔 cc. @FFY0

In my opinion, it’s worth adding.😀

This operation provides clear benefits while introducing little overhead in real-world deployment. My co-author, Junlin Lv, and I have developed a Triton kernel that optimizes this computation through kernel fusion, reducing memory usage significantly while maintaining high computational efficiency. We plan to open-source this kernel soon. Even with a naive implementation, I believe its overhead in inference remains negligible.

Moreover, to my best knowledge, this is the first attempt to leverage pre-trained model parameters to identify critical KV cache entries, making it a promising new research direction. I believe this direction is worth exploring further, as incorporating additional pre-trained parameter information could drive meaningful advancements in the future.

@SimJeg
Copy link
Copy Markdown
Collaborator Author

SimJeg commented Feb 12, 2025

I will merge it as is and we'll investigate later ! Using ||WoV|| instead of ||V|| makes a lot of sense, but the main contribution of this PR is the addition of epsilon which is far more important. I believe it has still to be investigated why this epsilon works so well.

@SimJeg SimJeg merged commit cc4bf60 into main Feb 12, 2025
@SimJeg SimJeg deleted the simon/update-vnorm-2 branch February 12, 2025 13:36
maxjeblick pushed a commit that referenced this pull request Aug 12, 2025
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants