Add epsilon by SimJeg · Pull Request #47 · NVIDIA/kvpress

SimJeg · 2025-02-11T16:39:12Z

Inspired by experiments on CriticalKVPress (#46) I noticed the most important parameter is epsilon. This parameter appears to be a key for big performances boost. In this PR I propose a very simple to the ExpectedAttentionPress to include this epsilon. I get even better perfs using ||WoV|| instead of ||V|| (see branch simon/update-vnorm).

SimJeg · 2025-02-12T09:30:24Z

Below are additional results for x3 and x4 compression. I'm wondering if I should add CriticalKVPress.vwl1norm in this PR too 🤔 cc. @FFY0

maxjeblick

LGTM thanks!

FFY0 · 2025-02-12T12:38:04Z

I'm wondering if I should add CriticalKVPress.vwl1norm in this PR too 🤔 cc. @FFY0

In my opinion, it’s worth adding.😀

This operation provides clear benefits while introducing little overhead in real-world deployment. My co-author, Junlin Lv, and I have developed a Triton kernel that optimizes this computation through kernel fusion, reducing memory usage significantly while maintaining high computational efficiency. We plan to open-source this kernel soon. Even with a naive implementation, I believe its overhead in inference remains negligible.

Moreover, to my best knowledge, this is the first attempt to leverage pre-trained model parameters to identify critical KV cache entries, making it a promising new research direction. I believe this direction is worth exploring further, as incorporating additional pre-trained parameter information could drive meaningful advancements in the future.

SimJeg · 2025-02-12T13:25:36Z

I will merge it as is and we'll investigate later ! Using ||WoV|| instead of ||V|| makes a lot of sense, but the main contribution of this PR is the addition of epsilon which is far more important. I believe it has still to be investigated why this epsilon works so well.

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>

SimJeg added 2 commits February 11, 2025 16:36

Add epsilon

683a6ca

Update badges and version

21abce0

SimJeg requested a review from maxjeblick February 12, 2025 08:11

maxjeblick approved these changes Feb 12, 2025

View reviewed changes

Fix style errors

6bafbf7

maxjeblick approved these changes Feb 12, 2025

View reviewed changes

SimJeg merged commit cc4bf60 into main Feb 12, 2025

SimJeg deleted the simon/update-vnorm-2 branch February 12, 2025 13:36

maxjeblick pushed a commit that referenced this pull request Aug 12, 2025

Add epsilon to ExpectedAttentionPress (#47)

6518ed3

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add epsilon#47

Add epsilon#47
SimJeg merged 3 commits intomainfrom
simon/update-vnorm-2

SimJeg commented Feb 11, 2025

Uh oh!

SimJeg commented Feb 12, 2025 •

edited

Loading

Uh oh!

maxjeblick left a comment

Uh oh!

FFY0 commented Feb 12, 2025

Uh oh!

SimJeg commented Feb 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SimJeg commented Feb 11, 2025

Uh oh!

SimJeg commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxjeblick left a comment

Choose a reason for hiding this comment

Uh oh!

FFY0 commented Feb 12, 2025

Uh oh!

SimJeg commented Feb 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SimJeg commented Feb 12, 2025 •

edited

Loading