Add ThinKPress by SimJeg · Pull Request #20 · NVIDIA/kvpress

SimJeg · 2024-11-28T17:21:26Z

Implementation of ThinKV following #18

SimJeg · 2024-11-28T17:27:43Z

@yuhuixu1993 here is a proposal for the implementation of ThinKPress based on our previous discussions in #18:

As in your proposal, I'm zeroing the pruned dimensions
I added support to optionally combine it with any press, and in my first experiments it worked great ! As you required, the inner press is applied before ThinK
I also ensured support for quantization

Please confirm this press is what you want. I explicitely mentioned you in the docstring as a reviewer.

ThinKPress is the first press in this repo that compress the channel dimension, hence the code is bit more complex. If other similar presses are proposed, we will refactor the code to make the implementation easier (e.g. we can imagine a SequenceBasePress and a DimensionBasePress with options to compose them in one way or the other).

yuhuixu1993 · 2024-11-29T02:08:41Z

@SimJeg, It looks awesome!!! Thanks for your hard work!! Sorry for the late response as we have time difference(I live in Singapore). While I notice that in this implementation channel pruning is applied in all tokens in key cache, we prefer to keep the most recent tokens unchanged e.g. 32. I did not test the performance of current evaluations if we prune all the tokens.
In my previous PR:

keys = torch.cat([keys[:, :, :q_len - self.window_size, :].masked_fill(mask_k, 0), keys[:, :, q_len - self.window_size:, :]], dim=-2)

SimJeg · 2024-12-02T08:53:49Z

I will update. Are you using 32 for the window size too ? (I've been using 64 as in the default of SnapKV).

yuhuixu1993 · 2024-12-02T10:09:01Z

Thanks. Yes, I use 32 in the paper, but 64 is definitely OK.

maxjeblick

Thanks a lot for this PR.
From the technical side, @yuhuixu1993 already gave some great feedback.

I left a comment regarding RoPE computation where I'm not sure about the dimensions.

Apart from that, it would be great to add an additional test that tests the inner_press functionality, as well.

SimJeg · 2024-12-03T13:23:14Z

@yuhuixu1993 I tried with and without 0-ing the channels for the last 32 tokens and did not see any difference in the prompt I tried. May I keep the current version ? I'm asking because if other similar presses come, would be nice to have a uniform API and not very custom changes. Also to implement what you ask we need to slightly update the compression ratio to take into account the 32*n_pruned_channels elements that are not removed.

yuhuixu1993 · 2024-12-03T15:13:21Z

@SimJeg I think it is OK with current version， as the performance is OK， many thanks for the experiments！！

maxjeblick

Thanks a lot LGTM!

I inspected if module.rotary_emb(query_states, q_len) is needed, and it seems that

mixtral
gpt neox
open_llama
idefics (cross attention)

are using it. As transformers converges to using module.rotary_emb(query_states, position_ids) , I don't think we need to support these models.

Add ThinKPress (NVIDIA#20)

Add ThinKPress

a6ef923

SimJeg mentioned this pull request Nov 28, 2024

Add ThinK_Press #18

Closed

SimJeg requested a review from maxjeblick November 28, 2024 17:42

maxjeblick reviewed Dec 2, 2024

View reviewed changes

Comment thread kvpress/presses/think_press.py Outdated

maxjeblick reviewed Dec 2, 2024

View reviewed changes

Comment thread kvpress/presses/think_press.py

maxjeblick reviewed Dec 2, 2024

View reviewed changes

SimJeg mentioned this pull request Dec 3, 2024

Refactor press implementation #21

Closed

SimJeg added 2 commits December 3, 2024 13:44

address pr feedback

f40e0fc

add unsqueeze(1)

5659128

update version

dd6fc64

maxjeblick reviewed Dec 3, 2024

View reviewed changes

maxjeblick approved these changes Dec 3, 2024

View reviewed changes

SimJeg merged commit ac2445e into main Dec 3, 2024

SimJeg deleted the simon/think-press branch December 3, 2024 15:29

FFY0 added a commit to FFY0/AdaKV-in-NVIDIA-kvpress that referenced this pull request Dec 6, 2024

Merge pull request #1 from NVIDIA/main

1265921

Add ThinKPress (NVIDIA#20)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ThinKPress#20

Add ThinKPress#20
SimJeg merged 4 commits intomainfrom
simon/think-press

SimJeg commented Nov 28, 2024

Uh oh!

SimJeg commented Nov 28, 2024 •

edited

Loading

Uh oh!

yuhuixu1993 commented Nov 29, 2024 •

edited

Loading

Uh oh!

SimJeg commented Dec 2, 2024

Uh oh!

yuhuixu1993 commented Dec 2, 2024

Uh oh!

Uh oh!

Uh oh!

maxjeblick left a comment

Uh oh!

SimJeg commented Dec 3, 2024

Uh oh!

yuhuixu1993 commented Dec 3, 2024

Uh oh!

maxjeblick left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SimJeg commented Nov 28, 2024

Uh oh!

SimJeg commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuhuixu1993 commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimJeg commented Dec 2, 2024

Uh oh!

yuhuixu1993 commented Dec 2, 2024

Uh oh!

Uh oh!

Uh oh!

maxjeblick left a comment

Choose a reason for hiding this comment

Uh oh!

SimJeg commented Dec 3, 2024

Uh oh!

yuhuixu1993 commented Dec 3, 2024

Uh oh!

maxjeblick left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SimJeg commented Nov 28, 2024 •

edited

Loading

yuhuixu1993 commented Nov 29, 2024 •

edited

Loading