Skip to content

Transformers compatability#115

Merged
alessiodevoto merged 138 commits intomainfrom
max/transformers_compat
Sep 3, 2025
Merged

Transformers compatability#115
alessiodevoto merged 138 commits intomainfrom
max/transformers_compat

Conversation

@maxjeblick
Copy link
Copy Markdown
Collaborator

PR description

Updates kvpress to upcoming transformers version 4.56.0.

Follwoing issues/PRs need to be fixed before merging this PR:

PR was tested on huggingface/transformers#40002

TODOS:

  • rerun (subset) of ruler benchmark to ensure no regression

Checklist

  • Tests are working (make test)
  • Code is formatted correctly (make style, on errors try fix with make format)
  • Copyright header is included
  • All commits are signed-off using git commit -s
  • (new press) mypress_press.py is in the presses directory
  • (new press) MyPress is in __init__.py
  • (new press) README.md is updated with a 1 liner about the new press in the Available presses section
  • (new press) New press is in the default_presses list in tests/default_presses.py
  • (new press) A docstring is provided that follows the same structure as the existing ones

@maxjeblick maxjeblick changed the title Max/transformers compat Transformers compatability Aug 12, 2025
@maxjeblick
Copy link
Copy Markdown
Collaborator Author

To test with current transformers PR being open:
uv pip install git+https://github.com/vasqu/transformers.git@fix-fa-integration

@maxjeblick maxjeblick force-pushed the max/transformers_compat branch from 66d37d9 to e74d852 Compare August 12, 2025 14:02
maxjeblick and others added 26 commits August 12, 2025 16:14
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
* Add DuoAttentionPress

* Fix tests and compression_ratio

* Address feedback

* Update plot

* Update version

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: SimJeg <sjegou@nvidia.com>
Co-authored-by: giulio98 <corallo.giulio@yahoo.it>
Co-authored-by: miriam-16 <miriam.lamari2@gmail.com>
Co-authored-by: FaureElia <s283469@studenti.polito.it>
Co-authored-by: YuhuiXu <yuhuixu1993@126.com>
Co-authored-by: win10 <doss72180@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
@manueldeprada
Copy link
Copy Markdown

Also note that now we compute the flash attention varlen kwargs early in generate:
https://github.com/vasqu/transformers/blob/cb89dbee4694ca1fec3d5733c296b729291bb507/src/transformers/generation/utils.py#L681-L692

to avoid expensive recomputations on each attention pass. This is the case since huggingface/transformers#39474 and huggingface/transformers#40002

Unfortunately, this means kvpress might cause an out of bound access, since the condition assert keys.shape[0] == cu_seqlens_k[-1] must hold and it doesnt if kvpress changes the keys and the computation of cu_seqlens_k happened earlier, in generate.

Have a look at 9986c31 as a possible fix. This script tests for such failure:

import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
from transformers import AutoModelForCausalLM, AutoTokenizer
from kvpress import KnormPress

ckpt = "meta-llama/Meta-Llama-3-8B-Instruct"
device = "cuda"

model = AutoModelForCausalLM.from_pretrained(ckpt, torch_dtype="auto").to(device)
model.set_attn_implementation("flash_attention_2")
tok = AutoTokenizer.from_pretrained(ckpt)
inputs = tok("Hello, how are you? bla bla how are you? this is some text lala ddd", return_tensors="pt").to(device)

with KnormPress(0.8)(model):
    outputs = model.generate(**inputs, max_new_tokens=10, do_sample=False)

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
@maxjeblick
Copy link
Copy Markdown
Collaborator Author

Thanks a lot for the comments @manueldeprada, this is really appreciated!

Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
@maxjeblick maxjeblick mentioned this pull request Aug 14, 2025
maxjeblick and others added 13 commits August 14, 2025 10:13
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: Max Jeblick <maximilianjeblick@gmail.com>
Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Sep 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: alessiodevoto <devoto.alessio@gmail.com>
@alessiodevoto
Copy link
Copy Markdown
Collaborator

/ok to test f37b768

@alessiodevoto
Copy link
Copy Markdown
Collaborator

Thanks a lot @maxjeblick for handling this, and @Jack-Yu-815 for the review!

@alessiodevoto alessiodevoto merged commit 7dbd3f0 into main Sep 3, 2025
3 checks passed
@alessiodevoto alessiodevoto deleted the max/transformers_compat branch September 3, 2025 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.