Fix cache update! by Cyrilvallez · Pull Request #38046 · huggingface/transformers

Cyrilvallez · 2025-05-09T14:46:08Z

What does this PR do?

As per the title. #37873 broke the cache update when going beyond the sliding window, see my comment here.
This PR fixes it.
This also incorporates the issue mentioned in #37574! TLDR, the order of operations here is important as we check strict inequality!!
Correcteness can be verified with

model_id = "google/gemma-2-9b-it"
device = 0


tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map=device)



chat1 = [  # This size + the new tokens is > than sliding window
  {"role": "user", "content": "This is a nice place. " * 675 + "\n\nForget about the previous text, and tell me who you are?"},
]
prompt1 = tokenizer.apply_chat_template(chat1, tokenize=False, add_generation_prompt=True)
chat2 = [
  {"role": "user", "content": "create a list of at least 10 colors please"},
]
prompt2 = tokenizer.apply_chat_template(chat2, tokenize=False, add_generation_prompt=True)

inputs = tokenizer([prompt1, prompt2], padding=True, return_tensors="pt").to(0 if device == "auto" else device)

print(f"Sliding window: {getattr(model.config, 'sliding_window', None)}")
print(f"Input size: {inputs.input_ids.shape}")
# print(inputs.keys())





cache = "hybrid"

compile_config = CompileConfig(fullgraph=False)
out = model.generate(**inputs, do_sample=False, max_new_tokens=100, cache_implementation=cache, compile_config=compile_config)

text = tokenizer.batch_decode(out[:, inputs.input_ids.shape[-1] :], skip_special_tokens=False)
print("\n\n")
for seq in text:
    print("NEW SEQ:")
    print(seq)

It use to generate correctly for both the sequence > sliding window and the padded sequence, and now generates very badly. This fixes it once and for all.

github-actions · 2025-05-09T14:46:21Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Cyrilvallez · 2025-05-09T14:46:26Z

cc @gante

HuggingFaceDocBuilderDev · 2025-05-09T14:59:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Lol little but destructive!

* fix slicing * better fix

gante · 2025-05-20T14:37:46Z

@Cyrilvallez indeed, my previous PR fixed a small (tested) issue but created a large (untested) issue 🙈

fix slicing

c1d608a

github-actions Bot marked this pull request as draft May 9, 2025 14:46

Cyrilvallez marked this pull request as ready for review May 9, 2025 14:46

better fix

cc5c009

Cyrilvallez mentioned this pull request May 9, 2025

Wrong KV cache update for sliding-window attention (SWA) layers when total sequence length reaches window size #37574

Closed

ArthurZucker approved these changes May 9, 2025

View reviewed changes

Cyrilvallez merged commit aaed2f5 into main May 9, 2025
21 checks passed

Cyrilvallez deleted the fix-cache branch May 9, 2025 15:54

manueldeprada mentioned this pull request May 10, 2025

New cache tests and modular Hybrid Cache #37972

Merged

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

Fix cache update! (huggingface#38046)

991499e

* fix slicing * better fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cache update!#38046

Fix cache update!#38046
Cyrilvallez merged 2 commits intomainfrom
fix-cache

Cyrilvallez commented May 9, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented May 9, 2025

Uh oh!

Cyrilvallez commented May 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 9, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

gante commented May 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Cyrilvallez commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions Bot commented May 9, 2025

Uh oh!

Cyrilvallez commented May 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 9, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gante commented May 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented May 9, 2025 •

edited

Loading