revert `_prepare_4d_causal_attention_mask_with_cache_position` for gpt2 by jiqing-feng · Pull Request #41806 · huggingface/transformers

jiqing-feng · 2025-10-23T07:22:16Z

The PR #39754 deleted _prepare_4d_causal_attention_mask_with_cache_position on gpt2, which caused 40% performance regression on CPU. You can reproduce it by
numactl -C 0-7 --membind 0 python test.py

import time
import torch
from transformers import pipeline, set_seed, AutoTokenizer

set_seed(42)

model_id = "openai-community/gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.padding_side = 'left'
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

pipe = pipeline("text-generation", model=model_id, tokenizer=tokenizer, torch_dtype=torch.float16, device_map="cpu")

generation_config = pipe.model.generation_config
generation_config.do_sample = False
generation_config.use_cache = True
generation_config.max_new_tokens = 128
generation_config.min_new_tokens = 128
generation_config.cache_implementation="static"
generation_config.temperature = 1.0
generation_config.top_p = 1.0
generation_config.num_beams = 1
pipe.model.config._attn_implementation="sdpa"

inputs = "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors"

for _ in range(5):
    set_seed(42)
    pipe(inputs, generation_config=generation_config)

for _ in range(5):
    set_seed(42)
    start = time.time()
    pipe(inputs, generation_config=generation_config)
    end = time.time()
    print(f"{pipe.model.dtype} time costs {(end-start)*1000} ms")

Revert _prepare_4d_causal_attention_mask_with_cache_position can fix the regression.

github-actions · 2025-10-23T07:23:17Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt2

jiqing-feng · 2025-10-23T07:25:21Z

run-slow: gpt2

vasqu

It's the same issue as in #41639 (vmap in causal masking), I already discussed with @Cyrilvallez that we will add a non-vmap path to the mask creations. This will revert the perf regressions, so I'd like you to wait for the PR instead as we don't want to introduce old functions (we want to deprecate) back into the code.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2025-10-24T01:22:51Z

OK, please let me know when the PR is ready. Thanks!

jiqing-feng marked this pull request as ready for review October 23, 2025 07:27

github-actions Bot requested a review from ArthurZucker October 23, 2025 07:27

vasqu reviewed Oct 23, 2025

View reviewed changes

revert _prepare_4d_causal_attention_mask_with_cache_position for gpt2

0bbd3dc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng closed this Oct 24, 2025

jiqing-feng deleted the gpt2 branch December 15, 2025 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revert `_prepare_4d_causal_attention_mask_with_cache_position` for gpt2#41806

revert `_prepare_4d_causal_attention_mask_with_cache_position` for gpt2#41806
jiqing-feng wants to merge 1 commit intohuggingface:mainfrom
jiqing-feng:gpt2

jiqing-feng commented Oct 23, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Oct 23, 2025

Uh oh!

jiqing-feng commented Oct 23, 2025

Uh oh!

vasqu left a comment

Uh oh!

jiqing-feng commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiqing-feng commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Oct 23, 2025

Uh oh!

jiqing-feng commented Oct 23, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

jiqing-feng commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiqing-feng commented Oct 23, 2025 •

edited

Loading