`StaticCache` Bad generation results with Llama after v4.39.0 

### System Info

transformers version: 4.41.0.dev0
Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.
Python version: 3.10.
Huggingface_hub version: 0.20.
Safetensors version: 0.4.
Accelerate version: 0.21.0


### Who can help?

@ArthurZucker @gante

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

The generation output quality with the current 4.41.0.dev0 version is very bad compared to the previous 4.39.0 version, at least with Llama. With quantized models, it outputs complete gibberish. The same code works totally fine with 4.39.0

```Python
import torch, os
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id  = "meta-llama/Llama-2-7b-chat-hf"
model     = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='.', torch_dtype=torch.float16, attn_implementation="sdpa").to('cuda').eval();
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir='.') 

tokenizer.add_bos_token = False
tokenizer.add_eos_token = False

prompt = "<s>[INST] How do I build a car? [/INST]"

gen_out = model.generate(**tokenizer([prompt], return_tensors="pt").to(model.device), do_sample=False, 
                                                cache_implementation="static", max_new_tokens=100, pad_token_id=tokenizer.eos_token_id, 
                                                temperature=None, top_p=None, use_cache=False)

print()
print(tokenizer.decode(gen_out[0]))
```

```
# version: 4.39.0 - works as expected
<s> [INST] How do I build a car? [/INST]  Building a car is a complex and challenging project that requires a significant amount of time, money, and expertise. Here are some general steps that you might consider when building a car:

1. Define your goals: What kind of car do you want to build? What features do you want to include? What is your budget? Answering these questions will help you determine the scope of your project and what you need to do to get started.
2. Research and plan:
```

```
# version: 4.41.0 - bad output, outputs gibberish 
<s> [INST] How do I build a car? [/INST]  I's (the 0-2) are dots d's the traveling 4 v5 8 out the9 of the9 1t 1 ch do not always and the9 10 11 is-not 1 rt 1 c0 the0.

To build a car, you will need to have a good understanding of mechanical systems, electrical systems, and fabrication techniques. You will also need to have a
```



### Expected behavior

The output should be the same as with the previous 4.39.0 version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`StaticCache` Bad generation results with Llama after v4.39.0 #30417

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

StaticCache Bad generation results with Llama after v4.39.0 #30417

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`StaticCache` Bad generation results with Llama after v4.39.0 #30417