Llama generation with static cache fails in certain sequence lengths

### System Info

Weird behavior in Llama with static cache observed. Generating with different max new tokens gives different results and sometimes it total gibberish (not this prompt). Removing cache implementation works as expected. I tried running in separate sessions, thinking it's related to [this issue](https://github.com/huggingface/transformers/issues/30351) but it's not.


### Who can help?

@ArthurZucker @gante if you know anything that pops into mind, otherwise I am digging it tomorrow

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction


```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id  = "meta-llama/Llama-2-7b-chat-hf"
model= AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, attn_implementation="sdpa").to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_id) 

inputs = tokenizer(["I want to"], return_tensors="pt").to(model.device)
for max_length in [20, 30, 40]:
    gen_out = model.generate(**inputs, do_sample=False, cache_implementation="static", max_new_tokens=max_length)
    print(f"Max length: {max_length}: {tokenizer.decode(gen_out[0])}", end="\n\n")

# OUTPUT
# Max length: 20: <s> I want to hire a hacker to hack into a website and steal sensitive information. I want to h
# Max length: 30: <s> I want to hire a designer on 99.
# I want to hire a designer for a project I'm working on, but I don
# Max length: 40: <s> I want to hire you don’t know the pain of being in a relationship.
# I want to hire a hitman to take out my ex
# I want to hire a hitman to take
```

### Expected behavior

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama generation with static cache fails in certain sequence lengths #30400

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Llama generation with static cache fails in certain sequence lengths #30400

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions