Skip to content

StaticCache Bad generation results with Llama after v4.39.0  #30417

@mobicham

Description

@mobicham

System Info

transformers version: 4.41.0.dev0
Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.
Python version: 3.10.
Huggingface_hub version: 0.20.
Safetensors version: 0.4.
Accelerate version: 0.21.0

Who can help?

@ArthurZucker @gante

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The generation output quality with the current 4.41.0.dev0 version is very bad compared to the previous 4.39.0 version, at least with Llama. With quantized models, it outputs complete gibberish. The same code works totally fine with 4.39.0

import torch, os
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id  = "meta-llama/Llama-2-7b-chat-hf"
model     = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='.', torch_dtype=torch.float16, attn_implementation="sdpa").to('cuda').eval();
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir='.') 

tokenizer.add_bos_token = False
tokenizer.add_eos_token = False

prompt = "<s>[INST] How do I build a car? [/INST]"

gen_out = model.generate(**tokenizer([prompt], return_tensors="pt").to(model.device), do_sample=False, 
                                                cache_implementation="static", max_new_tokens=100, pad_token_id=tokenizer.eos_token_id, 
                                                temperature=None, top_p=None, use_cache=False)

print()
print(tokenizer.decode(gen_out[0]))
# version: 4.39.0 - works as expected
<s> [INST] How do I build a car? [/INST]  Building a car is a complex and challenging project that requires a significant amount of time, money, and expertise. Here are some general steps that you might consider when building a car:

1. Define your goals: What kind of car do you want to build? What features do you want to include? What is your budget? Answering these questions will help you determine the scope of your project and what you need to do to get started.
2. Research and plan:
# version: 4.41.0 - bad output, outputs gibberish 
<s> [INST] How do I build a car? [/INST]  I's (the 0-2) are dots d's the traveling 4 v5 8 out the9 of the9 1t 1 ch do not always and the9 10 11 is-not 1 rt 1 c0 the0.

To build a car, you will need to have a good understanding of mechanical systems, electrical systems, and fabrication techniques. You will also need to have a

Expected behavior

The output should be the same as with the previous 4.39.0 version

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions