System Info
transformers version: 4.41.0.dev0
Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.
Python version: 3.10.
Huggingface_hub version: 0.20.
Safetensors version: 0.4.
Accelerate version: 0.21.0
Who can help?
@ArthurZucker @gante
Information
Tasks
Reproduction
The generation output quality with the current 4.41.0.dev0 version is very bad compared to the previous 4.39.0 version, at least with Llama. With quantized models, it outputs complete gibberish. The same code works totally fine with 4.39.0
import torch, os
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='.', torch_dtype=torch.float16, attn_implementation="sdpa").to('cuda').eval();
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir='.')
tokenizer.add_bos_token = False
tokenizer.add_eos_token = False
prompt = "<s>[INST] How do I build a car? [/INST]"
gen_out = model.generate(**tokenizer([prompt], return_tensors="pt").to(model.device), do_sample=False,
cache_implementation="static", max_new_tokens=100, pad_token_id=tokenizer.eos_token_id,
temperature=None, top_p=None, use_cache=False)
print()
print(tokenizer.decode(gen_out[0]))
# version: 4.39.0 - works as expected
<s> [INST] How do I build a car? [/INST] Building a car is a complex and challenging project that requires a significant amount of time, money, and expertise. Here are some general steps that you might consider when building a car:
1. Define your goals: What kind of car do you want to build? What features do you want to include? What is your budget? Answering these questions will help you determine the scope of your project and what you need to do to get started.
2. Research and plan:
# version: 4.41.0 - bad output, outputs gibberish
<s> [INST] How do I build a car? [/INST] I's (the 0-2) are dots d's the traveling 4 v5 8 out the9 of the9 1t 1 ch do not always and the9 10 11 is-not 1 rt 1 c0 the0.
To build a car, you will need to have a good understanding of mechanical systems, electrical systems, and fabrication techniques. You will also need to have a
Expected behavior
The output should be the same as with the previous 4.39.0 version
System Info
transformers version: 4.41.0.dev0
Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.
Python version: 3.10.
Huggingface_hub version: 0.20.
Safetensors version: 0.4.
Accelerate version: 0.21.0
Who can help?
@ArthurZucker @gante
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
The generation output quality with the current 4.41.0.dev0 version is very bad compared to the previous 4.39.0 version, at least with Llama. With quantized models, it outputs complete gibberish. The same code works totally fine with 4.39.0
Expected behavior
The output should be the same as with the previous 4.39.0 version