System Info
transformers version: 4.43.2
- Platform: Linux-5.15.0-1049-aws-x86_64-with-glibc2.17
- Python version: 3.8.18
- Huggingface_hub version: 0.23.2
- Safetensors version: 0.4.1
- Accelerate version: 0.26.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1+cu118 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
Who can help?
@amyeroberts
@ArthurZucker
Just filling an official issue so that other users can see it.
Some people didn't understand why they had much lower scores with Idefics2, as it's a silent bug, plus the model still gives generations related to the text prompt, so that's hard to notice at first there is a bug.
I also put it in big in the model cards like https://huggingface.co/HuggingFaceM4/idefics2-8b
Information
Tasks
Reproduction
from PIL import Image
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
model = AutoModelForVision2Seq.from_pretrained(
"HuggingFaceM4/idefics2-8b", torch_dtype=torch.bfloat16, device_map="auto", attn_implementation="flash_attention_2"
)
# Create inputs
path_image = "/fsx/hugo/wow_images/cv_0.png"
image = Image.open(path_image)
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What's written on this image?"},
]
},
]
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
# Generate
generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_texts)
Pick any image, this code will give different outputs, depending on the version of Transformers.
It gives correct outputs for version 4.40, and outputs that seem unrelated to the image with newer versions.
This is surely because of the new caching strategy implemented in Transformers, and modeling_idefics2.py doesn't that take into account.
Expected behavior
We should have the same output as with the version 4.40 with the newer versions.
System Info
transformersversion: 4.43.2Who can help?
@amyeroberts
@ArthurZucker
Just filling an official issue so that other users can see it.
Some people didn't understand why they had much lower scores with Idefics2, as it's a silent bug, plus the model still gives generations related to the text prompt, so that's hard to notice at first there is a bug.
I also put it in big in the model cards like https://huggingface.co/HuggingFaceM4/idefics2-8b
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
Pick any image, this code will give different outputs, depending on the version of Transformers.
It gives correct outputs for version 4.40, and outputs that seem unrelated to the image with newer versions.
This is surely because of the new caching strategy implemented in Transformers, and
modeling_idefics2.pydoesn't that take into account.Expected behavior
We should have the same output as with the version 4.40 with the newer versions.