-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
Thank you for your interesting and valuable work!
I have a question regarding the evaluation. According to the input prompt, the model is asked to find out the hallucination phase from a given description and output it in a list. During my experiments, I noticed that the model (Qwen2.5VL, InternVL3) sometimes outputs a full sentence or several sentences, rather than a specific word or concise phrase.
I would greatly appreciate your guidance on the following:
-
What is the expected granularity for the hallucination phase (e.g., should it ideally be a word, a short phrase, an entire sentence, or could multiple sentences also be acceptable)?
-
When the model outputs longer text (such as one or more sentences) instead of a brief phrase, do you have recommendations on how to evaluate such outputs?
I would appreciate your advice or suggestions on how to handle this case in evaluation.
Thank you again for your great work!