[Important] Added README to the Qwen2VL implementation#11642
[Important] Added README to the Qwen2VL implementation#11642samkoesnadi wants to merge 3 commits intoggml-org:masterfrom
Conversation
* Also allows no --image argument cli to the qwen2vl-cli
|
@HimariO @ggerganov @tc-mb sorry for tagging, just if you have the time to review this short PR :) |
|
Thanks for your invitation, but I'm sorry that I can't give a very accurate answer. This may require gg to decide whether to make all multi-modal cli support only text mode. It is also possible that gg has answered it a long time ago, but I can't find it. At present, I understand that this judgment comes from the earliest llava support. Perhaps you can find some clues from the earliest PR. I hope what I know can help you. |
|
|
||
| *Have fun with the models ! :)* | ||
|
|
||
| ## Limitations |
There was a problem hiding this comment.
We should probably mention the fact that the vision model(clip.cpp) currently had its GPU backend support disabled #10896
There was a problem hiding this comment.
I have just added this in the limitations section...
|
Great work on the Qwen2VL README! I forgot to include it with the original Qwen2VL PR, but I think it covers all the essential information needed to use the CLI tool. I'm not entirely sure about adding a text-only mode to the CLI, as that usage scenario would be better supported by integrating the Qwen2VL gguf model (the LLM component) with llama-cli(or just use Qwen2 LLM instead). |
Undo changes on qwen2vl-cli
You have a fair point. It is also wise to conform to other LLAVA models cli implementations. I have undo the changes on the cli, back to how it was. Thanks :D |
That makes sense. I originally intended this as in my use-case I used visual and text-only in the same session. However, the cli is not intended for chat session anyway. So, I removed the changes in the cli. Thank you :D |
It took me sometime to figure out how to use Qwen2VL CLI and also how to do the conversion. After looking into the code, here is my documentation about it. For respective contributors, please feel free to correct me if there is something missing.
Additionally, there are use-cases where we just want to do text-only prediction without image using Qwen2VL (while running a chat for example). So, this PR also allows no --image argument cli to the qwen2vl-cli. Let me know if you think --image has to exist, otherwise this change is actually not intrusive at all.