[Important] Added README to the Qwen2VL implementation by samkoesnadi · Pull Request #11642 · ggml-org/llama.cpp

samkoesnadi · 2025-02-04T05:20:41Z

It took me sometime to figure out how to use Qwen2VL CLI and also how to do the conversion. After looking into the code, here is my documentation about it. For respective contributors, please feel free to correct me if there is something missing.

Additionally, there are use-cases where we just want to do text-only prediction without image using Qwen2VL (while running a chat for example). So, this PR also allows no --image argument cli to the qwen2vl-cli. Let me know if you think --image has to exist, otherwise this change is actually not intrusive at all.

* Also allows no --image argument cli to the qwen2vl-cli

samkoesnadi · 2025-02-04T05:41:30Z

@HimariO @ggerganov @tc-mb sorry for tagging, just if you have the time to review this short PR :)

tc-mb · 2025-02-06T07:51:18Z

Thanks for your invitation, but I'm sorry that I can't give a very accurate answer.

This may require gg to decide whether to make all multi-modal cli support only text mode. It is also possible that gg has answered it a long time ago, but I can't find it.

At present, I understand that this judgment comes from the earliest llava support. Perhaps you can find some clues from the earliest PR.

I hope what I know can help you.

HimariO · 2025-02-09T08:18:21Z

+
+*Have fun with the models ! :)*
+
+## Limitations


We should probably mention the fact that the vision model(clip.cpp) currently had its GPU backend support disabled #10896

I have just added this in the limitations section...

HimariO · 2025-02-09T08:30:19Z

Great work on the Qwen2VL README! I forgot to include it with the original Qwen2VL PR, but I think it covers all the essential information needed to use the CLI tool.

I'm not entirely sure about adding a text-only mode to the CLI, as that usage scenario would be better supported by integrating the Qwen2VL gguf model (the LLM component) with llama-cli(or just use Qwen2 LLM instead).

Undo changes on qwen2vl-cli

samkoesnadi · 2025-02-09T08:42:47Z

Thanks for your invitation, but I'm sorry that I can't give a very accurate answer.

This may require gg to decide whether to make all multi-modal cli support only text mode. It is also possible that gg has answered it a long time ago, but I can't find it.

At present, I understand that this judgment comes from the earliest llava support. Perhaps you can find some clues from the earliest PR.

I hope what I know can help you.

You have a fair point. It is also wise to conform to other LLAVA models cli implementations. I have undo the changes on the cli, back to how it was. Thanks :D

samkoesnadi · 2025-02-09T08:46:18Z

Great work on the Qwen2VL README! I forgot to include it with the original Qwen2VL PR, but I think it covers all the essential information needed to use the CLI tool.

I'm not entirely sure about adding a text-only mode to the CLI, as that usage scenario would be better supported by integrating the Qwen2VL gguf model (the LLM component) with llama-cli(or just use Qwen2 LLM instead).

That makes sense. I originally intended this as in my use-case I used visual and text-only in the same session. However, the cli is not intended for chat session anyway. So, I removed the changes in the cli.

Thank you :D

Added README

23bce61

* Also allows no --image argument cli to the qwen2vl-cli

github-actions Bot added the examples label Feb 4, 2025

Remove white trailing

8777473

la1ty mentioned this pull request Feb 9, 2025

Add minicpm-o and qwen2-vl to the list of supported multimodal models. abetlen/llama-cpp-python#1904

Open

HimariO reviewed Feb 9, 2025

View reviewed changes

Added GPU support on qwen2vl readme

185e1b1

Undo changes on qwen2vl-cli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Important] Added README to the Qwen2VL implementation#11642

[Important] Added README to the Qwen2VL implementation#11642
samkoesnadi wants to merge 3 commits intoggml-org:masterfrom
samkoesnadi:feat/qwen2vl-readme

samkoesnadi commented Feb 4, 2025

Uh oh!

samkoesnadi commented Feb 4, 2025

Uh oh!

tc-mb commented Feb 6, 2025

Uh oh!

HimariO Feb 9, 2025

Uh oh!

samkoesnadi Feb 9, 2025

Uh oh!

HimariO commented Feb 9, 2025

Uh oh!

samkoesnadi commented Feb 9, 2025

Uh oh!

samkoesnadi commented Feb 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

samkoesnadi commented Feb 4, 2025

Uh oh!

samkoesnadi commented Feb 4, 2025

Uh oh!

tc-mb commented Feb 6, 2025

Uh oh!

HimariO Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

samkoesnadi Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

HimariO commented Feb 9, 2025

Uh oh!

samkoesnadi commented Feb 9, 2025

Uh oh!

samkoesnadi commented Feb 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants