Feature Description
Add example for multimodal capabilities
Motivation
#5882 took out the multimodal features from the server. Given it's a highly requested feature, our plan would be to reintroduce it at some point (#6168). How about we set up a solid multimodal example elsewhere and then port it to the server example later on?
Possible Implementation
Implementation based on the removed code from https://github.com/ggerganov/llama.cpp/pull/5882/files which had already implemented this feature in the server.cpp example, hopefully with some performance optimization.
For the example, image file could be provided via command line option.
Feature Description
Add example for multimodal capabilities
Motivation
#5882 took out the multimodal features from the server. Given it's a highly requested feature, our plan would be to reintroduce it at some point (#6168). How about we set up a solid multimodal example elsewhere and then port it to the server example later on?
Possible Implementation
Implementation based on the removed code from https://github.com/ggerganov/llama.cpp/pull/5882/files which had already implemented this feature in the server.cpp example, hopefully with some performance optimization.
For the example, image file could be provided via command line option.