[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
-
Updated
Oct 16, 2024 - Python
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
WACV 2024 Papers: Discover cutting-edge research from WACV 2024, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"
A deep learning project to tell a story with an image or a video.
Official implementation of "UniMedVL: Unifying Medical Multimodal Understanding and Generation through Observation-Knowledge-Analysis" - A unified medical vision-language model that integrates multimodal understanding and generation capabilities.
HumanVLM (LLaVA-based): Foundation for Human-Scene Vision-Language Model (Journal of Information Fusion 2025)
🖼️📄E2E Multi-modal Document Preprocessing with Azure Document Intelligence
OllamaMulti-RAG 🚀 is a multimodal AI chat app combining Whisper AI for audio, LLaVA for images, and Chroma DB for PDFs, enhanced with Ollama and OpenAI API. 📄 Built for AI enthusiasts, it welcomes contributions—features, bug fixes, or optimizations—to advance practical multimodal AI research and development collaboratively.
Annuncio generates product advertisements from user inputs, utilizing Aria for descriptions, Allegro for promotional videos, and hashtags for social media discoverability.
Agent ADK Chainlit is a sophisticated multi-modal conversational AI application built on Chainlit that integrates Google ADK agents, document intelligence, web search, and persistent storage capabilities
Add a description, image, and links to the image-understanding topic page so that developers can more easily learn about it.
To associate your repository with the image-understanding topic, visit your repo's landing page and select "manage topics."