The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
-
Updated
Dec 7, 2025 - TypeScript
The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
Reliable Automation Agents at Scale
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
Modern video analytics with VLM
Run Surfer-H agents powered by Holo1 using the Surfer-H-CLI. Includes example tasks, scripts, and configurations.
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
Official n8n custom node for VLM Run
A Node-based CLI tool to generate test plans from video recordings using Google's Gemini models.
An implementation of FastVLM/LLaVA or any llm/vlm model using FastAPI (backend) and react js (backend) + Action/Caption mode and frame control
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
This repository contains the frontend code for Ailert.tech build on Next.js, Tailwind CSS, and Python.
AI-Powered Multi-Camera Vision LLM System for Factory Optimization
AI-powered disaster alert system for Pakistan that automatically processes official emergency warnings and delivers location-targeted alerts to communities in real-time.
👀 Monitor camera streams in real-time with AI vision models for object detection, contextual understanding, and intelligent video search.
An AI-powered location discovery system using multi-modal data (text, images, reviews, real-time factors)
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."