[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
-
Updated
May 29, 2025 - Python
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
An open-sourced end-to-end VLM-based GUI Agent
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
[AAAI-2026] Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
[AAAI 2026 Oral] Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecks in GUI agents through efficient, guided exploration.
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.
This is the official website for TuriX Computer-use-Agent
The official code for "GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning"
🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
A Practical Zoom-in GUI Grounding and Behavior-Based Evaluation method.
Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"
Create your self-hosted, open-source Operator model.
A think-with-image GUI visual grounding model.
This is a quick test of Chinese Scripting Language powered by AI. You can use it to open any text file. No illegal use is allowed! Free for commercial use and academic use.
This dataset contains 3,167 completed tasks of human-computer interactions captured with video, screenshots, DOM snapshots, and detailed interaction events. Created by Paradigm Shift AI for advancing computer use AI agent research.
🛒 An intelligent shopping agent powered by LLMs. Cross-platform search (JD, Taobao, Vipshop), AI-driven product analysis, and smart scoring reports. 全网比价,AI 决策购物助手。
This is a quick test of Chinese Scripting Language powered by AI. You can use it to open any text file. No illegal use is allowed! Free for commercial use and academic use.
🚀 Generate realistic GUI trajectories using GUI-ReWalk, a framework that enhances automation through reasoning and diverse, high-quality data synthesis.
Add a description, image, and links to the gui-agent topic page so that developers can more easily learn about it.
To associate your repository with the gui-agent topic, visit your repo's landing page and select "manage topics."