PhoneAgent is an advanced iOS intelligent agent capable of performing natural language tasks on a real iPhone. It leverages Large Language Models (LLMs) to understand user intent and automates device interactions using iOS UI Testing frameworks.
- 🗣️ Voice Interaction:
- Real-time Speech-to-Text inputs.
- High-quality Text-to-Speech using Google Chirp 3 HD (via gRPC) with System fallback.
- "Keep-Alive" mechanism for robust background audio.
- 🤖 App Automation:
- Executes complex workflows (e.g., searching and purchasing on Taobao) across apps.
- Uses
PhoneAgentUITeststo drive the UI interactions.
- 💬 Chat Interface:
- SwiftUI-based chat UI for command input and history.
- Live Activities integration for status updates on the Lock Screen and Dynamic Island.
- ⚙️ Configurable:
- Secure API Key management (LLM & Google) via Keychain.
The project consists of three main components:
- PhoneAgent App: The main interface for the user, handling voice/text input and communicating with the orchestration layer.
- PhoneAgentUITests: The automation engine that runs in a separate process, driving the actual device interactions.
- AppToTestStream: A bidirectional communication channel between the main App and the UITest runner.
- Xcode 15+
- iOS 17+ (Target device)
- AutoGLM API Key (Zhipu AI, OpenAI compatible logic)
- Google Cloud API Key (for Text-to-Speech)
- Clone the repository.
- Open
PhoneAgent.xcodeproj. - Configure Signing & Capabilities for all targets (
PhoneAgent,PhoneAgentWidget, etc.) with your Apple Developer Team. - Build and Run the PhoneAgent scheme on a physical device.
- Launch the app.
- Enter your API Keys when prompted.
- Tap the microphone or type a command (e.g., "Help me buy a coffee").
- The agent will process the request and begin automation.
- SwiftUI & SwiftData
- AVFoundation (Audio)
- gRPC Swift (Network)
- XCUITest (Automation)
This project was inspired by: