Skip to content

Advanced iOS Agent framework leveraging AutoGLM and XCUITest for complex on-device automation and voice interaction.

License

Notifications You must be signed in to change notification settings

GeoLibra/PhoneAgent

Repository files navigation

PhoneAgent (Hashira)

PhoneAgent is an advanced iOS intelligent agent capable of performing natural language tasks on a real iPhone. It leverages Large Language Models (LLMs) to understand user intent and automates device interactions using iOS UI Testing frameworks.

demo

Features

  • 🗣️ Voice Interaction:
    • Real-time Speech-to-Text inputs.
    • High-quality Text-to-Speech using Google Chirp 3 HD (via gRPC) with System fallback.
    • "Keep-Alive" mechanism for robust background audio.
  • 🤖 App Automation:
    • Executes complex workflows (e.g., searching and purchasing on Taobao) across apps.
    • Uses PhoneAgentUITests to drive the UI interactions.
  • 💬 Chat Interface:
    • SwiftUI-based chat UI for command input and history.
    • Live Activities integration for status updates on the Lock Screen and Dynamic Island.
  • ⚙️ Configurable:
    • Secure API Key management (LLM & Google) via Keychain.

Architecture

The project consists of three main components:

  1. PhoneAgent App: The main interface for the user, handling voice/text input and communicating with the orchestration layer.
  2. PhoneAgentUITests: The automation engine that runs in a separate process, driving the actual device interactions.
  3. AppToTestStream: A bidirectional communication channel between the main App and the UITest runner.

Prerequisites

  • Xcode 15+
  • iOS 17+ (Target device)
  • AutoGLM API Key (Zhipu AI, OpenAI compatible logic)
  • Google Cloud API Key (for Text-to-Speech)

Setup

  1. Clone the repository.
  2. Open PhoneAgent.xcodeproj.
  3. Configure Signing & Capabilities for all targets (PhoneAgent, PhoneAgentWidget, etc.) with your Apple Developer Team.
  4. Build and Run the PhoneAgent scheme on a physical device.

Usage

  1. Launch the app.
  2. Enter your API Keys when prompted.
  3. Tap the microphone or type a command (e.g., "Help me buy a coffee").
  4. The agent will process the request and begin automation.

Tech Stack

  • SwiftUI & SwiftData
  • AVFoundation (Audio)
  • gRPC Swift (Network)
  • XCUITest (Automation)

Acknowledgements

This project was inspired by:

About

Advanced iOS Agent framework leveraging AutoGLM and XCUITest for complex on-device automation and voice interaction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages