Skip to content

SaswatRay2505/ProjectZ

Repository files navigation

NEXUS — Networked Executive for Unified Services

A JARVIS-like personal AI assistant with agentic task execution, voice interface, local machine control, and autonomous browser automation.

Features

Phase 0 — Foundation

  • Chat interface with streaming responses (Next.js + FastAPI + WebSocket)
  • Conversation persistence in PostgreSQL
  • LLM integration via OpenAI API with configurable models

Phase 1 — Task Automation Core

  • Agentic orchestration — ReAct loop with multi-step tool execution
  • LLM Router — automatic model selection with budget-aware downgrade
  • 29 built-in tools with supervised approval for destructive actions
  • Google OAuth2 — Gmail (search, read, send, reply) and Calendar (CRUD) tools
  • File system tools — read, write, list directories

Phase 2 — Knowledge & Recall

  • Web search via SearXNG (self-hosted metasearch) with DDGS fallback
  • Vector memory — ChromaDB-backed semantic recall (episodic + long-term facts)
  • Fact extraction — automatic extraction of user facts from conversations
  • Daily briefings — scheduled summary of calendar, email, and news
  • Scheduled tasks — natural language scheduling ("remind me in 5 minutes", "every day at 9am")
  • Real-time notifications — SSE-based live delivery of scheduled task results

Phase 3 — Advanced Capabilities

  • Voice interface — OpenAI Whisper STT + sentence-streaming TTS with VAD (auto-silence detection)
  • Local machine tools — shell, file manager, git, process manager, clipboard, screenshot, app launcher
  • Browser control — AppleScript/JXA-based Chrome control (navigate, click, type, read, tabs)
  • Autonomous browser automation — vision-powered multi-step browser tasks with GPT-4o:
    • Annotates interactive elements with numbered labels
    • Takes screenshots and reasons about the page
    • Clicks, types, scrolls, and navigates autonomously
    • Hands control back for payment, OTP, CAPTCHA, or login
  • Budget tracking — Redis-backed token/cost monitoring with chat-accessible budget_status tool

Architecture

┌─────────────┐     WebSocket      ┌──────────────────────┐
│  Next.js    │◄──────────────────►│  FastAPI Backend     │
│  Frontend   │                    │  (runs natively)     │
│  :3000      │                    │  :8000               │
└─────────────┘                    └──────┬───────────────┘
                                          │
                    ┌─────────────────────┼─────────────────────┐
                    │                     │                     │
              ┌─────▼─────┐        ┌─────▼─────┐        ┌─────▼─────┐
              │ PostgreSQL │        │   Redis   │        │ ChromaDB  │
              │   :5432    │        │   :6379   │        │   :8100   │
              └────────────┘        └───────────┘        └───────────┘
                                                               │
                                                         ┌─────▼─────┐
                                                         │  SearXNG  │
                                                         │   :9090   │
                                                         └───────────┘

The backend runs natively on the host (not in Docker) to enable direct OS access for local machine tools (shell, clipboard, screenshot, browser control). Infrastructure services remain in Docker.

Quick Start

# 1. Clone and configure
cp .env.example .env
# Edit .env — set OPENAI_API_KEY at minimum

# 2. Create Python virtualenv
cd backend
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# 3. Start infrastructure services
docker compose up -d postgres redis chromadb searxng

# 4. Start the backend natively (enables local machine tools)
./backend/scripts/run-local.sh

# 5. Start the frontend (via Docker)
docker compose up -d nexus-web

# 6. Open browser
open http://localhost:3000

Google OAuth (Gmail + Calendar)

  1. Create OAuth credentials in Google Cloud Console
  2. Add http://localhost:8000/api/oauth/google/callback as an authorized redirect URI
  3. Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET in .env
  4. Visit http://localhost:8000/api/oauth/google/authorize to connect

Chrome Browser Control (macOS)

For the browser_control and browse_web tools to work:

  1. Open Chrome
  2. Go to View > Developer > Allow JavaScript from Apple Events (enable it)

Services

Service URL Runs In
Frontend (Next.js) http://localhost:3000 Docker
Backend API (FastAPI) http://localhost:8000 Native
PostgreSQL localhost:5432 Docker
Redis localhost:6379 Docker
ChromaDB localhost:8100 Docker
SearXNG localhost:9090 Docker

Development

# Backend with hot-reload (local mode)
./backend/scripts/run-local.sh

# Rebuild frontend after changes
docker compose up --build -d nexus-web

# View backend logs
# (output streams to terminal running run-local.sh)

# API docs
open http://localhost:8000/docs

Project Structure

ProjectZ/
├── backend/
│   ├── app/
│   │   ├── api/routes/       # REST + WebSocket endpoints
│   │   ├── core/             # LLM client, orchestrator, streaming
│   │   ├── db/               # Database setup, migrations
│   │   ├── models/           # SQLAlchemy models
│   │   ├── services/         # Business logic (memory, scheduler, etc.)
│   │   └── tools/            # All 29 tool implementations
│   ├── scripts/              # run-local.sh
│   └── pyproject.toml
├── frontend/
│   └── src/
│       ├── app/              # Next.js pages
│       ├── components/       # React components
│       ├── hooks/            # useChat, useVoice, useSchedulerNotifications
│       └── types/            # TypeScript interfaces
├── docker/                   # SearXNG, PostgreSQL, Redis configs
├── docker-compose.yml
└── docs/                     # Architecture documentation

About

This project is an attempt to build state of the ART AI assistant

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors