AIMA

AI Inference Managed by AI — A single Go binary that detects hardware, resolves optimal configs from a YAML knowledge base, deploys inference engines via K3S, and exposes 94 MCP tools for AI Agents to operate everything.

Features

Zero-config hardware detection — automatically discovers GPUs (NVIDIA, AMD, Huawei Ascend, Hygon DCU, Apple Silicon), CPU, and RAM
Knowledge-driven deployment — YAML catalog of hardware profiles, engines, models, and partition strategies; no engine-specific code branches
Multi-runtime — K3S (Pod) for clusters, Docker for single-node containers, Native (exec) for bare-metal inference
94 MCP tools — full programmatic control for AI Agents over hardware, models, engines, deployments, fleet, and more
Fleet management — mDNS-based auto-discovery of LAN peers; remote tool execution across heterogeneous devices
Offline-first — all core functions work with zero network; network is enhancement, not requirement
Single binary, zero CGO — cross-compiles to Windows, macOS, Linux (amd64/arm64) with no C dependencies

Quick Start

Download

Grab a pre-built binary from the Releases page, or build from source:

git clone https://github.com/Approaching-AI/AIMA.git
cd aima
make build

For published product releases, the binary installer can be one line:

curl -fsSL https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.sh | sh

On Windows PowerShell:

irm https://raw.githubusercontent.com/Approaching-AI/AIMA/master/install.ps1 | iex

Notes:

The installer resolves the latest installable vX.Y.Z product release instead of GitHub's latest release, because bundle tags such as bundle/stack/2026-02-26 are not product binaries.
If tags are ahead of published binaries, the installer warns and stays on the latest installable release until the new assets are uploaded.
Override the source repo for forks with AIMA_REPO=<owner>/<repo>.
Pin a release with AIMA_VERSION=v0.2.0.
Windows installer currently targets windows/amd64 and installs to %LOCALAPPDATA%\\Programs\\AIMA.

Server Setup (Linux)

# 1. Detect your hardware
aima hal detect

# 2. Initialize infrastructure (installs K3S + HAMi + aima-serve daemon)
#    Downloads airgap images for offline container startup.
#    Requires root for systemd service installation.
sudo aima init

# 3. Deploy a model (auto-resolves engine + config for your hardware)
aima deploy apply --model qwen3.5-35b-a3b

After aima init, three components are running as systemd services:

Component	What it does
K3S	Container orchestration (containerd, airgap images pre-loaded)
HAMi	GPU virtualization for multi-model sharing (skipped on unsupported hardware)
aima-serve	API server on `0.0.0.0:6188` with mDNS broadcast

The server is now discoverable on the LAN and ready to serve inference requests.

Client Usage (Any Platform)

On another device with the AIMA binary — no init or serve needed:

# Discover servers on the LAN via mDNS (no IP needed)
aima discover

# List all discovered AIMA devices
aima fleet devices

# Query a remote device
aima fleet exec <device-id> hardware.detect
aima fleet exec <device-id> deploy.list

# Call the OpenAI-compatible API directly
curl http://<server-ip>:6188/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5-35b-a3b","messages":[{"role":"user","content":"hello"}]}'

Web UI

Every AIMA server hosts a built-in Web UI at http://<server-ip>:6188/ui/.

To discover the server IP first: aima discover.

To get a Fleet dashboard that auto-discovers all LAN peers, run aima serve --discover on your own device and open http://localhost:6188/ui/.

Security

aima init starts the server without authentication (LAN trust model). To enable API key authentication:

# Set API key (hot-reloads, no restart needed)
aima config set api_key <your-key>

# All API/MCP/Fleet requests now require: Authorization: Bearer <your-key>
# Web UI will prompt for the key automatically.

# Remote fleet commands with authentication
aima fleet devices --api-key <your-key>

Supported Hardware

Vendor	Tested Devices	SDK
NVIDIA	RTX 4060, RTX 4090, GB10 (Grace Blackwell)	CUDA
AMD	Radeon 8060S (RDNA 3.5), Ryzen AI MAX+ 395	ROCm / Vulkan
Huawei	Ascend 910B1 (8× 64GB HBM, Kunpeng-920 aarch64)	CANN
Hygon	BW150 DCU (8× 64GB HBM)	DCU
Apple	M4	Metal
Intel	CPU-only	—

Supported Engines

Engine	GPU Support	Format
vLLM	NVIDIA CUDA, AMD ROCm, Hygon DCU	Safetensors
llama.cpp	NVIDIA CUDA, AMD Vulkan, Apple Metal, CPU	GGUF
SGLang	NVIDIA CUDA, Huawei Ascend (CANN)	Safetensors
Ollama	All (via llama.cpp)	GGUF

Architecture

AIMA follows a layered intelligence architecture (L0-L3):

L0 — YAML knowledge base defaults
L1 — Human CLI overrides
L2 — Golden configs from benchmark history
L3a — Go Agent loop (tool-calling LLM)

The system is built around four invariants: no code branches for engine/model types (YAML-driven), no container lifecycle management (K3S handles it), MCP tools as the single source of truth, and offline-first operation.

See design/ARCHITECTURE.md for the full architecture document.

Project Structure

cmd/aima/          Entry point + dependency wiring split by domain
internal/
  hal/             Hardware detection
  knowledge/       YAML knowledge base + SQLite resolver
  runtime/         K3S (Pod) + Docker (container) + Native (exec) runtimes
  mcp/             MCP server + 94 MCP tool registrations/implementations
  agent/           Go Agent loop (L3a)
  cli/             Cobra CLI (thin wrappers over MCP tools)
  ui/              Embedded Web UI (Alpine.js SPA)
  proxy/           OpenAI-compatible HTTP proxy
  fleet/           mDNS fleet discovery + remote execution
  sqlite.go        SQLite state store (`package state`, modernc.org/sqlite, zero CGO)
  model/           Model scan/download/import + metadata detection
  engine/          Engine image management
  stack/           K3S + HAMi infrastructure installer
catalog/
  hardware/        Hardware profile YAML
  engines/         Engine asset YAML
  models/          Model asset YAML
  partitions/      Partition strategy YAML
  stack/           Stack component YAML

Building

Local build

make build
# Output: build/aima (or build/aima.exe on Windows)

Cross-compile all platforms

make all
# Output:
#   build/aima.exe          (windows/amd64)
#   build/aima-darwin-arm64 (macOS/arm64)
#   build/aima-linux-arm64  (linux/arm64)
#   build/aima-linux-amd64  (linux/amd64)

Package GitHub release assets

make release-assets
# Output:
#   build/release/<version>/aima-darwin-arm64
#   build/release/<version>/aima-linux-amd64
#   build/release/<version>/aima-linux-arm64
#   build/release/<version>/aima-windows-amd64.exe
#   build/release/<version>/checksums.txt

To upload those assets to the matching GitHub release with gh:

make publish-release-assets

Annotated SemVer tag pushes such as v0.2.1 also trigger .github/workflows/release.yml, which builds the same assets and uploads them automatically.

Run tests

go test ./...

License

Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
artifacts		artifacts
catalog		catalog
cmd		cmd
containers/qwen-tts-fastapi		containers/qwen-tts-fastapi
design		design
docs		docs
internal		internal
packaging/icons		packaging/icons
scripts		scripts
tmp		tmp
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh.md		README_zh.md
go.mod		go.mod
go.sum		go.sum
install.ps1		install.ps1
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIMA

Features

Quick Start

Download

Server Setup (Linux)

Client Usage (Any Platform)

Web UI

Security

Supported Hardware

Supported Engines

Architecture

Project Structure

Building

Local build

Cross-compile all platforms

Package GitHub release assets

Run tests

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AIMA

Features

Quick Start

Download

Server Setup (Linux)

Client Usage (Any Platform)

Web UI

Security

Supported Hardware

Supported Engines

Architecture

Project Structure

Building

Local build

Cross-compile all platforms

Package GitHub release assets

Run tests

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages