AINode

Turn any NVIDIA GPU into your local AI platform.

Chat, API, and fine-tuning in your browser. One command to start. Automatic multi-node clustering. Apache 2.0.

Install in 30 seconds

curl -fsSL https://ainode.dev/install | bash

That's it. Pulls the unified container image, registers a systemd service, and opens the chat UI at http://localhost:3000.

Distributed (multi-node) install:

AINODE_PEERS="10.0.0.2,10.0.0.3" curl -fsSL https://ainode.dev/install | bash

What it looks like

Distributed inference across two DGX Sparks

Two DGX Sparks, one sharded model, 244 GB aggregated VRAM. NCCL over RoCE at 200 Gbps on ConnectX-7.

Chat — streaming, OpenAI-compatible

API console — LM Studio style, live request tap

50+ models — Hugging Face catalog, one click

Fine-tune from the browser

Cluster config — set TP / PP, pick the fabric, go

Projects in this org

Repo	What it is
ainode	Product source — CLI, API, web UI, engines, training
ainode.dev	Marketing site at ainode.dev

Container images

Public on both GHCR (canonical) and Docker Hub (mirror):

# GHCR — used by the installer, no rate limits
docker pull ghcr.io/getainode/ainode:latest

# Docker Hub — mirror, for discoverability
docker pull argentaios/ainode:latest

Why this exists

Most "local AI" tooling bails out the moment your model is bigger than one GPU. The moment you need two, you're writing Ray configs, debugging NCCL, patching vLLM, and wiring SSH bootstrap — by hand, at 2 AM.

AINode bundles that entire stack into a single container and turns multi-node inference into a UI checkbox:

Auto-discovery over UDP on your cluster subnet
Tensor-parallel sharding across every GPU the cluster sees
Ray head + worker formation via eugr's launcher
NCCL over RoCE when ConnectX-7 + RDMA are present
Graceful fallback to single-node when the cluster shrinks

If you have one GB10 or ten of them, you run the same install command, and the thing just adds up the VRAM.

Built on giants

AINode doesn't reinvent inference — it composes the best OSS runtimes and makes them boring to operate:

vLLM — the inference engine
Ray — cross-node orchestration
NCCL — patched for 3-node ring on GB10
eugr/spark-vllm-docker — the blessed GB10 base image
Hugging Face — model catalog

Status

v0.4.0 shipped (April 2026). Distributed TP=2 verified on real GB10 hardware. See the full state-of-play in the product README — including what works, what doesn't, and the lessons learned getting to "it just runs."

Powered by argentos.ai · Apache 2.0 · Made with NVIDIA GB10

_{If this saved you a weekend, consider sponsoring the work.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AINode

AINode

Turn any NVIDIA GPU into your local AI platform.

Install in 30 seconds

What it looks like

Distributed inference across two DGX Sparks

Chat — streaming, OpenAI-compatible

API console — LM Studio style, live request tap

50+ models — Hugging Face catalog, one click

Fine-tune from the browser

Cluster config — set TP / PP, pick the fabric, go

Projects in this org

Container images

Why this exists

Built on giants

Status

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!