Skip to content

AlbericByte/ArqonDB

Repository files navigation

ArqonDB

Build License Stars Rust

AI-native distributed database built from scratch in Rust. ArqonDB unifies key-value storage, vector search (DiskHNSW / SPFresh, PQ-encoded), and temporal graph traversal in a single engine — powered by Raft consensus, LSM-tree compaction, and a sharded metadata plane.

Why ArqonDB

  • Unified engine — KV, vector, and temporal graph in one process. No glue code between three separate systems.
  • 6x faster writes than RocksDB on single-node benchmarks with WAL durability.
  • Built for AI agents — causal graph, reactive state, and CAS primitives designed for agent memory and planning.
  • Pure Rust, zero C++ deps — single static binary, no JNI, no CGO.
  • Production topology — Raft consensus, sharded metadata, stateless gateway, Redis RESP2 compatible.

Highlights

Storage LSM-tree with leveled compaction, MVCC, bloom filters, sharded block cache
Vector DiskHNSW / SPFresh with PQ encoding, distributed fan-out search
Graph Temporal edge traversal (BFS), GraphSST with temperature-based zoning
Consensus Per-shard Raft groups + separate metadata Raft plane
Interfaces gRPC, Redis RESP2, REST management API, React UI
SDKs Python, Java, Rust, Go, C++, Node.js

Performance

ArqonDB matches or outperforms RocksDB on all single-node benchmarks. Both use page-cache WAL durability (sync=false) — ArqonDB reuses its Raft log double-buffer WAL engine for standalone mode.

Benchmark ArqonDB RocksDB Ratio
Sequential write (10K keys) 5.29 ms 33.99 ms 6.4x faster
Sequential read (10K keys) 4.20 ms 9.56 ms 2.3x faster
Random read (10K keys) 5.40 ms 9.01 ms 1.7x faster
Sequential write + flush (100K x 1KB) 105.40 ms 462.95 ms 4.4x faster
cargo bench --bench kv_benchmark

Demo

demo


Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        Clients                              │
└───────────────────────┬─────────────────────────────────────┘
                        │ gRPC
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Gateway (stateless)                      │
│         shard-map cache + leader retry + vector merge       │
└───────────────────────┬─────────────────────────────────────┘
                        │
          ┌─────────────┴──────────────┐
          │                            │
          ▼                            ▼
┌──────────────────┐        ┌──────────────────────────────────┐
│  Metadata Plane  │        │       Data Plane                 │
│  (arqondb-meta) │        │  (arqondb + data-node)          │
│                  │        │                                  │
│  Raft group      │        │  ShardEngine per node            │
│  MetadataState   │        │  LSM-tree per shard              │
│  ShardMap        │        │  HNSW + PQ vector index          │
│                  │        │  Raft per shard group            │
└──────────────────┘        └──────────────────────────────────┘

Three Binaries

Binary Feature Flag Role
metadata_service (none) Standalone metadata Raft group
raft_engine data-node Data node: ShardEngine + gRPC KV server
gateway (none) Stateless routing gateway + management UI

Component Map

src/
├── engine/
│   ├── mem/          # MemTable: skip-list backed, MVCC-ordered
│   ├── sst/          # SST files: data blocks, index blocks, bloom filters
│   ├── wal/          # Write-ahead log: record framing + CRC
│   ├── version/      # VersionSet: LSM level management, compaction
│   ├── background/   # Background compaction and flush tasks
│   ├── vector/       # HNSW + PQ vector index: ANN search per shard
│   └── shard/        # ShardEngine: maps metadata events → local LSM shards
│
├── raft/
│   ├── node.rs       # RaftNode (public handle) + RaftCore (event loop)
│   ├── log.rs        # RaftLog: 1-indexed, sentinel at [0]
│   ├── state.rs      # RaftRole, RaftState transitions
│   └── transport.rs  # Lazy gRPC connections to peers
│
├── metadata/
│   ├── state.rs      # MetadataState: shards, CFs, node registry
│   ├── op.rs         # MetadataOp variants (CreateShard, RegisterNode, …)
│   ├── manager.rs    # MetadataManager: Raft-backed metadata
│   ├── provider.rs   # MetadataProvider trait (local vs remote)
│   └── router.rs     # ShardRouter: (cf, key) → ShardInfo
│
├── network/
│   ├── grpc_service.rs       # KV gRPC service (GrpcKvService + GrpcShardKvService)
│   ├── redis_service.rs      # Redis-compatible TCP server (RESP2 protocol)
│   ├── raft_service.rs       # Raft RPC handler
│   ├── metadata_service.rs   # Metadata gRPC service
│   ├── metadata_client.rs    # MetadataClient (remote MetadataProvider)
│   └── gateway_service.rs    # Stateless routing gateway
│
└── db/
    └── db_impl.rs    # DBImpl: write group, WAL, memtable, compaction

Getting Started

Prerequisites

  • Rust 1.85+ (rustup update stable)
  • protoc is not requiredprotoc-bin-vendored bundles a prebuilt binary

Build

# Library + metadata + gateway binaries
cargo build

# Data node (requires data-node feature)
cargo build --features data-node --bin raft_engine

# All binaries
cargo build --features data-node

# Build the web UI
cd src/ui && npm install && npm run build

Test

# All tests (~920 tests)
cargo test

# Integration tests (20 tests)
cargo test --test integration_test

Redis Protocol

ArqonDB includes a Redis-compatible TCP server (RedisServer) that speaks RESP2 — the same wire protocol used by Redis itself. Any existing Redis client library or redis-cli can connect without modification.

Architecture

RedisServer is generic over the KvOps trait, so it plugs into two different positions:

Option A — inside the Gateway (recommended for production):

  redis-cli ──RESP2──► RedisServer(GatewayService)
                              │
                    metadata shard lookup
                              │
                    ┌─────────▼──────────┐
                    │  data node (leader) │
                    └────────────────────┘

Option B — on a single data node (simple / dev):

  redis-cli ──RESP2──► RedisServer(KvService) ──► local LSM-tree

In Option A the Redis client gets exactly the same routing, leader-retry, and fault-tolerance as gRPC clients — there is no extra hop or intermediate service.

Supported commands

String / key commands

Command Description
SET key value [EX s|PX ms|EXAT ts|PXAT ts|KEEPTTL] [NX|XX] [GET] Store a key/value pair with optional TTL and conditional semantics
GET key Get value, or (nil) if absent or expired
MSET key value [key value …] Set multiple keys
MGET key [key …] Get multiple values (array reply)
GETDEL key Get value then delete the key
STRLEN key Length of stored value (0 if absent)
APPEND key value Merge value into key (append-style merge)
EXISTS key [key …] Count how many of the given keys exist (expired keys not counted)
DEL key [key …] Delete keys; returns count deleted
TYPE key Returns "string" or "none"

TTL / expiry commands

Command Description
EXPIRE key seconds Set expiry in seconds; returns 1 if set, 0 if key not found
PEXPIRE key milliseconds Set expiry in milliseconds
EXPIREAT key unix-time-seconds Set absolute expiry (Unix timestamp in seconds)
PEXPIREAT key unix-time-ms Set absolute expiry (Unix timestamp in milliseconds)
TTL key Remaining seconds; -1 = no expiry, -2 = key not found
PTTL key Remaining milliseconds; -1 = no expiry, -2 = key not found
PERSIST key Remove expiry; returns 1 if removed, 0 if no expiry / no key

Expiry is enforced lazily on reads: expired keys are transparently deleted when accessed and return (nil) / 0 / "none" as appropriate.

TTL metadata is stored in an internal column family (CF 1) so it survives restarts and is replicated through Raft like any other write.

Connection commands

Command Description
PING [message] Returns PONG (or echoes message)
ECHO message Echo the message back
QUIT Close the connection
SELECT db No-op (only SELECT 0 accepted)

Server info commands

Command Description
DBSIZE Returns 0 (full scan not yet implemented)
INFO [section] Returns basic server info
COMMAND COUNT Returns number of supported commands
COMMAND DOCS / INFO Empty array (compatibility shim)
FLUSHDB / FLUSHALL Returns -ERR (destructive; not supported)

Explicitly unsupported (returns descriptive -ERR)

Category Commands
Atomic ops INCR, DECR, SETNX, GETSET, …
Lists LPUSH, RPUSH, LRANGE, …
Hashes HSET, HGET, HMGET, …
Sets SADD, SMEMBERS, …
Sorted sets ZADD, ZRANGE, …
Pub/Sub SUBSCRIBE, PUBLISH, …
Transactions MULTI, EXEC, …
Scripting EVAL, EVALSHA, …
Key iteration KEYS, SCAN

All key commands operate on USER_COLUMN_FAMILY_ID (CF 0).

Quickstart — Gateway mode (recommended)

# Terminal 1: metadata server
cargo run --bin metadata_service

# Terminal 2: data node
META_SERVER=http://127.0.0.1:8379 DATA_NODE_ID=1 \
  DATA_ADDR=http://127.0.0.1:7379 RAFT_ADDR=127.0.0.1:7380 \
  cargo run --features data-node --bin raft_engine -- /tmp/node1 0.0.0.0:7379

# Terminal 3: gateway — enable Redis on port 6379
GATEWAY_META=http://127.0.0.1:8379 \
  GATEWAY_REDIS_ADDR=0.0.0.0:6379 \
  cargo run --bin gateway

# Any terminal: works with redis-cli out of the box
redis-cli -p 6379 SET hello world
redis-cli -p 6379 GET hello   # → "world"
redis-cli -p 6379 DEL hello

Environment variable

Variable Default Description
GATEWAY_REDIS_ADDR 0.0.0.0:6379 TCP address for the Redis-compatible listener

Using RedisServer in code

// Single-node (direct DB access)
use arqondb::{DBImpl, network::{KvService, redis_service::RedisServer}};
let svc = KvService::new(DBImpl::open("/tmp/mydb").unwrap());
RedisServer::new(svc).serve("0.0.0.0:6379").await.unwrap();

// Inside gateway (shard-routed)
use arqondb::network::{gateway_service::GatewayService, redis_service::RedisServer, MetadataClient};
let (meta, _sub) = MetadataClient::connect("http://127.0.0.1:8379".to_string()).await?;
RedisServer::new(GatewayService::new(meta)).serve("0.0.0.0:6379").await.unwrap();

gRPC KV API

The ArqonDb gRPC service (defined in proto/arqondb.proto) exposes the following key-value operations:

RPC Description
Put(PutRequest) Write a single key-value pair
Get(GetRequest) Read a key (returns found=false when absent)
Delete(DeleteRequest) Delete a single key
Merge(MergeRequest) Merge an operand into an existing value
BatchWrite(BatchWriteRequest) Atomically apply a batch of Put/Delete/Merge
Scan(ScanRequest) Scan keys with optional prefix filter (paginated)
DeleteByPrefix(DeleteByPrefixRequest) Delete all keys matching a prefix

DeleteByPrefix

rpc DeleteByPrefix(DeleteByPrefixRequest) returns (DeleteByPrefixResponse);
Field Type Description
cf uint32 Column family id (0 = default user CF)
prefix bytes Key prefix to match (must be non-empty)

Returns deleted (uint32) — the number of keys that were deleted. In distributed mode the gateway fans out to all shards in the column family and aggregates the count.


Vector Index (HNSW + PQ)

ArqonDB includes a built-in HNSW (Hierarchical Navigable Small World) vector index for approximate nearest neighbor (ANN) search, with optional Product Quantization (PQ) for memory-efficient large-scale search. Each node manages named vector indices in memory, accessible via gRPC. The gateway provides distributed vector search: fan-out queries to all nodes hosting an index, then merge results by distance to return the global top-k.

Architecture

                              Client
                                │  VectorSearch(query, k=10)
                                ▼
                      ┌──────────────────┐
                      │     Gateway      │
                      │  index metadata  │──── VectorIndexMeta
                      │  lookup          │     { name, node_ids }
                      └────────┬─────────┘
                   fan-out to index nodes only
                 ┌─────────────┼─────────────┐
                 ▼             ▼             ▼
           ┌──────────┐ ┌──────────┐ ┌──────────┐
           │  Node 1  │ │  Node 2  │ │  Node 3  │
           │  top-10  │ │  top-10  │ │  top-10  │
           └────┬─────┘ └────┬─────┘ └────┬─────┘
                └─────────────┼─────────────┘
                              ▼
                     merge by distance
                      return top-10

  Per node:
        ┌─────────────────────────────────┐
        │         VectorIndexManager       │
        │   manages named indices (HNSW    │
        │   or PQ-HNSW)                    │
        └──────────┬──────────────────────┘
                   │
     ┌─────────────┴─────────────┐
     │                           │
  ┌─────────────┐     ┌────────────────────┐
  │  HnswIndex  │     │   PqHnswIndex      │
  │  Full f32   │     │   HNSW + PQ codes  │
  │  < 100K vec │     │   > 100K vectors   │
  └─────────────┘     └────────────────────┘

Plain HNSW stores full f32 vectors in every graph node. Simple, highest accuracy, but memory-intensive for large datasets.

PQ-HNSW adds Product Quantization on top of HNSW:

  • Construction: uses exact distances for graph quality (no accuracy loss during build)
  • Search: uses Asymmetric Distance Computation (ADC) via PQ codes for fast beam traversal, then reranks top candidates with exact distances
  • Memory: 128D vectors drop from 512 bytes → 32 bytes per vector (16x savings)

Supported distance metrics

Metric Proto value Description
L2 VECTOR_L2 Squared Euclidean distance
Cosine VECTOR_COSINE 1 − cosine similarity
Inner Product VECTOR_INNER_PRODUCT Negative dot product (smaller = more similar)

gRPC API

All vector RPCs are part of the ArqonDb gRPC service defined in proto/arqondb.proto.

Create an index

rpc CreateVectorIndex(CreateVectorIndexRequest) returns (CreateVectorIndexResponse);
Field Type Description
index_name string Unique name for this index
config.dim uint32 Vector dimensionality (required, > 0)
config.metric VectorDistanceMetric Distance metric (default: L2)
config.m uint32 Max connections per graph layer (default: 16)
config.ef_construction uint32 Build-time search width (default: 200, higher = better recall)
config.ef_search uint32 Default query-time search width (default: 64)

Insert / update a vector

rpc VectorPut(VectorPutRequest) returns (VectorPutResponse);

Re-inserting the same vector_id replaces the previous vector.

Delete a vector

rpc VectorDelete(VectorDeleteRequest) returns (VectorDeleteResponse);

Search (k-ANN)

rpc VectorSearch(VectorSearchRequest) returns (VectorSearchResponse);
Field Type Description
query repeated float Query vector (must match index dimension)
k uint32 Number of nearest neighbors to return
ef_search uint32 Override search width for this query (0 = use index default)

Returns a list of VectorSearchResult { id, distance } sorted by distance ascending.

Retrieve a vector

rpc VectorGet(VectorGetRequest) returns (VectorGetResponse);

Drop an index

rpc DropVectorIndex(DropVectorIndexRequest) returns (DropVectorIndexResponse);

Distributed vector search

When running multiple data nodes behind the gateway, vector indexes are automatically distributed:

RPC Routing strategy
CreateVectorIndex Broadcast to all nodes; register index → node_ids in metadata
DropVectorIndex Send to nodes hosting the index; remove from metadata
VectorPut / VectorDelete Hash vector_id to pick one node within the index's node set
VectorGet Hash vector_id to the owning node
VectorSearch Fan-out to all nodes hosting the index, merge by distance, return global top-k

The gateway tracks which nodes host each index via VectorIndexMeta in the metadata Raft group. Only nodes that actually host an index participate in fan-out queries — no unnecessary broadcast to the entire cluster.

Example: Python client

import grpc
from arqondb_pb2 import *
from arqondb_pb2_grpc import ArqonDbStub

channel = grpc.insecure_channel("127.0.0.1:7379")
stub = ArqonDbStub(channel)

# Create a 128-dim L2 index
stub.CreateVectorIndex(CreateVectorIndexRequest(
    index_name="embeddings",
    config=VectorIndexConfig(dim=128, metric=VECTOR_L2),
))

# Insert vectors
for i in range(1000):
    stub.VectorPut(VectorPutRequest(
        index_name="embeddings",
        vector_id=i,
        vector=[float(x) for x in range(128)],  # your embedding here
    ))

# Search
resp = stub.VectorSearch(VectorSearchRequest(
    index_name="embeddings",
    query=[0.0] * 128,
    k=10,
))
for r in resp.results:
    print(f"id={r.id}  distance={r.distance:.4f}")

Using the index in Rust code

Plain HNSW

use arqondb::engine::vector::{HnswConfig, HnswIndex, DistanceMetric};

let config = HnswConfig::new(128, DistanceMetric::Cosine);
let index = HnswIndex::new(config);

index.insert(1, vec![0.1; 128]);
index.insert(2, vec![0.2; 128]);

let results = index.search(&vec![0.15; 128], 5, None);
for r in &results {
    println!("id={} distance={:.4}", r.id, r.distance);
}

PQ-HNSW (memory-efficient)

use arqondb::engine::vector::{
    PqHnswConfig, PqHnswIndex, PqConfig, HnswConfig, DistanceMetric,
};

// Configure HNSW graph + PQ compression
let config = PqHnswConfig {
    hnsw: HnswConfig::new(128, DistanceMetric::L2),
    pq: PqConfig {
        dim: 128,
        num_sub: 32,        // 32 sub-quantizers of 4D each
        num_centroids: 256, // 256 centroids per sub-quantizer → 1 byte per sub
        metric: DistanceMetric::L2,
        max_iter: 20,
    },
    rerank_k: 100, // rerank top-100 ADC candidates with exact distances
};

let index = PqHnswIndex::new(config);

// Insert vectors (graph built with exact distances)
for i in 0..10000u64 {
    index.insert(i, embedding(i));
}

// Train PQ codebooks on the inserted vectors
index.train();

// Search: ADC beam search → exact rerank → top-k
let results = index.search(&query, 10, None);

// Persistence
let bytes = index.to_bytes();
let restored = PqHnswIndex::from_bytes(&bytes).unwrap();

Upgrading an existing HNSW to PQ-HNSW

use arqondb::engine::vector::{HnswIndex, PqHnswIndex, PqConfig, DistanceMetric};

// Start with a plain HNSW index
let hnsw = HnswIndex::new(/* ... */);
// ... insert vectors ...

// Wrap it with PQ (trains codebooks + encodes all vectors)
let pq_config = PqConfig {
    dim: 128, num_sub: 32, num_centroids: 256,
    metric: DistanceMetric::L2, max_iter: 20,
};
let pq_index = PqHnswIndex::from_hnsw(hnsw, pq_config, 100);

HNSW tuning guide

Parameter Default Effect
m 16 Higher = better recall, more memory, slower insert
ef_construction 200 Higher = better graph quality, slower build
ef_search 64 Higher = better recall, slower query. Must be ≥ k
dim 128 Must match your embedding model output dimension

PQ tuning guide

Parameter Default Effect
num_sub 32 Number of sub-quantizers. Higher = less compression, more accuracy. Must divide dim evenly
num_centroids 256 Centroids per sub-quantizer (max 256). Higher = better approximation, slower training
max_iter 20 k-means training iterations. More = better codebooks, slower training
rerank_k 100 Candidates reranked with exact distances. Higher = better recall, slower search. Use 2-4x your typical k

Typical settings by use case:

Use case dim metric m ef_construction ef_search PQ num_sub rerank_k
OpenAI text-embedding-3-small 1536 Cosine 16 200 100 192 (8D each) 200
Sentence-BERT 384 Cosine 16 200 64 48 (8D each) 100
Image embeddings (CLIP) 512 InnerProduct 24 300 128 64 (8D each) 200
Low-latency (< 1ms) any L2 8 100 32 dim/4 50
Memory-constrained (1M+ vectors) 128 L2 16 200 64 32 (4D each) 100

When to use PQ-HNSW vs plain HNSW:

Scenario Recommendation
< 100K vectors Plain HNSW — simpler, no accuracy trade-off
100K–10M vectors PQ-HNSW — 16x memory savings with minimal recall loss
Recall > 99% required Plain HNSW, or PQ-HNSW with high rerank_k
Latency-sensitive search PQ-HNSW — ADC table lookups faster than f32 distance

Client SDKs

ArqonDB provides official gRPC client SDKs for five languages. Each SDK wraps the ArqonDb gRPC service and provides idiomatic APIs for KV operations, batch writes, and vector index management.

Language Path Transport Min Version
Python sdk/python/ grpcio Python 3.9+
Java sdk/java/ grpc-java + Netty Java 17+
Rust sdk/rust/ tonic Rust 1.70+
Go sdk/go/ grpc-go Go 1.21+
C++ sdk/cpp/ gRPC C++ C++17

Quick start (Python)

cd sdk/python && pip install grpcio grpcio-tools protobuf && make proto
from arqondb import ArqonDBClient

with ArqonDBClient("127.0.0.1:7379") as client:
    client.put(b"hello", b"world")
    print(client.get(b"hello"))   # b"world"
    client.delete(b"hello")

Quick start (Java)

cd sdk/java && cp ../../proto/arqondb.proto src/main/proto/ && mvn clean compile
try (ArqonDBClient client = new ArqonDBClient("127.0.0.1", 7379)) {
    client.put("hello".getBytes(), "world".getBytes());
    byte[] val = client.get("hello".getBytes()).orElse(null);
    client.delete("hello".getBytes());
}

Quick start (Rust)

# Cargo.toml
[dependencies]
arqondb-client = { path = "sdk/rust" }
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
let mut client = ArqonDBClient::connect("http://127.0.0.1:7379").await?;
client.put(b"hello", b"world", None).await?;
let val = client.get(b"hello", None).await?;
client.delete(b"hello", None).await?;

Quick start (Go)

cd sdk/go && make proto
client, _ := arqondb.NewClient("127.0.0.1:7379")
defer client.Close()
client.Put(ctx, []byte("hello"), []byte("world"))
val, _ := client.Get(ctx, []byte("hello"))
client.Delete(ctx, []byte("hello"))

Quick start (C++)

cd sdk/cpp && mkdir build && cd build && cmake .. && make
arqondb::Client client("127.0.0.1:7379");
client.put("hello", "world");
auto val = client.get("hello"); // std::optional<std::string>
client.del("hello");

All SDKs support the full API: ping, put, get, delete, merge, batch_write, scan, delete_by_prefix, create_vector_index, drop_vector_index, vector_put, vector_delete, vector_search, vector_get. See each SDK's README for detailed documentation.


Demo: Single-Node Cluster

Start all three services, then use the web console to create column families and run KV operations.

Step 1 — Start the services

# Terminal 1: metadata server
RUST_LOG=info cargo run --bin metadata_service

# Terminal 2: data node
RUST_LOG=info \
  META_SERVER=http://127.0.0.1:8379 \
  DATA_NODE_ID=1 \
  DATA_ADDR=http://127.0.0.1:7379 \
  RAFT_ADDR=127.0.0.1:7380 \
  cargo run --features data-node --bin raft_engine -- /tmp/node1 0.0.0.0:7379

# Terminal 3: gateway  (UI at http://localhost:9380)
RUST_LOG=info \
  GATEWAY_META=http://127.0.0.1:8379 \
  GATEWAY_JWT_SECRET=mysecret \
  GATEWAY_USERS="admin:admin:admin" \
  cargo run --bin gateway

Open http://localhost:9380 — login with admin / admin.

Step 2 — Create column families

In the KV Console, run:

CREATECF metrics
CREATECF logs
CREATECF embeddings

Each command allocates a new CF in the metadata Raft group, and the data node automatically opens a dedicated LSM shard for it.

Step 3 — PUT / GET / MERGE / DELETE / TTL

PUT hello world
GET hello               → "world"

PUT user:alice {"age":30,"city":"Berlin"}
GET user:alice          → "{\"age\":30,\"city\":\"Berlin\"}"
GET user:nobody         → (nil)

PUT  counter 10
MERGE counter 5
MERGE counter 3

DELETE hello
GET hello               → (nil)

Via the Redis-compatible interface (once gateway is running on port 6379):

# Basic TTL
redis-cli SET session:abc token123 EX 3600   # expires in 1 hour
redis-cli TTL session:abc                    # → 3599 (remaining seconds)
redis-cli PTTL session:abc                   # → remaining milliseconds

# Conditional set (NX = only if not exists)
redis-cli SET lock:foo 1 NX EX 30

# Inspect / remove expiry
redis-cli PERSIST session:abc                # → 1 (expiry removed)
redis-cli TTL session:abc                    # → -1 (no expiry)

Control Plane UI

The gateway ships a built-in web management console at GATEWAY_MGMT_ADDR (default 0.0.0.0:9380).

Features

Page Description
Dashboard Live stat cards (status, nodes, shards, column families), per-node shard distribution bars
KV Console Terminal-style REPL — GET, PUT, DELETE, MERGE, CREATECF, DROPCF with command history (↑/↓) and CF selector
Users Create, list, and delete gateway users; role assignment (admin / user)
Cluster SVG network topology graph, node table, shard table, one-click rebalancing
Metrics Parsed Prometheus key metrics (RPC requests/errors, cache hits, WAL bytes) + raw scrape output

Build the UI

cd src/ui
npm install
npm run dev      # Dev server on :5173, proxies /api → :9380
npm run build    # Outputs to src/ui/dist/

After npm run build, the gateway serves the compiled bundle automatically.

Environment Variables

Variable Default Description
GATEWAY_ADDR 0.0.0.0:9379 gRPC listen address
GATEWAY_META http://127.0.0.1:8379 Metadata service URL
GATEWAY_MGMT_ADDR 0.0.0.0:9380 Management HTTP listen address
GATEWAY_UI_DIR src/ui/dist Directory containing the built React app
GATEWAY_JWT_SECRET (unset — auth disabled) HMAC secret for JWT signing
GATEWAY_USERS admin:admin:admin Comma-separated user:pass:role seed list

macOS Installation (launchd)

ArqonDB can be installed as a set of macOS background services that auto-start on login. This uses launchd (the native macOS service manager) to run the metadata server, data node, and gateway as persistent services.

Prerequisites

  1. Build the release binaries (or use --no-build if you already have them):
cargo build --release --features data-node
  1. Install redis-cli for testing. Homebrew does not offer a standalone redis-cli — install the full Redis package:
brew install redis

You do not need to start the Redis server. Only redis-cli (included in the package) is used as a client to connect to ArqonDB.

Install the services

# Full install (builds + installs + starts services)
sudo bash scripts/launchd-install.sh

# If you already built release binaries, skip the build step
sudo bash scripts/launchd-install.sh --no-build

This will:

  1. Copy release binaries to /usr/local/bin/arqondb-{metadata,datanode,gateway}
  2. Create data directories at /usr/local/var/arqondb/
  3. Create log directories at /usr/local/var/log/arqondb/
  4. Install launchd plist files to ~/Library/LaunchAgents/
  5. Start all three services in order (metadata → datanode → gateway)

Verify with redis-cli

Once the services are running, connect with redis-cli:

redis-cli -h 127.0.0.1 -p 6379

# Try some commands
127.0.0.1:6379> SET hello world
OK
127.0.0.1:6379> GET hello
"world"
127.0.0.1:6379> DEL hello
(integer) 1

Troubleshooting: If you see Could not connect to Redis at 127.0.0.1:6379: Connection refused, the gateway may not be running yet. Check status with launchctl list | grep arqondb and inspect logs with tail -f /usr/local/var/log/arqondb/*.log.

Service management

# Check status
launchctl list | grep arqondb

# View logs
tail -f /usr/local/var/log/arqondb/*.log

# Stop a service
launchctl bootout gui/$(id -u)/com.arqondb.gateway

# Start a service
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.arqondb.gateway.plist

# Deploy new code (rebuild + rolling restart)
sudo bash scripts/deploy-local.sh

# Uninstall everything
sudo bash scripts/launchd-uninstall.sh

# Uninstall and remove data
sudo bash scripts/launchd-uninstall.sh --clean

Run a 3-Node Data Cluster

# Start metadata (single node)
cargo run --bin metadata_service

# Start three data nodes
for i in 1 2 3; do
  META_SERVER=http://127.0.0.1:8379 \
  DATA_NODE_ID=$i \
  DATA_ADDR=http://127.0.0.1:$((7379 + (i-1)*10)) \
  RAFT_ADDR=127.0.0.1:$((7380 + (i-1)*10)) \
  cargo run --features data-node --bin raft_engine -- /tmp/node$i 0.0.0.0:$((7379 + (i-1)*10)) &
done

# Start gateway
GATEWAY_META=http://127.0.0.1:8379 cargo run --bin gateway

Key Design Decisions

LSM-Tree Storage Engine

  • MemTable: concurrent-safe skip list with MVCC key ordering ((user_key ASC, seq DESC, type DESC))
  • WAL: record-framed with hardware-accelerated CRC32C, supports fragmentation for large writes; background sync thread for durability
  • SST: data blocks with prefix compression and configurable restart interval; bloom filters per block range; sharded LRU block cache
  • Compaction: leveled compaction, version-set tracks live files and sequence numbers

Raft Consensus

  • RaftCore runs in a single tokio::spawn task — no lock contention on consensus state
  • All messages (proposals, peer RPCs, timer ticks) pass through an mpsc::UnboundedSender<RaftMsg>
  • In-flight proposals tracked by log index in pending: HashMap<u64, oneshot::Sender<ProposeResult>>
  • Heartbeat: 50ms; election timeout: 150–300ms (randomized)

Metadata Plane

  • Separate Raft group manages shard map, column family registry, and node membership
  • Data nodes subscribe to SubscribeShardEvents stream; gateway caches the shard map locally
  • MetadataProvider trait abstracts local (embedded) vs remote (gRPC) metadata

Write Path

  1. Client → Gateway (shard lookup via metadata)
  2. Gateway → Shard leader's data_addr (gRPC)
  3. GrpcShardKvServiceShardEngine::put/get/delete/merge → routes to the correct shard's KvService
  4. KvService::writeRaftNode::propose(WriteBatch::encode_to_bytes())
  5. Raft commits → state machine applies → WAL flush → MemTable insert

Roadmap

  • Compaction: leveled compaction with k-way merge, tombstone GC, merge operator support, TTL-aware expiry, and write-amplification optimization (post-delete skip at base level)
  • Vector index: HNSW index per node for ANN search — L2, Cosine, Inner Product metrics; PQ compression for memory-efficient large-scale search; distributed search via gateway with index-aware fan-out and top-k merge
  • Snapshot & restore: Raft InstallSnapshot for catching up lagging replicas
  • Merge operator: wire through ShardEngine so MERGE reads finalize correctly
  • Benchmarks: write/read throughput vs RocksDB baseline — ArqonDB outperforms RocksDB on all single-node benchmarks including flush-to-SST workloads
  • Client SDKs: gRPC client libraries for Python, Java, Rust, Go, and C++
  • CLI client: arqondb-cli for interactive get/put/scan
  • Control Plane UI: React 18 management console with KV console, cluster topology, metrics
  • ShardEngine gRPC server: data nodes expose ArqonDb gRPC endpoint for gateway routing
  • Redis protocol: RESP2-compatible TCP server — connect any Redis client directly to ArqonDB
  • TTL / expiry: EXPIRE, PEXPIRE, EXPIREAT, PEXPIREAT, TTL, PTTL, PERSIST; SET EX/PX/EXAT/PXAT/NX/XX/GET options

Contributing

Good first issues:

  • Implement DBImpl::iterator (currently todo!())

  • Add DBImpl::flush and compact_range implementations

  • Write benchmarks comparing MemTable throughput to a BTreeMap baseline

Deeper contributions:

  • Add IVF (Inverted File Index) pre-filtering to PQ-HNSW for billion-scale search

How to contribute:

  1. Fork the repo and create a branch
  2. Make your changes with tests
  3. Run cargo test and cargo clippy
  4. Open a PR describing what you changed and why

License

Apache 2.0 — see LICENSE.

About

AI-native distributed database for agent memory and real-time state. Unifies KV, vector search, and temporal graph in a single Rust engine with Raft consensus

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Generated from apache/template-site