Skip to content
/ reflux Public

An ultra-fast OCI distribution proxy and caching engine designed to scale AI image delivery across massive HPC-K8s clusters.

License

Notifications You must be signed in to change notification settings

scitix/reflux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reflux

Go Containerd Kubernetes License

An ultra-fast OCI distribution proxy and caching engine designed to scale AI image delivery across massive HPC-K8s clusters.

中文 | English

✨ Features

  • 🚀 High Performance: Optimized proxying with intelligent caching based on Containerd
  • 🔄 Dual Mode Architecture:
    • Proxy Mode: Transparent proxy with blob request optimization
    • SuperNode Mode: Pull images from Registry and store to distributed shared storage
  • 📦 OCI v2 API Compatible: Full support for OCI Registry API v2 specification
  • 🌊 JSONL Streaming: Custom streaming protocol for distributed data retrieval
  • 📦 Chunked Storage: Concurrent downloads with incremental updates on distributed storage
  • 🔍 Smart Caching: On-demand pulling with intelligent cache management
  • 🎯 Image Warmup: CRD-based image preheating from Registry for SuperNode mode
  • 🐳 Kubernetes Native: Helm charts with RBAC and leader election
  • 💾 Distributed Storage: Native support for POSIX-compatible distributed shared storage systems
  • 🔧 Containerd Integration: Seamless integration with Containerd runtime

🏗️ Architecture Overview

Reflux adopts a unified service architecture that distinguishes roles through the --blob-mode parameter, built on Containerd with distributed shared storage:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Containerd/Node │────│   Proxy Mode    │────│  SuperNode Mode │
│                 │    │                 │    │                 │
│ nerdctl/ctr pull│    │ • Transparent   │    │ • Pull from     │
│ nerdctl/ctr push│    │ • Optimized     │    │ • Registry      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │                        │
                              └────────────────────────┘
                                       │
                                ┌─────────────────┐
                                │ Distributed     │
                                │ Shared Storage  │
                                │                 │
                                │ POSIX-compatible │
                                └─────────────────┘
                                       │
                                ┌─────────────────┐
                                │   Registry      │
                                │   (Harbor/etc.) │
                                │                 │
                                │ Private/Upstream │
                                └─────────────────┘

See more details

📈 Performance Benchmark

Production Test Results (400-node AI-HPC Cluster)

Benchmark Results

  • Test Scenario: 400 nodes concurrent pulling 20GB AI training model image
  • Client Download Bandwidth: 77.4 GBps total
  • SuperNode Upstream Bandwidth: 253 MBps (Harbor Registry)
  • Full Image Download Time: 3 minutes
  • Efficiency: 306x concurrent throughput improvement

🚀 Quick Start

Prerequisites

  • Go 1.19+
  • Containerd (for containerized deployment)
  • Kubernetes 1.19+ (for Kubernetes deployment)
  • Distributed shared storage (Ceph, NFS, S3, etc.) - Required for SuperNode mode

Build

# Clone repository
git clone <repository-url>
cd reflux

# Build proxy server
make build-proxy

# Build controller
make build-controller

# Build all components
make all

Run Proxy Mode

./bin/reflux-proxy --blob-mode=proxy --http-port=8080 --upstream=https://harbor.example.com --storage=/mnt/shared-storage/reflux

Run SuperNode Mode

# Run proxy server
./bin/reflux-proxy --blob-mode=supernode --http-port=8080 --upstream=https://harbor.example.com --storage=/mnt/shared-storage/reflux

# Run controller (with leader election)
./bin/reflux-controller --blob-mode=supernode --storage=/mnt/shared-storage/reflux --leader-elect

🚀 Deployment

Containerd Integration

Reflux is designed to work seamlessly with Containerd. Autoconfigure your Containerd to use Reflux as a mirror:

# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/etc/containerd/certs.d"


# /etc/containerd/certs.d/harbor.example.com
server = "https://harbor.example.com"
  [host."https://localhost:9980"]
  capabilities = ["pull"]
  skip_verify = true
  [host."https://localhost:9980".header]
    x-custom-2 = ["value1", "value2"]

Generate a self-signed certificate

# Generate a private key
openssl genrsa -out tls.key 2048

# Generate a Certificate Signing Request (CSR) with the specified subject
openssl req -new -key tls.key -out tls.csr -subj "/CN=reflux"

# Generate a self-signed certificate valid for 365 days
openssl x509 -req -in tls.csr -signkey tls.key -out tls.crt -days 365

# Create a Kubernetes TLS Secret
kubectl create secret tls reflux-tls --cert=tls.crt --key=tls.key -n reflux-system

# Clean up temporary files (optional)
rm tls.key tls.csr tls.crt

Kubernetes (Helm)

cd deploy/reflux
helm install reflux .

🚀 Use Cases

AI-HPC Kubernetes Clusters

Reflux is specifically designed for AI and High-Performance Computing (HPC) environments where:

  • Large-scale model images: Efficiently cache and distribute large AI/ML model containers
  • Distributed training: Ensure consistent image availability across compute nodes
  • High concurrency: Handle simultaneous image pulls during job scheduling
  • Network optimization: Reduce bandwidth usage and improve pull speeds

About

An ultra-fast OCI distribution proxy and caching engine designed to scale AI image delivery across massive HPC-K8s clusters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages