Skip to content

durutheguru/Flow

 
 

Repository files navigation

Flow

A peer-to-peer network for storing, sharing, and searching knowledge.

Flow lets you store content locally with full ownership, share it selectively with others, and search across the entire network. Every piece of content is tracked with provenance, and contributions can be rewarded automatically.

What Flow Does

Capability Description
Local-first storage Your data stays on your machine, content-addressed and cryptographically verified
Selective sharing Share specific content with specific people or agents — you control access
Semantic search Find content by meaning, not just keywords — works across local and network data
Provenance tracking Know where content came from and how it's been used
Programmable rewards Define rules for compensating valuable contributions (coming soon)

Concept Diagram

Flow concept

The core idea: Each user controls their own Knowledge Base. They can share parts of it with others on the network. The network enforces access rules and tracks contributions.

Core Principles

Principle Description
Local-first Your data lives on your machine; cloud is optional
Content-addressed Data is identified by its hash (CID), not location
Verifiable Every claim can be cryptographically verified
Capability-based Access controlled by tokens you grant, not central authority
Peer-to-peer Nodes connect directly; no central server required
Provenance-tracked Origin and history of all content is preserved

Architecture

Flow is organized into four architectural groups:

Foundation Layer

Core infrastructure for data and connectivity.

Component Purpose Spec
Storage Content-addressed storage (IPLD/CIDs), RocksDB block store spec
Identity & Auth DIDs, verifiable credentials, capability tokens spec
Network libp2p, peer discovery, GossipSub messaging spec

Intelligence Layer

Understanding and context for content.

Component Purpose Spec
Knowledge Graph Semantic indexing, embeddings, vector search spec
Agents Autonomous actors (Sense→Learn→Reason→Predict→Act) spec
MCP Model Context Protocol for AI/tool integration spec

Execution Layer

Coordination and computation.

Component Purpose Spec
Coordination State sync across nodes, conflict resolution spec
Workflows DAG-based task execution with signed transitions spec
Compute Distributed execution (local, Bacalhau, etc.) spec

Experience Layer

User-facing capabilities.

Component Purpose Spec
User Interface Web/mobile/desktop apps for content management spec
Incentives Programmable rewards and reputation tracking spec

How It Works Today

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Add Files  │ ──► │   Index &   │ ──► │   Search    │
│  to Space   │     │   Embed     │     │  Locally    │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Publish   │ ◄── │  Discover   │ ◄── │   Connect   │
│  to Network │     │    Peers    │     │  to Network │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
                                        ┌─────────────┐
                                        │  Federated  │
                                        │   Search    │
                                        └─────────────┘
  1. Add content to a local space (files are chunked and hashed)
  2. Automatic indexing generates embeddings for semantic search
  3. Search locally using natural language queries
  4. Connect to network and discover peers via mDNS
  5. Publish content to make it available to other nodes
  6. Federated search queries both local and network content

Vision (Coming Soon)

  • 🔮 Autonomous agents that act on your behalf
  • 🔮 DAG-based workflows with verifiable execution
  • 🔮 Programmable rewards for contributions
  • 🔮 Decentralized compute marketplace

Implementation Status

Last Updated: 2026-01-23 | Overall Progress: ~95% of core features complete

Layer Status Description Code Location
Storage ✅ Complete IPLD/CIDs, RocksDB block store, DAG builder, 3 chunking algorithms back-end/node/src/modules/storage/
Access & Auth 🚧 Partial WebAuthn registration/login implemented; DIDs and VCs planned back-end/node/src/modules/identity/
Network ✅ Complete libp2p (Kademlia, GossipSub, mDNS), content transfer protocol, peer registry back-end/node/src/modules/network/
Coordination & Sync ✅ Complete GossipSub pub/sub with persistent message store, content announcements back-end/node/src/modules/network/gossipsub/
Knowledge Graph ✅ Complete Semantic indexing (FastEmbed), Qdrant vector storage, RAG pipeline back-end/node/src/modules/ai/
Distributed Search ✅ Complete SearchRouter, federated search, live query engine, result aggregation back-end/node/src/modules/query/
User Interface ✅ Complete React/Vite web app with spaces, search, content management user-interface/flow-web/
Agent 📋 Planned SLRPA agent framework — next development phase
MCP 📋 Planned Model Context Protocol integration
Execution 📋 Planned DAG workflow engine
Compute 📋 Planned Bacalhau integration for distributed compute
Incentive 📋 Planned Reward and reputation system

What You Can Do Today

  • Store content locally — Files chunked (FastCDC/Fixed/Rabin), hashed (CID), stored in RocksDB
  • Build DAGs — Large files automatically structured as Merkle trees (174-fanout)
  • Semantic search — Natural language queries via FastEmbed + Qdrant
  • Distributed search — Query local spaces + network content in a single request
  • Create spaces — Organize content into named, indexed collections
  • Publish to network — Announce content via DHT + GossipSub
  • Retrieve from network — Fetch content by CID from remote peers
  • Remote indexing — Automatically index discovered network content
  • Peer discovery — mDNS local discovery + Kademlia DHT
  • Web interface — Full-featured React app for content management
  • REST API — 20+ endpoints for programmatic access

Coming Soon

  • 🚧 Decentralized identity (DIDs) and verifiable credentials
  • 📋 Autonomous agents (SLRPA: Sense→Learn→Reason→Predict→Act)
  • 📋 DAG-based workflow execution
  • 📋 Distributed compute marketplace
  • 📋 Programmable incentives and rewards

For detailed roadmap, see specs/ROADMAP.md.


Getting Started

Prerequisites

Before you begin, ensure you have the following installed:

Tool Version Purpose
Rust 1.75+ Backend development
Cargo Latest Rust package manager
Node.js 18+ Frontend development
npm/yarn/pnpm Latest Node package manager
Docker 20+ Running Qdrant and Redis
nx Latest Monorepo task runner

Install nx globally:

npm install -g nx

Quick Start

One Command (after initial setup)

nx start-all flow

This starts everything in parallel: Docker (Qdrant + Redis), backend node, and web UI.

First Time Setup

  1. Clone the repository:

    git clone https://github.com/your-org/flow.git
    cd flow
  2. Copy environment file:

    cp .env.example .env

    Note: .env is gitignored - never commit files containing secrets

  3. Install frontend dependencies:

    nx install-all user-interface
  4. Build the backend:

    nx build back-end
  5. Start all services:

    nx start-all flow
  6. Open your browser:

Stop All Services

nx stop-all flow

Environment Variables

Copy .env.example to .env and configure as needed.

Security Warning

  • Never commit .env files containing secrets to version control
  • The .env file is already in .gitignore - keep it that way
  • Only .env.example (with placeholder values) should be committed
  • If you accidentally commit secrets, rotate them immediately and use git filter-branch or BFG to remove from history

Quick Start Configuration

For most users, these are the only variables you need to set:

# Required - Database
DATABASE_URL="sqlite:///db/data.sqlite?mode=rwc"

# Required - Vector Database (must have Qdrant running)
QDRANT_URL="http://localhost:6334"
QDRANT_SKIP_API_KEY=true  # For local development

# Optional - Server ports (defaults shown)
REST_PORT=8080
WEBSOCKET_PORT=8081

Everything else has sensible defaults. See the full reference below for advanced configuration.


Full Configuration Reference (click to expand)

Database Configuration

Variable Default Description
DATABASE_URL sqlite:///db/data.sqlite?mode=rwc SQLite database connection string
DB_MAX_CONNECTIONS 100 Maximum database connections
DB_MIN_CONNECTIONS 5 Minimum database connections
DB_CONNECT_TIMEOUT 8 Connection timeout in seconds
DB_IDLE_TIMEOUT 600 Idle connection timeout in seconds
DB_MAX_LIFETIME 1800 Maximum connection lifetime in seconds
DB_LOGGING_ENABLED false Enable SQL query logging

Storage Configuration

Variable Default Description
KV_STORE_PATH $HOME/.config/flow/kv Path to RocksDB key-value store

Server Configuration

Variable Default Description
HOST 0.0.0.0 Server bind address
REST_PORT 8080 REST API port
WEBSOCKET_PORT 8081 WebSocket port
CORS_ORIGINS http://localhost:3000,http://localhost:5173 Allowed CORS origins (comma-separated)

Vector Database (Qdrant)

Variable Default Description
QDRANT_URL http://localhost:6334 Qdrant gRPC endpoint
QDRANT_SKIP_API_KEY false Skip API key auth (for local/test instances)

Cache (Redis)

Variable Default Description
REDIS_URL (optional) Redis connection URL for caching indexed content

AI Indexing Configuration

Variable Default Description
VECTOR_SIZE 384 Embedding vector dimensions (FastEmbed default)
MIN_CHUNK_SIZE 10 Minimum chunk size in characters
MAX_CHUNK_SIZE 20000 Maximum chunk size in characters
MAX_FILE_SIZE 10000000 Maximum file size to index (10MB)
EMBED_BATCH_SIZE 20 Batch size for embedding generation
STORAGE_BATCH_SIZE 50 Batch size for Qdrant storage
CONCURRENCY 4 Pipeline concurrency level
RATE_LIMIT_MS 100 Rate limiting between operations
ALLOWED_EXTENSIONS md,rs,txt,... File extensions to index (comma-separated)
EXCLUDE_PATTERNS (empty) Patterns to exclude from indexing
MAX_FAILURE_RATE 0.5 Maximum failure rate before stopping
MIN_SAMPLE_SIZE 10 Minimum samples for failure rate calculation

AI Metadata Generation (Optional)

These features require Ollama with the llama3.2:3b model installed.

Variable Default Description
ENABLE_METADATA_QA false Generate Q&A pairs for chunks
ENABLE_METADATA_SUMMARY false Generate summaries for chunks
ENABLE_METADATA_KEYWORDS false Extract keywords from chunks

Distributed Search

Variable Default Description
FLOW_SEARCH_RESPOND_TO_QUERIES true Respond to network search queries
FLOW_SEARCH_MAX_RESULTS_PER_QUERY 10 Max results per search query
FLOW_SEARCH_PEER_COUNT 5 Number of peers to query
FLOW_SEARCH_NETWORK_TIMEOUT_MS 5000 Network query timeout
FLOW_SEARCH_RETRY_ENABLED true Enable query retries
FLOW_SEARCH_CACHE_TTL_SECS 300 Search cache TTL
FLOW_SEARCH_CACHE_MAX_ENTRIES 1000 Maximum cache entries

P2P Network Configuration

Variable Default Description
GOSSIPSUB_ENABLED true Enable GossipSub protocol
GOSSIPSUB_HEARTBEAT_INTERVAL_MS 1000 Heartbeat interval
GOSSIPSUB_MESH_N 6 Target mesh size
GOSSIPSUB_MESH_N_LOW 4 Minimum mesh size
GOSSIPSUB_MESH_N_HIGH 12 Maximum mesh size
GOSSIPSUB_MAX_MESSAGE_SIZE 65536 Maximum message size
GOSSIPSUB_VALIDATE_SIGNATURES true Validate message signatures
GOSSIP_MESSAGE_DB_PATH $HOME/.config/flow/gossip Message store path
MESSAGE_STORE_ENABLED true Enable message persistence
MESSAGE_STORE_MAX_PER_TOPIC 10000 Max messages per topic
MESSAGE_STORE_MAX_TOTAL 100000 Max total messages
MESSAGE_STORE_CLEANUP_INTERVAL 60 Cleanup interval in seconds
NETWORK_MDNS_QUERY_INTERVAL 5 mDNS discovery interval

Peer Registry

Variable Default Description
PEER_REGISTRY_DB_PATH $HOME/.config/flow/peers Peer registry database path
PEER_REGISTRY_FLUSH_INTERVAL 30 Flush interval in seconds
PEER_REGISTRY_MAX_FAILURES 5 Max failures before peer removal
PEER_REGISTRY_TTL_SECS 86400 Peer TTL (24 hours)

WebAuthn Configuration

Variable Default Description
WEBAUTHN_RP_ID localhost Relying Party ID
WEBAUTHN_RP_ORIGIN http://localhost:3000 Relying Party origin
WEBAUTHN_RP_NAME Flow WebAuthn Relying Party display name

Logging

Variable Default Description
RUST_LOG info Log level (error, warn, info, debug, trace)
LOG_LEVEL info Alternative log level variable

You can also set per-module log levels:

RUST_LOG=node::modules::network=debug,node::modules::ai=trace

Available Commands

The project uses nx to manage the workspace.

Workspace Commands (flow)

Command Description
nx start-all flow Start all services (backend + docker + frontend)
nx stop-all flow Stop all services
nx docker-up flow Start Qdrant + Redis containers
nx docker-down flow Stop Qdrant + Redis containers

Back-End Commands (back-end)

Command Description
nx run-node back-end Run the Flow node
nx stop back-end Stop the node
nx build back-end Build for production
nx test back-end Run tests
nx docker-up back-end Start Docker containers
nx docker-down back-end Stop Docker containers
nx docker-logs back-end Tail container logs

User Interface Commands (user-interface)

Command Description
nx install-all user-interface Install all UI dependencies
nx dev-web user-interface Run web app in dev mode
nx dev-mobile user-interface Run mobile app in dev mode
nx dev-desktop user-interface Run desktop app in dev mode
nx build-all user-interface Build all UI applications
nx stop user-interface Stop all frontend processes

Docker Services

The backend requires two Docker services:

Qdrant (Vector Database)

  • Image: qdrant/qdrant:latest
  • Ports: 6333 (HTTP), 6334 (gRPC)
  • Purpose: Stores embeddings for semantic search

Redis (Cache)

  • Image: redis:7-alpine
  • Port: 6379
  • Purpose: Caches indexed content to avoid re-processing

Start both with:

nx docker-up back-end

Or manually:

cd back-end
docker-compose up -d

Running Tests

Backend Tests

# Run all tests
nx test back-end

# Run specific test
cd back-end/node
cargo test test_name

# Run integration tests (requires Docker)
cargo test indexing_e2e -- --ignored --test-threads=1

Frontend Tests

cd user-interface/flow-web
npm test

Troubleshooting

Docker containers won't start

# Check if ports are in use
lsof -i :6333
lsof -i :6334
lsof -i :6379

# Remove old containers
docker rm -f qdrant redis

Qdrant connection errors

  1. Ensure Docker is running: docker ps
  2. Check Qdrant logs: docker logs qdrant
  3. For local testing without API key, set: QDRANT_SKIP_API_KEY=true

Database errors

  1. Ensure the database directory exists
  2. Check permissions on the SQLite file
  3. Try removing and recreating: rm -rf db/ then restart

Indexing not working

  1. Check Qdrant is running: curl http://localhost:6333/health
  2. Check allowed extensions include your file types
  3. Review logs: RUST_LOG=debug nx run-node back-end

Frontend can't connect to backend

  1. Ensure backend is running on port 8080
  2. Check CORS_ORIGINS includes your frontend URL
  3. Try: CORS_ORIGINS="http://localhost:5173" nx run-node back-end

Contributing

Go over the Contributing guide to learn how you can contribute.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Where to get help?

Join the Discord community and chat with the development team: here

About

The Flow Repository.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 85.2%
  • TypeScript 12.7%
  • CSS 1.6%
  • JavaScript 0.5%
  • Batchfile 0.0%
  • Shell 0.0%