A fast, in-memory time-series analytics database written in Rust. Inspired by Facebook's Scuba, Snorkel is designed for real-time analytics on event data with SQL-like queries, automatic data expiration, and horizontal scaling.
- Columnar Storage - Cache-efficient memory layout optimized for analytics queries
- String Dictionary Encoding - Compact storage for repeated string values
- Time-Based Sharding - Data partitioned by time for efficient range queries and TTL expiration
- SQL-Like Queries - Familiar syntax with SELECT, WHERE, GROUP BY, ORDER BY, LIMIT
- Rich Aggregations - COUNT, SUM, AVG, MIN, MAX, PERCENTILE
- JSON Auto-Flattening - Nested JSON objects automatically flattened to dot-notation columns
- Distributed Queries - Fan-out queries across multiple nodes with result merging
- Query Caching - TTL-based result cache with automatic invalidation
- Bloom Filters - Fast shard pruning for equality filters
- SIMD Aggregations - Vectorized single-pass statistics computation
- Web UI - Visual query builder with interactive charts
- No Dependencies - Pure Rust with no external database requirements
# Clone and build
git clone https://github.com/jamra/snorkel.git
cd snorkel
cargo build --release
# Run the server (default port 8080)
cargo run --release
# Or with custom settings
SNORKEL_PORT=9000 SNORKEL_MAX_MEMORY_MB=2048 cargo run --releasecurl -X POST http://localhost:9000/ingest \
-H "Content-Type: application/json" \
-d '{
"table": "events",
"rows": [
{"timestamp": 1708700000000, "event": "click", "user_id": 123, "latency_ms": 45},
{"timestamp": 1708700001000, "event": "view", "user_id": 456, "latency_ms": 120}
]
}'# Count events by type
curl -X POST http://localhost:9000/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT event, COUNT(*), AVG(latency_ms) FROM events GROUP BY event"}'
# Filter and aggregate
curl -X POST http://localhost:9000/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT event, COUNT(*) FROM events WHERE latency_ms > 100 GROUP BY event"}'Nested JSON is automatically flattened on ingest:
# Ingest nested data
curl -X POST http://localhost:9000/ingest \
-H "Content-Type: application/json" \
-d '{
"table": "orders",
"rows": [
{
"timestamp": 1708700000000,
"customer": {"name": "Alice", "tier": "premium"},
"order": {"total": 99.99, "items": 3}
}
]
}'
# Query with dot notation
curl -X POST http://localhost:9000/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT customer.tier, SUM(order.total) FROM orders GROUP BY customer.tier"}'Open http://localhost:9000/ in your browser for the visual query builder:
- Select tables and metrics from dropdowns
- Build filters with point-and-click
- Visualize results as bar, line, or area charts
- View sample data alongside aggregations
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Web UI |
/health |
GET | Health check |
/ingest |
POST | Insert rows |
/query |
POST | Execute SQL query |
/tables |
GET | List all tables |
/tables |
POST | Create table with config |
/tables/:name/schema |
GET | Get table schema |
/tables/:name |
DELETE | Drop table |
/stats |
GET | Server statistics |
Environment variables:
| Variable | Default | Description |
|---|---|---|
SNORKEL_HOST |
0.0.0.0 |
Bind address |
SNORKEL_PORT |
8080 |
Server port |
SNORKEL_MAX_MEMORY_MB |
1024 |
Maximum memory usage |
Run multiple nodes for horizontal scaling. All nodes are equal - any node can coordinate queries.
# Node 1
SNORKEL_PORT=9000 \
SNORKEL_NODE_ID=node-1 \
SNORKEL_PEERS="10.0.0.2:8080,10.0.0.3:8080" \
cargo run
# Node 2
SNORKEL_PORT=8080 \
SNORKEL_NODE_ID=node-2 \
SNORKEL_PEERS="10.0.0.1:9000,10.0.0.3:8080" \
cargo run
# Node 3
SNORKEL_PORT=8080 \
SNORKEL_NODE_ID=node-3 \
SNORKEL_PEERS="10.0.0.1:9000,10.0.0.2:8080" \
cargo runKey points:
- Each node lists all OTHER nodes as peers
- Query any node - it will fan out to all peers and merge results
- Put a load balancer (nginx, HAProxy, k8s Ingress) in front for production
Or use the provided scripts for local testing:
./run_cluster.sh # Start 3-node cluster
./stop_cluster.sh # Stop all nodes
./seed_data.sh # Load sample data┌─────────────────────────────────────────────────────────────────┐
│ HTTP API │
│ (Axum + Tower middleware) │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────────────────────────────────┐ │
│ │Query Cache │ │ Query Engine │ │
│ │ (moka) │◄─┤ Parser → Planner → Executor (parallel) │ │
│ │ TTL-based │ │ │ │
│ └─────────────┘ │ Optimizations: │ │
│ │ • Predicate pushdown (row masks) │ │
│ │ • SIMD aggregations (single-pass) │ │
│ │ • Bloom filter shard pruning │ │
│ └──────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Storage Engine │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Table 1 │ │ Table 2 │ │ Table N │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Shard │ │ │ │ Shard │ │ │ │ Shard │ │ │
│ │ │ • Columns │ │ │ │ • Columns │ │ │ │ • Columns │ │ │
│ │ │ • Bloom │ │ │ │ • Bloom │ │ │ │ • Bloom │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Alert Checker │ │ TTL Expiration │ │ Load Balancer │ │
│ │ (background) │ │ (background) │ │ (cluster mode) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Cluster Mode
┌─────────────────────────────────────────────────────────────────┐
│ Coordinator │
│ (fan-out queries, merge results) │
├──────────────────┬──────────────────┬──────────────────────────┤
│ Node 1 │ Node 2 │ Node N │
│ (local data) │ (local data) │ (local data) │
└──────────────────┴──────────────────┴──────────────────────────┘
SELECT column1, column2, AGG(column3)
FROM table_name
WHERE condition1 AND condition2
GROUP BY column1, column2
ORDER BY column1 [ASC|DESC]
LIMIT nCOUNT(*)/COUNT(column)SUM(column)AVG(column)MIN(column)MAX(column)PERCENTILE(column, 0.95)
- Comparison:
=,!=,>,<,>=,<= - Pattern:
LIKE - Logical:
AND,OR
Snorkel is optimized for high-throughput ingestion and sub-millisecond queries.
| Metric | Result |
|---|---|
| Ingest throughput | 300-390K rows/sec |
| Query latency (avg) | 85-100µs |
| Cache speedup | 287x |
| Bloom filter speedup | 9.5x |
| Memory efficiency | 75 bytes/row |
| Query Type | Average | Min |
|---|---|---|
| COUNT(*) | 95µs | 53µs |
| GROUP BY | 85µs | 53µs |
| AVG aggregation | 89µs | 57µs |
| Multiple aggregates | 85µs | 59µs |
| Top N with ORDER BY | 79µs | 56µs |
- Bloom filters - Prune shards that don't contain searched values (9.5x speedup for selective queries)
- Predicate pushdown - Build row masks once per shard instead of per-row filter evaluation
- SIMD-friendly aggregations - Single-pass statistics computation with auto-vectorization
- Query result caching - TTL-based cache with 287x speedup for repeated queries
- Parallel shard processing - Rayon-based parallel execution across CPU cores
# Start server
cargo run --release &
# Run benchmark (choose scale)
cargo run --release --bin benchmark small # 10K rows/table
cargo run --release --bin benchmark medium # 100K rows/table
cargo run --release --bin benchmark large # 1M rows/table
cargo run --release --bin benchmark xlarge # 10M rows/table# Run tests
cargo test
# Run with logging
RUST_LOG=snorkel=debug cargo run
# Build release
cargo build --releaseMIT