- Overview
- Package Structure
- Dual Storage Engine
- Raft Storage Layer Deep Dive
- Data Flow
- Key Component Relationships
MetaStore is a lightweight distributed KV storage system based on the etcd Raft consensus protocol. It supports two storage engines:
- Memory Mode (Memory + WAL) - Default mode, fast and lightweight
- RocksDB Mode - Full persistence, suitable for large datasets
┌─────────────────────────────────────────────────┐
│ HTTP REST API │
│ GET/PUT/POST/DELETE /key │
└──────────────────┬──────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────┐
│ KV Store Layer (Application Layer) │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ Memory KV Store │ │ RocksDB KV Store │ │
│ │ (Memory Mode) │ │ (RocksDB Mode) │ │
│ └──────────────────┘ └──────────────────────┘ │
└──────────────────┬──────────────────────────────┘
│
↓ Committed via Raft
┌─────────────────────────────────────────────────┐
│ Raft Consensus Layer (Consensus Layer) │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ raftNode │ │ raftNodeRocks │ │
│ │ (Memory Node) │ │ (RocksDB Node) │ │
│ └──────────────────┘ └──────────────────────┘ │
└──────────────────┬──────────────────────────────┘
│
↓ Raft Log Storage
┌─────────────────────────────────────────────────┐
│ Raft Storage Layer (Raft Storage) │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ MemoryStorage │ │ RocksDBStorage │ │
│ │ + WAL │ │ (raftlog.go) │ │
│ └──────────────────┘ └──────────────────────┘ │
└─────────────────────────────────────────────────┘
internal/
├── kvstore/ # Interface Definition Layer
│ └── store.go # Store interface + Commit/KV types
│
├── memory/ # Memory Implementation Layer
│ ├── kvstore.go # Memory KV store implementation
│ └── kvstore_test.go # Unit tests
│
├── rocksdb/ # RocksDB Implementation Layer
│ ├── kvstore.go # RocksDB KV store (application data)
│ ├── raftlog.go # RocksDB Raft storage (Raft internal data) ⭐
│ └── raftlog_test.go # Raft storage tests
│
├── raft/ # Raft Consensus Layer
│ ├── node.go # Memory mode Raft node
│ ├── node_rocksdb.go # RocksDB mode Raft node
│ ├── node_test.go # Raft tests
│ └── listener.go # Network listener
│
└── http/ # HTTP API Layer
└── api.go # REST API handler
| Package | Responsibility | Dependencies | Key Types |
|---|---|---|---|
kvstore |
Define KV store interface | None | Store, Commit, KV |
memory |
Implement memory KV store | kvstore |
Memory |
rocksdb |
Implement RocksDB KV + Raft storage | kvstore |
RocksDB, RocksDBStorage |
raft |
Implement Raft consensus protocol | kvstore, rocksdb |
raftNode, raftNodeRocks |
http |
Provide HTTP REST API | kvstore |
httpKVAPI |
| Feature | Memory Mode (Memory + WAL) | RocksDB Mode |
|---|---|---|
| Application KV Storage | internal/memory/kvstore.go |
internal/rocksdb/kvstore.go |
| Raft Node | internal/raft/node.go |
internal/raft/node_rocksdb.go |
| Raft Log Storage | raft.MemoryStorage (etcd) |
rocksdb.RocksDBStorage ⭐ |
| WAL Persistence | wal.WAL (etcd) |
✅ Built-in RocksDB |
| Snapshot Storage | Filesystem | RocksDB |
| Data Location | Memory + WAL files | All in RocksDB |
| CLI Flag | --storage=memory |
--storage=rocksdb |
| Use Case | Fast, lightweight deployment | Large datasets, full persistence |
┌─────────────────────────────────────────────────┐
│ internal/memory/kvstore.go │
│ Memory │
│ (User KV data stored in memory) │
└──────────────────┬──────────────────────────────┘
↓ Propose to Raft
┌─────────────────────────────────────────────────┐
│ internal/raft/node.go │
│ raftNode │
│ (Raft consensus node) │
└──────────────────┬──────────────────────────────┘
↓ Raft log storage
┌─────────────────────────────────────────────────┐
│ raft.MemoryStorage (etcd built-in) │
│ (Raft logs stored in memory) │
│ + │
│ wal.WAL (etcd built-in) │
│ (WAL file persistence) │
└──────────────────┬──────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Memory + WAL files + Snapshot files │
│ Directory: ./metaStore-{id}/ │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ internal/rocksdb/kvstore.go │
│ RocksDB │
│ (User KV data, key prefix: kv_data_) │
└──────────────────┬──────────────────────────────┘
↓ Propose to Raft
┌─────────────────────────────────────────────────┐
│ internal/raft/node_rocksdb.go │
│ raftNodeRocks │
│ (Raft consensus node) │
└──────────────────┬──────────────────────────────┘
↓ Raft log storage
┌─────────────────────────────────────────────────┐
│ internal/rocksdb/raftlog.go ⭐ │
│ RocksDBStorage │
│ (Raft log data, key prefix: raft_log_, etc.) │
│ Replaces MemoryStorage + WAL combination │
└──────────────────┬──────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ RocksDB Database (all data) │
│ Directory: ./data/{id}/ │
│ │
│ Contains: │
│ - User KV data (kv_data_*) │
│ - Raft logs (raft_log_*) │
│ - Raft HardState (hard_state) │
│ - Raft ConfState (conf_state) │
│ - Snapshot metadata (snapshot_meta) │
└─────────────────────────────────────────────────┘
This is the most confusing part of the project!
raftlog.go implements the raft.Storage interface, providing Raft log storage for RocksDB mode.
-
etcd Raft Library Requirement
- etcd Raft library requires a storage backend that implements
raft.Storageinterface - etcd provides
raft.MemoryStorage(in-memory implementation) - But the project needs RocksDB persistence, so we must implement it ourselves
- etcd Raft library requires a storage backend that implements
-
Different from kvstore.go
kvstore.go= Application layer KV storage (stores user data)raftlog.go= Raft layer log storage (stores Raft internal state)
-
Replaces MemoryStorage + WAL
- Memory mode needs
raft.MemoryStorage+wal.WALcombination - RocksDB mode uses
RocksDBStorageto replace the entire combination - All data is in RocksDB, no separate WAL files needed
- Memory mode needs
const (
raftLogPrefix = "raft_log_" // Raft log entries
hardStateKey = "hard_state" // Raft HardState (Term, Vote, Commit)
confStateKey = "conf_state" // Cluster configuration state
snapshotKey = "snapshot_meta" // Snapshot metadata
firstIndexKey = "first_index" // First log index
lastIndexKey = "last_index" // Last log index
)These are all Raft consensus protocol internal states, not user data!
type RocksDBStorage struct {
db *grocksdb.DB
nodeID string
// ...
}
// Required by raft.Storage interface:
func (s *RocksDBStorage) InitialState() (HardState, ConfState, error)
func (s *RocksDBStorage) Entries(lo, hi, maxSize uint64) ([]Entry, error)
func (s *RocksDBStorage) Term(index uint64) (uint64, error)
func (s *RocksDBStorage) FirstIndex() (uint64, error)
func (s *RocksDBStorage) LastIndex() (uint64, error)
func (s *RocksDBStorage) Snapshot() (Snapshot, error)
// Additional persistence methods:
func (s *RocksDBStorage) Append(entries []Entry) error
func (s *RocksDBStorage) SetHardState(st HardState) error
func (s *RocksDBStorage) CreateSnapshot(...) (Snapshot, error)
func (s *RocksDBStorage) ApplySnapshot(snap Snapshot) error
func (s *RocksDBStorage) Compact(compactIndex uint64) errortype raftNode struct {
node raft.Node
raftStorage *raft.MemoryStorage // ← etcd built-in
wal *wal.WAL // ← etcd WAL
// ...
}
// Initialization
func NewNode(...) {
rc.raftStorage = raft.NewMemoryStorage()
rc.wal = wal.Create(waldir, nil)
// Start Raft
raft.NewRawNode(&raft.Config{
Storage: rc.raftStorage, // ← Use MemoryStorage
})
}type raftNodeRocks struct {
node raft.Node
raftStorage *rocksdb.RocksDBStorage // ← raftlog.go implementation!
rocksDB *grocksdb.DB
// No WAL needed!
}
// Initialization
func NewNodeRocksDB(..., rocksDB *grocksdb.DB) {
// Create RocksDBStorage
rc.raftStorage = rocksdb.NewRocksDBStorage(rocksDB, "node_1")
// Start Raft
raft.NewRawNode(&raft.Config{
Storage: rc.raftStorage, // ← Use RocksDBStorage
})
}1. HTTP API receives request
↓
internal/http/api.go:ServeHTTP()
2. Call KV Store's Propose method
↓
Memory: internal/memory/kvstore.go:Propose()
RocksDB: internal/rocksdb/kvstore.go:Propose()
3. Send to Raft proposal channel
↓
proposeC <- encodedKV
4. Raft node receives proposal
↓
Memory: internal/raft/node.go:serveChannels()
RocksDB: internal/raft/node_rocksdb.go:serveChannels()
5. Raft reaches consensus, writes to log
↓
Memory: raftStorage.Append() → MemoryStorage + WAL
RocksDB: raftStorage.Append() → RocksDBStorage (raftlog.go)
6. Commit applied entries
↓
commitC <- &Commit{Data: [...]string, ApplyDoneC: ...}
7. KV Store applies committed entries
↓
Memory: internal/memory/kvstore.go:readCommits()
→ Write to memory map
RocksDB: internal/rocksdb/kvstore.go:readCommits()
→ Write to RocksDB (kv_data_ prefix)
8. Return success response
1. HTTP API receives request
↓
internal/http/api.go:ServeHTTP()
2. Call KV Store's Lookup method
↓
Memory: internal/memory/kvstore.go:Lookup()
→ Read from memory map
RocksDB: internal/rocksdb/kvstore.go:Lookup()
→ Read from RocksDB (kv_data_ prefix)
3. Return result
1. Start node
↓
internal/raft/node.go:NewNode()
2. Replay WAL
↓
wal.OpenForRead(waldir)
raftStorage.Append(entries from WAL)
3. Load snapshot (if exists)
↓
snapshotter.Load()
raftStorage.ApplySnapshot(snapshot)
4. KV Store recovers from snapshot
↓
internal/memory/kvstore.go:recoverFromSnapshot()
→ Rebuild memory map
5. Continue processing new requests
1. Start node
↓
internal/raft/node_rocksdb.go:NewNodeRocksDB()
2. Open RocksDB
↓
rocksdb.Open("data/1")
3. Create RocksDBStorage
↓
internal/rocksdb/raftlog.go:NewRocksDBStorage()
→ Automatically load firstIndex, lastIndex from RocksDB
4. Load snapshot (if exists)
↓
snapshotter.Load()
raftStorage.ApplySnapshot(snapshot)
5. KV Store recovers from RocksDB
↓
internal/rocksdb/kvstore.go:recoverFromSnapshot()
→ All data already in RocksDB, no additional recovery needed
6. Continue processing new requests
In RocksDB mode, the same RocksDB database instance is shared by two components:
// cmd/metastore/main.go
db := rocksdb.Open("data/1")
// Purpose 1: Application layer KV storage
kvs := rocksdb.NewRocksDB(db, "node_1", ...)
// Writes key: "kv_data_mykey" → value: "myvalue"
// Purpose 2: Raft log storage
raftStorage := rocksdb.NewRocksDBStorage(db, "node_1")
// Writes key: "raft_log_123" → value: <raft entry>
// Writes key: "hard_state" → value: <term, vote, commit>Data types are distinguished by different key prefixes:
| Prefix | Purpose | Defined In |
|---|---|---|
kv_data_* |
User KV data | internal/rocksdb/kvstore.go |
raft_log_* |
Raft log entries | internal/rocksdb/raftlog.go |
hard_state |
Raft HardState | internal/rocksdb/raftlog.go |
conf_state |
Raft ConfState | internal/rocksdb/raftlog.go |
snapshot_meta |
Snapshot metadata | internal/rocksdb/raftlog.go |
┌──────────────────────────────────────┐
│ etcd Raft Library (go.etcd.io) │
│ │
│ Requires: raft.Storage interface │
└──────────────┬───────────────────────┘
│
↓ Provide implementation
┌──────────────────────────────────────┐
│ Memory Mode │
│ ┌────────────────────────────┐ │
│ │ raft.MemoryStorage │ │
│ │ (etcd built-in impl) │ │
│ └────────────────────────────┘ │
│ + │
│ ┌────────────────────────────┐ │
│ │ wal.WAL │ │
│ │ (etcd built-in WAL) │ │
│ └────────────────────────────┘ │
└──────────────────────────────────────┘
OR
┌──────────────────────────────────────┐
│ RocksDB Mode │
│ ┌────────────────────────────┐ │
│ │ rocksdb.RocksDBStorage │ │
│ │ (raftlog.go custom impl) │ │
│ │ │ │
│ │ Replaces MemoryStorage+WAL │ │
│ └────────────────────────────┘ │
└──────────────────────────────────────┘
kvstore.Store interface
↑ implemented by
├── internal/memory/Memory
└── internal/rocksdb/RocksDB
raft.Storage interface (defined by etcd)
↑ implemented by
├── raft.MemoryStorage (etcd built-in)
└── rocksdb.RocksDBStorage (raftlog.go custom)
- Layered Architecture: HTTP → KV Store → Raft → Storage
- Dual Mode Support: Memory mode (fast) vs RocksDB mode (persistent)
- Interface Abstraction: Pluggable storage engines through interfaces
- Shared Storage: In RocksDB mode, user data and Raft data share the same database
| File | Responsibility | Interface |
|---|---|---|
internal/memory/kvstore.go |
Memory mode user KV storage | kvstore.Store |
internal/rocksdb/kvstore.go |
RocksDB mode user KV storage | kvstore.Store |
internal/rocksdb/raftlog.go |
RocksDB mode Raft log storage | raft.Storage |
internal/raft/node.go |
Memory mode Raft node | - |
internal/raft/node_rocksdb.go |
RocksDB mode Raft node | - |
Although package and file names appear to have duplicates (memory, rocksdb), each file has a clear and unique responsibility:
- Application Layer Storage vs Raft Layer Storage - Completely different layers
- Memory Mode vs RocksDB Mode - Two optional implementation approaches
- Interface Definition vs Interface Implementation - Clear abstraction levels
This is a well-designed, distributed system architecture that follows Go best practices!