A fault-tolerant, multi-threaded distributed system built in C++ from scratch. Uses TCP sockets, State Machine Replication (SMR), a load-balancing layer with LRU caching, and round-robin routing to ensure high availability and consistent order processing across replicated servers.
Developed as part of CS 7610 — Distributed Systems at Northeastern University.
Clients
│
▼
Load Balancer (with LRU cache + SMR log)
│ round-robin / failover routing
├── Primary Server ──┐
├── Replica 1 │ State Machine Replication
└── Replica N ───────┘ (committed across all nodes)
| Component | Role |
|---|---|
ClientStub / ClientThread |
Abstracts TCP I/O; spawns concurrent customer threads |
ServerStub / ServerThread |
Accepts connections; dispatches engineer + expert threads |
LoadBalancer |
Routes requests round-robin; caches last order via LRU; manages primary election |
ServerClientStub |
Inter-server communication for replication commits |
ClientTimer |
Measures per-request latency and aggregate throughput |
- State Machine Replication — SMR log replicated to all nodes before committing; survives primary failure
- LRU Read Cache — Load balancer caches per-customer last-order lookups to reduce replica reads
- Round-Robin Routing — Requests distributed across replicas with automatic failover
- Thread Pool (Expert Engineers) — Pooled worker threads handle custom requests concurrently with mutex/CV synchronization
- RPC-like Stubs — Request/response serialization over raw TCP; no external RPC library
- Performance Metrics — Client reports mean latency (ms) and throughput (ops/sec) at exit
- Linux (tested on Khoury College cluster)
- C++11 compiler (
g++) make
make all
# Produces: ./server ./client ./load# Start server on port 12347, with 2 expert threads, server ID 1, 0 replicas
./server 12347 1 0
# Run 4 customer threads, each placing 10 regular orders
./client 127.0.0.1 12347 4 10 0# Start two replica servers
./server 12348 2 1 2 127.0.0.1 12349 # primary (ID=2), knows about replica at 12349
./server 12349 2 1 1 127.0.0.1 12348 # replica
# Start load balancer: port 12347, LRU cache size 100, 2 replicas
./load 12347 100 2 2 127.0.0.1 12348 3 127.0.0.1 12349
# Run clients through the load balancer
./client 127.0.0.1 12347 8 50 1 # custom laptop orders./client [ip] [port] [# customers] [# orders] [laptop type]
laptop type: 0 = regular, 1 = custom
The load balancer forwards write operations to the primary, which replicates via the SMR log to all replicas before returning a response. Reads can be served from any replica (or the LRU cache).
- Each incoming client connection gets a dedicated engineer thread
- Custom requests are queued to a shared expert thread pool (mutex + condition variable)
- Replica stubs maintain their own connection pool, protected by a round-robin lock
- Load balancer tracks the current primary index
- On connection failure, promotes the next replica to primary and retries
The client timer reports:
- Latency — elapsed time per request (microseconds → milliseconds)
- Throughput — total orders / total wall time (ops/sec)
Use these metrics to compare single-server vs. replicated configurations under varying load.
MIT License — see LICENSE for details.