StreamFlow is a event streaming platform designed to demonstrate the complete Data Engineering Lifecycle. It transitions from simple message passing to a managed, resilient ecosystem following industry best practices.
flowchart TB
%% --- Master Theme & Precision Styling ---
accTitle: StreamFlow Master-Level System Architecture
accDescr: An exhaustive mapping of the end-to-end Data Engineering ecosystem.
classDef default fill:#0f172a,stroke:#334155,stroke-width:1px,color:#cbd5e1;
classDef producer fill:#020617,stroke:#10b981,stroke-width:2px,color:#fff;
classDef kafka fill:#1e293b,stroke:#3b82f6,stroke-width:2px,color:#fff;
classDef processing fill:#0f172a,stroke:#6366f1,stroke-width:2px,color:#fff;
classDef storage fill:#020617,stroke:#ef4444,stroke-width:2px,color:#fff;
classDef frontend fill:#1e293b,stroke:#f59e0b,stroke-width:2px,color:#fff;
classDef network fill:#0f172a,stroke:#64748b,stroke-dasharray: 4 4,color:#94a3b8;
classDef init fill:#111827,stroke:#9ca3af,stroke-width:1px,color:#9ca3af,stroke-dasharray: 2 2;
%% --- Layer 1: Ingestion Ecosystem ---
subgraph INGEST ["π INGESTION LAYER (NODE.JS 20)"]
direction LR
P1["βοΈ eventProducer.js\nMath.random() Distribution\n3s Intermittent Emission"]
P1 --> P2["π‘οΈ security.js\nSHA-256 PII Redaction\nMasked: {user_email}"]
end
%% --- Layer 2: Transport (Broker) ---
subgraph BROKER ["β‘ PERSISTENT DATA BUS (APACHE KAFKA)"]
direction TB
ZK["π Zookeeper\nCluster Coordination\nPort: 2181 / 2888 / 3888"]
K1["π¨ Kafka Broker v7.5\nPLAINTEXT: 9092\nINTERNAL: 29092"]
INIT["ποΈ init-kafka (InitContainer)\nExec: createTopics.js\nCreates Paritions & Specs"]
subgraph TOPICS ["π TOPIC REGISTRY"]
T1["π user-events\nPartitions: 3\nReten: 168h"]
T2["π analytics\nAggregated Telemetry"]
T3["π notifications\nVerified Data Objects"]
T4["β οΈ dead-letter-topic\nFault Isolation (DLQ)"]
end
INIT -.->|Check Health| K1
ZK -.->|Leader Election| K1
K1 === T1 & T2 & T3 & T4
end
%% --- Layer 3: Transformation Workspace ---
subgraph PROCESS ["π― TRANSFORMATION LAYER (DISTRIBUTED CONSUMERS)"]
direction LR
C1["π analyticsConsumer.js\nGroup: analytics-group\nWindow: 10s Tumbling"]
C2["π notificationConsumer.js\nGroup: notification-group\nZod Schema Enforcement"]
end
%% --- Layer 4: Persistence & Reliability ---
subgraph SINK ["πΎ PERSISTENCE & FAULT TOLERANCE"]
direction TB
DB_S["π₯ dbConsumer.js\nGroup: db-group\nPostgreSQL Sink"]
PG["π PostgreSQL 15\nTable: 'events'\nInternal Port: 5432"]
DLQ_S["β Error Observer\nSchema/JSON Fault\nPersistence to T4"]
end
%% --- Layer 5: Serving & Real-Time Observer ---
subgraph SERVE ["π₯οΈ OBSERVABILITY COCKPIT (THE BRIDGE)"]
direction TB
WS["π websocketBridge.js\nSocket.IO v4.8\nPort: 3001 | 1.2k req/s"]
DASH["π React Dashboard\nVite 8 + Tailwind 4\nRecharts Gallery"]
end
%% --- Advanced Data Flow ---
P2 ===>|"KAFKA_BROKER:29092"| T1
T1 --->|"Stateful Map"| C1
C1 --->|"Agg: LoginCount"| T2
T1 --->|"Schema Validate"| C2
C2 --"Success"--> T3
C2 --"DLQ Redirect"--> T4
T1 & T2 ---> DB_S
DB_S ===>|"DATABASE_URL"| PG
T3 & T4 ---> DLQ_S
%% Critical Serving Logic
T1 & T2 & T3 & T4 -.->|"Event Pipeline"| WS
WS ===>|"Broadcasting\n'kafka-event'"| DASH
%% --- Class Assignments ---
class P1,P2 producer;
class K1,T1,T2,T3,T4 kafka;
class C1,C2 processing;
class DB_S,PG,DLQ_S storage;
class WS,DASH frontend;
class INGEST,BROKER,PROCESS,SINK,SERVE network;
class INIT init;
StreamFlow implements the industry-standard Data Engineering Lifecycle with high-fidelity practices at every stage:
- High-Velocity Simulation:
eventProducer.jsgenerates standardized JSON login events. - Undercurrent: Security: In-transit Data Masking (SHA-256) redacts sensitive
user_emailpatterns before they enter the stream.
- Resilient Transport: Apache Kafka acts as the decoupled, durable intake buffer.
- Architectural Scaling: Configured with 3 Partitions per topic to enable parallel ingestion and high-throughput ingestion.
- Stateful Processing:
analyticsConsumer.jsperforms Tumbling Window aggregations (10s intervals). - Undercurrent: Data Management: Zod Schema Validation creates a strict data contract; invalid payloads are routed to a Dead Letter Queue (DLQ).
- Relational Persistence:
dbConsumer.jssinks validated events into PostgreSQL 15. - Structured Schema: Hardened table definitions ensure consistent historical data retrieval.
- Real-Time Delivery: WebSocket Bridge (Socket.io) serves as the presentation layer.
- Observability Cockpit: A high-end React Dashboard provides live observability of the entire pipeline velocity.
- Streaming: Apache Kafka, Zookeeper
- Backend: Node.js, Express, Socket.io, KafkaJS
- Frontend: React, Vite, TailwindCSS v4, Recharts
- Database: PostgreSQL 15
- Tools: Docker, Jest, Zod, Framer-motion
The following diagram traces the temporal journey of a single event from generation to persistence and real-time visualization:
sequenceDiagram
autonumber
participant P as βοΈ eventProducer.js
participant K as β‘ Apache Kafka
participant A as π analyticsConsumer.js
participant N as π notificationConsumer.js
participant B as π websocketBridge.js
participant D as π React Dashboard
participant PG as π PostgreSQL 15
Note over P: [Generation] Create JSON Event
P->>P: π‘οΈ maskEmail(user_email)
P->>K: π₯ Publish to 'user-events' (Partition 0-2)
Note right of K: [Ingestion] Durable Persistence
par Ingestion Broadcast
K->>A: π Stream to Analytics
and
K->>N: π Stream to Notification
and
K->>PG: πΎ Sync to DB (dbConsumer.js)
end
Note over A: [Transformation] 10s Tumbling Window
A->>K: π Publish Aggregates to 'analytics'
Note over N: [Validation] Zod Schema Check
alt Success
N->>K: β
Publish to 'notifications'
else Failure
N->>K: β οΈ Route to 'dead-letter-topic'
end
K->>B: β‘ Stream 'notifications' & 'analytics'
B->>D: π Socket.IO Broadcast ('kafka-event')
Note over D: [Serving] Update Recharts & Logs
The StreamFlow dashboard provides real-time throughput velocity, deep-payload inspection, and infrastructure health monitoring.

Full visibility into the 3-partition scaling and message distribution via Kafka-UI.

Observation of the end-to-end data flow: Producer -> Analytics -> Bridge -> DB.

docker-compose up -dcd dashboard && npm run devnpm run create:topics
npm run start:producer
npm run start:analytics
npm run start:db
npm run start:bridgeTo ensure the platform is enterprise-ready, the following precise network and service specifications are implemented:
| Component | Technology | Internal Port | External Port | Role |
|---|---|---|---|---|
| Broker | Apache Kafka 7.5 | 29092 |
9092 |
Distributed Event Log |
| Coordinator | Zookeeper | 2181 |
- | Cluster State Manager |
| Persistence | PostgreSQL 15 | 5432 |
5432 |
Historical Sink |
| Bridge | Express + Socket.IO | 3001 |
3001 |
WebSocket Telemetry |
| Dashboard | React + Vite | 5173 |
5173 |
Observability Cockpit |
| Monitoring | Kafka-UI | 8080 |
8080 |
Cluster Management |
The environment uses a robust Init-Container pattern:
- Stage 1:
dbandkafkaservices initiate with health checks. - Stage 2:
init-kafkaexecutes topic creation (3 Partitions) and waits for cluster availability. - Stage 3: Operational microservices (
producer,analytics,bridge, etc.) start only after Stage 2 success, preventing race conditions and partial failures.
- Platform Health: Automated service health checks and 24/7 Kafka monitoring via Kafka-UI.
- Real-Time Telemetry: Millisecond-latency tracking via the Socket.IO Bridge and Payload Inspector.
- Orchestration: Fully containerized setup with Docker Compose managing the microservices mesh.
- CI/CD: GitHub Actions pipeline for automated build validation and regression testing.
- Automated Testing: Comprehensive Jest suite covering PII masking and Zod schema contracts.
- Fault Isolation: Dead Letter Queue (DLQ) prevents upstream data corruption.
Β© 2026 MNH (@noumanic). Licensed under the MIT License.