Skip to content
View NasitSony's full-sized avatar

Block or report NasitSony

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nasitsony/README.md

Hi, I'm Nasit Sony πŸ‘‹

Distributed Systems & AI Infrastructure Engineer

I build correctness-first systems β€” from storage engines and consensus protocols to fault-tolerant pipelines and orchestration platforms.

I focus on what happens when systems fail:

  • crashes
  • retries
  • duplicate processing
  • network delays and reordering
  • adversarial behavior (BFT)

🧠 About Me

I design and implement distributed systems where correctness is a requirement β€” not a best-effort.

My work spans:

  • Storage systems (WAL, crash recovery, replication)
  • Consensus protocols (Raft, asynchronous BFT)
  • Fault-tolerant pipelines (Kafka, idempotency, retries)
  • Orchestration systems (workflow + job scheduling)
  • AI infrastructure (RAG pipelines, inference routing)

I treat AI systems as distributed systems problems, not just APIs.


⚑ Experience Snapshot

πŸ’° Production Systems (Fintech)

  • Built international money transfer systems handling $600M+ annual volume
  • Focus: correctness, consistency, and reliability under real-world constraints

πŸ”¬ Distributed Systems & BFT Research

  • Published work in Springer journals and international conferences
  • Designed and implemented asynchronous Byzantine fault-tolerant protocols
  • Focus: bridging theoretical guarantees with real system behavior

πŸš€ What I Build

🧱 Storage Layer β€” VeriStore

Crash-consistent KV engine with WAL durability, snapshotting, and Raft-based replication.

Handles:

  • process crashes (kill -9)
  • partial/torn writes
  • deterministic recovery via WAL replay
  • leader failover and log consistency

🧠 Consensus Layer β€” Async-BFT Framework

Asynchronous Byzantine fault-tolerant consensus framework (MVBA, ABBA).

Simulates:

  • adversarial nodes
  • message delays and reordering
  • quorum-based agreement under failure

βš™οΈ Orchestration Layer β€” Veriflow

Kubernetes-based job orchestration control plane.

Implements:

  • idempotent job submission
  • concurrency-safe scheduling (SKIP LOCKED)
  • reconciliation-driven execution recovery
  • append-only event timeline for auditability

πŸ”„ Data Pipeline β€” SmartSearch

Fault-aware async ingestion + semantic retrieval backend.

Handles:

  • worker crashes mid-processing
  • Kafka replay / duplicate delivery
  • idempotent ingestion and deterministic recovery

πŸ” Workflow Orchestration β€” AgentFlow

Failure-aware workflow execution engine with explicit state transitions.

Features:

  • step-level execution and retry
  • timeout handling and recovery
  • deterministic state reconstruction

πŸ’₯ Engineering Philosophy

I design systems for failure, not just success.

I ask:

  • What if a worker crashes mid-processing?
  • What if a write is partially persisted?
  • What if messages are replayed?
  • What if nodes behave maliciously?

I build systems that:

  • recover deterministically
  • enforce explicit state transitions
  • prevent duplication and corruption
  • remain correct under failure

🧰 Tech Stack

Languages:
Java, C++, Go, Python

Backend & Infra:
Spring Boot, Kafka, PostgreSQL, Kubernetes, Docker

Distributed Systems:
WAL, replication, consensus (Raft, BFT), idempotency, retries

AI Infrastructure:
Embeddings, RAG pipelines, vector search (pgvector)


πŸ“š Research

Prioritized-MVBA β€” Asynchronous Byzantine Agreement Protocol
Published in Springer journals & international conferences

πŸ”— https://scholar.google.com/citations?user=mBIQ1-0AAAAJ&hl=en


🎯 Current Focus

  • Distributed systems & storage engines
  • Fault-tolerant AI infrastructure
  • Consensus protocol engineering

πŸ“¬ Connect

πŸ”— LinkedIn: https://www.linkedin.com/in/nasitsony

Pinned Loading

  1. VeriStore VeriStore Public

    Correctness-first C++ storage engine with WAL durability, crash recovery, Raft replication, and a minimal S3-style object store.

    C++

  2. async-bft-suite async-bft-suite Public

    Prototype framework implementing three asynchronous BFT agreement protocols (Cachin MVBA, VABA, pMVBA) with a unified simulation harness and comparable metrics.

    Python

  3. SmartSearch SmartSearch Public

    Production-style semantic search and RAG backend built as a distributed system. Features async ingestion (Kafka), embedding pipelines, pgvector search, and strong reliability guarantees β€” including…

    Java

  4. veriflow-control-plane veriflow-control-plane Public

    Fault-tolerant Kubernetes job orchestration control plane with persistent lifecycle tracking and reconciliation-driven execution recovery.

    Go

  5. agentflow agentflow Public

    Control-plane system for reliable, stateful task orchestration with idempotency, retries, and failure-aware execution.

    Java

  6. llm-serving-cache llm-serving-cache Public

    Distributed inference cache using VeriStore

    C++