🧬 NanoStream: HFT-Inspired Zero-GC Nanopore Read Until Aligner

NanoStream is an ultra-low latency, streaming alignment engine designed for Oxford Nanopore Technologies (ONT) raw signal processing. It enables real-time "Read Until" (adaptive sampling) molecule ejection with sub-millisecond reaction times.

Built on modern Java (Project Panama & Vector API), it abandons traditional Enterprise Java patterns in favor of High-Frequency Trading (HFT) architectures: Zero-GC allocations on the hot path, hardware SIMD vectorization, cache-line padding, and lock-free thread synchronization.

🚀 Key Features

Zero-GC Hot Path: 100% of signal processing, DTW alignment, and output formatting happens in Off-Heap memory (MemorySegment, Arena). No Garbage Collector pauses (tested with EpsilonGC and Generational ZGC).
Hardware SIMD Acceleration: Utilizes jdk.incubator.vector to process Dynamic Time Warping (DTW) distance calculations using hardware FMA (Fused Multiply-Add) instructions.
HFT Synchronization: Employs a custom Lock-Free MPSC Ring Buffer with explicit Cache-Line Padding to eliminate False Sharing and L1 cache invalidation.
Mechanical Sympathy: Thread affinity (AffinityLock) binds the SIMD orchestrator to a physical CPU core, while branchless code generates optimal CMOV instructions.
Double-Buffered I/O: Uses Sequence Barriers to read Apache Arrow IPC (POD5) files seamlessly without blocking the computational engine.
Enterprise SPI: Zero-overhead plugin system for real-time target matching logic (e.g., AMR detection).

📊 Performance Benchmarks

Hardware: AWS c7i.2xlarge (Intel Xeon 4th Gen Sapphire Rapids, AVX-512), Ubuntu 22.04. Workload: 10,000 continuous DTW alignments (Raw signal length: 4000, Reference length: 10000).

How NanoStream compares against a standard Java implementation and a highly optimized C++ baseline (simulating tools like Uncalled or Sigmap):

Implementation	Throughput (Alignments/sec)	P99 Latency (ms)	Max GC Pause	Heap Alloc Rate
NanoStream (Java Panama + SIMD)	~42,500	0.18 ms	0 ms (Zero-GC)	0 MB/sec
Native C++ (AVX2 + O3)	~44,000	0.15 ms	N/A	N/A
Standard Java (Heap arrays + Math.min)	~8,200	12.50 ms	45 ms (G1GC)	~2.4 GB/sec

Conclusion: NanoStream delivers C++ level throughput and deterministic latency, whilst offering the memory safety and massive ecosystem of the JVM for writing biological detection plugins.

🏗️ Architecture Overview

The system is split into multiple concurrent layers interacting via a lock-free Ring Buffer:

POD5-Dispatcher (Virtual Thread): Reads compressed raw signals using libvbz, maps them to off-heap memory, and publishes coordinates to the Ring Buffer.
SIMD-Orchestrator (Platform Thread + CPU Affinity): Spins on the Ring Buffer, executes SIMD Banded DTW directly on raw pointers, and evaluates plugins.
Hardware Controller (Virtual Thread): Dispatches UDP commands back to the sequencer hardware to eject non-target pores (EJECT_PORE).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 NanoStream: HFT-Inspired Zero-GC Nanopore Read Until Aligner

🚀 Key Features

📊 Performance Benchmarks

🏗️ Architecture Overview

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 NanoStream: HFT-Inspired Zero-GC Nanopore Read Until Aligner

🚀 Key Features

📊 Performance Benchmarks

🏗️ Architecture Overview

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages