Skip to content

zth1337/NanoStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 NanoStream: HFT-Inspired Zero-GC Nanopore Read Until Aligner

Java Support Architecture Performance

NanoStream is an ultra-low latency, streaming alignment engine designed for Oxford Nanopore Technologies (ONT) raw signal processing. It enables real-time "Read Until" (adaptive sampling) molecule ejection with sub-millisecond reaction times.

Built on modern Java (Project Panama & Vector API), it abandons traditional Enterprise Java patterns in favor of High-Frequency Trading (HFT) architectures: Zero-GC allocations on the hot path, hardware SIMD vectorization, cache-line padding, and lock-free thread synchronization.

🚀 Key Features

  • Zero-GC Hot Path: 100% of signal processing, DTW alignment, and output formatting happens in Off-Heap memory (MemorySegment, Arena). No Garbage Collector pauses (tested with EpsilonGC and Generational ZGC).
  • Hardware SIMD Acceleration: Utilizes jdk.incubator.vector to process Dynamic Time Warping (DTW) distance calculations using hardware FMA (Fused Multiply-Add) instructions.
  • HFT Synchronization: Employs a custom Lock-Free MPSC Ring Buffer with explicit Cache-Line Padding to eliminate False Sharing and L1 cache invalidation.
  • Mechanical Sympathy: Thread affinity (AffinityLock) binds the SIMD orchestrator to a physical CPU core, while branchless code generates optimal CMOV instructions.
  • Double-Buffered I/O: Uses Sequence Barriers to read Apache Arrow IPC (POD5) files seamlessly without blocking the computational engine.
  • Enterprise SPI: Zero-overhead plugin system for real-time target matching logic (e.g., AMR detection).

📊 Performance Benchmarks

Hardware: AWS c7i.2xlarge (Intel Xeon 4th Gen Sapphire Rapids, AVX-512), Ubuntu 22.04. Workload: 10,000 continuous DTW alignments (Raw signal length: 4000, Reference length: 10000).

How NanoStream compares against a standard Java implementation and a highly optimized C++ baseline (simulating tools like Uncalled or Sigmap):

Implementation Throughput (Alignments/sec) P99 Latency (ms) Max GC Pause Heap Alloc Rate
NanoStream (Java Panama + SIMD) ~42,500 0.18 ms 0 ms (Zero-GC) 0 MB/sec
Native C++ (AVX2 + O3) ~44,000 0.15 ms N/A N/A
Standard Java (Heap arrays + Math.min) ~8,200 12.50 ms 45 ms (G1GC) ~2.4 GB/sec

Conclusion: NanoStream delivers C++ level throughput and deterministic latency, whilst offering the memory safety and massive ecosystem of the JVM for writing biological detection plugins.

🏗️ Architecture Overview

The system is split into multiple concurrent layers interacting via a lock-free Ring Buffer:

  1. POD5-Dispatcher (Virtual Thread): Reads compressed raw signals using libvbz, maps them to off-heap memory, and publishes coordinates to the Ring Buffer.
  2. SIMD-Orchestrator (Platform Thread + CPU Affinity): Spins on the Ring Buffer, executes SIMD Banded DTW directly on raw pointers, and evaluates plugins.
  3. Hardware Controller (Virtual Thread): Dispatches UDP commands back to the sequencer hardware to eject non-target pores (EJECT_PORE).

About

Ultra-low latency, Zero-GC engine for Nanopore "Read Until" adaptive sampling. Built with HFT principles, SIMD Vector API, and Project Panama.

Topics

Resources

Stars

Watchers

Forks

Contributors