A high-performance Limit Order Book implementation in C++ with progressive optimizations, designed for correctness and speed.
This pacakge features
- Three progressive implementations of an order book with increasing performance optimizations
- ITCH 5.0 parser for binary market data feed format (Nasdaq TotalView-ITCH)
- Generation of synthetic order streams used for testing and benchmarking
- Comprehensive test suite using GoogleTest with differential testing across all versions
Each version explores different trade-offs in data structure design and memory management:
Data Structures:
std::map<price, std::list<Order>>for bid/ask price levelsstd::unordered_map<order_id, OrderLocation>for O(1) order ID lookup- Each price level stores orders in a doubly-linked list (
std::list)
Characteristics:
- O(log n) price level lookup via
std::map - O(1) order insertion/deletion at any position within a price level
- Poor cache locality: linked list nodes scattered in memory
- Stable iterators enable simple O(1) order cancellation
Complexity:
- Order insertion/matching: O(log n) + O(m) where n = price levels, m = orders at that level
- Order cancellation: O(1) erasure + O(log n) potential empty level cleanup
Data Structures:
std::map<price, std::vector<LazyOrder>>for bid/ask price levelsstd::unordered_map<order_id, OrderLocation>storing vector indices- Orders stored in vectors with an
activeflag for lazy deletion
Characteristics:
- Improved cache locality: contiguous memory layout of vectors
- Lazy deletion: orders marked inactive instead of immediately removed
- Performance penalty: cleanup operations must update all vector indices in
order_lookup - Iteration overhead: must skip inactive orders during matching
Why v2 is slower than v1: Despite better cache locality from vectors, v2 is ~5-14% slower than v1 because:
- The
cleanup_price_level()function (called after deletions) must rebuild the vector and update every moved order's index in theorder_lookuphashmap - During order matching, the code must iterate through and skip inactive orders
- The overhead of index maintenance and inactive order checks outweighs the cache locality benefits
In contrast, v1's std::list provides O(1) deletion without any index updates.
Data Structures:
std::map<price, PriceLevelHead>stores only head/tail pointers for each price level- Pre-allocated
std::vector<OrderNode>memory pool (default 4M orders) - Orders stored as intrusive doubly-linked list nodes within the pool
- Free list for O(1) node allocation/deallocation
Characteristics:
- Zero allocation overhead: all memory pre-allocated on construction
- O(1) insertion/deletion: intrusive linked list with direct index manipulation
- Cache-locality: entire pool is contiguous memory
- No index maintenance: removing nodes requires only updating next/prev pointers
- No cleanup needed: nodes immediately returned to free list
- CMake 3.14 or higher
- C++17 compatible compiler (GCC, Clang, or MSVC)
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j$(nproc)
# The executable will be at: build/src/cpp_orderbookWhen compiling for benchmarks, the following flags are recommended for optimal performance:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Releasecpp_orderbook- Main executable for running benchmarksunit_tests- Test suite (GoogleTest)cpp_orderbook- Library target used by executable and tests
The project uses GoogleTest for unit testing. To run the tests after building:
# From build directory
ctestThe main executable supports benchmarking all order book versions on different input datasets:
./src/cpp_orderbook -i <input.csv> --version <1-3> [--verbosity <0-1>] [--repetitions <n>]-i <file>- Input CSV file with order events (required)--version <1-3>- Order book version to run (default: 1)--verbosity <0-1>- Output level (0=silent, 1=summary)--repetitions <n>- Number of benchmark repetitions to average over (default: 10)
You can generate test order streams programmatically using the generate_events.cpp script:
./src/generate_events test_events.csv 100000All benchmarks are single-threaded, measured on a 13th Gen Intel(R) Core(TM) i7-1360P (x86_64) compiled with GCC 14.2.0.
Generated by ./src/generate_events test_events.csv 1000000:
Running Version 1...
Throughput: (1.0745 +- 0.0100)*1e+7 orders/sec
p99 per-event processing time: (2.410 +- 0.055)*1e+2 ns
Running Version 2...
Throughput: (1.023 +- 0.015)*1e+7 orders/sec
p99 per-event processing time: (3.66 +- 0.13)*1e+2 ns
Running Version 3...
Throughput: (1.7037 +- 0.0081)*1e+7 orders/sec
p99 per-event processing time: (1.940 +- 0.023)*1e+2 ns
Analysis:
- v2 vs v1: Version 2 is ~5% slower despite vector cache locality due to index maintenance overhead in
cleanup_price_level()and iteration over inactive orders - v3 vs v1: Version 3 achieves ~58% higher throughput by eliminating allocations and index updates through pre-allocated memory pool and intrusive linked lists
MSFT_2012-06-21_34200000_57600000_message_10.csv:
Running Version 1...
Throughput: (1.925 +- 0.030)*1e+7 orders/sec
p99 per-event processing time: (1.598 +- 0.064)*1e+2 ns
Running Version 2...
Throughput: (1.666 +- 0.014)*1e+7 orders/sec
p99 per-event processing time: (2.896 +- 0.018)*1e+2 ns
Running Version 3...
Throughput: (2.340 +- 0.013)*1e+7 orders/sec
p99 per-event processing time: (1.339 +- 0.033)*1e+2 ns
This project is licensed under the MIT License. By using or interacting with this software in any way, you agree to the license of this software.
- LOBSTER Data Samples: https://data.lobsterdata.com/info/DataSamples.php
- ITCH 5.0 Specification: Nasdaq TotalView-ITCH protocol documentation