Quant-finance benchmarks for C++ automatic differentiation libraries. Heston MC, SABR calibration, XVA CVA, and LIBOR swaption — finite differences, XAD, CppAD, Adept 2, autodiff. Reproducible from source.
GCC 13.3, Intel Xeon Platinum 8488C, Ubuntu 24.04, -O3 -mavx2 -mfma, 10K MC paths. Canonical CSV: results/results.csv.
Gradient time — the cost of computing all sensitivities. Bold = fastest in row. The Primal column is the cost of evaluating the same workload once with raw doubles (no AD machinery), so the gap between Primal and each AAD library shows the recording-and-reverse-sweep overhead, and FD ≈ (N+1) × Primal as expected for forward-difference bumping.
| # | Benchmark | Sensis | Primal | FD | XAD | XAD-Codegen | CppAD | Adept | autodiff |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Heston MC | 8 | 9.3 ms | 83 ms | 40 ms | 21 ms | 268 ms | 91 ms | DNF |
| 2 | SABR Calib | 15 | 2.1 ms | 33 ms | 8.4 ms | 4.9 ms | 32 ms | 19 ms | 38 ms |
| 3 | XVA CVA | 40 | 591 ms | 24.1 s | 2.6 s | 0.57 s | 8.0 s | 7.1 s | DNF |
| 4 | LIBOR Swaption | 161 | 138 ms | 21.6 s | 1.00 s | 0.31 s | 4.57 s | 1.15 s | DNF |
Median of 10 measured iterations after warmup, reverse-mode for the AAD libraries. The Primal column is the median of the AAD libs' raw-double primal benchmarks, which all run the same underlying numerical kernel as the gradient timings (no RNG cost asymmetries). The sanity check holds: FD/Primal ≈ N+1 across all four rows.
- Adjoint AD vs finite differences. FD scales O(N) with input count, so the gap to AAD widens from ~4× on Heston (8 inputs) to ~70× on LIBOR (161 inputs). FD is fine for spot checks but quickly becomes the bottleneck once you need more than a handful of sensitivities.
- Tape libraries cluster within an order of magnitude. XAD's tape mode is the fastest tape library on every benchmark, by margins of ~1.1× (LIBOR vs Adept) up to 2.3× (Heston vs Adept). CppAD is consistently slowest of the three on the MC benchmarks but stays within an order of magnitude.
- XAD-Codegen compiles the recorded graph to AVX2 native code at runtime and is roughly 2×–5× faster than XAD's own tape mode on the four benchmarks. Switching from tape to Codegen requires expressing data-dependent branches via
xad::less(a,b).If(then,else)so the recorded graph is branch-free; no code changes outside the per-path kernel. - autodiff completes only 1 of 4 benchmarks (SABR, using forward
dualmode). Forward mode is O(N) in input count and the alternativevarreverse mode allocates a fresh heap-based expression tree on every gradient call — neither scales to MC pricing or larger calibrations. autodiff isn't a peer for the AAD workloads benchmarked here.
| Library | Modes available | Recording approach |
|---|---|---|
| XAD | Forward & Adjoint, higher-order | Tape-based; optional xad-codegen backend compiles the recorded graph to AVX2 native code |
| CppAD | Forward & Reverse, higher-order | Tape-based ADFun record/replay |
| Adept 2 | Forward & Reverse | Expression templates with stack recording |
| autodiff | Forward (dual) & Reverse (var) |
Compile-time dual numbers / runtime expression tree |
All four libraries support both forward and reverse modes; the suite exercises reverse mode, the standard choice for many-inputs/one-output workloads such as risk and pricing.
cmake -B build -GNinja -DCMAKE_BUILD_TYPE=Release
cmake --build build
./build/ad_benchmarksTo enable XAD-Codegen results (xad-codegen is commercially licensed):
cmake -B build -GNinja -DCMAKE_BUILD_TYPE=Release \
-DXAD_DIR=/path/to/xad \
-DXAD_CODEGEN_DIR=/path/to/xad-codegen \
-DENABLE_XAD_JIT=ONRun ./build/ad_benchmarks --help for CLI options (--paths, --iters, --warmup, --csv, --only, --skip).
To regenerate the chart from a CSV:
python scripts/plot_results.py results/results.csv results/chart.png- Identical compiler flags across libraries:
-O3 -mavx2 -mfma(GCC/Clang) or/O2 /arch:AVX2 /fp:fast(MSVC). - Idiomatic APIs. Each library uses its own recommended pattern: XAD reverse-mode tape (
xad::adj<double>), CppADADFunrecord/replay, AdeptStackrecording, autodiffdualforward mode for SABR. No micro-optimizations applied to one library that wouldn't be applied to another. - Median of measured iterations after warmup; warmup excluded.
- All four libraries' gradients agree with finite differences within numerical tolerance during development.
- Same machine, same run. Re-running on a different machine scales all rows by roughly the same factor.
PRs and issues welcome — especially:
- More AD libraries (open an issue or PR with a wrapper following the pattern in
xad/,cppad/,adept/,autodiff/). - More finance kernels (Bermudan/American MC, multi-curve bootstrapping, equity vol surface fitters, real PFE / ECL).
- Methodology improvements or fairer wirings for any of the existing libraries.
The LIBOR swaption benchmark is adapted from Prof. Mike Giles' canonical adjoint LMM C++ code. Thanks to the maintainers of CppAD, Adept 2, and autodiff — a meaningful comparison is only possible because their libraries exist.
Copyright © 2026 Xcelerit Computing Ltd. Licensed under the MIT License. See CITATION.cff for citation metadata.
