Generic neural network inference engine in pure C. 258x faster than PyTorch on small networks, reveals BLAS optimization crossover on large ones.
c machine-learning neural-networks blas systems-programming performance-optimization inference-engine matrix-operations low-level-optimization pytorch-comparison
-
Updated
Mar 25, 2026 - C