As of this PR to llama.cpp the CUDA binaries are capable of running with CPU only, as long as n_gpu_layers = 0.
This might mean that we can significantly simplify our distribution of binaries by removing the CPU only variants and only shiping CUDA ones.
As of this PR to llama.cpp the CUDA binaries are capable of running with CPU only, as long as
n_gpu_layers = 0.This might mean that we can significantly simplify our distribution of binaries by removing the CPU only variants and only shiping CUDA ones.