Only Distribute CUDA Binaries?

As of [this PR to llama.cpp](https://github.com/ggerganov/llama.cpp/pull/3946) the CUDA binaries are capable of running with CPU only, as long as `n_gpu_layers = 0`.

This might mean that we can significantly simplify our distribution of binaries by removing the CPU only variants and only shiping CUDA ones.