Llama.cpp 30B runs with only 6GB of RAM now : https://github.com/ggerganov/llama.cpp/pull/613
Llama.cpp 30B runs with only 6GB of RAM now :
ggml-org#613