This error occurs with quantized 70B model that works with the latest current master branch of llama.cpp
llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
llama_init_from_file: failed to load model
I am guessing that you would just need to update the PyPi package. Will try to build from source in the meantime.