Please make enabled Metal inference for GGML weight models.
Now, llama.cpp can generate texts using Metal inference on Apple Silicon computers. It is very good news for M1/M2 users, and I can run LLaMA 65B GGML q4_0 model on my M1 Max computer at the speed of 4 ~ 5 tokens/s. It is awesome and really fantastic!
koboldcpp repository already has related source codes from llama.cpp like ggml-metal.h, ggml-metal.m, and ggml-metal.metal. So please make them available during inference for text generation. It would be a very special present for Apple Silicon computer users.
Please make enabled Metal inference for GGML weight models.
Now, llama.cpp can generate texts using Metal inference on Apple Silicon computers. It is very good news for M1/M2 users, and I can run LLaMA 65B GGML q4_0 model on my M1 Max computer at the speed of 4 ~ 5 tokens/s. It is awesome and really fantastic!
koboldcpp repository already has related source codes from llama.cpp like ggml-metal.h, ggml-metal.m, and ggml-metal.metal. So please make them available during inference for text generation. It would be a very special present for Apple Silicon computer users.