[Enhancement] Add support for Metal inference

Please make enabled Metal inference for GGML weight models.

Now, llama.cpp can generate texts using Metal inference on Apple Silicon computers. It is very good news for M1/M2 users, and I can run LLaMA 65B GGML q4_0 model on my M1 Max computer at the speed of 4 ~ 5 tokens/s. It is awesome and really fantastic!

koboldcpp repository already has related source codes from llama.cpp like ggml-metal.h, ggml-metal.m, and ggml-metal.metal. So please make them available during inference for text generation. It would be a very special present for Apple Silicon computer users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add support for Metal inference #216

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Enhancement] Add support for Metal inference #216

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions