Skip to content

[Enhancement] Add support for Metal inference #216

@beebopkim

Description

@beebopkim

Please make enabled Metal inference for GGML weight models.

Now, llama.cpp can generate texts using Metal inference on Apple Silicon computers. It is very good news for M1/M2 users, and I can run LLaMA 65B GGML q4_0 model on my M1 Max computer at the speed of 4 ~ 5 tokens/s. It is awesome and really fantastic!

koboldcpp repository already has related source codes from llama.cpp like ggml-metal.h, ggml-metal.m, and ggml-metal.metal. So please make them available during inference for text generation. It would be a very special present for Apple Silicon computer users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions