Faster Q5_K and Q6_K on Metal by ikawrakow · Pull Request #2294 · ggml-org/llama.cpp

ikawrakow · 2023-07-20T14:10:19Z

Along the same lines as #2290. Here the speedup is not quite as large as for Q4_K, but still significant:

Model	Master	This PR	Speedup
Q5_K_S 7B	26.2	22.8	14.9%
Q5_K_S 13B	46.1	39.3	17.4%
Q5_K_S 33B	115.5	97.0	19.1%
Q5_K_S 65B	214.6	181.1	18.5%
Q6_K 7B	25.6	24.6	4.1%
Q6_K 13B	46.1	44.3	4.1%
Q6_K 33B	116.6	111.0	5.0%

Table shows token generation time in ms/t on M2 Max with 30-core GPU. The system has 64 GB RAM and the 65B Q6_K model does not run successfully.

* Faster Q6_K on Metal * Faster Q5_K on Metal * Another Q5_K speedup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Iwan Kawrakow added 3 commits July 20, 2023 16:00

Faster Q6_K on Metal

fa9d54e

Faster Q5_K on Metal

463f420

Another Q5_K speedup

5f2e4bd

ikawrakow requested a review from ggerganov July 20, 2023 14:10

ggerganov approved these changes Jul 20, 2023

View reviewed changes

ikawrakow merged commit e782c9e into master Jul 20, 2023

ikawrakow mentioned this pull request Jul 20, 2023

Faster Q2_K on Metal #2297

Merged

j-f1 deleted the ik/metal_faster_q6k branch July 21, 2023 12:43

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Faster Q5_K and Q6_K on Metal (ggml-org#2294)

6a870b4

* Faster Q6_K on Metal * Faster Q5_K on Metal * Another Q5_K speedup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Q5_K and Q6_K on Metal#2294

Faster Q5_K and Q6_K on Metal#2294
ikawrakow merged 3 commits intomasterfrom
ik/metal_faster_q6k

ikawrakow commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ikawrakow commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants