The MLX Challenge by ggerganov · Pull Request #6539 · ggml-org/llama.cpp

ggerganov · 2024-04-08T09:55:49Z

ref https://twitter.com/awnihannun/status/1777072588633882741

This branch starts from the flash-attention branch (#5021, #6508).

To perform a benchmark for the challenge, run:

# generate pure 4-bit model
./quantize --pure models/mistral-7b/ggml-model-f16.gguf models/mistral-7b/ggml-model-q4_0-pure.gguf q4_0

make -j llama-bench
./llama-bench -m ./models/mistral-7b/ggml-model-q4_0-pure.gguf -p 0 -t 4 -n 128 -r 10 -fa 1

Current numbers on M2 Ultra:

model	size	params	backend	ngl	threads	test	t/s
llama 7B Q4_0	3.79 GiB	7.24 B	Metal	99	4	tg 128	102.29 ± 0.07

build: 22df85f (2707)

ggerganov · 2024-11-17T09:29:55Z

We don't support group size of 64 atm (which is what I think MLX uses), so can't make an apples-to-apples comparison with MLX.

Base automatically changed from gg/flash-attn-vec to gg/flash-attn April 18, 2024 11:33

ggerganov force-pushed the gg/flash-attn branch 4 times, most recently from 82b282c to ce281b9 Compare April 24, 2024 14:54

mofosyne added Review Complexity : High Generally require indepth knowledge of LLMs or GPUs performance Speed related topics labels May 10, 2024

llama : more metal-friendly KV cache PAD

33a004e

ggerganov force-pushed the mlx-challenge branch from 22df85f to 33a004e Compare May 13, 2024 07:40

ggerganov changed the base branch from gg/flash-attn to master May 13, 2024 07:40

atelepov approved these changes Jul 24, 2024

View reviewed changes

ggerganov closed this Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The MLX Challenge#6539

The MLX Challenge#6539
ggerganov wants to merge 1 commit intomasterfrom
mlx-challenge

ggerganov commented Apr 8, 2024 •

edited

Loading

Uh oh!

ggerganov commented Nov 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggerganov commented Apr 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Nov 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Apr 8, 2024 •

edited

Loading