ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot by Vithulep · Pull Request #11917 · ggml-org/llama.cpp

Vithulep · 2025-02-17T03:58:13Z

This PR introduces support for SVE (Scalable Vector Extensions) kernels for the q3_K_q8_K vector dot on the Arm architecture. A similar proposal for SVE support is made in PR #7433 and #11227.

This PR contains the SVE implementation of the vector dot used to compute the Q3_K quantization.
By running a Q3_K quantized model of mistral-7b-v01, on Graviton 3 (Perf 01 XL), Accuracy and Performance are measured.

Performance

The performance enhancement with this PR (SVE) is ~ x1.02 to x1.15 faster than the NEON implementation.

Decoding Throughput (TPOT)

Threads	NEON (original)	This PR(SVE)	Ratio
2	4.21	4.86	1.15
4	8.26	9.37	1.13
8	15.90	17.49	1.10
16	29.09	31.05	1.06
32	42.59	43.80	1.03
48	48.36	49.41	1.02

The command used to measure the performance is

./llama-bench  -m ${PATH_TO_MODEL} -n 0 -n 16 -p 64 -t 2,4,8,16,32,48

Perplexity

I also verified that perplexity matches between the NEON and SVE Implementation.

NEON (original)	SVE (this PR)
2.9394 +/- 0.35779	2.9394 +/- 0.35779

ggerganov

Improve the formatting of the code to be more consistent with the rest of the code. I've given a few hints below.

Vithulep · 2025-02-18T04:51:32Z

Improve the formatting of the code to be more consistent with the rest of the code. I've given a few hints below.

Thank you. Improved the code formatting for consistency.

ggerganov

Haven't ran any tests myself, so taking a small leap of faith here, assuming you've done all the necessary tests for this change.

Vithulep · 2025-02-21T03:30:52Z

Haven't ran any tests myself, so taking a small leap of faith here, assuming you've done all the necessary tests for this change.

Thank you! We've done all the necessary tests for this change.

…rg#11917) * Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file * Improved Formating of code in ggml-cpu-quants.c file * style : minor fixes * style : less whitespaces * style : ptr spaceing --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file

e2fdc47

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 17, 2025

ggerganov reviewed Feb 17, 2025

View reviewed changes

Comment thread ggml/src/ggml-cpu/ggml-cpu-quants.c Outdated

Comment thread ggml/src/ggml-cpu/ggml-cpu-quants.c Outdated

Comment thread ggml/src/ggml-cpu/ggml-cpu-quants.c Outdated

Improved Formating of code in ggml-cpu-quants.c file

3b10dff

ggerganov approved these changes Feb 20, 2025

View reviewed changes

style : minor fixes

d4f0941

ggerganov reviewed Feb 20, 2025

View reviewed changes

Comment thread ggml/src/ggml-cpu/ggml-cpu-quants.c

ggerganov reviewed Feb 20, 2025

View reviewed changes

Comment thread ggml/src/ggml-cpu/ggml-cpu-quants.c

style : less whitespaces

da6f6b9

ggerganov reviewed Feb 20, 2025

View reviewed changes

Comment thread ggml/src/ggml-cpu/ggml-cpu-quants.c Outdated

style : ptr spaceing

bc44992

ggerganov merged commit 4806498 into ggml-org:master Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot#11917

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot#11917
ggerganov merged 5 commits intoggml-org:masterfrom
Vithulep:Q3_SVE_Kernel

Vithulep commented Feb 17, 2025 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vithulep commented Feb 18, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vithulep commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Vithulep commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Perplexity

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vithulep commented Feb 18, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vithulep commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vithulep commented Feb 17, 2025 •

edited

Loading