kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 by chaxu01 · Pull Request #20043 · ggml-org/llama.cpp

chaxu01 · 2026-03-02T15:49:57Z

This patch introduce an SME2-based FP16 compute path for Q4_0 GEMM to improve performance on AARCH64.

Benchmark result for Llama-3.2-1B-Instruct-Q4_0 — pp512 (t/s) (Mac M4 Pro, GGML_KLEIDIAI_SME=1)

Threads	w/o fp16q4	w/ fp16q4	Improvement
1	183.76 ± 0.29	297.03 ± 0.39	+61.6%
2	349.14 ± 5.09	548.97 ± 6.24	+57.2%

…rg#20043)

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64

46dcd95

chaxu01 requested a review from ggerganov as a code owner March 2, 2026 15:49

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 2, 2026

ggerganov approved these changes Mar 3, 2026

View reviewed changes

ggerganov merged commit 137435f into ggml-org:master Mar 3, 2026
77 of 78 checks passed

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (ggml-o…

6585968

…rg#20043)

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (ggml-o…

36d73d6

…rg#20043)

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (ggml-o…

413039e

…rg#20043)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64#20043

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64#20043
ggerganov merged 1 commit intoggml-org:masterfrom
chaxu01:feature/sme-fp16q4

chaxu01 commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chaxu01 commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants