Skip to content

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64#20043

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
chaxu01:feature/sme-fp16q4
Mar 3, 2026
Merged

kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64#20043
ggerganov merged 1 commit intoggml-org:masterfrom
chaxu01:feature/sme-fp16q4

Conversation

@chaxu01
Copy link
Copy Markdown
Collaborator

@chaxu01 chaxu01 commented Mar 2, 2026

This patch introduce an SME2-based FP16 compute path for Q4_0 GEMM to improve performance on AARCH64.

Benchmark result for Llama-3.2-1B-Instruct-Q4_0 — pp512 (t/s) (Mac M4 Pro, GGML_KLEIDIAI_SME=1)

Threads w/o fp16q4 w/ fp16q4 Improvement
1 183.76 ± 0.29 297.03 ± 0.39 +61.6%
2 349.14 ± 5.09 548.97 ± 6.24 +57.2%

@chaxu01 chaxu01 requested a review from ggerganov as a code owner March 2, 2026 15:49
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 2, 2026
@ggerganov ggerganov merged commit 137435f into ggml-org:master Mar 3, 2026
77 of 78 checks passed
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants