opencl: add q5_K gemm and gemv kernels for Adreno by shaofeiqi · Pull Request #21595 · ggml-org/llama.cpp

shaofeiqi · 2026-04-08T00:01:45Z

Overview

Add Q5_K GEMM and GEMV kernels to the Adreno backend to improve performance for Q5_K quantized models.

Additional information

With Qwen3.5-9B-Q5_K_M.gguf on 8 elite gen 5:

master,

common_perf_print: prompt eval time =    7754.19 ms /    89 tokens (   87.13 ms per token,    11.48 tokens per second)
common_perf_print:        eval time =   54689.77 ms /   137 runs   (  399.20 ms per token,     2.51 tokens per second)

this PR,

common_perf_print: prompt eval time =    1601.59 ms /    89 tokens (   18.00 ms per token,    55.57 tokens per second)
common_perf_print:        eval time =   26400.97 ms /   126 runs   (  209.53 ms per token,     4.77 tokens per second)

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: No

max-krasnyansky

Nice to see Q5_K. Will get started on the Hexagon version too :)

shaofeiqi requested a review from a team as a code owner April 8, 2026 00:01

github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Apr 8, 2026

opencl: add q5_K gemm and gemv kernels for Adreno

467ffd3

lhez force-pushed the sq/q5_k-adreno branch from d684323 to 467ffd3 Compare April 11, 2026 09:54

lhez approved these changes Apr 16, 2026

View reviewed changes

lhez requested a review from max-krasnyansky April 16, 2026 19:02

max-krasnyansky approved these changes Apr 16, 2026

View reviewed changes

max-krasnyansky merged commit e45dbde into ggml-org:master Apr 16, 2026
87 of 89 checks passed

cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 17, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

f840dca

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request Apr 19, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

36a3c1b

mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

156ab19

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

fff2344

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

af820f0

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

9839f80

jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026

opencl: add q5_K gemm and gemv kernels for Adreno (ggml-org#21595)

14f6c04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opencl: add q5_K gemm and gemv kernels for Adreno#21595

opencl: add q5_K gemm and gemv kernels for Adreno#21595
max-krasnyansky merged 1 commit intoggml-org:masterfrom
qualcomm:sq/q5_k-adreno

shaofeiqi commented Apr 8, 2026

Uh oh!

max-krasnyansky left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shaofeiqi commented Apr 8, 2026

Overview

Additional information

Requirements

Uh oh!

max-krasnyansky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants