Skip to content

opencl: add q5_K gemm and gemv kernels for Adreno#21595

Merged
max-krasnyansky merged 1 commit intoggml-org:masterfrom
qualcomm:sq/q5_k-adreno
Apr 16, 2026
Merged

opencl: add q5_K gemm and gemv kernels for Adreno#21595
max-krasnyansky merged 1 commit intoggml-org:masterfrom
qualcomm:sq/q5_k-adreno

Conversation

@shaofeiqi
Copy link
Copy Markdown
Contributor

Overview

Add Q5_K GEMM and GEMV kernels to the Adreno backend to improve performance for Q5_K quantized models.

Additional information

With Qwen3.5-9B-Q5_K_M.gguf on 8 elite gen 5:

master,

common_perf_print: prompt eval time =    7754.19 ms /    89 tokens (   87.13 ms per token,    11.48 tokens per second)
common_perf_print:        eval time =   54689.77 ms /   137 runs   (  399.20 ms per token,     2.51 tokens per second) 

this PR,

common_perf_print: prompt eval time =    1601.59 ms /    89 tokens (   18.00 ms per token,    55.57 tokens per second)
common_perf_print:        eval time =   26400.97 ms /   126 runs   (  209.53 ms per token,     4.77 tokens per second)

Requirements

@shaofeiqi shaofeiqi requested a review from a team as a code owner April 8, 2026 00:01
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Apr 8, 2026
@lhez lhez requested a review from max-krasnyansky April 16, 2026 19:02
Copy link
Copy Markdown
Member

@max-krasnyansky max-krasnyansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see Q5_K. Will get started on the Hexagon version too :)

@max-krasnyansky max-krasnyansky merged commit e45dbde into ggml-org:master Apr 16, 2026
87 of 89 checks passed
cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 17, 2026
samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request Apr 19, 2026
mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants