ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot by rehan-10xengineer · Pull Request #20633 · ggml-org/llama.cpp

rehan-10xengineer · 2026-03-16T10:56:09Z

Summary

This PR adds RVV 128-bit implementations for quantized vector dot kernels.

Key Changes

Added the following RVV kernels:

Kernel	VLEN
ggml_vec_dot_iq1_s_q8_K	128
ggml_vec_dot_iq1_m_q8_K	128
ggml_vec_dot_iq2_xs_q8_K	128
ggml_vec_dot_iq3_s_q8_K	128
ggml_vec_dot_iq3_xxs_q8_K	128
ggml_vec_dot_iq4_xs_q8_K	128
ggml_vec_dot_tq1_0_q8_K	128
ggml_vec_dot_tq2_0_q8_K	128

Testing

Kernels were functionally tested through test-quantize-fns for 128-bit on QEMU.

Future Work

Subsequent PRs plan to extend existing RVV kernels for quantization types to higher VLENs (512-bit and 1024-bit).

Copilot

Pull request overview

Adds RVV 128-bit (VLEN=128) implementations for several quantized vector dot kernels in the RISC-V backend to improve coverage/perf on smaller VLEN targets.

Changes:

Introduces new *_vl128 RVV kernels and dispatches them via __riscv_vlenb() * 8 == 128.
Applies NOINLINE to multiple RVV kernels and refactors some unpack/reduction logic.
Updates some existing RVV kernels (notably tq1_0 and iq4_xs) to different vector types / gather patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 void ggml_vec_dot_tq1_0_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc) {
 #if defined __riscv_v_intrinsic
    switch (__riscv_vlenb() * 8) {
-        case 256:
-            ggml_vec_dot_tq1_0_q8_K_vl256(n, s, bs, vx, bx, vy, by, nrc);
+        case 128:
+            ggml_vec_dot_tq1_0_q8_K_vl128(n, s, bs, vx, bx, vy, by, nrc);
            break;
        default:
-            ggml_vec_dot_tq1_0_q8_K_generic(n, s, bs, vx, bx, vy, by, nrc);
+            ggml_vec_dot_tq1_0_q8_K_vl256(n, s, bs, vx, bx, vy, by, nrc);
            break;
    }


+        vint16m2_t sumb = __riscv_vadd_vv_i16m2(suml1, __riscv_vlmul_ext_v_i16m1_i16m2(__riscv_vadd_vv_i16m1(suml2, suml3, 16)), 16);

-        vint32m1_t sum = __riscv_vredsum_vs_i32m2_i32m1(sumb, __riscv_vmv_v_x_i32m1(0, 1), 16);
+        vint32m1_t sum = __riscv_vwredsum_vs_i16m2_i32m1(sumb, __riscv_vmv_v_x_i32m1(0, 1), 32);


+        int sumi = __riscv_vmv_x_s_i32m1_i32(__riscv_vredsum_vs_i32m2_i32m1(sumi_v, __riscv_vmv_v_x_i32m1(0.0f, 1), 8));
+        int sumi1 = __riscv_vmv_x_s_i32m1_i32(__riscv_vredsum_vs_i32m2_i32m1(sumi1_v, __riscv_vmv_v_x_i32m1(0.0f, 1), 8));


+            lsums_s[0] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
+            lsums_s[1] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
+        }
+        __asm__ __volatile__("" ::: "memory");


+            lsums_s[2] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
+            lsums_s[3] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
+        }
+        __asm__ __volatile__("" ::: "memory");


+            lsums_s[4] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
+            lsums_s[5] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
+        }
+        __asm__ __volatile__("" ::: "memory");


+            lsums_s[6] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
+            lsums_s[7] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
+        }
+        __asm__ __volatile__("" ::: "memory");


+
+        // Final lsums.
+        int32_t lsums_s[8];
+        vint32m1_t one_scalar = __riscv_vmv_v_x_i32m1(0, 1);


xctan · 2026-03-17T09:57:02Z

I noticed recent RVV kernels (in this and previous PRs) aren't guarded by ISA test macros, which breaks non-RVV builds. Could you fix this?

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

rehan-10xengineer · 2026-04-14T11:38:46Z

@ggerganov could you please review this PR.

…gml-org#20633) * ggml-cpu: add 128-bit impls for i-quants, ternary quants * ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0 Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: refactor; add rvv checks --------- Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai> Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

rehan-10xengineer requested a review from ggerganov as a code owner March 16, 2026 10:56

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 16, 2026

rehan-10xengineer mentioned this pull request Mar 16, 2026

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot riseproject-dev/llama.cpp#9

Merged

taimur-10x force-pushed the 10x/riscv-quant-vec-dot-128b branch from c7c6abc to d618925 Compare March 16, 2026 12:15

ggerganov requested review from Copilot and xctan March 16, 2026 12:48

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Copilot started reviewing on behalf of ggerganov March 16, 2026 14:46 View session

xctan approved these changes Mar 18, 2026

View reviewed changes

taimur-10x force-pushed the 10x/riscv-quant-vec-dot-128b branch 2 times, most recently from cf95828 to 05a5425 Compare March 18, 2026 12:47

taimur-10x and others added 3 commits April 14, 2026 16:35

ggml-cpu: add 128-bit impls for i-quants, ternary quants

cbf22be

ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

4f74fd8

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

ggml-cpu: refactor; add rvv checks

80c0ac3

rehan-10xengineer force-pushed the 10x/riscv-quant-vec-dot-128b branch from 05a5425 to 80c0ac3 Compare April 14, 2026 11:37

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 14, 2026

ggerganov merged commit 1e796eb into ggml-org:master Apr 16, 2026
50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot#20633

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot#20633
ggerganov merged 3 commits intoggml-org:masterfrom
riseproject-dev:10x/riscv-quant-vec-dot-128b

rehan-10xengineer commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

xctan commented Mar 17, 2026

Uh oh!

rehan-10xengineer commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		int sumi = __riscv_vmv_x_s_i32m1_i32(__riscv_vredsum_vs_i32m2_i32m1(sumi_v, __riscv_vmv_v_x_i32m1(0.0f, 1), 8));
		int sumi1 = __riscv_vmv_x_s_i32m1_i32(__riscv_vredsum_vs_i32m2_i32m1(sumi1_v, __riscv_vmv_v_x_i32m1(0.0f, 1), 8));

Conversation

rehan-10xengineer commented Mar 16, 2026

Summary

Key Changes

Testing

Future Work

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

xctan commented Mar 17, 2026

Uh oh!

rehan-10xengineer commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants