Skip to content

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot#20633

Merged
ggerganov merged 3 commits intoggml-org:masterfrom
riseproject-dev:10x/riscv-quant-vec-dot-128b
Apr 16, 2026
Merged

ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot#20633
ggerganov merged 3 commits intoggml-org:masterfrom
riseproject-dev:10x/riscv-quant-vec-dot-128b

Conversation

@rehan-10xengineer
Copy link
Copy Markdown
Contributor

Summary

This PR adds RVV 128-bit implementations for quantized vector dot kernels.

Key Changes

  • Added the following RVV kernels:
Kernel VLEN
ggml_vec_dot_iq1_s_q8_K 128
ggml_vec_dot_iq1_m_q8_K 128
ggml_vec_dot_iq2_xs_q8_K 128
ggml_vec_dot_iq3_s_q8_K 128
ggml_vec_dot_iq3_xxs_q8_K 128
ggml_vec_dot_iq4_xs_q8_K 128
ggml_vec_dot_tq1_0_q8_K 128
ggml_vec_dot_tq2_0_q8_K 128

Testing

Kernels were functionally tested through test-quantize-fns for 128-bit on QEMU.

Future Work

Subsequent PRs plan to extend existing RVV kernels for quantization types to higher VLENs (512-bit and 1024-bit).

@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 16, 2026
@taimur-10x taimur-10x force-pushed the 10x/riscv-quant-vec-dot-128b branch from c7c6abc to d618925 Compare March 16, 2026 12:15
@ggerganov ggerganov requested review from Copilot and xctan March 16, 2026 12:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds RVV 128-bit (VLEN=128) implementations for several quantized vector dot kernels in the RISC-V backend to improve coverage/perf on smaller VLEN targets.

Changes:

  • Introduces new *_vl128 RVV kernels and dispatches them via __riscv_vlenb() * 8 == 128.
  • Applies NOINLINE to multiple RVV kernels and refactors some unpack/reduction logic.
  • Updates some existing RVV kernels (notably tq1_0 and iq4_xs) to different vector types / gather patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 4118 to 4127
void ggml_vec_dot_tq1_0_q8_K(int n, float * GGML_RESTRICT s, size_t bs, const void * GGML_RESTRICT vx, size_t bx, const void * GGML_RESTRICT vy, size_t by, int nrc) {
#if defined __riscv_v_intrinsic
switch (__riscv_vlenb() * 8) {
case 256:
ggml_vec_dot_tq1_0_q8_K_vl256(n, s, bs, vx, bx, vy, by, nrc);
case 128:
ggml_vec_dot_tq1_0_q8_K_vl128(n, s, bs, vx, bx, vy, by, nrc);
break;
default:
ggml_vec_dot_tq1_0_q8_K_generic(n, s, bs, vx, bx, vy, by, nrc);
ggml_vec_dot_tq1_0_q8_K_vl256(n, s, bs, vx, bx, vy, by, nrc);
break;
}
Comment thread ggml/src/ggml-cpu/arch/riscv/quants.c Outdated
Comment on lines +4109 to +4111
vint16m2_t sumb = __riscv_vadd_vv_i16m2(suml1, __riscv_vlmul_ext_v_i16m1_i16m2(__riscv_vadd_vv_i16m1(suml2, suml3, 16)), 16);

vint32m1_t sum = __riscv_vredsum_vs_i32m2_i32m1(sumb, __riscv_vmv_v_x_i32m1(0, 1), 16);
vint32m1_t sum = __riscv_vwredsum_vs_i16m2_i32m1(sumb, __riscv_vmv_v_x_i32m1(0, 1), 32);
Comment on lines +2165 to +2166
int sumi = __riscv_vmv_x_s_i32m1_i32(__riscv_vredsum_vs_i32m2_i32m1(sumi_v, __riscv_vmv_v_x_i32m1(0.0f, 1), 8));
int sumi1 = __riscv_vmv_x_s_i32m1_i32(__riscv_vredsum_vs_i32m2_i32m1(sumi1_v, __riscv_vmv_v_x_i32m1(0.0f, 1), 8));
lsums_s[0] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
lsums_s[1] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
}
__asm__ __volatile__("" ::: "memory");
lsums_s[2] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
lsums_s[3] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
}
__asm__ __volatile__("" ::: "memory");
lsums_s[4] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
lsums_s[5] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
}
__asm__ __volatile__("" ::: "memory");
lsums_s[6] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 0), one_scalar, 32));
lsums_s[7] = __riscv_vmv_x_s_i32m1_i32(__riscv_vwredsum_vs_i16m4_i32m1(__riscv_vget_v_i16m8_i16m4(lsum0, 1), one_scalar, 32));
}
__asm__ __volatile__("" ::: "memory");

// Final lsums.
int32_t lsums_s[8];
vint32m1_t one_scalar = __riscv_vmv_v_x_i32m1(0, 1);
@xctan
Copy link
Copy Markdown
Collaborator

xctan commented Mar 17, 2026

I noticed recent RVV kernels (in this and previous PRs) aren't guarded by ISA test macros, which breaks non-RVV builds. Could you fix this?

@taimur-10x taimur-10x force-pushed the 10x/riscv-quant-vec-dot-128b branch 2 times, most recently from cf95828 to 05a5425 Compare March 18, 2026 12:47
@rehan-10xengineer rehan-10xengineer force-pushed the 10x/riscv-quant-vec-dot-128b branch from 05a5425 to 80c0ac3 Compare April 14, 2026 11:37
@rehan-10xengineer
Copy link
Copy Markdown
Contributor Author

@ggerganov could you please review this PR.

@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Apr 14, 2026
@ggerganov ggerganov merged commit 1e796eb into ggml-org:master Apr 16, 2026
50 checks passed
cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 17, 2026
…gml-org#20633)

* ggml-cpu: add 128-bit impls for i-quants, ternary quants

* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor; add rvv checks

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026
…gml-org#20633)

* ggml-cpu: add 128-bit impls for i-quants, ternary quants

* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor; add rvv checks

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
…gml-org#20633)

* ggml-cpu: add 128-bit impls for i-quants, ternary quants

* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor; add rvv checks

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
…gml-org#20633)

* ggml-cpu: add 128-bit impls for i-quants, ternary quants

* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor; add rvv checks

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
…gml-org#20633)

* ggml-cpu: add 128-bit impls for i-quants, ternary quants

* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor; add rvv checks

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
jimbothigpen pushed a commit to jimbothigpen/frankenturbo2 that referenced this pull request May 2, 2026
…gml-org#20633)

* ggml-cpu: add 128-bit impls for i-quants, ternary quants

* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: refactor; add rvv checks

---------

Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants