vulkan: Support GGML_TYPE_NVFP4 by jeffbolznv · Pull Request #21455 · ggml-org/llama.cpp

jeffbolznv · 2026-04-05T03:43:53Z

Overview

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.

Additional information

I haven't done a ton of perf tuning, partly to get the basic functionality in first and partly because my normal test system is temporarily out of commission.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES. I asked AI to do this, it did the boilerplate but it couldn't manage to get the indexing calculations right. I ended up rewriting all of the indexing calculations myself.

0cc4m · 2026-04-09T08:44:08Z

This also needs a rebase.

0cc4m · 2026-04-10T12:59:45Z

The CM pipeline has hit a sentinel mismatch with ROPE twice in a row now, but this PR doesn't touch any code that could affect that. Bad luck? Also, another conflict.

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.

jeffbolznv · 2026-04-10T13:42:48Z

The sentinel error is weird, and I think I saw something similar with an ADD test in another PR. There should be explicit bounds checking in all the rope variants. Maybe some kind of synchronization issue, though I'm not sure how...

0cc4m · 2026-04-14T09:27:48Z

@ggml-org/maintainers Another approval needed.

ggerganov · 2026-04-14T10:57:09Z

Did you notice on which runner was the sentinel error for the ROPE test?

I wonder if it always happens on @taronaeo's self-hosted runner like in this case: https://github.com/ggml-org/llama.cpp/actions/runs/24360207449/job/71154501348

0cc4m · 2026-04-14T11:40:23Z

I think it was the Nvidia Coopmat1 runner.

pwilkin · 2026-04-14T12:00:47Z

There was also a CPU crash on OUT_PROD:

OUT_PROD(type_a=q8_0,type_b=f16,m=256,n=16,k=16,bs=[3,3],nr=[2,2],trans_b=0): not supported [CPU] 
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(+0x17394)[0xff7c23c57394]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0xff7c23c5784c]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_abort+0x138)[0xff7c23c57a18]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(ggml_compute_forward_out_prod+0x508)[0xff7c23694c88]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(+0x17cc0)[0xff7c23637cc0]
/lib/aarch64-linux-gnu/libgomp.so.1(+0x1ba2c)[0xff7c235dba2c]
/lib/aarch64-linux-gnu/libc.so.6(+0x80398)[0xff7c23810398]
/lib/aarch64-linux-gnu/libc.so.6(+0xe9e9c)[0xff7c23879e9c]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(+0x17394)[0xff7c23c57394]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0xff7c23c5784c]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_abort+0x138)[0xff7c23c57a18]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(ggml_compute_forward_out_prod+0x508)[0xff7c23694c88]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(+0x17cc0)[0xff7c23637cc0]
/lib/aarch64-linux-gnu/libgomp.so.1(+0x1ba2c)[0xff7c235dba2c]
/lib/aarch64-linux-gnu/libc.so.6(+0x80398)[0xff7c23810398]
/lib/aarch64-linux-gnu/libc.so.6(+0xe9e9c)[0xff7c23879e9c]

CISC · 2026-04-14T12:12:09Z

There was also a CPU crash on OUT_PROD:

That was fixed with #21716

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.

This reverts commit 6a6780a.

jeffbolznv requested a review from a team as a code owner April 5, 2026 03:43

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 5, 2026

0cc4m mentioned this pull request Apr 5, 2026

Misc. bug: NVFP4 Type CPU Backend high error on ARM #21462

Closed

richarddd mentioned this pull request Apr 7, 2026

cpu: fix ARM NEON nvfp4 dot product on non-dotprod targets #21559

Merged

0cc4m reviewed Apr 9, 2026

View reviewed changes

Comment thread ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs.glsl

Comment thread ggml/src/ggml-vulkan/vulkan-shaders/types.glsl

jeffbolznv force-pushed the nvfp4 branch from 952a5d3 to 096aa72 Compare April 9, 2026 14:24

vulkan: Support GGML_TYPE_NVFP4

bb7b6bd

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.

jeffbolznv force-pushed the nvfp4 branch from 096aa72 to bb7b6bd Compare April 10, 2026 13:42

0cc4m approved these changes Apr 14, 2026

View reviewed changes

am17an approved these changes Apr 14, 2026

View reviewed changes

0cc4m merged commit 6a6780a into ggml-org:master Apr 14, 2026
45 of 48 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026

vulkan: Support GGML_TYPE_NVFP4 (ggml-org#21455)

9f7c31a

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.

vkhaitan added a commit to vkhaitan/vllama.cpp that referenced this pull request Apr 27, 2026

Revert "vulkan: Support GGML_TYPE_NVFP4 (ggml-org#21455)"

15f85e3

This reverts commit 6a6780a.

vkhaitan added a commit to vkhaitan/vllama.cpp that referenced this pull request Apr 29, 2026

Revert "vulkan: Support GGML_TYPE_NVFP4 (ggml-org#21455)"

1f9e87f

This reverts commit 6a6780a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: Support GGML_TYPE_NVFP4#21455

vulkan: Support GGML_TYPE_NVFP4#21455
0cc4m merged 1 commit intoggml-org:masterfrom
jeffbolznv:nvfp4

jeffbolznv commented Apr 5, 2026

Uh oh!

Uh oh!

Uh oh!

0cc4m commented Apr 9, 2026

Uh oh!

0cc4m commented Apr 10, 2026

Uh oh!

jeffbolznv commented Apr 10, 2026

Uh oh!

0cc4m commented Apr 14, 2026

Uh oh!

Uh oh!

ggerganov commented Apr 14, 2026

Uh oh!

0cc4m commented Apr 14, 2026

Uh oh!

pwilkin commented Apr 14, 2026

Uh oh!

CISC commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jeffbolznv commented Apr 5, 2026

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Uh oh!

0cc4m commented Apr 9, 2026

Uh oh!

0cc4m commented Apr 10, 2026

Uh oh!

jeffbolznv commented Apr 10, 2026

Uh oh!

0cc4m commented Apr 14, 2026

Uh oh!

Uh oh!

ggerganov commented Apr 14, 2026

Uh oh!

0cc4m commented Apr 14, 2026

Uh oh!

pwilkin commented Apr 14, 2026

Uh oh!

CISC commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants