Skip to content

vulkan: Support GGML_TYPE_NVFP4#21455

Merged
0cc4m merged 1 commit intoggml-org:masterfrom
jeffbolznv:nvfp4
Apr 14, 2026
Merged

vulkan: Support GGML_TYPE_NVFP4#21455
0cc4m merged 1 commit intoggml-org:masterfrom
jeffbolznv:nvfp4

Conversation

@jeffbolznv
Copy link
Copy Markdown
Contributor

Overview

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.

Additional information

I haven't done a ton of perf tuning, partly to get the basic functionality in first and partly because my normal test system is temporarily out of commission.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES. I asked AI to do this, it did the boilerplate but it couldn't manage to get the indexing calculations right. I ended up rewriting all of the indexing calculations myself.

@jeffbolznv jeffbolznv requested a review from a team as a code owner April 5, 2026 03:43
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 5, 2026
Comment thread ggml/src/ggml-vulkan/vulkan-shaders/dequant_funcs.glsl
Comment thread ggml/src/ggml-vulkan/vulkan-shaders/types.glsl
@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Apr 9, 2026

This also needs a rebase.

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Apr 10, 2026

The CM pipeline has hit a sentinel mismatch with ROPE twice in a row now, but this PR doesn't touch any code that could affect that. Bad luck? Also, another conflict.

This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For
mul_mat, it does not add support for the dp4/q8_1 path, it's all via
fp16/fp32.
@jeffbolznv
Copy link
Copy Markdown
Contributor Author

The sentinel error is weird, and I think I saw something similar with an ADD test in another PR. There should be explicit bounds checking in all the rope variants. Maybe some kind of synchronization issue, though I'm not sure how...

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Apr 14, 2026

@ggml-org/maintainers Another approval needed.

@0cc4m 0cc4m merged commit 6a6780a into ggml-org:master Apr 14, 2026
45 of 48 checks passed
@ggerganov
Copy link
Copy Markdown
Member

Did you notice on which runner was the sentinel error for the ROPE test?

I wonder if it always happens on @taronaeo's self-hosted runner like in this case: https://github.com/ggml-org/llama.cpp/actions/runs/24360207449/job/71154501348

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Apr 14, 2026

I think it was the Nvidia Coopmat1 runner.

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Apr 14, 2026

There was also a CPU crash on OUT_PROD:

OUT_PROD(type_a=q8_0,type_b=f16,m=256,n=16,k=16,bs=[3,3],nr=[2,2],trans_b=0): not supported [CPU] 
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/ggml/src/ggml-cpu/ops.cpp:4371: fatal error
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(+0x17394)[0xff7c23c57394]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0xff7c23c5784c]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_abort+0x138)[0xff7c23c57a18]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(ggml_compute_forward_out_prod+0x508)[0xff7c23694c88]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(+0x17cc0)[0xff7c23637cc0]
/lib/aarch64-linux-gnu/libgomp.so.1(+0x1ba2c)[0xff7c235dba2c]
/lib/aarch64-linux-gnu/libc.so.6(+0x80398)[0xff7c23810398]
/lib/aarch64-linux-gnu/libc.so.6(+0xe9e9c)[0xff7c23879e9c]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(+0x17394)[0xff7c23c57394]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0xff7c23c5784c]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-base.so.0(ggml_abort+0x138)[0xff7c23c57a18]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(ggml_compute_forward_out_prod+0x508)[0xff7c23694c88]
/home/runner/work/llama.cpp/llama.cpp/build-ci-release/bin/libggml-cpu.so.0(+0x17cc0)[0xff7c23637cc0]
/lib/aarch64-linux-gnu/libgomp.so.1(+0x1ba2c)[0xff7c235dba2c]
/lib/aarch64-linux-gnu/libc.so.6(+0x80398)[0xff7c23810398]
/lib/aarch64-linux-gnu/libc.so.6(+0xe9e9c)[0xff7c23879e9c]

@CISC
Copy link
Copy Markdown
Member

CISC commented Apr 14, 2026

There was also a CPU crash on OUT_PROD:

That was fixed with #21716

mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For
mul_mat, it does not add support for the dp4/q8_1 path, it's all via
fp16/fp32.
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For
mul_mat, it does not add support for the dp4/q8_1 path, it's all via
fp16/fp32.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For
mul_mat, it does not add support for the dp4/q8_1 path, it's all via
fp16/fp32.
vkhaitan added a commit to vkhaitan/vllama.cpp that referenced this pull request Apr 27, 2026
vkhaitan added a commit to vkhaitan/vllama.cpp that referenced this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants