vulkan: Support GGML_TYPE_NVFP4#21455
Conversation
|
This also needs a rebase. |
|
The CM pipeline has hit a sentinel mismatch with ROPE twice in a row now, but this PR doesn't touch any code that could affect that. Bad luck? Also, another conflict. |
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.
|
The sentinel error is weird, and I think I saw something similar with an ADD test in another PR. There should be explicit bounds checking in all the rope variants. Maybe some kind of synchronization issue, though I'm not sure how... |
|
@ggml-org/maintainers Another approval needed. |
|
Did you notice on which runner was the sentinel error for the ROPE test? I wonder if it always happens on @taronaeo's self-hosted runner like in this case: https://github.com/ggml-org/llama.cpp/actions/runs/24360207449/job/71154501348 |
|
I think it was the Nvidia Coopmat1 runner. |
|
There was also a CPU crash on OUT_PROD: |
That was fixed with #21716 |
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.
This reverts commit 6a6780a.
This reverts commit 6a6780a.
Overview
This adds nvfp4 support for get_rows, dequant, and mul_mat(_id). For mul_mat, it does not add support for the dp4/q8_1 path, it's all via fp16/fp32.
Additional information
I haven't done a ton of perf tuning, partly to get the basic functionality in first and partly because my normal test system is temporarily out of commission.
Requirements