vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron#18295
vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron#182950cc4m merged 3 commits intoggml-org:masterfrom
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Ack on the ggml-backend changes
|
From my side it looks fine, but the Vulkan Mac CI is reporting an issue. Can you look into that? |
Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified.
4a17402 to
75bcc84
Compare
|
I don't know why the mac system is failing. It's in test cases that should be fused so I don't think it's a fluke, but I can't reproduce it locally on NVIDIA or lavapipe and I can't find anything from code inspection. While trying I did find that sometimes there are ties that lead to spurious failures, so I've updated the tests to avoid that. I doubt this is related to the mac failures. If it still fails in CI I'll probably need to just disable this fusion for moltenvk. |
bfbd40e to
03b18c9
Compare
03b18c9 to
86df563
Compare
|
I tried a couple experiments through CI, but don't have a workaround for the moltenvk failures. I've disabled the new fusion for moltenvk. |
…gml-org#18295) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk
…(#18295) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk
…gml-org#18295) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk
…gml-org#18295) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk
Also handle GGML_OP_SCALE at the end (nemotron, deepseek2).
Fewer pipeline variants and spec constants, just use push constants.
In test_topk_moe, change exp_probs_b to be 1D, matching real networks.
Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified.