cpu: fix ARM NEON nvfp4 dot product on non-dotprod targets by richarddd · Pull Request #21559 · ggml-org/llama.cpp

richarddd · 2026-04-07T13:49:37Z

Overview

Fix incorrect nvfp4 vec_dot results on ARM targets without dotprod support. The ggml_vdotq_s32 fallback produces different per-lane grouping than native vdotq_s32. The total sum is identical but individual lane values differ. Since nvfp4 applies different scales per sub-block using per-lane values, the fallback produces wrong results.

Uses native vdotq_s32 directly when dotprod is available (unchanged fast path), and a per-sub-block 8-wide path on non-dotprod targets.

Additional information

Related to #21455

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, used AI to confirm and validate the fix as well as reproduce the error in an isolated test

0cc4m

I can confirm this fixes the issue on DGX Spark.

0cc4m · 2026-04-14T10:59:47Z

@ggml-org/maintainers Another review needed.

…g#21559)

fix ARM NEON nvfp4 dot product on non-dotprod targets

1e83428

richarddd requested a review from ggerganov as a code owner April 7, 2026 13:49

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 7, 2026

0cc4m approved these changes Apr 10, 2026

View reviewed changes

ngxson approved these changes Apr 14, 2026

View reviewed changes

ggerganov merged commit 2e05f06 into ggml-org:master Apr 14, 2026
47 checks passed

mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026

ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (ggml-or…

36b3eaf

…g#21559)

ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026

ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (ggml-or…

01f3ea0

…g#21559)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026

ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (ggml-or…

e5498e1

…g#21559)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu: fix ARM NEON nvfp4 dot product on non-dotprod targets#21559

cpu: fix ARM NEON nvfp4 dot product on non-dotprod targets#21559
ggerganov merged 1 commit intoggml-org:masterfrom
richarddd:fix/arm-nvfp4-dotprod-fallback

richarddd commented Apr 7, 2026

Uh oh!

0cc4m left a comment

Uh oh!

0cc4m commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

richarddd commented Apr 7, 2026

Overview

Additional information

Requirements

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

0cc4m commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants