Skip to content

cpu: fix ARM NEON nvfp4 dot product on non-dotprod targets#21559

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
richarddd:fix/arm-nvfp4-dotprod-fallback
Apr 14, 2026
Merged

cpu: fix ARM NEON nvfp4 dot product on non-dotprod targets#21559
ggerganov merged 1 commit intoggml-org:masterfrom
richarddd:fix/arm-nvfp4-dotprod-fallback

Conversation

@richarddd
Copy link
Copy Markdown
Contributor

Overview

Fixes #21462

Fix incorrect nvfp4 vec_dot results on ARM targets without dotprod support. The ggml_vdotq_s32 fallback produces different per-lane grouping than native vdotq_s32. The total sum is identical but individual lane values differ. Since nvfp4 applies different scales per sub-block using per-lane values, the fallback produces wrong results.

Uses native vdotq_s32 directly when dotprod is available (unchanged fast path), and a per-sub-block 8-wide path on non-dotprod targets.

Additional information

Related to #21455

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, used AI to confirm and validate the fix as well as reproduce the error in an isolated test

@richarddd richarddd requested a review from ggerganov as a code owner April 7, 2026 13:49
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 7, 2026
Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm this fixes the issue on DGX Spark.

@0cc4m
Copy link
Copy Markdown
Contributor

0cc4m commented Apr 14, 2026

@ggml-org/maintainers Another review needed.

@ggerganov ggerganov merged commit 2e05f06 into ggml-org:master Apr 14, 2026
47 checks passed
mengqin pushed a commit to mengqin/llama.cpp that referenced this pull request Apr 20, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: NVFP4 Type CPU Backend high error on ARM

4 participants