Skip to content

Implement SVE2 non-temporal scatter stores#123892

Open
ylpoonlg wants to merge 1 commit intodotnet:mainfrom
ylpoonlg:github-sve2_scatterstore
Open

Implement SVE2 non-temporal scatter stores#123892
ylpoonlg wants to merge 1 commit intodotnet:mainfrom
ylpoonlg:github-sve2_scatterstore

Conversation

@ylpoonlg
Copy link
Contributor

@ylpoonlg ylpoonlg commented Feb 2, 2026

Contributes to #122033 .

@dotnet/arm64-contrib @a74nh

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Feb 2, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member

🤖 Copilot Code Review — PR #123892

Holistic Assessment

Motivation: This PR implements SVE2 non-temporal scatter store intrinsics as part of the larger SVE2 API work (issue #122033). The APIs were approved in issue #94023 and have clear justification — they expose ARM SVE2 hardware capabilities for scatter stores with cache-bypassing behavior.

Approach: The implementation follows established patterns from existing SVE1 scatter intrinsics and SVE2 gather intrinsics. The codegen correctly distinguishes between index-based variants (which need LSL scaling) and byte-offset variants (which don't).

Summary: ⚠️ Needs Human Review. The code appears correct but has a few items that warrant human verification: (1) register ordering in the emitted instructions, and (2) test coverage for all new API overloads. The implementation is consistent with existing patterns but this is a large API surface addition.


Detailed Findings

✅ Codegen Logic — Index scaling is correct

The codegen correctly applies LSL shifts only to index-based variants:

  • Scatter16BitNarrowingNonTemporal → LSL 1 (16-bit = 2 bytes) ✓
  • Scatter32BitNarrowingNonTemporal → LSL 2 (32-bit = 4 bytes) ✓
  • ScatterNonTemporal → LSL 3 (64-bit = 8 bytes) ✓
  • Scatter8BitNarrowingNonTemporal — has no 4-operand index variant, only vector addresses (correct since 8-bit = 1 byte, no scaling needed)
  • All WithByteOffsets* variants — no LSL applied (correct)

This matches the pattern used for NI_Sve2_GatherVector*NonTemporal intrinsics.

✅ Register Ordering — Matches SVE2 instruction encoding

The register swap (op3Reg, op2Reg) for the 4-operand form is consistent with the SVE2 gather non-temporal implementation at line ~2369. The SVE2 non-temporal scatter/gather instructions use a different operand encoding than SVE1 scatter/gather.

✅ Template Bug Fix — Correct

The fix from typeof(Sve) to typeof({Isa}) in SveScatterVectorBases.template is necessary for reflection-based tests to work with Sve2 intrinsics.

✅ API Surface — Consistent with approved proposal

The APIs follow the naming convention from issue #122033, correctly adding "NonTemporal" suffix to distinguish from existing SVE1 ScatterXxx APIs. The 32-bit address variants are commented out with reference to issue #103297.

⚠️ Test Coverage — Verify test generation runs correctly

The test definitions in Sve2Tests.cs look comprehensive, but the test templates use reflection which failed with the unfixed template. A human reviewer should verify:

  1. The generated tests compile and run on SVE2 hardware
  2. All 55 new API overloads are covered by tests (spot-check the commented-out test entries match commented-out APIs)

💡 Minor — Consider adding assertion for 8-bit cases

In the codegen, NI_Sve2_Scatter8BitNarrowingNonTemporal doesn't apply any LSL shift (correct since it's 1 byte), but this falls through silently. Consider adding an explicit // no shift needed for 8-bit comment for clarity, or an assert that validates the intrinsic ID in the no-shift fallthrough case.


Review conducted with GPT-5.1-Codex model. Gemini-3-Pro timed out after 4+ minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants