Fix TensorPrimitives.IndexOfMax by lilinus · Pull Request #127454 · dotnet/runtime

lilinus · 2026-04-27T14:09:54Z

Fixes #124233

Tried to include feedback from #124274 (comment).

Summary of changes:

Change interface IIndexOfOperator to specialized IIndexOfMinMaxOperator.
IndexOfMinMaxCore delegates to ten different methods:
- IndexOfMinMaxVector128/256/512Size4Plus when sizeof(T) is 4 or 8. The result index fits in one vector.
- IndexOfMinMaxVector128/256/512Size2 when sizeof(T) is 2. The result index fits in two vectors.
- IndexOfMinMaxVector128/256/512Size1 when sizeof(T) is 1. The result index fits in four vectors.
- IndexOfMinMaxNaive as fallback.
For vector methods: the final aggregation is done by horizontal-aggregation values in the lanes. Then the corresponding index found by matching that value bitwise.
The search is done left-to-right so there is no need for the IndexLessThan methods
The IsQuickReturn methods are extracted out since they need to be implemented differently for IndexOfMaxNumber and friends (for [API Proposal]: Add missing Min/MaxNumber generic math APIs on TensorPrimitives #98862).

dotnet-policy-service · 2026-04-27T14:12:47Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refactors the IndexOfMin/Max* tensor primitives to fix incorrect indices (notably for small element sizes) by introducing a specialized min/max operator interface and a new IndexOfMinMaxCore implementation with multiple vectorized paths.

Changes:

Replaced IIndexOfOperator with IIndexOfMinMaxOperator and moved/rewrote IndexOfMinMaxCore into shared code.
Implemented specialized Vector128/256/512 routines for sizeof(T) = 1/2/4/8 plus a naive fallback.
Added regression tests for IndexOfMax on byte/ushort when the correct index exceeds the element type’s max value.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs	Adds regression tests for `IndexOfMax` returning indices > 255 and > 65535.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMinMagnitude.cs	Updates operator to the new interface and comparison/aggregation model.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMin.cs	Updates operator to the new interface and comparison/aggregation model.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMaxMagnitude.cs	Updates operator to the new interface and comparison/aggregation model.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMax.cs	Replaces per-method core logic with the shared core + new operator shape.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/Common/TensorPrimitives.IIndexOfOperator.cs	Introduces `IIndexOfMinMaxOperator` and the new shared vectorized implementations.

Copilot · 2026-04-27T14:19:50Z

+            while (!span.IsEmpty)
            {
-                // Compare 0 with 1
-                tmpResult = Vector128.Shuffle(result.AsInt64(), Vector128.Create(1, 0)).As<long, T>();
-                tmpIndex = Vector128.Shuffle(resultIndex.AsInt64(), Vector128.Create(1, 0)).As<long, T>();
-                TIndexOfOperator.Invoke(ref result, tmpResult, ref resultIndex, tmpIndex);
+                Vector256<T> current;
+                if (span.Length >= Vector256<T>.Count)
+                {
+                    current = Vector256.Create(span);
+                    span = span.Slice(Vector256<T>.Count);
+                }
+                else
+                {
+                    // Process a final back-shifted to cover remaining elements in x in one vector.
+                    int start = x.Length - Vector256<T>.Count;
+                    current = Vector256.Create(x.Slice(start));
+                    currentIndex = Vector256.Create(TInt.CreateChecked(start)) + Vector256<TInt>.Indices;
+                    span = ReadOnlySpan<T>.Empty;
+                }


These hot loops repeatedly create/slice spans and use Vector*.Create(span) each iteration. Compared to the previous LoadUnsafe(ref, offset) style, this may inhibit bounds-check elimination and add overhead (extra slicing, length checks, and potentially less optimal codegen). Consider switching back to a ref+offset iteration pattern (ref T xRef + LoadUnsafe/Unsafe.Add) or otherwise restructuring the loop to minimize span slicing in the steady state.

tannergooding · 2026-04-28T16:59:47Z

+                return sizeof(T) == 8 ? IndexOfMinMaxVector512Size4Plus<T, TOperator, ulong>(x) :
+                    sizeof(T) == 4 ? IndexOfMinMaxVector512Size4Plus<T, TOperator, uint>(x) :
+                    sizeof(T) == 2 ? IndexOfMinMaxVector512Size2<T, TOperator>(x) :
+                    IndexOfMinMaxVector512Size1<T, TOperator>(x);


These paths could be shared between the 128-256 paths, as they are 1To4, 1To2, or 1To1

Effectively instead of returning result we should be returning a resultMask indicating which elements are modified. We then widen then to int for small types and add to the tracked 1/2/4 total results.

lilinus added 3 commits April 27, 2026 15:44

Fix and refactor IndexOfMinMaxCore

8b76b08

Add test cases

2bf8dee

Minor fix

97a1aea

Copilot AI review requested due to automatic review settings April 27, 2026 14:09

github-actions Bot added the area-System.Numerics label Apr 27, 2026

dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 27, 2026

Copilot started reviewing on behalf of lilinus April 27, 2026 14:11 View session

lilinus changed the title ~~Fix tensor indexof~~ Fix TensorPrimitives.IndexOfMax Apr 27, 2026

Copilot AI reviewed Apr 27, 2026

View reviewed changes

FB

2304dc9

This was referenced Apr 28, 2026

Android arm32 device not found (armeabi-v7a architecture unavailable) #125440

Open

[android] Got a SIGSEGV while executing native code. at System.DateTime.get_Now() #127500

Open

tannergooding reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TensorPrimitives.IndexOfMax#127454

Fix TensorPrimitives.IndexOfMax#127454
lilinus wants to merge 4 commits intodotnet:mainfrom
lilinus:fix-tensor-indexof

lilinus commented Apr 27, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

tannergooding Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lilinus commented Apr 27, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants