Skip to content

Fix TensorPrimitives.IndexOfMax#127454

Open
lilinus wants to merge 4 commits intodotnet:mainfrom
lilinus:fix-tensor-indexof
Open

Fix TensorPrimitives.IndexOfMax#127454
lilinus wants to merge 4 commits intodotnet:mainfrom
lilinus:fix-tensor-indexof

Conversation

@lilinus
Copy link
Copy Markdown
Contributor

@lilinus lilinus commented Apr 27, 2026

Fixes #124233

Tried to include feedback from #124274 (comment).

Summary of changes:

  • Change interface IIndexOfOperator to specialized IIndexOfMinMaxOperator.
  • IndexOfMinMaxCore delegates to ten different methods:
    • IndexOfMinMaxVector128/256/512Size4Plus when sizeof(T) is 4 or 8. The result index fits in one vector.
    • IndexOfMinMaxVector128/256/512Size2 when sizeof(T) is 2. The result index fits in two vectors.
    • IndexOfMinMaxVector128/256/512Size1 when sizeof(T) is 1. The result index fits in four vectors.
    • IndexOfMinMaxNaive as fallback.
  • For vector methods: the final aggregation is done by horizontal-aggregation values in the lanes. Then the corresponding index found by matching that value bitwise.
  • The search is done left-to-right so there is no need for the IndexLessThan methods
  • The IsQuickReturn methods are extracted out since they need to be implemented differently for IndexOfMaxNumber and friends (for [API Proposal]: Add missing Min/MaxNumber generic math APIs on TensorPrimitives #98862).

Copilot AI review requested due to automatic review settings April 27, 2026 14:09
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 27, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@lilinus lilinus changed the title Fix tensor indexof Fix TensorPrimitives.IndexOfMax Apr 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refactors the IndexOfMin/Max* tensor primitives to fix incorrect indices (notably for small element sizes) by introducing a specialized min/max operator interface and a new IndexOfMinMaxCore implementation with multiple vectorized paths.

Changes:

  • Replaced IIndexOfOperator with IIndexOfMinMaxOperator and moved/rewrote IndexOfMinMaxCore into shared code.
  • Implemented specialized Vector128/256/512 routines for sizeof(T) = 1/2/4/8 plus a naive fallback.
  • Added regression tests for IndexOfMax on byte/ushort when the correct index exceeds the element type’s max value.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs Adds regression tests for IndexOfMax returning indices > 255 and > 65535.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMinMagnitude.cs Updates operator to the new interface and comparison/aggregation model.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMin.cs Updates operator to the new interface and comparison/aggregation model.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMaxMagnitude.cs Updates operator to the new interface and comparison/aggregation model.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMax.cs Replaces per-method core logic with the shared core + new operator shape.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/Common/TensorPrimitives.IIndexOfOperator.cs Introduces IIndexOfMinMaxOperator and the new shared vectorized implementations.

Comment thread src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs Outdated
Comment thread src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs Outdated
Comment on lines +324 to +339
while (!span.IsEmpty)
{
// Compare 0 with 1
tmpResult = Vector128.Shuffle(result.AsInt64(), Vector128.Create(1, 0)).As<long, T>();
tmpIndex = Vector128.Shuffle(resultIndex.AsInt64(), Vector128.Create(1, 0)).As<long, T>();
TIndexOfOperator.Invoke(ref result, tmpResult, ref resultIndex, tmpIndex);
Vector256<T> current;
if (span.Length >= Vector256<T>.Count)
{
current = Vector256.Create(span);
span = span.Slice(Vector256<T>.Count);
}
else
{
// Process a final back-shifted to cover remaining elements in x in one vector.
int start = x.Length - Vector256<T>.Count;
current = Vector256.Create(x.Slice(start));
currentIndex = Vector256.Create(TInt.CreateChecked(start)) + Vector256<TInt>.Indices;
span = ReadOnlySpan<T>.Empty;
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hot loops repeatedly create/slice spans and use Vector*.Create(span) each iteration. Compared to the previous LoadUnsafe(ref, offset) style, this may inhibit bounds-check elimination and add overhead (extra slicing, length checks, and potentially less optimal codegen). Consider switching back to a ref+offset iteration pattern (ref T xRef + LoadUnsafe/Unsafe.Add) or otherwise restructuring the loop to minimize span slicing in the steady state.

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +41
return sizeof(T) == 8 ? IndexOfMinMaxVector512Size4Plus<T, TOperator, ulong>(x) :
sizeof(T) == 4 ? IndexOfMinMaxVector512Size4Plus<T, TOperator, uint>(x) :
sizeof(T) == 2 ? IndexOfMinMaxVector512Size2<T, TOperator>(x) :
IndexOfMinMaxVector512Size1<T, TOperator>(x);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These paths could be shared between the 128-256 paths, as they are 1To4, 1To2, or 1To1

Effectively instead of returning result we should be returning a resultMask indicating which elements are modified. We then widen then to int for small types and add to the tracked 1/2/4 total results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Numerics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TensorPrimitives.IndexOfMax produces incorrect results with vectorized paths

3 participants