Fix TensorPrimitives.IndexOfMax#127454
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-numerics |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR refactors the IndexOfMin/Max* tensor primitives to fix incorrect indices (notably for small element sizes) by introducing a specialized min/max operator interface and a new IndexOfMinMaxCore implementation with multiple vectorized paths.
Changes:
- Replaced
IIndexOfOperatorwithIIndexOfMinMaxOperatorand moved/rewroteIndexOfMinMaxCoreinto shared code. - Implemented specialized Vector128/256/512 routines for
sizeof(T)= 1/2/4/8 plus a naive fallback. - Added regression tests for
IndexOfMaxonbyte/ushortwhen the correct index exceeds the element type’s max value.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs | Adds regression tests for IndexOfMax returning indices > 255 and > 65535. |
| src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMinMagnitude.cs | Updates operator to the new interface and comparison/aggregation model. |
| src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMin.cs | Updates operator to the new interface and comparison/aggregation model. |
| src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMaxMagnitude.cs | Updates operator to the new interface and comparison/aggregation model. |
| src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.IndexOfMax.cs | Replaces per-method core logic with the shared core + new operator shape. |
| src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/Common/TensorPrimitives.IIndexOfOperator.cs | Introduces IIndexOfMinMaxOperator and the new shared vectorized implementations. |
| while (!span.IsEmpty) | ||
| { | ||
| // Compare 0 with 1 | ||
| tmpResult = Vector128.Shuffle(result.AsInt64(), Vector128.Create(1, 0)).As<long, T>(); | ||
| tmpIndex = Vector128.Shuffle(resultIndex.AsInt64(), Vector128.Create(1, 0)).As<long, T>(); | ||
| TIndexOfOperator.Invoke(ref result, tmpResult, ref resultIndex, tmpIndex); | ||
| Vector256<T> current; | ||
| if (span.Length >= Vector256<T>.Count) | ||
| { | ||
| current = Vector256.Create(span); | ||
| span = span.Slice(Vector256<T>.Count); | ||
| } | ||
| else | ||
| { | ||
| // Process a final back-shifted to cover remaining elements in x in one vector. | ||
| int start = x.Length - Vector256<T>.Count; | ||
| current = Vector256.Create(x.Slice(start)); | ||
| currentIndex = Vector256.Create(TInt.CreateChecked(start)) + Vector256<TInt>.Indices; | ||
| span = ReadOnlySpan<T>.Empty; | ||
| } |
There was a problem hiding this comment.
These hot loops repeatedly create/slice spans and use Vector*.Create(span) each iteration. Compared to the previous LoadUnsafe(ref, offset) style, this may inhibit bounds-check elimination and add overhead (extra slicing, length checks, and potentially less optimal codegen). Consider switching back to a ref+offset iteration pattern (ref T xRef + LoadUnsafe/Unsafe.Add) or otherwise restructuring the loop to minimize span slicing in the steady state.
| return sizeof(T) == 8 ? IndexOfMinMaxVector512Size4Plus<T, TOperator, ulong>(x) : | ||
| sizeof(T) == 4 ? IndexOfMinMaxVector512Size4Plus<T, TOperator, uint>(x) : | ||
| sizeof(T) == 2 ? IndexOfMinMaxVector512Size2<T, TOperator>(x) : | ||
| IndexOfMinMaxVector512Size1<T, TOperator>(x); |
There was a problem hiding this comment.
These paths could be shared between the 128-256 paths, as they are 1To4, 1To2, or 1To1
Effectively instead of returning result we should be returning a resultMask indicating which elements are modified. We then widen then to int for small types and add to the tracked 1/2/4 total results.
Fixes #124233
Tried to include feedback from #124274 (comment).
Summary of changes:
IIndexOfOperatorto specializedIIndexOfMinMaxOperator.IndexOfMinMaxCoredelegates to ten different methods:IndexOfMinMaxVector128/256/512Size4Pluswhensizeof(T)is 4 or 8. The result index fits in one vector.IndexOfMinMaxVector128/256/512Size2whensizeof(T)is 2. The result index fits in two vectors.IndexOfMinMaxVector128/256/512Size1whensizeof(T)is 1. The result index fits in four vectors.IndexOfMinMaxNaiveas fallback.IndexLessThanmethodsIsQuickReturnmethods are extracted out since they need to be implemented differently forIndexOfMaxNumberand friends (for [API Proposal]: Add missing Min/MaxNumber generic math APIs on TensorPrimitives #98862).