Use safe Span.Slice loop pattern in Enumerable.SumSignedIntegersVectorized#127429
Merged
EgorBo merged 6 commits intodotnet:mainfrom Apr 27, 2026
Merged
Use safe Span.Slice loop pattern in Enumerable.SumSignedIntegersVectorized#127429EgorBo merged 6 commits intodotnet:mainfrom
EgorBo merged 6 commits intodotnet:mainfrom
Conversation
…rized Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
This was referenced Apr 25, 2026
Hoist `overflowTracking` out of the unrolled main loop and the tail loop so a single `vptest + jne` runs once after all data is processed instead of every 4 vectors. The sign-bit overflow trick is unchanged the bits accumulated across the whole input still land in the same lanes, so a single final test is sufficient.
Also unifies the previous separate `if (... >= Count) { Vector<T> overflowTracking = Zero; do { ... } while (...); if (...) Throw; }` tail block into the same shared `overflowTracking` chain as the main loop, which lets us drop the `do..while` and the second sign-bit test entirely.
Microbenchmark on Ryzen 9 7950X (AVX-512, `int[].Sum`):
```n| N | Original | This PR | Ratio |
|------- |---------:|---------:|------:|
| 50 | 4.85 ns | 4.71 ns | 0.97x |
| 100 | 6.11 ns | 5.87 ns | 0.96x |
| 100000 | 4.10 us | 3.77 us | 0.92x |
```n
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment was marked as outdated.
This comment was marked as outdated.
ed424bd to
dd072a0
Compare
EgorBo
added a commit
that referenced
this pull request
Apr 26, 2026
As part of the "remove unsafe" work, we're introducing lots of new `Span.Slice`, `Vector.Create`, etc calls. It turns out these negatively impact on the inliner time budget and lead to bad regressions (I've hit it in #127429). <img width="1268" height="182" alt="{B127EBE7-9CB9-4219-BD0D-9B4FE009E9CC}" src="https://github.com/user-attachments/assets/4507d429-4abc-4f59-bcc4-11a6a719f541" /> I suggest we exclude these from the budget check just like we already do for small methods. Just a few [hits](MihuBot/runtime-utils#1864) with PMI jit-diffs.
Member
Author
|
@EgorBot -amd -intel -arm using System.Linq;
using BenchmarkDotNet.Attributes;
public class SumBench
{
// Cover edge cases (below Vector<T>.Count*4 dispatch threshold), main-loop boundaries
// around Vector<T>.Count multiples on V128/V256/V512 hosts, and L1/L2/RAM-resident sizes.
[Params(4, 8, 16, 32, 50, 100, 256, 1024, 4096, 16384, 65536, 1_000_000)]
public int N;
private int[] _ints = null!;
private long[] _longs = null!;
[GlobalSetup]
public void Setup()
{
_ints = Enumerable.Range(0, N).ToArray();
_longs = Enumerable.Range(0, N).Select(i => (long)i).ToArray();
}
[Benchmark]
public int SumInt() => _ints.Sum();
[Benchmark]
public long SumLong() => _longs.Sum();
} |
MihaZupan
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
This PR was authored with assistance from GitHub Copilot (AI-generated content).
This PR replaces unsafe code with safe one. Also, slightly changes the code to check overflow status in the end so we don't pay for overflow check every iteration and assume overflow happens rarely and when it does - the cost of the exception is so big that it doesn't make sense to quit earlier.
Perf
Benchmark:
@EgorBotrequest sweepingint[].Sum()andlong[].Sum()overN ∈ {4, 8, 16, 32, 50, 100, 256, 1024, 4096, 16384, 65536, 1_000_000}. Full results: EgorBot/Benchmarks#155.TL;DR: neutral on most sizes, with notable wins around the small-vectorized regime (N=32-256). No regressions.
Highlights (
maintime vs PR time, ratio = main/PR; values >1.00 mean PR is faster):SumIntSumIntSumIntSumIntSumLongSumIntSumIntSumLongAll other size/workload combinations are within run-to-run noise (ratio in [0.99, 1.03]).
The wins at N=32-256 are mostly the deferred-overflow-check paying off (the test+branch is a meaningful fraction of cycles when the loop body is small and only runs a few times). The 1.51x at N=32 on M4 is amplified because Apple Silicon also benefits from the simpler control flow (one fewer branch per main-loop iter).