Skip to content

Use safe Span.Slice loop pattern in Enumerable.SumSignedIntegersVectorized#127429

Merged
EgorBo merged 6 commits intodotnet:mainfrom
EgorBo:copilot/safe-loop-sum
Apr 27, 2026
Merged

Use safe Span.Slice loop pattern in Enumerable.SumSignedIntegersVectorized#127429
EgorBo merged 6 commits intodotnet:mainfrom
EgorBo:copilot/safe-loop-sum

Conversation

@EgorBo
Copy link
Copy Markdown
Member

@EgorBo EgorBo commented Apr 25, 2026

Note

This PR was authored with assistance from GitHub Copilot (AI-generated content).

This PR replaces unsafe code with safe one. Also, slightly changes the code to check overflow status in the end so we don't pay for overflow check every iteration and assume overflow happens rarely and when it does - the cost of the exception is so big that it doesn't make sense to quit earlier.

Perf

Benchmark: @EgorBot request sweeping int[].Sum() and long[].Sum() over N ∈ {4, 8, 16, 32, 50, 100, 256, 1024, 4096, 16384, 65536, 1_000_000}. Full results: EgorBot/Benchmarks#155.

TL;DR: neutral on most sizes, with notable wins around the small-vectorized regime (N=32-256). No regressions.

Highlights (main time vs PR time, ratio = main/PR; values >1.00 mean PR is faster):

Hardware Workload N main PR main/PR
Apple M4 SumInt 32 2.73 ns 1.82 ns 1.51x
Apple M4 SumInt 16 1.11 ns 1.06 ns 1.05x
Apple M4 SumInt 65536 7.89 us 7.69 us 1.03x
Intel Xeon SumInt 32 4.53 ns 3.71 ns 1.22x
Intel Xeon SumLong 32 5.63 ns 4.60 ns 1.22x
Intel Xeon SumInt 50 6.04 ns 5.18 ns 1.17x
Intel Xeon SumInt 256 21.14 ns 19.26 ns 1.10x
Intel Xeon SumLong 1M 272.7 us 248.4 us 1.10x

All other size/workload combinations are within run-to-run noise (ratio in [0.99, 1.03]).

The wins at N=32-256 are mostly the deferred-overflow-check paying off (the test+branch is a meaningful fraction of cycles when the loop body is small and only runs a few times). The 1.51x at N=32 on M4 is amplified because Apple Silicon also benefits from the simpler control flow (one fewer branch per main-loop iter).

…rized

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This comment was marked as resolved.

@EgorBo

This comment was marked as outdated.

Hoist `overflowTracking` out of the unrolled main loop and the tail loop so a single `vptest + jne` runs once after all data is processed instead of every 4 vectors. The sign-bit overflow trick is unchanged  the bits accumulated across the whole input still land in the same lanes, so a single final test is sufficient.

Also unifies the previous separate `if (... >= Count) { Vector<T> overflowTracking = Zero; do { ... } while (...); if (...) Throw; }` tail block into the same shared `overflowTracking` chain as the main loop, which lets us drop the `do..while` and the second sign-bit test entirely.

Microbenchmark on Ryzen 9 7950X (AVX-512, `int[].Sum`):

```n| N      | Original | This PR  | Ratio |
|------- |---------:|---------:|------:|
| 50     | 4.85 ns  | 4.71 ns  | 0.97x |
| 100    | 6.11 ns  | 5.87 ns  | 0.96x |
| 100000 | 4.10 us  | 3.77 us  | 0.92x |
```n
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@EgorBo

This comment was marked as outdated.

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@EgorBo EgorBo force-pushed the copilot/safe-loop-sum branch from ed424bd to dd072a0 Compare April 26, 2026 12:22
@dotnet dotnet deleted a comment from dotnet-policy-service Bot Apr 26, 2026
EgorBo added a commit that referenced this pull request Apr 26, 2026
As part of the "remove unsafe" work, we're introducing lots of new
`Span.Slice`, `Vector.Create`, etc calls. It turns out these negatively
impact on the inliner time budget and lead to bad regressions (I've hit
it in #127429).

<img width="1268" height="182"
alt="{B127EBE7-9CB9-4219-BD0D-9B4FE009E9CC}"
src="https://github.com/user-attachments/assets/4507d429-4abc-4f59-bcc4-11a6a719f541"
/>

I suggest we exclude these from the budget check just like we already do
for small methods.

Just a few [hits](MihuBot/runtime-utils#1864)
with PMI jit-diffs.
Copilot AI review requested due to automatic review settings April 26, 2026 17:02
@EgorBo
Copy link
Copy Markdown
Member Author

EgorBo commented Apr 26, 2026

@EgorBot -amd -intel -arm

using System.Linq;
using BenchmarkDotNet.Attributes;

public class SumBench
{
    // Cover edge cases (below Vector<T>.Count*4 dispatch threshold), main-loop boundaries
    // around Vector<T>.Count multiples on V128/V256/V512 hosts, and L1/L2/RAM-resident sizes.
    [Params(4, 8, 16, 32, 50, 100, 256, 1024, 4096, 16384, 65536, 1_000_000)]
    public int N;

    private int[] _ints = null!;
    private long[] _longs = null!;

    [GlobalSetup]
    public void Setup()
    {
        _ints = Enumerable.Range(0, N).ToArray();
        _longs = Enumerable.Range(0, N).Select(i => (long)i).ToArray();
    }

    [Benchmark]
    public int SumInt() => _ints.Sum();

    [Benchmark]
    public long SumLong() => _longs.Sum();
}

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@EgorBo EgorBo marked this pull request as ready for review April 26, 2026 17:52
Copilot AI review requested due to automatic review settings April 26, 2026 20:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread src/libraries/System.Linq/src/System/Linq/Sum.cs
@EgorBo EgorBo merged commit 83f1a5d into dotnet:main Apr 27, 2026
95 checks passed
@EgorBo EgorBo deleted the copilot/safe-loop-sum branch April 27, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants