Skip to content

Add Vector.Sum<T> for netstandard2.0#273

Closed
adamgauthier wants to merge 1 commit intodotnet:mainfrom
adamgauthier:add-vector-sum
Closed

Add Vector.Sum<T> for netstandard2.0#273
adamgauthier wants to merge 1 commit intodotnet:mainfrom
adamgauthier:add-vector-sum

Conversation

@adamgauthier
Copy link
Copy Markdown
Contributor

Summary

Adds Vector.Sum<T>(Vector<T>) to the System.Numerics.Vectors package for netstandard2.0 and .NET Framework consumers.

This API has been available in the .NET runtime since .NET 6.0 but was never included in this package. Other comparable static methods (Dot, Abs, Min, Max, SquareRoot, etc.) are all present.

Fixes #272

Implementation

Sum is implemented as Dot(value, Vector<T>.One) rather than a standalone method with its own ScalarAdd loop:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T Sum<T>(Vector<T> value) where T : struct
{
    return Dot(value, Vector<T>.One);
}

Why Dot?

The .NET Framework JIT has a hardcoded list of Vector methods it recognizes as intrinsics. Dot is on that list; a new Sum method is not. Adding [Intrinsic] to a new method doesn't help — the Framework JIT ignores it for methods it doesn't know about.

A standalone Sum using the ScalarAdd/typeof(T) pattern (same as DotProduct) would run the full managed code path on .NET Framework: a generic loop where each iteration does a 10-way typeof(T) chain. Benchmarks show this is 11-17x slower than routing through Dot.

By expressing Sum as Dot(value, One), we piggyback on Dot's existing intrinsification. The multiply-by-one is essentially free at the hardware level.

Runtime behavior

On .NET 6+, the package's _._ placeholder means this code is never used — the runtime's intrinsified Vector.Sum takes over via type forwarding. The Dot-based implementation only matters on .NET Framework, where it provides SIMD-accelerated performance.

We verified this: calling Vector.Sum through a netstandard2.0 library on .NET 6/8 produces identical performance to calling the runtime's Vector.Sum directly (0.85 ns vs 0.85 ns, ratio 1.00). The type forwarding works with zero overhead.

Benchmarks

All methods go through a netstandard2.0 class library compiled once, then consumed by each runtime — simulating the real scenario for library authors.

BenchmarkDotNet v0.14.0, X64 RyuJIT AVX2
OperationsPerInvoke=10000

What's being benchmarked

Each row corresponds to a different Sum strategy, all compiled into a netstandard2.0 library:

// Package Sum — our fix (calls Dot under the hood on Framework,
// type-forwards to runtime's Vector.Sum on .NET 6+)
float PackageSum(Vector<float> v) => Vector.Sum(v);

// Runtime Sum — calling the runtime's Vector.Sum directly (net6.0+ only)
float RuntimeSum(Vector<float> v) => Vector.Sum(v);

// Dot workaround — what people do today
float DotWorkaround(Vector<float> v) => Vector.Dot(v, Vector<float>.One);

// #if polyfill — compiles the #else branch on netstandard2.0, locked to manual loop
#if NET6_0_OR_GREATER
float IfdefPolyfill(Vector<float> v) => Vector.Sum(v);
#else
float IfdefPolyfill(Vector<float> v) { var s = 0f; for (...) s += v[i]; return s; }
#endif

// ScalarAdd chain — what a naive internal Sum implementation would do
float ScalarAddChain(Vector<float> v) => SumGeneric(v);
// (generic loop with typeof(T) dispatch on every addition)

Results

Method .NET Framework 4.8.1 .NET 6.0 .NET 8.0
Package Sum (Dot-based) 1.61 ns 0.87 ns 0.86 ns
Runtime Sum (direct) N/A 0.85 ns 0.85 ns
Dot(v, One) workaround 1.63 ns 0.33 ns 0.33 ns
#if polyfill (manual loop) 2.07 ns 2.06 ns 2.07 ns
ScalarAdd typeof chain 18.64 ns 14.75 ns 14.85 ns

Key observations:

  • Package Sum ≈ Runtime Sum on .NET 6/8 (0.86 vs 0.85 ns) — type forwarding resolves to the same intrinsified implementation with zero overhead.
  • Package Sum ≈ Dot workaround on .NET Framework (~1.6 ns) — both route through the same JIT-intrinsified Dot path.
  • Package Sum on .NET 6/8 (0.86 ns) is slightly slower than raw Dot (0.33 ns) because the runtime's Vector.Sum uses a different instruction sequence (shuffle+add vs dpps). This is a JIT codegen detail, not a package issue.
  • #if polyfill is locked at ~2.1 ns everywhere — a netstandard2.0 library compiles the #else manual loop, which never improves regardless of the consumer's runtime. The package fix is 2.4x faster on .NET 6/8.
  • ScalarAdd chain is 11-17x slower — this is why we use Dot instead of a standalone implementation.

The Vector.Sum<T> API has been available in the .NET runtime
since .NET 6.0 but was never included in this package.
Implements Sum via Dot(value, Vector<T>.One) so that the
.NET Framework JIT can intrinsify the call through Dot.

Fixes dotnet#272

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adamgauthier adamgauthier requested a review from a team as a code owner April 16, 2026 05:42
@adamgauthier
Copy link
Copy Markdown
Contributor Author

@dotnet-policy-service agree company="Microsoft"

@tannergooding
Copy link
Copy Markdown
Member

The api surface for this package is effectively frozen and not being versioned anymore. Users can polyfill the api using extension members if really desired and can otherwise define their own utility function trivially

Copy link
Copy Markdown
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not meet the contribution bar: https://github.com/dotnet/maintenance-packages?tab=readme-ov-file#contribution-bar

We only consider fixes that unblock critical issues

  • New features and APIs are not accepted
  • Infrastructure changes and other non-functional changes are considered

@adamgauthier
Copy link
Copy Markdown
Contributor Author

Thanks @tannergooding for the explanation here and the additional context offline.

Closing this, my motivation was that adding Sum<T> would let netstandard2.0 consumers transparently pick up the inbox implementation on modern .NET (the way Vector<T> itself appears to "just work" today via assembly-name unification). But after digging in, that intuition doesn't hold: the netstandard ref assembly version is permanently frozen so new APIs can't be added without breaking the contract (ericstj's comment, corefx#29182), and even the unification I was relying on is more fragile than I realized.

The officially supported path is to multi-target and let the modern TFM bind directly to the inbox type, which is the normal and expected scenario for maximizing performance. On my side, moving the library to C# 14 also let me drop all the #if ceremony via downlevel extension-member polyfills, which removed most of the friction that pushed me toward this PR in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector.Sum<T> missing from System.Numerics.Vectors package

2 participants