Add SquareRoot and Logarithm to SVE microbenchmark by ylpoonlg · Pull Request #5021 · dotnet/performance

ylpoonlg · 2025-10-27T15:31:33Z

This PR adds the following two SVE benchmarks:

SquareRoot

Results on Nvidia Grace

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	14.356 ns	0.0053 ns	0.0047 ns	14.354 ns	14.349 ns	14.366 ns	-
Vector128SquareRoot	15	5.285 ns	0.1041 ns	0.0974 ns	5.276 ns	5.177 ns	5.498 ns	-
SveSquareRoot	15	2.717 ns	0.0254 ns	0.0225 ns	2.720 ns	2.671 ns	2.743 ns	-
SveTail	15	5.406 ns	0.0170 ns	0.0159 ns	5.400 ns	5.393 ns	5.441 ns	-
Scalar	127	132.888 ns	0.3102 ns	0.2901 ns	132.783 ns	132.602 ns	133.404 ns	-
Vector128SquareRoot	127	35.431 ns	0.0305 ns	0.0286 ns	35.419 ns	35.402 ns	35.478 ns	-
SveSquareRoot	127	32.119 ns	0.0049 ns	0.0041 ns	32.120 ns	32.114 ns	32.125 ns	-
SveTail	127	35.485 ns	0.0202 ns	0.0189 ns	35.479 ns	35.467 ns	35.529 ns	-
Scalar	527	557.405 ns	0.3753 ns	0.3511 ns	557.235 ns	557.096 ns	558.034 ns	-
Vector128SquareRoot	527	141.569 ns	0.0777 ns	0.0727 ns	141.537 ns	141.494 ns	141.729 ns	-
SveSquareRoot	527	138.183 ns	0.0420 ns	0.0372 ns	138.173 ns	138.143 ns	138.268 ns	-
SveTail	527	141.521 ns	0.0509 ns	0.0451 ns	141.503 ns	141.473 ns	141.610 ns	-
Scalar	10015	10,589.862 ns	6.5392 ns	5.1054 ns	10,587.659 ns	10,586.120 ns	10,601.373 ns	-
Vector128SquareRoot	10015	2,649.680 ns	0.9145 ns	0.8107 ns	2,649.360 ns	2,648.837 ns	2,651.312 ns	-
SveSquareRoot	10015	2,648.157 ns	0.6936 ns	0.6149 ns	2,648.013 ns	2,647.016 ns	2,649.408 ns	-
SveTail	10015	2,650.875 ns	0.7506 ns	0.6268 ns	2,650.927 ns	2,649.866 ns	2,652.168 ns	-

Logarithm

The algorithm is ported from: https://github.com/ARM-software/optimized-routines/blob/v25.07/math/aarch64/sve/logf.c.
The accuracy is around 3ULP.

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	15	48.38 ns	0.241 ns	0.188 ns	48.45 ns	47.96 ns	48.57 ns	-
Vector128Logarithm	15	17.19 ns	0.021 ns	0.019 ns	17.19 ns	17.16 ns	17.22 ns	-
SveLogarithm	15	20.06 ns	0.050 ns	0.046 ns	20.07 ns	19.99 ns	20.12 ns	-
Scalar	127	403.73 ns	3.722 ns	3.299 ns	401.57 ns	401.29 ns	410.48 ns	-
Vector128Logarithm	127	92.86 ns	0.038 ns	0.034 ns	92.84 ns	92.82 ns	92.93 ns	-
SveLogarithm	127	104.63 ns	0.165 ns	0.154 ns	104.64 ns	104.27 ns	104.86 ns	-
Scalar	527	1,661.68 ns	2.010 ns	1.570 ns	1,661.00 ns	1,660.82 ns	1,666.26 ns	-
Vector128Logarithm	527	359.11 ns	0.301 ns	0.281 ns	358.92 ns	358.80 ns	359.56 ns	-
SveLogarithm	527	399.49 ns	1.053 ns	0.985 ns	399.69 ns	397.88 ns	401.24 ns	-
Scalar	10015	31,281.10 ns	43.606 ns	38.656 ns	31,271.90 ns	31,240.84 ns	31,367.99 ns	-
Vector128Logarithm	10015	6,708.43 ns	2.011 ns	1.783 ns	6,707.70 ns	6,705.88 ns	6,712.16 ns	-
SveLogarithm	10015	7,845.96 ns	16.922 ns	15.829 ns	7,843.81 ns	7,818.20 ns	7,875.49 ns	-

@dotnet/arm64-contrib @SwapnilGaikwad

SwapnilGaikwad · 2025-10-29T08:06:18Z

Hi @LoopedBard3 , here is another benchmark from the series of SVE benchmarks for review. Kindly take a look.
Thanks.

Copilot

Pull Request Overview

This PR adds two new SVE (Scalable Vector Extension) benchmark files for Arm64 architecture: one for square root operations and one for logarithm operations. These benchmarks compare scalar implementations against Vector128 and SVE-based vectorized implementations.

Implements SquareRoot benchmarks with scalar, Vector128, and two SVE variants
Implements Logarithm benchmarks with scalar, Vector128, and SVE implementations using Arm's optimized-routines algorithm
Both benchmarks include verification logic to ensure correctness of vectorized implementations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/benchmarks/micro/sve/SquareRoot.cs	Adds benchmark comparing scalar, Vector128, and SVE square root implementations with tail handling strategies
src/benchmarks/micro/sve/Logarithm.cs	Adds benchmark for logarithm calculation using optimized polynomial approximation from Arm's optimized-routines library

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-29T17:23:29Z

+                {
+                    // Since pLoop is a Vector<uint> predicate, we load the input as uint array,
+                    // then cast it back to Vector<float>.
+                    // This is preferrable to casting pLoop to Vector<float>, which would cause


Corrected spelling of 'preferrable' to 'preferable'.

Suggested change

// This is preferrable to casting pLoop to Vector<float>, which would cause

// This is preferable to casting pLoop to Vector<float>, which would cause

Copilot · 2025-10-29T17:23:30Z

+                    // Since pLoop is a Vector<uint> predicate, we load the input as uint array,
+                    // then cast it back to Vector<float>.
+                    // This is preferrable to casting pLoop to Vector<float>, which would cause
+                    // a unnecessary conversion from predicate to vector in the codegen.


Corrected grammar: 'a unnecessary' should be 'an unnecessary'.

Suggested change

// a unnecessary conversion from predicate to vector in the codegen.

// an unnecessary conversion from predicate to vector in the codegen.

Copilot · 2025-10-29T17:23:30Z

+                    Vector128<uint> u = AdvSimd.And(u_off, Vector128.Create(0x007fffffu));
+                    u = AdvSimd.Add(u, offVec);
+
+                    Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f));


Using Sve.Subtract with Vector128 types is incorrect. Should use AdvSimd.Subtract instead to match the pattern used elsewhere in this method (lines 105, 114, 118) where AdvSimd operations are used for Vector128 types.

Suggested change

Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f));

Vector128<float> r = AdvSimd.Subtract(u.AsSingle(), Vector128.Create(1.0f));

I think this is a fair question, @ylpoonlg is there a reason to choose one over another?

This was a mistake sorry. It only worked because Sve is a subclass of AdvSimd, but AdvSimd is the correct one.

Add SquareRoot and Logarithm to SVE microbenchmark

25e991d

ylpoonlg marked this pull request as ready for review October 27, 2025 15:33

LoopedBard3 requested review from DrewScoggins, LoopedBard3 and Copilot October 29, 2025 17:13

Copilot AI reviewed Oct 29, 2025

View reviewed changes

LoopedBard3 previously approved these changes Oct 29, 2025

View reviewed changes

Fix AdvSimd intrinsic call

d3a9f6a

ylpoonlg dismissed LoopedBard3’s stale review via d3a9f6a October 30, 2025 09:37

Fix typos in comments

1efb11a

LoopedBard3 approved these changes Oct 30, 2025

View reviewed changes

LoopedBard3 merged commit 4aa9b56 into dotnet:main Oct 30, 2025
73 of 80 checks passed

a74nh mentioned this pull request Apr 1, 2026

Improve Arm64 for .NET11 dotnet/runtime#121787

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SquareRoot and Logarithm to SVE microbenchmark#5021

Add SquareRoot and Logarithm to SVE microbenchmark#5021
LoopedBard3 merged 3 commits intodotnet:mainfrom
ylpoonlg:github-sqrt-log

ylpoonlg commented Oct 27, 2025

Uh oh!

SwapnilGaikwad commented Oct 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 29, 2025

Uh oh!

Copilot AI Oct 29, 2025

Uh oh!

Copilot AI Oct 29, 2025

Uh oh!

LoopedBard3 Oct 29, 2025

Uh oh!

ylpoonlg Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	// This is preferrable to casting pLoop to Vector<float>, which would cause
	// This is preferable to casting pLoop to Vector<float>, which would cause

	// a unnecessary conversion from predicate to vector in the codegen.
	// an unnecessary conversion from predicate to vector in the codegen.

	Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f));
	Vector128<float> r = AdvSimd.Subtract(u.AsSingle(), Vector128.Create(1.0f));

Conversation

ylpoonlg commented Oct 27, 2025

SquareRoot

Logarithm

Uh oh!

SwapnilGaikwad commented Oct 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

LoopedBard3 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ylpoonlg Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants