Add SquareRoot and Logarithm to SVE microbenchmark#5021
Add SquareRoot and Logarithm to SVE microbenchmark#5021LoopedBard3 merged 3 commits intodotnet:mainfrom
Conversation
|
Hi @LoopedBard3 , here is another benchmark from the series of SVE benchmarks for review. Kindly take a look. |
There was a problem hiding this comment.
Pull Request Overview
This PR adds two new SVE (Scalable Vector Extension) benchmark files for Arm64 architecture: one for square root operations and one for logarithm operations. These benchmarks compare scalar implementations against Vector128 and SVE-based vectorized implementations.
- Implements SquareRoot benchmarks with scalar, Vector128, and two SVE variants
- Implements Logarithm benchmarks with scalar, Vector128, and SVE implementations using Arm's optimized-routines algorithm
- Both benchmarks include verification logic to ensure correctness of vectorized implementations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/benchmarks/micro/sve/SquareRoot.cs | Adds benchmark comparing scalar, Vector128, and SVE square root implementations with tail handling strategies |
| src/benchmarks/micro/sve/Logarithm.cs | Adds benchmark for logarithm calculation using optimized polynomial approximation from Arm's optimized-routines library |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| // Since pLoop is a Vector<uint> predicate, we load the input as uint array, | ||
| // then cast it back to Vector<float>. | ||
| // This is preferrable to casting pLoop to Vector<float>, which would cause |
There was a problem hiding this comment.
Corrected spelling of 'preferrable' to 'preferable'.
| // This is preferrable to casting pLoop to Vector<float>, which would cause | |
| // This is preferable to casting pLoop to Vector<float>, which would cause |
| // Since pLoop is a Vector<uint> predicate, we load the input as uint array, | ||
| // then cast it back to Vector<float>. | ||
| // This is preferrable to casting pLoop to Vector<float>, which would cause | ||
| // a unnecessary conversion from predicate to vector in the codegen. |
There was a problem hiding this comment.
Corrected grammar: 'a unnecessary' should be 'an unnecessary'.
| // a unnecessary conversion from predicate to vector in the codegen. | |
| // an unnecessary conversion from predicate to vector in the codegen. |
| Vector128<uint> u = AdvSimd.And(u_off, Vector128.Create(0x007fffffu)); | ||
| u = AdvSimd.Add(u, offVec); | ||
|
|
||
| Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f)); |
There was a problem hiding this comment.
Using Sve.Subtract with Vector128 types is incorrect. Should use AdvSimd.Subtract instead to match the pattern used elsewhere in this method (lines 105, 114, 118) where AdvSimd operations are used for Vector128 types.
| Vector128<float> r = Sve.Subtract(u.AsSingle(), Vector128.Create(1.0f)); | |
| Vector128<float> r = AdvSimd.Subtract(u.AsSingle(), Vector128.Create(1.0f)); |
There was a problem hiding this comment.
I think this is a fair question, @ylpoonlg is there a reason to choose one over another?
There was a problem hiding this comment.
This was a mistake sorry. It only worked because Sve is a subclass of AdvSimd, but AdvSimd is the correct one.
This PR adds the following two SVE benchmarks:
SquareRoot
Results on Nvidia Grace
Logarithm
The algorithm is ported from: https://github.com/ARM-software/optimized-routines/blob/v25.07/math/aarch64/sve/logf.c.
The accuracy is around 3ULP.
@dotnet/arm64-contrib @SwapnilGaikwad