Add Scalar Intel hardware intrinsic functions

This proposal extends dotnet/runtime#23057 to additionally cover intrinsics for the `scalar` forms of the x86 hardware instructions.

## Rationale
Currently, CoreFX implicitly emits various SSE/SSE2 instructions when performing operations on scalar floating-point types. However, there is currently no way to explicitly emit specific optimized sequences of code to best take advantage of the underlying hardware.

For example, the `System.Math` and `System.MathF` APIs are currently implemented as FCALLs to the underlying C Runtime implementation of the corresponding method. The underlying C Runtime implementations are most frequently implemented as hand-optimized assembly or C/C++ intrinsics to ensure that they provide the best throughput possible and that they take advantage of newer instruction sets when available (`cos`/`cosf`, for example, frequently have one code path for SSE and one for FMA3).

Due to these methods depending on the underlying C Runtime implementation:
* We are at a significantly delayed cadence for getting bug fixes
  * We often have to add hacks to workaround these bugs as we find them, decreasing perf
* The implementations between platforms and architectures often differ
  * This leads to perf differences: https://github.com/dotnet/coreclr/issues/9373
  * This leads to input/output differences between Operating Systems (Windows, Linux, Mac, etc)
  * This leads to input/output differences between Platforms (x86, x64, ARM, etc)
* Updating these methods requires modifying the runtime

By providing scalar intrinsics for the Intel hardware functions it becomes much easier to implement these functions in managed code, which:
 * Means fewer workarounds as bugs can be fixed directly
 * Keeps perf is more consistent
 * Keeps input/output differences minimal or non-existing between operating systems
 * Helps keep input/output are minimal between platforms
 * Allows most bug fixes to be made independent of the runtime

Furthermore, with the addition of dotnet/runtime#23057, it may become more pertinent to also have these scalar intrinsics to ensure that the codegen when intermixing scalar and vectorized operations remains "optimal" for the end-users customized/hand-optimized algorithms.

## Proposed API
The current design in dotnet/runtime#23057 creates a class per instruction set and exposes methods such as `Vector128<double> Sse2.Sqrt(Vector128<double>)` which corresponds to the `__m128d _mm_sqrt_pd(__m128d)` C/C++ intrinsic`.

This would additionally extend the surface area to expose the scalar forms of all the instructions with the same name as the vector intrinsic, but with a `Scalar` postfix. This is required to differentiate between the vector and scalar APIs due to them taking the same types as their inputs.

For example, we would expose:
```csharp
public static class Sse2
{
    // __m128d _mm_add_sd(__m128d a, __m128d b);
    public static Vector128<double> AddScalar<double>(Vector128<double> left, Vector128<double> right);

    // ...

    // No corresponding C/C++ intrinsic, used when upper should be taken from `value`
    public static Vector128<double> SqrtScalar(Vector128<double> value);    

    // __m128d _mm_sqrt_sd(__m128d a, __m128d b)
    public static Vector128<double> SqrtScalar(Vector128<double> upper, Vector128<double> value);

    // ...
}
```

## Other Thoughts
Most of the remaining sections (Intended Audience, Semantics and Usage, Implementation Roadmap, etc) are the same as in dotnet/runtime#23057 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Scalar Intel hardware intrinsic functions #23315

Rationale

Proposed API

Other Thoughts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Scalar Intel hardware intrinsic functions #23315

Description

Rationale

Proposed API

Other Thoughts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions