Simd & fill optimizations

#1579 notes some **unfinished business**:

> The `Simd` and `m128i` etc. type generation should be equivalent, but they're not in terms of code; the `Simd` impls currently use `fill` to avoid more `unsafe` code here.
>
> Notice from the above that `u32x4`, `u16x8` and `u8x16` are the same size as `u128` and `m128i` but cost about twice as much to generate here. This indicates the `fill` code may be sub-optimal.
>
> Additionally, the `m128i` impl performed even worse when transmuting a `u128` value (~4.3ns or +%130) which, as far as I can tell, is purely because the `u128` value is returned via `rax, rdx` while the `__m128i` value is returned via `rdx, r10` (with `rax` equal to the struct address). I don't understand this.

Optimizing `Fill` for such cases may not be possible without specialization, and even then it's unclear if we'd want to due to the implied value-breaking changes.

Optimizing SIMD impls would require either specialization or replacing the generic `Simd<$ty, LANES>` impls with a (large) number of specific impls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simd & fill optimizations #1628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Simd & fill optimizations #1628

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions