provide Standard for x86 __m128/256i on stable Rust, add 128xN/sizexN SIMD types#1162
provide Standard for x86 __m128/256i on stable Rust, add 128xN/sizexN SIMD types#1162TheIronBorn wants to merge 4 commits intorust-random:masterfrom
Conversation
|
Looks good! I wonder how to document this, it is not very discoverable at the moment. Should we document this in the README? |
|
Hmm. We don't have any documentation on |
dhardy
left a comment
There was a problem hiding this comment.
The use of unsafe needs attention; after that I'd like to do another review.
|
I don't think we need the |
|
The extra internal NE trait might reduce code duplication for future architectures, but it's still probably minimal |
|
|
|
Noticed we mention SIMD |
src/distributions/integer.rs
Outdated
| #[cfg(target_arch = "x86")] use core::arch::x86::*; | ||
| #[cfg(target_arch = "x86_64")] use core::arch::x86_64::*; |
There was a problem hiding this comment.
We only want two items, right? I'm not so keen on using glob imports.
There was a problem hiding this comment.
4 items now. Added 2 setzero intrinsics
There was a problem hiding this comment.
True, though if you make the change below those will go away.
src/distributions/integer.rs
Outdated
| (__m128i, _mm_setzero_si128), | ||
| (__m256i, _mm256_setzero_si256) |
There was a problem hiding this comment.
I'm baffled: (1) the types exist without additional target features while the constructors require (sse2 / avx), and (2) the constructors are unsafe. Maybe I should learn a little more about SIMD here...
Stupid questions, but:
- This code will fail to compile without
sse2/avx, right? - Is there a reason we shouldn't simply transmute an array with suitable alignment? Especially since we're mostly doing that with the pointer-cast anyway.
There was a problem hiding this comment.
AFAIK there are no dedicated instructions for the setzero intrinsics. Usually they get compiled either down to XORing the same register or to writing zero bytes to memory. I am also a bit surprised that they are gated on sse2/avx, while types themselves are not.
I agree that transmuting arrays would be a simpler solution, but instead of creating an array with proper alignment I think it will be easier to write something like this:
let mut buf = [0u8; mem::size_of::<$ty>()];
rng.fill_bytes(&mut buf);
unsafe { mem::transmute_copy(&buf) }transmute_copy will handle the alignment requirements and in practice should be properly optimized out by compiler.
There was a problem hiding this comment.
It will compile just fine but without see/avx it will fail to run
There was a problem hiding this comment.
@TheIronBorn
Using intrinsics without properly checking required target features (either at compile or at run time) is considered UB.
| let mut vec: $ty = <$ty>::default(); | ||
| unsafe { | ||
| let ptr = &mut vec; | ||
| let b_ptr = &mut *(ptr as *mut $ty as *mut [u8; mem::size_of::<$ty>()]); | ||
| rng.fill_bytes(b_ptr); | ||
| } | ||
| vec.to_le() |
There was a problem hiding this comment.
I think this is correct, but we should really use from_bits like the old code to avoid unsafe (but do use fill_bytes instead of gen).
Unfortunately from_bits is not documented on docs.rs; I just dropped a PR for that.
There was a problem hiding this comment.
I'm confused by this. Do you mean use fill_bytes on a regular array and then from_slice_unaligned? That would avoid all unsafe.
There was a problem hiding this comment.
Hmm, I hadn't figured on Simd<[u8; 2]> etc. being hard to construct from an array. Maybe my suggestion doesn't make sense then.
There was a problem hiding this comment.
We could do something like
let mut bytes = [0_u8; mem::size_of::<$ty>()];
rng.fill_bytes(&mut bytes);
let vec = $ty::from_bits($u8xN::from_slice_unaligned(&bytes));
vec.to_le()but usizexN don't have from_bits,
No description provided.