-
Notifications
You must be signed in to change notification settings - Fork 3
Faster random seq generation using rand_xoshiro #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ah nice!
(It's not like this adds recursive dependencies so it's gonna be fine anyway, but I'm curious and just came across SmallRng earlier this week.) Edit: Ah I see that this is maintained by the same people as |
|
Ah: Additional context on Vigna's rng page: https://prng.di.unimi.it/ Should we just use that instead? It feels 'simpler' instead. (But feel free to argue that it's better to pin the algorithm for reproducability, in which case yes, we should just use the crate directly as you did.) |
|
Oh I didn't know that! I agree, using |
|
I did a small benchmark here: https://github.com/imartayan/prng-bench |
|
The difference seems less significant on x86 than ARM, but |
|
Some more background reading:
Also note: On my hardware, fastrand is vectorized by default, which is as fast as smallrng. When preventing vectorization, it becomes twice as fast. But then we'd have to modify the library. |
| fn random(n: usize) -> Self { | ||
| let mut rng = rand::rng(); | ||
| let mut seq = vec![0; n]; | ||
| rand::rngs::SmallRng::from_os_rng().fill_bytes(&mut seq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm; this generates 4x the randomness we need. We could instead generate 4x less and then only use 2 bits at a time in the loop below. But probably it's fine as it.
If we really wanted to optimize this we might be able to replace the array indexing below with simd shuffles but not feeling like that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what might be faster is to make an 256 long array mapping bytes to [u8; 4], and then appending 4 values at a time. That's still relatively clean and should be fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Merged for now. We can revisit this if becomes a bottleneck later.)
The current PRNG is quite slow, this is frustrating when it becomes the slowest part of a benchmark.
Switching to
rand_xoshiro(based on Blackman & Vigna's Xoshiro) makes it an order of magnitude faster.