Skip to content

Conversation

@imartayan
Copy link
Collaborator

The current PRNG is quite slow, this is frustrating when it becomes the slowest part of a benchmark.
Switching to rand_xoshiro (based on Blackman & Vigna's Xoshiro) makes it an order of magnitude faster.

@RagnarGrootKoerkamp
Copy link
Contributor

RagnarGrootKoerkamp commented Mar 17, 2025

Ah nice!
Just to check: could you do a quick&dirty comparison against the following?

(It's not like this adds recursive dependencies so it's gonna be fine anyway, but I'm curious and just came across SmallRng earlier this week.)

Edit: Ah I see that this is maintained by the same people as rand itself 👍

@RagnarGrootKoerkamp
Copy link
Contributor

RagnarGrootKoerkamp commented Mar 17, 2025

Ah: SmallRng is already using xoshiro!

Additional context on Vigna's rng page: https://prng.di.unimi.it/

Should we just use that instead? It feels 'simpler' instead. (But feel free to argue that it's better to pin the algorithm for reproducability, in which case yes, we should just use the crate directly as you did.)

@imartayan
Copy link
Collaborator Author

Oh I didn't know that! I agree, using SmallRng seems simpler.

@imartayan
Copy link
Collaborator Author

I did a small benchmark here: https://github.com/imartayan/prng-bench
Surprisingly, fastrand seems much faster than all other methods.
Can you reproduce these results?

@imartayan
Copy link
Collaborator Author

The difference seems less significant on x86 than ARM, but fastrand is still the fastest.
Given these results, I think we should use fastrand, are you okay with that?

@RagnarGrootKoerkamp
Copy link
Contributor

RagnarGrootKoerkamp commented Mar 17, 2025

Some more background reading:

Also note: On my hardware, fastrand is vectorized by default, which is as fast as smallrng. When preventing vectorization, it becomes twice as fast. But then we'd have to modify the library.

fn random(n: usize) -> Self {
let mut rng = rand::rng();
let mut seq = vec![0; n];
rand::rngs::SmallRng::from_os_rng().fill_bytes(&mut seq);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm; this generates 4x the randomness we need. We could instead generate 4x less and then only use 2 bits at a time in the loop below. But probably it's fine as it.

If we really wanted to optimize this we might be able to replace the array indexing below with simd shuffles but not feeling like that now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what might be faster is to make an 256 long array mapping bytes to [u8; 4], and then appending 4 values at a time. That's still relatively clean and should be fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Merged for now. We can revisit this if becomes a bottleneck later.)

@RagnarGrootKoerkamp RagnarGrootKoerkamp merged commit cf4a2b7 into rust-seq:master Mar 21, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants