Conversation
This change removes division from the rejection sampler in random_range(). Replace divisions in range sampling with bitshifting. Instead of finding a well fitting range, we generate a number [0,2^n) and reject out of range values. In synthetic benchmarks, this approximately doubles the throughput.
I can't duplicate this. In my tests:
So your algorithm most definitely is useful if we wish to cut out the extra field or where the range can't be pre-computed, but not with the benchmark you used(?). Actually, my test was using a different range ( |
|
Using bitshifts instead of a modulus improves performance. Benchmarking this method is a bit more involved, because the results very much depend on how close a range is to a power of two. I think dhardy#2, dhardy#68 and dhardy#69 help more to improve the performance of Range. Techniques are: never working on values smaller than 32 bits, using a widening multiply instead of modulus, and a |
|
Yes, sounds like we can close this now (still need to get that code merged of course)! |
|
Thanks for the effort on this though! |
This change removes division from the rejection sampler in random_range().
Replace divisions in range sampling with bitshifting. Instead of finding a
well fitting range, we generate a number [0,2^n) and reject out of range
values.
In synthetic benchmarks, this approximately doubles the throughput.