I have two suggestions for this code:
|
s0 = folded_multiply(load(v, 0) ^ s0, load(v, 32) ^ seeds[0]); |
- use more variables. MUL latency is 3 (and 4 for higher 64 bits afaik), other operations 1, so at least 6 vars required to hide latency
- mix vars. Instead of seed[0], use next var - i.e. s1 for s0 computation and so on. This way, entropy from a single input bit will spread around all vars. It will allow one to use it as a proper 128-256 bit hash if she needs one
I have two suggestions for this code:
foldhash/src/lib.rs
Line 314 in 8f878c6