factor: Faster modular arithmetic with the Montgomery transform#1529
Conversation
|
Marked WIP as 2 of the tests I introduced are now failing, after switching to the Montgomery transform. I suspect there is an overflow-related issue somewhere. |
|
Let me know when you finish this. It looks good to me other than the test failures. |
f07992e to
813e57d
Compare
|
@Arcterus I fixed one of the test failures, will take a break before having a go at the next one. |
This is a facter way to perform arithmetic mod n, when n is odd and a 64b number.
In debug mode, checks that all arithmetic operations coincide with the plain-u64 versions, as long as the latter does not overflow.
Just call `u64::wrapping_{mul,sub}` instead of (de)constructing Wrapping<u64>
values.
|
This should now be working, and ~45% faster than This also unlocks a lot of further algorithmic work, as we now have fast, non-overflowing modular arithmetic. |
Approx. 25% speedup
|
Update: about 59% faster than |
|
We have integrations tests here: |
Yes, the integration tests do exercise the new code too (essentially, anything that calls into |
|
@sylvestre Done :) |
|
This fails |
Odd; can you provide a log output (and preferably the backtrace too) ? |
|
Here’s the backtrace. There is no output on |
|
@Arcterus Thanks, I think I know what the issue might be then. Will have a deeper look after dinner. |
|
Sounds good 👍 |
|
@Arcterus Fixed |
|
🎉 |
…lures) - probably fixes uutils#1531 (via uutils#1529) per @nbraud
This can probably be optimised further, but as of commit 4851619 this is already ~2.43 times faster as previously, taking ~3.55s for all integers from 2 to 10⁶.
factor::{factor,miller_rabin}.Arithmetictrait, implement the Montgomery transform on 64b integers (requires 128b integers for some intermediate values)debug_assert!A further optimisation would be to make the
Montgomeryimplementation generic, and add a 32b variant. Moreover, a 32b variant could use a shorter basis in the Miller-Rabin primality test (either 3 witnesses, or a single witness depending on n).This seems however out-of-scope for this PR. Moreover, such an optimisation would be currently premature, and its impact hard to measure: after this change,
factoronly spends ~10% of its time in the Miller-Rabin primality test, and another ~10% in Pollard's ρ algorithm, versus ~35% intable::factorand ~25% printing the factorisations (!)