Skip to content

Performance improvements for factor#1525

Merged
rivy merged 15 commits into
uutils:masterfrom
nicoonoclaste:factor/faster
May 24, 2020
Merged

Performance improvements for factor#1525
rivy merged 15 commits into
uutils:masterfrom
nicoonoclaste:factor/faster

Conversation

@nicoonoclaste
Copy link
Copy Markdown
Contributor

@nicoonoclaste nicoonoclaste commented May 24, 2020

First pass at fixing #1456: after those changes, factor is 4× faster, and
“only” 48× slower than the GNU implementation (when factoring all numbers from 2
to 10⁶).

  • Replaced the Vec of factors with a datatype that only stores each prime once
  • Moved each algorithm (table-based, Pollard's rho, and Miller-Rabin) to its own module, and decoupled them.
  • Removed unecessary calls to is_prime (~50% of the perf. gain)
  • Replaced iterated division by 2 with u64::trailing_zeros
  • Refactored and optimized the Miller-Rabin primality test, extracted dividers from the result (the other half of the performance gain)

It is clearer to see what is going on, as opposed to passing around an
unmarked `Vec<u64>`, and there is a single place to add invariants checks.

This is also a more compact memory representation: each prime factor is
represented only once, with an additional byte for multiplicity.  The
performance impact is however not significant.
Also decoupled the factorisation methods; now factor::factor contains
the logic that chains the different algorithms and aggregates results.

As a side-effect, rho::factor now performs extraneous allocations (as each
recursive step creates a new `Factors` value, which is then aggregated into
the previous one) but there is no significant performance impact.
No significant performance impact (most of the time is spent elsewhere),
but an easy and satisfying fix nevertheless.
When the remainder is smaller than the max. entry in the table,
it is guaranteed to be prime.
50% performance improvement on factoring all numbers between 2 and 10⁶.
Replace iterated division with u64::trailing_zeros, hoist the selection of `mul`
out of the loop, another cool 49.5% runtime improvement.
@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

nicoonoclaste commented May 24, 2020

I am going to have a deeper look at the implementation of Pollard's rho, but I suspect further improvements will require switching to a wholly-different factoring algorithm.

Factoring Small to Medium Size Integers: An Experimental Comparison (2010) suggests that Shanks's SQUFOF (square forms factorization) is the fastest algorithm for integers up to ~60 bits.

A possible alternative I might investigate is Hart's “one-line factoring algorithm”, which is said to be competitive with SQUFOF and simpler to implement.

PS: This would presumably be the subject of a second PR.

@sylvestre
Copy link
Copy Markdown
Contributor

Thanks for the PR. There are some rust format and clippy issues, could you please fix them?

@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

Thanks for the PR. There are some rust format and clippy issues, could you please fix them?

Done; I was trying first to recall how to deal with not having NLLs (as the build is currently failing on obsolete versions of Rust).

@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

I didn't see the format issues, though (or at least, running cargo fmt didn't introduce any change)

@rivy rivy self-requested a review May 24, 2020 15:17
Copy link
Copy Markdown
Contributor

@rivy rivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the format issues, though (or at least, running cargo fmt didn't introduce any change)

It looks like it's just cargo clippy issues, specifically large numbers of single character variable names, unneeded returns, long literals lacking separators, ....

If you look at the "Checks" tabs and pull up the CICD section, it lists all the warnings.

It is acceptable to use an #[allow(...)] for the naming issue if you have reasons.

Thanks for your work!

@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

nicoonoclaste commented May 24, 2020

It looks like it's just cargo clippy issues, specifically about a large number of single character variable names.

Yes, I already dealt with it (by using 1 fewer variable >_>')

Thanks for your work!

You are welcome <3

@nicoonoclaste nicoonoclaste requested a review from rivy May 24, 2020 15:27
Comment thread src/uu/factor/src/factor.rs Outdated
nicoonoclaste and others added 3 commits May 24, 2020 19:10
Instead of computing a^r and a^(n-1) = a^(r 2ⁱ) separately,
compute the latter by repeatedly squaring the former.

33.6% performance improvement
Co-authored-by: Roy Ivy III <rivy.dev@gmail.com>
@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

This is “merely” 37× slower than GNU factor, now, and I still haven't implemented any fancier number theory (aside from extracting dividers from the M-H primality test)

Comment thread src/uu/factor/src/miller_rabin.rs
@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

@rivy @sylvestre I found another 2× speedup with a loop exchange (and essentially inlining a batch version of pow inside rabin_miller::test) but I'd rather postpone it to another PR:
I prototyped it using lazy_static and nalgebra, but those dependencies aren't really needed — all I need is all, any, and map for [u64; 7] — so I'd rather rewrite it more cleanly, but I'm not doing that tonight.

@rivy rivy merged commit 09abcf8 into uutils:master May 24, 2020
@rivy
Copy link
Copy Markdown
Contributor

rivy commented May 24, 2020

@nbraud , thanks! Sounds like your having fun! Mush on at your leisure.

@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

@rivy Thanks a bunch for reviewing and merging so quickly :O

@nicoonoclaste
Copy link
Copy Markdown
Contributor Author

And yes, golfing this to be ~93% faster was fun :3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants