perf/factor ~ deduplicate divisors by nicoonoclaste · Pull Request #1571 · uutils/coreutils

nicoonoclaste · 2020-07-31T14:37:57Z

Refactor (eheh) the factor() function and Factors datastructure
Introduce Decomposition, a representation of divisor multisets, common to factor() and Factors.
Optimise Decomposition, first by switching to a flat vector representation, then by using smallvec to stack-allocate the space in most cases.
Optimise the fmt implementation for Factors, avoiding a data copy to sort the factors.
Use a GCD-powered algorithm to decompose numbers into coprime factors, whenever a new factor is found.
This is marked WiP, as the implementation is rather ugly and leaves some performance on the table.
Optimise Decomposition again, knowing we do not ever add the same factor twice (so we can skip looking for it)

nicoonoclaste · 2020-07-31T14:41:08Z

Expected performance gain is ⪆20%, but I am missing precise measurements: the only computer I can use for this is currently my laptop, and as the weather got rather warm here, I'm having issues benchmarking precisely due to thermal throttling.

Moreover, the performance figures in the commit messages are likely wrong and need to be measured again, as I reordered the commits.

nicoonoclaste · 2020-07-31T14:50:09Z

Thinking further about it, I should submit everything up to 25176ee (i.e. using a flat vector, eliding most heap allocations, etc.) as a separate PR that doesn't involve heavy maths or algorithmics.

That should be easier to review, if I split it in a few PRs with narrower scope, and it means it can be reviewed/merged while I'm still working on this. :)

nicoonoclaste · 2020-07-31T15:52:24Z

Rebased on top of #1572 after those commits got extracted.

nicoonoclaste · 2020-08-09T16:23:33Z

Unrelated CI failure:

failures:
    test_head::test_bug_in_negative_zero_lines
    test_head::test_unsupported_negative_byte_syntax
    test_head::test_unsupported_zero_terminated_syntax

nicoonoclaste · 2020-08-09T17:06:15Z

Test failure should be fixed by #1586.

rivy · 2020-10-06T15:58:09Z

Rebased to 'master'.
Tests fail occasionally at 'test_factor::test_random' and 'test_factor::test_random_big'.
Needs bug triage and repair before merge.

rivy · 2020-10-10T21:21:49Z

newer changes

bug has been tracked down and fixed (including relevent additional testing)
minor performance enhancement on top of @nbraud's changes (~5% speed improvement)
minor refactoring for comprehension/readability

Overall, this looks to be ready for merge and narrows the performance gap with GNU factor. With this PR, this implementation now takes about 3-4 times as much time as GNU factor when factoring large sets of numbers. For smaller (usual?) sets, setup and I/O are more dominant and the two executables are perceptually equal.

$ ## 10,000,001 factorizations
$ hyperfine -L exe "factor,../../../target/release/factor,../../../target/release/coreutils factor" "seq 0 $((10 ** 7)) | {exe} > /dev/null"
Benchmark #1: seq 0 10000000 | factor > /dev/null
  Time (mean ± σ):      2.974 s ±  0.088 s    [User: 2.364 s, System: 0.809 s]
  Range (min … max):    2.881 s …  3.117 s    10 runs

Benchmark #2: seq 0 10000000 | ../../../target/release/factor > /dev/null
  Time (mean ± σ):     10.221 s ±  0.152 s    [User: 10.139 s, System: 0.292 s]
  Range (min … max):    9.901 s … 10.331 s    10 runs

Benchmark #3: seq 0 10000000 | ../../../target/release/coreutils factor > /dev/null
  Time (mean ± σ):     10.585 s ±  0.141 s    [User: 10.580 s, System: 0.204 s]
  Range (min … max):   10.254 s … 10.693 s    10 runs

Summary
  'seq 0 10000000 | factor > /dev/null' ran
    3.44 ± 0.11 times faster than 'seq 0 10000000 | ../../../target/release/factor > /dev/null'
    3.56 ± 0.12 times faster than 'seq 0 10000000 | ../../../target/release/coreutils factor > /dev/null'

$ ## 101 factorizations
$ hyperfine -L exe "factor,../../../target/release/factor,../../../target/release/coreutils factor" "seq $((10 ** 7)) $((10 ** 5)) $((2 * (10 ** 7))) | {exe} > /dev/null"
Benchmark #1: seq 10000000 100000 20000000 | factor > /dev/null
  Time (mean ± σ):       4.7 ms ±   0.5 ms    [User: 0.8 ms, System: 6.9 ms]
  Range (min … max):     3.7 ms …   6.6 ms    399 runs

  Warning: Command took less than 5 ms to complete. Results might be inaccurate.

Benchmark #2: seq 10000000 100000 20000000 | ../../../target/release/factor > /dev/null
  Time (mean ± σ):       6.8 ms ±   0.6 ms    [User: 1.0 ms, System: 9.3 ms]
  Range (min … max):     5.7 ms …   9.4 ms    279 runs

Benchmark #3: seq 10000000 100000 20000000 | ../../../target/release/coreutils factor > /dev/null
  Time (mean ± σ):       7.5 ms ±   0.6 ms    [User: 1.1 ms, System: 9.8 ms]
  Range (min … max):     6.3 ms …   9.8 ms    283 runs

Summary
  'seq 10000000 100000 20000000 | factor > /dev/null' ran
    1.45 ± 0.21 times faster than 'seq 10000000 100000 20000000 | ../../../target/release/factor > /dev/null'
    1.60 ± 0.22 times faster than 'seq 10000000 100000 20000000 | ../../../target/release/coreutils factor > /dev/null'

Thanks to @nbraud for the initial heavy lifting!

There are some unrelated new clippy warnings that I'll fix in a subsequent PR (see #1603).

This way, we can easily replace u8 with a larger type when moving to support larger integers.

The new type can be used to represent in-progress factorisations, which contain non-prime factors.

~18% faster than BTreeMap, and ~5% faster than 'master'

~2.9% faster than the previous commit, ~11% faster than “master” overall.

~7% slowdown, paves the way for upcoming improvements

~17% faster, many optimisation opportunities still missed >:)

The invariant is checked by a debug_assert!, and follows from the previous commit, as `dec` and `factors` only ever contains coprime numbers: - true at the start: factor = ∅ and dec = { n¹ } ; - on every loop iteration, we pull out an element `f` from `dec` and either: - discover it is prime, and add it to `factors` ; - split it into a number of coprime factors, that get reinserted into `dec`; the invariant is maintained, as all divisors of `f` are coprime with all numbers in `dec` and `factors` (as `f` itself is coprime with them. As we only add elements to `Decomposition` objects that are coprime with the existing ones, they are distinct.

This avoids allocating on the heap when factoring most numbers, without using much space on the stack. This is ~3.5% faster than the previous commit, and ~8.3% faster than “master”.

…ar_names)

…ilures

…ctor

… improvement)

…time improvement)

…bility

rivy · 2020-10-25T16:22:57Z

@uutils/maintainers , if there are no objections, I'll merge this on Monday.

perf/factor ~ deduplicate divisors

rivy · 2020-10-27T01:02:11Z

Closed via merge commit effb94b.

It was a draft PR, not ready for merging, and its premature inclusion caused repeated issues, see 368f473 & friends. Close #1841. This reverts commits 3743a3e, ce218e0, and b7b0c76.

nicoonoclaste mentioned this pull request Jul 31, 2020

factor: Refactor and optimise the Factors datastructure #1572

Merged

4 tasks

rivy force-pushed the master branch from d878f36 to b38abbc Compare August 9, 2020 04:19

rivy self-assigned this Sep 15, 2020

rivy added S - In progress U - factor labels Oct 7, 2020

rivy mentioned this pull request Oct 8, 2020

factor: Avoid most heap allocations #1574

Closed

rivy marked this pull request as ready for review October 10, 2020 21:21

rivy requested a review from Arcterus October 10, 2020 21:22

nicoonoclaste and others added 15 commits October 10, 2020 20:35

factor: Introduce a type alias for exponents

2c851ea

This way, we can easily replace u8 with a larger type when moving to support larger integers.

factor::Factors: Split off a Decomposition type

35e5ae8

The new type can be used to represent in-progress factorisations, which contain non-prime factors.

factor::Decomposition: Use a flat vector representation

9e4c14d

~18% faster than BTreeMap, and ~5% faster than 'master'

factor::Factors: Use a RefCell rather than copy data when printing

b8a7943

~2.9% faster than the previous commit, ~11% faster than “master” overall.

factor: Derecursify and refactor

34a472a

~7% slowdown, paves the way for upcoming improvements

factor: Ensure we only need to find every single factor once [WiP]

cb41a27

~17% faster, many optimisation opportunities still missed >:)

factor: Slightly refactor main loop, fix bug

cccb725

factor::Decomposition: Inline a small number (4) of factors

6fff452

This avoids allocating on the heap when factoring most numbers, without using much space on the stack. This is ~3.5% faster than the previous commit, and ~8.3% faster than “master”.

factor/refactor ~ fix cargo clippy complaints (allow many_single_ch…

d45db12

…ar_names)

tests/factor ~ update RNG usage and variable reports to ease debugging

9ad69fb

tests ~ (sub-crate/factor) add tests for known prior factorization fa…

ac893dd

…ilures

fix/factor ~ fix fault when factoring number composed of a squared fa…

86a5038

…ctor

perf/factor ~ improve factor() quotient and loop comparison (~6% time…

d3bad8b

… improvement)

perf/factor ~ tune number of stack inlined decomposition values (~1% …

ac41d41

…time improvement)

rivy added 4 commits October 10, 2020 20:35

tests/factor ~ test first 100000 integers for expected results

002d760

polish/factor ~ correct spelling

4284a5c

tests ~ (sub-crate factor) refactor divisor() test for improved reada…

b043976

…bility

tests/factor ~ refactor for readability + improve DRY

5f7d6c3

rivy removed the S - In progress label Oct 11, 2020

tdulcet mentioned this pull request Oct 16, 2020

README claims tdulcet/Benchmarking-Tool#1

Closed

rivy requested a review from a team October 25, 2020 05:30

rivy changed the title ~~factor: Deduplicate divisors~~ performance/factor ~ deduplicate divisors Oct 26, 2020

rivy changed the title ~~performance/factor ~ deduplicate divisors~~ perf/factor ~ deduplicate divisors Oct 26, 2020

rivy added a commit that referenced this pull request Oct 26, 2020

Merge pull request #1571 from nbraud + rivy

effb94b

perf/factor ~ deduplicate divisors

rivy closed this Oct 27, 2020

rivy mentioned this pull request Nov 3, 2020

uutils' factor is much slower than GNU's coreutils factor #1456

Open

nicoonoclaste mentioned this pull request Mar 18, 2021

factor test failures #1841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf/factor ~ deduplicate divisors#1571

perf/factor ~ deduplicate divisors#1571
nicoonoclaste wants to merge 19 commits into
uutils:masterfrom
nicoonoclaste:factor/faster/dedup_primes

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

rivy commented Oct 6, 2020

Uh oh!

rivy commented Oct 10, 2020 •

edited

Loading

Uh oh!

rivy commented Oct 25, 2020

Uh oh!

rivy commented Oct 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

rivy commented Oct 6, 2020

Uh oh!

rivy commented Oct 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

newer changes

Uh oh!

rivy commented Oct 25, 2020

Uh oh!

rivy commented Oct 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rivy commented Oct 10, 2020 •

edited

Loading