factor: Refactor and optimise the `Factors` datastructure by nicoonoclaste · Pull Request #1572 · uutils/coreutils

nicoonoclaste · 2020-07-31T15:11:10Z

Extract the divisors multiset abstraction to a Decomposition struct.
Make it use a flat vector representation.
Inline them on the stack in most cases, with smallvec.
Optimise away a copy in the fmt implementation of Factors.

This set of changes is extracted from #1571. Approx. 12% faster than master.

nicoonoclaste · 2020-07-31T15:16:30Z

There seem to be a very noticeable regression in dc83bed, but I cannot quite tell whether this is a measurement error, or something else (maybe this needs some inlining hints or somesuch?)

~~Table removed, as the measurements turned out to be bogus.~~

nicoonoclaste · 2020-07-31T21:53:06Z

Looks like smallvec cannot be built under Rust 1.32 (min. supported version) :(

nicoonoclaste · 2020-08-02T13:30:33Z

OK, pulled out the smallvec change, will split it off to a draft PR...

nicoonoclaste · 2020-08-04T15:16:23Z

OK, retesting this using the fantastic hyperfine, it looks like the previous measurements were bogus:

Commit id	Commit message	Mean (s)	σ (s)	vs. first	vs. previous
d2fa74b1	factor: Introduce a type alias for exponents	14.952	±0.744	—	—
24d43d20	factor::Factors: Split off a Decomposition type	14.286	±0.270	4.45%	4.45%
d9d793f1	factor::Decomposition: Use a flat vector representation	13.401	±0.331	10.37%	6.20%
104dd390	factor::Factors: Use a RefCell rather than copy data when printing	13.142	±0.112	12.11%	1.93%

If someone else wants to run the same measurement, I'd be interested; just run cargo install hyperfine and this bench.sh script, from the src/uu/factor directory, with this branch checked out.

tdulcet · 2020-08-06T16:02:21Z

@nbraud I ran your bench.sh script. It took about 2 hours and 42 minutes total. Not sure if you were expecting this much output, but your branch is 57 commits ahead of master:

Commit id,Commit message,Mean (s),σ (s),Median (s),User (s),System (s),Min (s),Max (s)
43ee92c4,factor::numeric: Generalise modular inverse computation,35.878468387765,2.3997042287247305,35.584719501265,35.704311365,0.24863277,33.147805829264996,40.902714871265
e68bb192,factor::numeric: Add a 32b Montgomery variant [WiP],21.52690325538,1.9938736411707,21.01120281358,21.400854335,0.250683695,19.03723600958,24.59110654558
a440807e,factor::miller_rabin: Use a specialized basis for 32b integers,17.46675466445,1.85610426440643,16.833038817149998,17.33516936,0.23757498,15.54208726115,20.09162680215
4d28f48a,factor: combine Montgomery and Montgomery32,17.533056054765,1.1437781126310476,17.930700861165,17.443206075000003,0.25593831499999997,15.615991417665,18.875818530665
774feb0a,factor::numeric: Generalise tests for Arithmetic trait,17.15965452775,0.8969738247010036,17.21235173145,17.024352895,0.24850488000000004,15.27254471895,18.54869988595
3f79be02,factor::numeric: Use debug_assert! for runtime assertions.,17.353463286005002,1.427500610505002,16.922570672405,17.236572794999997,0.25080718999999996,15.569953359405,19.623368221405002
28244413,factor::numeric: Document when to remove OverflowingAdd trait,17.62475276496,1.2523673333722436,17.32923914326,17.529190664999998,0.2963771599999999,15.66764080676,19.501110360760002
caa79a12,factor::numeric: Split Int and DoubleInt traits,16.077927163699997,0.5303870276605288,16.090360446299997,15.960060594999998,0.244823905,15.0998202013,16.8149661533
19a8231f,factor::numeric::Arithmetic: Rename associated type I to ModInt,17.64795944277,1.3492477840996728,17.831223533070002,17.513592995,0.244274235,15.93208370707,19.82783599807
53954bad,factor::numeric: Refactor away the use of {To,From}Primitives,17.981156535144997,1.4271304228938648,17.873709199745,17.85501381,0.249595175,15.400593259745,20.171055373745
f95f977f,factor::numeric: Generate implementations of Int with a macro,17.679454818874994,1.5068962564302795,17.882693404374997,17.54599701,0.264832885,15.460754106375001,19.706767693375
b25c77c5,factor::numeric: Generate implementations of DoubleInt with a macro,17.94097634246,1.1450735180035432,18.01301976716,17.84621017,0.27556158999999997,16.27239503766,19.46472276566
d2b43f49,factor::numeric::OverflowingAdd: Generate impls with a macro,17.45730724732,1.1995029159264945,17.633886106920002,17.30251592,0.235646215,15.46492033692,19.00287834092
30829032,factor::miller_rabin::is_prime: Fix bug,17.528326104905,0.9548872175749404,17.925878004205,17.41290489,0.24192220999999997,15.892889201205001,18.476054921205
0a1200bd,factor::miller_rabin: Add test for the largest 64b composite numbers,17.175683209225,1.880853231362936,16.784852204825,17.006031755,0.24236527499999996,15.232172310325,19.794208860325
600268c6,factor::miller_rabin::tests: Refactor,17.445730847175,1.6558240808191944,17.324954443375,17.292627995,0.24437186999999994,15.581947540375,20.024240928375
1e4d8248,factor::miller_rabin: Add negative test over all small composites,17.569805768815,1.1098256494177894,17.622730421815,17.45141901,0.269146595,15.221641923315,19.146409864315
d2fa0fe6,factor::miller_rabin::tests: small_composites → small_semiprimes,17.401873513185002,0.9093825635726501,17.517264971385003,17.273877364999997,0.21959015,15.714398522885,18.478969244885
4f08e281,factor::miller_rabin: Add property-based test,17.089070991325,0.7486808805073052,16.939671504425,17.004839994999998,0.25477757999999995,15.846659773425001,18.188974031425
9b149a75,factor::miller_rabin: Hoist edge-cases (even, <2) out of test(),17.606126276575004,1.2178196298979214,17.152779685475004,17.461185715,0.28401398,16.214026159475,19.802390198475003
3d6fdffe,factor::miller_rabin: Generalise tests to 32 and 64b Montgomery,17.45477828844,1.2028768707312043,17.30102943974,17.317652189999997,0.24759445499999994,15.53452037924,19.81237139524
cbcc760f,factor::miller_rabin: Squash another bug!  >:3,17.576567492849996,1.5098028285899108,17.18067792705,17.43398125,0.292258975,15.225456379550002,19.89777109855
7a1b86c9,factor::numeric::tests: Use a macro to instantiate every test,17.805454924035004,1.1052457816944914,17.895677207135,17.6434003,0.2762694,16.161066459135,19.363213732135
62567503,factor::miller_rabin: Use a macro to instantiate every test,16.623809618015,0.8603381254319471,16.629666819815,16.47382665,0.252747185,15.279380618315,17.659455061315
6e228d31,Merge branches 'factor/faster/{centralise_logic, montgomery32}',15.196345766214998,0.961143134967025,15.200603664414999,15.070257895000001,0.27043678,13.963816708915,16.851255903915
86a4749e,factor::numeric: fix original "Generalise modular inverse computation",14.815210445630001,1.044461977871921,14.78938690083,14.668546834999997,0.25254175500000003,13.65214797133,16.41261894933
1172af09,factor::numeric::DoubleInt: Clarify methods and associated types,14.979643095410001,0.7719163357864312,14.78806737651,14.846921290000001,0.312230115,13.81179663901,16.036300612010002
17c69674,factor::numeric::Int: Remove `from_u128` method,16.093290525060002,1.0038545651745396,16.17750861756,15.947029509999997,0.23776972999999996,14.545751878559999,17.53957889356
9a80ab77,factor::numeric::DoubleInt: Document the DoubleWidth associated type,14.505235780620001,0.6126135548505177,14.55867095032,14.385180979999998,0.18422122999999999,13.60796211382,15.46824456082
8cda0f59,Merge pull request #1554 from nbraud/factor/faster/montgomery32,16.877114592099996,1.183161989891194,17.3896179559,16.74047482,0.27635365,14.7031616494,18.0803880514
1b593d94,factor: Update rand dependency to v0.7,14.849025323244998,1.0308420341300828,14.660531791445,14.707610100000002,0.25420683000000005,13.782225963945,17.424611014944997
4f23767b,factor::numeric::gcd: Add criterion-powered benchmark,15.610932432735002,1.3538933048145732,15.263657409435002,15.473577884999997,0.23527541,14.209272120935001,17.741634832935
29d45e47,factor::numeric::gcd: Implement Stein's binary GCD algorithm,15.502573447674996,1.4678977111595681,15.395323508675,15.3635506,0.22980784,13.708727481175,17.964683215175
e415b17c,factor::miller_rabin: Remove duplicated work,15.035542875824998,1.265110563688749,14.607454229725,14.896940965000002,0.23275582,13.570697231725001,17.385409889725
ecc3e2db,factor::miller_rabin::test: Minor readability improvement,16.075442875655,1.3562418636569464,16.092371667555,15.978759694999999,0.26128788000000003,13.677802151055,17.602585390055
3e55139c,factor::miller_rabbin::Result: Mark as #[must_use],15.336046903035001,1.0691975220993377,15.421201584335,15.169051639999998,0.24571747,13.475025278335,16.733648946335002
c04c7a14,Merge pull request #1562 from nbraud/factor/faster/miller-rabbin,15.70935886686,1.1396824150894145,15.665521829860001,15.631747874999999,0.26616399999999996,13.57250447486,17.41569703186
6bef6306,factor::numeric::gcd: Avoid redundant u64::trailing_zeros and shifts,14.953435526165,1.342008432827372,14.744054230165002,14.848020669999997,0.23541200999999998,13.558416227665,17.704748294665
a6d7379b,Merge pull request #1563 from nbraud/factor/faster/gcd,15.37237351698,1.2100927543372695,15.188744222979999,15.23956589,0.24307856,13.94017769098,17.368003317979998
d3ef4bd7,maint/CICD ~ update 'actions/upload-artifact' to 'v2',15.115267393919998,1.3392384937680792,14.999434903520001,14.967971705,0.22946246500000003,13.495398551520001,17.36828242552
326ff367,refactor/polosh ~ fix `cargo clippy` complaint (reversed_empty_ranges),15.20062400355,1.0832194111863247,15.154882250850001,15.067971074999999,0.24650279,13.579744242850001,17.125761219850002
305c3cbc,refactor/polish ~ fix `cargo clippy` complaint (bind_instead_of_map),15.242674572090001,0.9870133965657807,15.244965855090001,15.118732095000002,0.220956655,13.95469084109,17.341408460089998
c36d71ba,refactor/polish ~ fix `cargo clippy` complaint (needless_lifetimes),15.040030620765,1.2182497985964735,14.667926136865,14.88231688,0.22561003999999998,13.904308036365,17.479553555364998
645e9a24,factor::miller_rabin: Add missing copyright header in source file,14.784038791375,1.0653999933743874,14.221299987975001,14.657214015,0.22436187,13.649960173475,16.600541793475
d57cba1f,Merge pull request #1565 from rivy/fix.warnings,16.36077211401,1.2310644901867385,17.138272431510003,16.330579694999997,0.28526825,14.54886188201,17.52008519901
85e2e1d0,Merge pull request #1566 from nbraud/factor/miller-rabin/copyright,14.332480402249999,0.7142285773530551,14.19009821125,14.208211379999998,0.23247135500000002,13.41772284375,15.67963591075
1eabda91,factor: Split numeric.rs into multiple modules (#1567),15.901662686695001,1.217471755447676,15.978481628695,15.784102805,0.275825965,14.186875511695,17.238340682695
70828329,factor::miller_rabin: Deduplicate parametrized_check macro (#1575),14.488758088779997,0.8358240840408043,14.26973037728,14.352265195000001,0.25891671499999996,13.41613511828,15.72039613328
34a5941e,factor::numeric: Add more property-based tests (#1577),15.02286552932,1.1176931289424201,14.738233719219998,14.911804810000001,0.2451294,13.52231086922,16.61278189822
17445126,factor: Add tests against (random) numbers with known factorisations (#1573),15.311491908539997,0.8758028085862738,15.60390606774,15.239671499999996,0.27407571000000003,13.92330076774,16.46147676174
8630b0dc,factor: Make the implementation of GCD more readable (#1576),15.76955108876,1.538046454546288,15.66347108726,15.641746595,0.251049225,13.58584591676,17.43384931676
2b16f666,feature[env]: Add support for `--chdir=DIR` in `env` (fixes #1568) (#1569),15.499173337074998,1.4632889095597237,15.639574347375,15.409693464999998,0.254159715,13.376256939875,17.362749525875
0ceecd02,fixup! factor: Add tests against (random) numbers with known factorisations (#1573) (#1578),15.18539258013,1.4086175861054198,14.885718741529999,15.040676384999998,0.25803286999999997,13.51539988303,17.34844563503
d2fa74b1,factor: Introduce a type alias for exponents,15.119831952174996,0.8114459030472361,15.038869459874999,14.992939830000001,0.249761305,13.610391835375,16.135139142375
24d43d20,factor::Factors: Split off a Decomposition type,14.820923227699998,1.0002657043718308,14.8276841383,14.698331804999999,0.22930462499999998,13.467206513299999,16.051844626300003
d9d793f1,factor::Decomposition: Use a flat vector representation,16.20502513564,1.1120190003688863,16.14369456424,16.059454574999997,0.24338153999999998,14.32854280274,17.73278594174
104dd390,factor::Factors: Use a RefCell rather than copy data when printing,14.937697380629999,0.7506043166311568,14.789696893230001,14.82469696,0.23812518500000004,14.14409040073,16.401049554729997

For reference, here are the specs of the server I ran it on and here are the commands I ran:

git clone https://github.com/nbraud/coreutils.git
cd coreutils/
git checkout factor/faster/flat-vec
cd src/uu/factor/
cargo install hyperfine
wget https://gist.githubusercontent.com/nbraud/fd1fea38f89a7b436fcd10d755040fad/raw/b767b6c00103a5de1ce79dcf3ba44431f01f9d3f/bench.sh -qO - | nohup time bash -s &

nicoonoclaste · 2020-08-07T15:45:34Z

I ran your bench.sh script. It took about 2 hours and 42 minutes total. Not sure if you were expecting this much output, but your branch is 57 commits ahead of master:

Oops, yes, master on my repo wasn't up-to-date with uutils/coreutils >_>'
It's fine, it just made you do 54 too many builds and measurements, sorry about that xD

For reference, here are the specs of the server I ran it on.

Interesting; I would have expected the measurements to have much lower noise than on my system, but the opposite happened. Is this a shared system, with other workloads running concurrently?

In any case, this should be good for review and merge. :)

tdulcet · 2020-08-07T17:10:42Z

Interesting; I would have expected the measurements to have much lower noise than on my system, but the opposite happened. Is this a shared system, with other workloads running concurrently?

Yes, it is a shared system, but I ran your script overnight, so nothing else was running. I think part of the issue is that the runtime for each command is too short. I would try 2..10⁸ or maybe try backwards from 2⁶⁴-1 since larger numbers obviously take longer to factor, such as (2⁶⁴-1) - 10⁶ .. 2⁶⁴-1 (see the script I shared here).

nicoonoclaste · 2020-08-08T15:12:19Z

Interesting; I would have expected the measurements to have much lower noise than on my system, but the opposite happened. Is this a shared system, with other workloads running concurrently?

Yes, it is a shared system, but I ran your script overnight, so nothing else was running.

'k, thanks for the info.

I think part of the issue is that the runtime for each command is too short.

~15 seconds should be long enough to get a decent measurement. I would blame the randomised nature of the algorithms, except that I'm getting an order of magnitude less noise (and ~10% faster) on my laptop, despite sharing 2 physical cores with a bunch of rather-noisy applications (browser and such) and suffering from thermal effects, so I'm somewhat stumped as to why you'd be getting less-reliable measurements on that system.

In any case, I'm currently assembling a new workstation, so I should be able to get much more precise measurements in the future (little-to-no thermal effects, and I can much more easily segregate one physical core for benchmarks)

I would try 2..10⁸ or maybe try backwards from 2⁶⁴-1 since larger numbers obviously take longer to factor, such as (2⁶⁴-1) - 10⁶ .. 2⁶⁴-1 (see the script I shared here).

Yeah, but I'm currently still trying to hunt down all the big inefficiencies (GNU factor is currently ~4× faster, even on small integers), whereas for larger numbers I expect the factor-finding algorithm to matter a lot more.

I confirmed that in a branch where I implemented Hart's “One-Line Factoring Algorithm”, which seems to be a large speedup (compared to the current implementation) on larger numbers, but it did not make any meaningful impact on the performance on small integer (again, on the range 2..10⁷).

The weasel-word is here because I don't yet have the kind of benchmarking and data analysis setup I'd need, to test that hypothesis with higher confidence; longer-term plan is to set that up, then implement optimisations for “large” 64b numbers, like Barrett reduction for Montgomery<u64> (to avoid computing modulo/division on 128b integers) and better factor-finding algorithms (probably 2 square-forms algos (Hart's OLF, and SQUFOF) and 2 elliptic curves ones (Lenstra's ECM, on both Edwards curves and hyperelliptic ones)). That's waaaay out-of-scope for this PR, though.

tdulcet · 2020-08-09T11:21:43Z

I would blame the randomised nature of the algorithms, except that I'm getting an order of magnitude less noise (and ~10% faster) on my laptop, despite sharing 2 physical cores with a bunch of rather-noisy applications (browser and such) and suffering from thermal effects, so I'm somewhat stumped as to why you'd be getting less-reliable measurements on that system.

That server is better for testing multithreaded programs, since it has 30 cores. I ran your bench.sh script on a regular Linux desktop with only 4 cores, but a higher frequency (comparable to this) and I think I got results more of what you were expecting, with a lot less noise:

Commit id,Commit message,Mean (s),σ (s),Median (s),User (s),System (s),Min (s),Max (s)
d2fa74b1,factor: Introduce a type alias for exponents,10.397865189715,0.029211322781921,10.389937086215,10.503208555,0.097809165,10.361069302715,10.452706799715001
24d43d20,factor::Factors: Split off a Decomposition type,10.386119839919997,0.019387630433985562,10.38824238942,10.474539525,0.115551245,10.35965036492,10.41641780592
d9d793f1,factor::Decomposition: Use a flat vector representation,10.350251792540003,0.05641790509874254,10.33629846594,10.45026414,0.10418118999999999,10.30711346394,10.50166181394
104dd390,factor::Factors: Use a RefCell rather than copy data when printing,10.184471022145,0.1880034808957368,10.103086786845001,10.302615274999999,0.08448331499999998,10.076130073845,10.668139007845001

For comparison, here is the hyperfine output from GNU factor on the same system:

command,mean,stddev,median,user,system,min,max
seq 2 10000000 | factor > /dev/null,2.043724070085,0.009707025301334486,2.042911540885,2.0401024899999998,0.10177365499999999,2.030404841385,2.0580838683850002

nicoonoclaste · 2020-08-09T14:36:00Z

Rebased to fix the conflict after master was force-pushed to.

nicoonoclaste · 2020-08-09T15:06:19Z

Unrelated test failure:

failures:

---- test_head::test_unsupported_line_syntax stdout ----
current_directory_resolved: 
run: /target/i686-unknown-linux-gnu/debug/coreutils head -n 2048m
thread 'test_head::test_unsupported_line_syntax' panicked at 'Broken pipe (os error 32)', tests/common/util.rs:595:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- test_head::test_unsupported_zero_terminated_syntax stdout ----
current_directory_resolved: 
run: /target/i686-unknown-linux-gnu/debug/coreutils head -z -n 1
thread 'test_head::test_unsupported_zero_terminated_syntax' panicked at 'Broken pipe (os error 32)', tests/common/util.rs:595:37

This way, we can easily replace u8 with a larger type when moving to support larger integers.

The new type can be used to represent in-progress factorisations, which contain non-prime factors.

~18% faster than BTreeMap, and ~5% faster than “master”.

~2.9% faster than the previous commit, ~11% faster than “master” overall.

nicoonoclaste · 2020-08-09T17:06:06Z

Test failure should be fixed by #1586.

sylvestre · 2020-09-21T19:44:38Z

Sounds great, sorry for the latency!

nicoonoclaste mentioned this pull request Jul 31, 2020

perf/factor ~ deduplicate divisors #1571

Closed

6 tasks

nicoonoclaste mentioned this pull request Aug 2, 2020

factor: Avoid most heap allocations #1574

Closed

rivy force-pushed the master branch from d878f36 to b38abbc Compare August 9, 2020 04:19

nicoonoclaste added 4 commits August 9, 2020 19:02

factor: Introduce a type alias for exponents

45eab38

This way, we can easily replace u8 with a larger type when moving to support larger integers.

factor::Factors: Split off a Decomposition type

436ce2d

The new type can be used to represent in-progress factorisations, which contain non-prime factors.

factor::Decomposition: Use a flat vector representation

4e3005a

~18% faster than BTreeMap, and ~5% faster than “master”.

factor::Factors: Use a RefCell rather than copy data when printing

287a8b7

~2.9% faster than the previous commit, ~11% faster than “master” overall.

rivy self-assigned this Sep 15, 2020

sylvestre merged commit 9a1c560 into uutils:master Sep 21, 2020

tdulcet mentioned this pull request Oct 16, 2020

README claims tdulcet/Benchmarking-Tool#1

Closed

Uh oh!

Conversation

nicoonoclaste commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicoonoclaste commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicoonoclaste commented Jul 31, 2020

Uh oh!

nicoonoclaste commented Aug 2, 2020

Uh oh!

nicoonoclaste commented Aug 4, 2020

Uh oh!

tdulcet commented Aug 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicoonoclaste commented Aug 7, 2020

Uh oh!

tdulcet commented Aug 7, 2020

Uh oh!

nicoonoclaste commented Aug 8, 2020

Uh oh!

tdulcet commented Aug 9, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

nicoonoclaste commented Aug 9, 2020

Uh oh!

sylvestre commented Sep 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nicoonoclaste commented Jul 31, 2020 •

edited

Loading

nicoonoclaste commented Jul 31, 2020 •

edited

Loading

tdulcet commented Aug 6, 2020 •

edited

Loading