factor: Refactor and optimise the Factors datastructure#1572
Conversation
|
There seem to be a very noticeable regression in dc83bed, but I cannot quite tell whether this is a measurement error, or something else (maybe this needs some inlining hints or somesuch?)
|
|
Looks like |
|
OK, pulled out the |
|
OK, retesting this using the fantastic
If someone else wants to run the same measurement, I'd be interested; just run |
|
@nbraud I ran your For reference, here are the specs of the server I ran it on and here are the commands I ran: git clone https://github.com/nbraud/coreutils.git
cd coreutils/
git checkout factor/faster/flat-vec
cd src/uu/factor/
cargo install hyperfine
wget https://gist.githubusercontent.com/nbraud/fd1fea38f89a7b436fcd10d755040fad/raw/b767b6c00103a5de1ce79dcf3ba44431f01f9d3f/bench.sh -qO - | nohup time bash -s & |
Oops, yes,
Interesting; I would have expected the measurements to have much lower noise than on my system, but the opposite happened. Is this a shared system, with other workloads running concurrently? In any case, this should be good for review and merge. :) |
Yes, it is a shared system, but I ran your script overnight, so nothing else was running. I think part of the issue is that the runtime for each command is too short. I would try 2..10⁸ or maybe try backwards from 264-1 since larger numbers obviously take longer to factor, such as (264-1) - 106 .. 264-1 (see the script I shared here). |
'k, thanks for the info.
~15 seconds should be long enough to get a decent measurement. I would blame the randomised nature of the algorithms, except that I'm getting an order of magnitude less noise (and ~10% faster) on my laptop, despite sharing 2 physical cores with a bunch of rather-noisy applications (browser and such) and suffering from thermal effects, so I'm somewhat stumped as to why you'd be getting less-reliable measurements on that system. In any case, I'm currently assembling a new workstation, so I should be able to get much more precise measurements in the future (little-to-no thermal effects, and I can much more easily segregate one physical core for benchmarks)
Yeah, but I'm currently still trying to hunt down all the big inefficiencies (GNU factor is currently ~4× faster, even on small integers), whereas for larger numbers I expect the factor-finding algorithm to matter a lot more. I confirmed that in a branch where I implemented Hart's “One-Line Factoring Algorithm”, which seems to be a large speedup (compared to the current implementation) on larger numbers, but it did not make any meaningful impact on the performance on small integer (again, on the range 2..10⁷). The weasel-word is here because I don't yet have the kind of benchmarking and data analysis setup I'd need, to test that hypothesis with higher confidence; longer-term plan is to set that up, then implement optimisations for “large” 64b numbers, like Barrett reduction for |
That server is better for testing multithreaded programs, since it has 30 cores. I ran your For comparison, here is the |
|
Rebased to fix the conflict after |
|
Unrelated test failure: |
This way, we can easily replace u8 with a larger type when moving to support larger integers.
The new type can be used to represent in-progress factorisations, which contain non-prime factors.
~18% faster than BTreeMap, and ~5% faster than “master”.
~2.9% faster than the previous commit, ~11% faster than “master” overall.
|
Test failure should be fixed by #1586. |
|
Sounds great, sorry for the latency! |
Decompositionstruct.smallvec.fmtimplementation ofFactors.This set of changes is extracted from #1571. Approx. 12% faster than
master.