-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
factor::table: Implement a batched version w/ improved performance #2143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
c68c83c
factor::table: Take mutable refs
nicoonoclaste cd04742
factor::table: Add chunked implementation and microbenchmarks
nicoonoclaste 1fd5f9d
factor::table::factor_chunk: Turn loop inside-out
nicoonoclaste 7c28754
factor::table: Fixup microbenchmark
nicoonoclaste 12efaa6
factor: Add BENCHMARKING.md
nicoonoclaste ae15bf1
factor::benches::table: Report throughput (in numbers/s)
nicoonoclaste e9f8194
factor::benchmarking(doc): Add guidance on running µbenches
nicoonoclaste 1d75f09
factor::benchmarking(doc): Add guidance on writing µbenches
nicoonoclaste ddfcd2e
factor::benchmarking: Add wishlist / planned work
nicoonoclaste 7c649bc
factor::benches: Add check against ASLR
nicoonoclaste 1cd001f
factor::benches::table: Match BenchmarkId w/ criterion's conventions
nicoonoclaste 00322b9
factor: Move benchmarks out-of-crate
nicoonoclaste 9afed1f
Update Cargo.lock
nicoonoclaste 998b3c1
factor: Make random Factors instance generatable for tests
nicoonoclaste a0a103b
factor::table::chunked: Add test (equivalent to the single-number ver…
nicoonoclaste File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| # Benchmarking `factor` | ||
|
|
||
| The benchmarks for `factor` are located under `tests/benches/factor` | ||
| and can be invoked with `cargo bench` in that directory. | ||
|
|
||
| They are located outside the `uu_factor` crate, as they do not comply | ||
| with the project's minimum supported Rust version, *i.e.* may require | ||
| a newer version of `rustc`. | ||
|
|
||
|
|
||
| ## Microbenchmarking deterministic functions | ||
|
|
||
| We currently use [`criterion`] to benchmark deterministic functions, | ||
| such as `gcd` and `table::factor`. | ||
|
|
||
| However, µbenchmarks are by nature unstable: not only are they specific to | ||
| the hardware, operating system version, etc., but they are noisy and affected | ||
| by other tasks on the system (browser, compile jobs, etc.), which can cause | ||
| `criterion` to report spurious performance improvements and regressions. | ||
|
|
||
| This can be mitigated by getting as close to [idealised conditions][lemire] | ||
| as possible: | ||
| - minimize the amount of computation and I/O running concurrently to the | ||
| benchmark, *i.e.* close your browser and IM clients, don't compile at the | ||
| same time, etc. ; | ||
| - ensure the CPU's [frequency stays constant] during the benchmark ; | ||
| - [isolate a **physical** core], set it to `nohz_full`, and pin the benchmark | ||
| to it, so it won't be preempted in the middle of a measurement ; | ||
| - disable ASLR by running `setarch -R cargo bench`, so we can compare results | ||
| across multiple executions. | ||
|
|
||
|
|
||
| [`criterion`]: https://bheisler.github.io/criterion.rs/book/index.html | ||
| [lemire]: https://lemire.me/blog/2018/01/16/microbenchmarking-calls-for-idealized-conditions/ | ||
| [isolate a **physical** core]: https://pyperf.readthedocs.io/en/latest/system.html#isolate-cpus-on-linux | ||
| [frequency stays constant]: XXXTODO | ||
|
|
||
|
|
||
| ### Guidance for designing µbenchmarks | ||
|
|
||
| *Note:* this guidance is specific to `factor` and takes its application domain | ||
| into account; do not expect it to generalise to other projects. It is based | ||
| on Daniel Lemire's [*Microbenchmarking calls for idealized conditions*][lemire], | ||
| which I recommend reading if you want to add benchmarks to `factor`. | ||
|
|
||
| 1. Select a small, self-contained, deterministic component | ||
| `gcd` and `table::factor` are good example of such: | ||
| - no I/O or access to external data structures ; | ||
| - no call into other components ; | ||
| - behaviour is deterministic: no RNG, no concurrency, ... ; | ||
| - the test's body is *fast* (~100ns for `gcd`, ~10µs for `factor::table`), | ||
| so each sample takes a very short time, minimizing variability and | ||
| maximizing the numbers of samples we can take in a given time. | ||
|
|
||
| 2. Benchmarks are immutable (once merged in `uutils`) | ||
| Modifying a benchmark means previously-collected values cannot meaningfully | ||
| be compared, silently giving nonsensical results. If you must modify an | ||
| existing benchmark, rename it. | ||
|
|
||
| 3. Test common cases | ||
| We are interested in overall performance, rather than specific edge-cases; | ||
| use **reproducibly-randomised inputs**, sampling from either all possible | ||
| input values or some subset of interest. | ||
|
|
||
| 4. Use [`criterion`], `criterion::black_box`, ... | ||
| `criterion` isn't perfect, but it is also much better than ad-hoc | ||
| solutions in each benchmark. | ||
|
|
||
|
|
||
| ## Wishlist | ||
|
|
||
| ### Configurable statistical estimators | ||
|
|
||
| `criterion` always uses the arithmetic average as estimator; in µbenchmarks, | ||
| where the code under test is fully deterministic and the measurements are | ||
| subject to additive, positive noise, [the minimum is more appropriate][lemire]. | ||
|
|
||
|
|
||
| ### CI & reproducible performance testing | ||
|
|
||
| Measuring performance on real hardware is important, as it relates directly | ||
| to what users of `factor` experience; however, such measurements are subject | ||
| to the constraints of the real-world, and aren't perfectly reproducible. | ||
| Moreover, the mitigations for it (described above) aren't achievable in | ||
| virtualized, multi-tenant environments such as CI. | ||
|
|
||
| Instead, we could run the µbenchmarks in a simulated CPU with [`cachegrind`], | ||
| measure execution “time” in that model (in CI), and use it to detect and report | ||
| performance improvements and regressions. | ||
|
|
||
| [`iai`] is an implementation of this idea for Rust. | ||
|
|
||
| [`cachegrind`]: https://www.valgrind.org/docs/manual/cg-manual.html | ||
| [`iai`]: https://bheisler.github.io/criterion.rs/book/iai/iai.html | ||
|
|
||
|
|
||
| ### Comparing randomised implementations across multiple inputs | ||
|
|
||
| `factor` is a challenging target for system benchmarks as it combines two | ||
| characteristics: | ||
|
|
||
| 1. integer factoring algorithms are randomised, with large variance in | ||
| execution time ; | ||
|
|
||
| 2. various inputs also have large differences in factoring time, that | ||
| corresponds to no natural, linear ordering of the inputs. | ||
|
|
||
|
|
||
| If (1) was untrue (i.e. if execution time wasn't random), we could faithfully | ||
| compare 2 implementations (2 successive versions, or `uutils` and GNU) using | ||
| a scatter plot, where each axis corresponds to the perf. of one implementation. | ||
|
|
||
| Similarly, without (2) we could plot numbers on the X axis and their factoring | ||
| time on the Y axis, using multiple lines for various quantiles. The large | ||
| differences in factoring times for successive numbers, mean that such a plot | ||
| would be unreadable. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sylvestre Do you agree it would make sense to use something like
iaiin factor's CI, and automatically report significant performance changes on the PR's discussion?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? Are you aware of any success story with iai ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, but the tool is very new. I seem to recall (non-Rust) projects having decent success using
cachegrindin CI that way, though.