CI improvements; re-enable criterion comparison by silvanshade · Pull Request #204 · bytecodealliance/wasmparser

silvanshade · 2020-03-06T00:15:34Z

This PR addresses issue #189.

I spent some time investigating the slow criterion benchmark runs and there indeed seems to be something weird going on with running those through github actions CI.

I noticed that even if the benchmark was made trivial (example below), the benchmark job still would never get beyond the initial warmup phase, even after more than 20 minutes. It just sits there and reports no progress in the console view.

use criterion::{criterion_group, criterion_main, Criterion};

fn noop() {}

fn bench_noop(c: &mut Criterion) {
    c.bench_function("noop", |b| b.iter(|| noop()));
}

criterion_group!(benches, bench_noop);
criterion_main!(benches);

Compiling the project in release mode for benchmarking with cargo bench --no-run only takes about 2-3 minutes as a point of reference.

I couldn't find anyone else reporting this but there aren't many projects using criterion on CI.

Rather than sink a bunch of time into trying to figure out why that wasn't working on github (since there's not much to go on), I just re-enabled travis for that particular job.

The benchmarks run on the files in benches/corpus, which consists of a random subset that I selected from the tests directory. However, the benchmark run even on travis still takes a good chunk of time (around 20 min) with only a handful of files in the corpus. Some thought should go into tuning that selection and perhaps modifying the test parameters.

As for the other jobs (formatting, docs, tests), I split them out from the test script into separate github action jobs so that they can run in parallel and also take advantage of pre-rolled actions from the actions-rs project.

yurydelendik · 2020-03-06T14:28:11Z

/cc @fitzgen

fitzgen · 2020-03-06T18:27:54Z

First off: thanks for digging into this @darinmorrison! I know CI stuff is not always the most glamorous of contributions, and I appreciate your efforts.

IIRC, we were avoiding actions-rs in other bytecodealliance repos because of the potential for supply chain attacks. I'm not sure if this was specific to wasmtime and its dist binaries or not, and whether the same concerns apply here. cc @tschneidereit?

The other thing is that we've generally moved everything away from Travis, and I'm not sure that it is worth maintaining two CI configurations just to keep the benchmarks running in CI. I definitely believe that we should keep building/checking the benchmarks, but actually running them I'm not convinced is useful. The results are going to be super noisy on CI machines. So perhaps the best thing to do is just enable a cargo check --benches job to github actions? Or do you find benchmark results in CI pretty useful, @yurydelendik?

silvanshade · 2020-03-06T20:09:30Z

IIRC, we were avoiding actions-rs in other bytecodealliance repos because of the potential for supply chain attacks. I'm not sure if this was specific to wasmtime and its dist binaries or not, and whether the same concerns apply here. cc @tschneidereit?

This aspect of github actions is something I've had concerns about too.

In this case, as long as the actions are not used for automated commits or for generating the binary artifacts, the risk is probably minimal, especially if the actions are only used at their release revisions.

There are some advantages to using the pre-rolled actions beyond just convenience since they enable better formatting of output from cargo and rustc and enable a smoother and more efficient install of the toolchain (particularly on non-linux platforms, if you decide to test on those).

But I'd be happy to switch back to using the manual commands if preferred.

The other thing is that we've generally moved everything away from Travis, and I'm not sure that it is worth maintaining two CI configurations just to keep the benchmarks running in CI. I definitely believe that we should keep building/checking the benchmarks, but actually running them I'm not convinced is useful. The results are going to be super noisy on CI machines.

Fair enough. To be honest, I'm also not sure that running criterion on CI is going to yield great results given the experience I had trying to get this to work.

The main concern I have is that it takes a very long time to run those benchmarks even when they do make progress (on travis). As I mentioned, it took around 20 minutes to generate the results with only a handful of files. Locally, the same benchmark run only took a minute or two.

So there's a scalability issue with running the benchmarks on CI. Maybe the test parameters can be tuned for fewer iterations or something to reduce that effect, but then the results will be noisier.

Would it be enough to just generate new comparison results as part of a release process script? Minor performance regressions at the per-commit level might be missed that way but you could remind contributors to check for that with a PR template.

Another option could be to run the benchmarks as part of a build script, with the results committed to the repo. That way, critcmp could still be run on CI.

darinmorrison added 11 commits March 5, 2020 17:07

CI: use actions-rs/toolchain

d25c633

CI: use actions-rs/cargo

9f297e4

CI: remove redundant names

adec76a

CI: remove compare step

f15f66e

CI: change test job to cargo check

2ea5004

CI: rename rustfmt job to formatting

148f914

CI: add docs job

8c2d960

CI: add test jobs

edc5169

CI: add travis config

992be28

CI: bench only subset of tests

543f83b

CI: use 2018 imports; cleanup

4a63840

silvanshade closed this Apr 21, 2020

yurydelendik mentioned this pull request Jun 12, 2020

Try to fix benchmark tests - #2 bytecodealliance/wasm-tools#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI improvements; re-enable criterion comparison#204

CI improvements; re-enable criterion comparison#204
silvanshade wants to merge 11 commits intobytecodealliance:masterfrom
silvanshade:ci-improvements

silvanshade commented Mar 6, 2020

Uh oh!

yurydelendik commented Mar 6, 2020

Uh oh!

fitzgen commented Mar 6, 2020

Uh oh!

silvanshade commented Mar 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

silvanshade commented Mar 6, 2020

Uh oh!

yurydelendik commented Mar 6, 2020

Uh oh!

fitzgen commented Mar 6, 2020

Uh oh!

silvanshade commented Mar 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants