Skip to content
This repository was archived by the owner on Jun 16, 2020. It is now read-only.

CI improvements; re-enable criterion comparison#204

Closed
silvanshade wants to merge 11 commits intobytecodealliance:masterfrom
silvanshade:ci-improvements
Closed

CI improvements; re-enable criterion comparison#204
silvanshade wants to merge 11 commits intobytecodealliance:masterfrom
silvanshade:ci-improvements

Conversation

@silvanshade
Copy link

This PR addresses issue #189.

I spent some time investigating the slow criterion benchmark runs and there indeed seems to be something weird going on with running those through github actions CI.

I noticed that even if the benchmark was made trivial (example below), the benchmark job still would never get beyond the initial warmup phase, even after more than 20 minutes. It just sits there and reports no progress in the console view.

use criterion::{criterion_group, criterion_main, Criterion};

fn noop() {}

fn bench_noop(c: &mut Criterion) {
    c.bench_function("noop", |b| b.iter(|| noop()));
}

criterion_group!(benches, bench_noop);
criterion_main!(benches);

Compiling the project in release mode for benchmarking with cargo bench --no-run only takes about 2-3 minutes as a point of reference.

I couldn't find anyone else reporting this but there aren't many projects using criterion on CI.

Rather than sink a bunch of time into trying to figure out why that wasn't working on github (since there's not much to go on), I just re-enabled travis for that particular job.

The benchmarks run on the files in benches/corpus, which consists of a random subset that I selected from the tests directory. However, the benchmark run even on travis still takes a good chunk of time (around 20 min) with only a handful of files in the corpus. Some thought should go into tuning that selection and perhaps modifying the test parameters.

As for the other jobs (formatting, docs, tests), I split them out from the test script into separate github action jobs so that they can run in parallel and also take advantage of pre-rolled actions from the actions-rs project.

@yurydelendik
Copy link
Collaborator

/cc @fitzgen

@fitzgen
Copy link
Member

fitzgen commented Mar 6, 2020

First off: thanks for digging into this @darinmorrison! I know CI stuff is not always the most glamorous of contributions, and I appreciate your efforts.

IIRC, we were avoiding actions-rs in other bytecodealliance repos because of the potential for supply chain attacks. I'm not sure if this was specific to wasmtime and its dist binaries or not, and whether the same concerns apply here. cc @tschneidereit?

The other thing is that we've generally moved everything away from Travis, and I'm not sure that it is worth maintaining two CI configurations just to keep the benchmarks running in CI. I definitely believe that we should keep building/checking the benchmarks, but actually running them I'm not convinced is useful. The results are going to be super noisy on CI machines. So perhaps the best thing to do is just enable a cargo check --benches job to github actions? Or do you find benchmark results in CI pretty useful, @yurydelendik?

@silvanshade
Copy link
Author

IIRC, we were avoiding actions-rs in other bytecodealliance repos because of the potential for supply chain attacks. I'm not sure if this was specific to wasmtime and its dist binaries or not, and whether the same concerns apply here. cc @tschneidereit?

This aspect of github actions is something I've had concerns about too.

In this case, as long as the actions are not used for automated commits or for generating the binary artifacts, the risk is probably minimal, especially if the actions are only used at their release revisions.

There are some advantages to using the pre-rolled actions beyond just convenience since they enable better formatting of output from cargo and rustc and enable a smoother and more efficient install of the toolchain (particularly on non-linux platforms, if you decide to test on those).

But I'd be happy to switch back to using the manual commands if preferred.

The other thing is that we've generally moved everything away from Travis, and I'm not sure that it is worth maintaining two CI configurations just to keep the benchmarks running in CI. I definitely believe that we should keep building/checking the benchmarks, but actually running them I'm not convinced is useful. The results are going to be super noisy on CI machines.

Fair enough. To be honest, I'm also not sure that running criterion on CI is going to yield great results given the experience I had trying to get this to work.

The main concern I have is that it takes a very long time to run those benchmarks even when they do make progress (on travis). As I mentioned, it took around 20 minutes to generate the results with only a handful of files. Locally, the same benchmark run only took a minute or two.

So there's a scalability issue with running the benchmarks on CI. Maybe the test parameters can be tuned for fewer iterations or something to reduce that effect, but then the results will be noisier.

Would it be enough to just generate new comparison results as part of a release process script? Minor performance regressions at the per-commit level might be missed that way but you could remind contributors to check for that with a PR template.

Another option could be to run the benchmarks as part of a build script, with the results committed to the repo. That way, critcmp could still be run on CI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants