CI improvements; re-enable criterion comparison#204
CI improvements; re-enable criterion comparison#204silvanshade wants to merge 11 commits intobytecodealliance:masterfrom
Conversation
|
/cc @fitzgen |
|
First off: thanks for digging into this @darinmorrison! I know CI stuff is not always the most glamorous of contributions, and I appreciate your efforts. IIRC, we were avoiding actions-rs in other bytecodealliance repos because of the potential for supply chain attacks. I'm not sure if this was specific to wasmtime and its dist binaries or not, and whether the same concerns apply here. cc @tschneidereit? The other thing is that we've generally moved everything away from Travis, and I'm not sure that it is worth maintaining two CI configurations just to keep the benchmarks running in CI. I definitely believe that we should keep building/checking the benchmarks, but actually running them I'm not convinced is useful. The results are going to be super noisy on CI machines. So perhaps the best thing to do is just enable a |
This aspect of github actions is something I've had concerns about too. In this case, as long as the actions are not used for automated commits or for generating the binary artifacts, the risk is probably minimal, especially if the actions are only used at their release revisions. There are some advantages to using the pre-rolled actions beyond just convenience since they enable better formatting of output from But I'd be happy to switch back to using the manual commands if preferred.
Fair enough. To be honest, I'm also not sure that running criterion on CI is going to yield great results given the experience I had trying to get this to work. The main concern I have is that it takes a very long time to run those benchmarks even when they do make progress (on travis). As I mentioned, it took around 20 minutes to generate the results with only a handful of files. Locally, the same benchmark run only took a minute or two. So there's a scalability issue with running the benchmarks on CI. Maybe the test parameters can be tuned for fewer iterations or something to reduce that effect, but then the results will be noisier. Would it be enough to just generate new comparison results as part of a release process script? Minor performance regressions at the per-commit level might be missed that way but you could remind contributors to check for that with a PR template. Another option could be to run the benchmarks as part of a build script, with the results committed to the repo. That way, |
This PR addresses issue #189.
I spent some time investigating the slow criterion benchmark runs and there indeed seems to be something weird going on with running those through github actions CI.
I noticed that even if the benchmark was made trivial (example below), the benchmark job still would never get beyond the initial warmup phase, even after more than 20 minutes. It just sits there and reports no progress in the console view.
Compiling the project in release mode for benchmarking with
cargo bench --no-runonly takes about 2-3 minutes as a point of reference.I couldn't find anyone else reporting this but there aren't many projects using criterion on CI.
Rather than sink a bunch of time into trying to figure out why that wasn't working on github (since there's not much to go on), I just re-enabled travis for that particular job.
The benchmarks run on the files in
benches/corpus, which consists of a random subset that I selected from the tests directory. However, the benchmark run even on travis still takes a good chunk of time (around 20 min) with only a handful of files in the corpus. Some thought should go into tuning that selection and perhaps modifying the test parameters.As for the other jobs (formatting, docs, tests), I split them out from the test script into separate github action jobs so that they can run in parallel and also take advantage of pre-rolled actions from the actions-rs project.