Add 'benchcomp visualize' & error on regression#2348
Merged
karkhaz merged 1 commit intomodel-checking:mainfrom Apr 7, 2023
Merged
Add 'benchcomp visualize' & error on regression#2348karkhaz merged 1 commit intomodel-checking:mainfrom
karkhaz merged 1 commit intomodel-checking:mainfrom
Conversation
afe4f61 to
778cecc
Compare
57ef7d6 to
3b93138
Compare
qinheping
reviewed
Apr 6, 2023
Contributor
qinheping
left a comment
There was a problem hiding this comment.
Is it possible to report the name of the regressed benchmark in the error message?
Also, it would be great to have a test that contains metric of more than one benchmark.
3b93138 to
91e468c
Compare
Contributor
Author
|
Done both, thank you! I'm printing it out as a warning to stderr. Note that this visualization is supposed to be fairly minimal, it's not supposed to be doing fancy output... the idea is that we'd implement more visualizations for Markdown output (for GitHub Actions), HTML reports, etc when we need them. But I do think a warning message to terminal is a good idea here |
91e468c to
9f9e537
Compare
This commit adds an implementation for the `benchcomp visualize`
command. Currently, there is one visualization, "error_on_regression",
which causes `benchcomp` or `benchcomp visualize` to terminate with a
return code of 1 if there was a regression in any of the metrics.
Users can specify the following in their config file:
visualize:
- type: error_on_regression
variant_pairs:
- [variant_1, variant_2]
- [variant_1, variant_3]
checks:
- metric: runtime
test: "lambda old, new: new / old > 1.1"
- metric: passed
test: "lambda old, new: False if not old else not new"
This says to check whether any benchmark regressed when run under
variant_2 compared to variant_1. A benchmark is considered to have
regressed if the value of the 'runtime' metric under variant_2 is 10%
higher than the value under variant_1. Furthermore, the benchmark is
also considered to have regressed if it was previously passing, but is
now failing. These same checks are performed on all benchmarks run under
variant_3 compared to variant_1. If any of those lambda functions
returns True, then benchcomp will terminate with a return code of 1.
This commit fixes model-checking#2338.
9f9e537 to
4b5223e
Compare
qinheping
approved these changes
Apr 6, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds an implementation for the
benchcomp visualizecommand. Currently, there is one visualization, "error_on_regression",
which causes
benchcomporbenchcomp visualizeto terminate with areturn code of 1 if there was a regression in any of the metrics.
Users can specify the following in their config file:
This says to check whether any benchmark regressed when run under
variant_2 compared to variant_1. A benchmark is considered to have
regressed if the value of the 'runtime' metric under variant_2 is 10%
higher than the value under variant_1. Furthermore, the benchmark is
also considered to have regressed if it was previously passing, but is
now failing. These same checks are performed on all benchmarks run under
variant_3 compared to variant_1. If any of those lambda functions
returns True, then benchcomp will terminate with a return code of 1.
This commit fixes #2338.
Testing:
How is this change tested? Two new regression tests
Is this a refactor change? No
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.