Skip to content

Add 'benchcomp visualize' & error on regression#2348

Merged
karkhaz merged 1 commit intomodel-checking:mainfrom
karkhaz:kk-visualize-error-on-regression
Apr 7, 2023
Merged

Add 'benchcomp visualize' & error on regression#2348
karkhaz merged 1 commit intomodel-checking:mainfrom
karkhaz:kk-visualize-error-on-regression

Conversation

@karkhaz
Copy link
Contributor

@karkhaz karkhaz commented Apr 4, 2023

This commit adds an implementation for the benchcomp visualize
command. Currently, there is one visualization, "error_on_regression",
which causes benchcomp or benchcomp visualize to terminate with a
return code of 1 if there was a regression in any of the metrics.

Users can specify the following in their config file:

visualize:
- type: error_on_regression
  variant_pairs:
  - [variant_1, variant_2]
  - [variant_1, variant_3]
  checks:
  - metric: runtime
    test: "lambda old, new: new / old > 1.1"
  - metric: passed
    test: "lambda old, new: False if not old else not new"

This says to check whether any benchmark regressed when run under
variant_2 compared to variant_1. A benchmark is considered to have
regressed if the value of the 'runtime' metric under variant_2 is 10%
higher than the value under variant_1. Furthermore, the benchmark is
also considered to have regressed if it was previously passing, but is
now failing. These same checks are performed on all benchmarks run under
variant_3 compared to variant_1. If any of those lambda functions
returns True, then benchcomp will terminate with a return code of 1.

This commit fixes #2338.

Testing:

  • How is this change tested? Two new regression tests

  • Is this a refactor change? No

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@karkhaz karkhaz force-pushed the kk-visualize-error-on-regression branch 2 times, most recently from afe4f61 to 778cecc Compare April 4, 2023 18:11
@karkhaz karkhaz marked this pull request as ready for review April 4, 2023 18:14
@karkhaz karkhaz requested a review from a team as a code owner April 4, 2023 18:14
@karkhaz karkhaz marked this pull request as draft April 4, 2023 18:15
@karkhaz karkhaz force-pushed the kk-visualize-error-on-regression branch 4 times, most recently from 57ef7d6 to 3b93138 Compare April 6, 2023 17:00
@karkhaz karkhaz marked this pull request as ready for review April 6, 2023 17:01
Copy link
Contributor

@qinheping qinheping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to report the name of the regressed benchmark in the error message?
Also, it would be great to have a test that contains metric of more than one benchmark.

@karkhaz karkhaz force-pushed the kk-visualize-error-on-regression branch from 3b93138 to 91e468c Compare April 6, 2023 23:25
@karkhaz
Copy link
Contributor Author

karkhaz commented Apr 6, 2023

Done both, thank you! I'm printing it out as a warning to stderr.

Note that this visualization is supposed to be fairly minimal, it's not supposed to be doing fancy output... the idea is that we'd implement more visualizations for Markdown output (for GitHub Actions), HTML reports, etc when we need them. But I do think a warning message to terminal is a good idea here

@karkhaz karkhaz force-pushed the kk-visualize-error-on-regression branch from 91e468c to 9f9e537 Compare April 6, 2023 23:28
This commit adds an implementation for the `benchcomp visualize`
command. Currently, there is one visualization, "error_on_regression",
which causes `benchcomp` or `benchcomp visualize` to terminate with a
return code of 1 if there was a regression in any of the metrics.

Users can specify the following in their config file:

    visualize:
    - type: error_on_regression
      variant_pairs:
      - [variant_1, variant_2]
      - [variant_1, variant_3]
      checks:
      - metric: runtime
        test: "lambda old, new: new / old > 1.1"
      - metric: passed
        test: "lambda old, new: False if not old else not new"

This says to check whether any benchmark regressed when run under
variant_2 compared to variant_1. A benchmark is considered to have
regressed if the value of the 'runtime' metric under variant_2 is 10%
higher than the value under variant_1. Furthermore, the benchmark is
also considered to have regressed if it was previously passing, but is
now failing. These same checks are performed on all benchmarks run under
variant_3 compared to variant_1. If any of those lambda functions
returns True, then benchcomp will terminate with a return code of 1.

This commit fixes model-checking#2338.
@karkhaz karkhaz force-pushed the kk-visualize-error-on-regression branch from 9f9e537 to 4b5223e Compare April 6, 2023 23:31
@karkhaz karkhaz enabled auto-merge (squash) April 6, 2023 23:31
Copy link
Contributor

@qinheping qinheping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @karkhaz!

@karkhaz karkhaz merged commit 94a26d1 into model-checking:main Apr 7, 2023
@karkhaz karkhaz deleted the kk-visualize-error-on-regression branch April 7, 2023 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement error return on regression action

2 participants