Add 'benchcomp visualize' & error on regression by karkhaz · Pull Request #2348 · model-checking/kani

karkhaz · 2023-04-04T18:08:24Z

This commit adds an implementation for the benchcomp visualize
command. Currently, there is one visualization, "error_on_regression",
which causes benchcomp or benchcomp visualize to terminate with a
return code of 1 if there was a regression in any of the metrics.

Users can specify the following in their config file:

visualize:
- type: error_on_regression
  variant_pairs:
  - [variant_1, variant_2]
  - [variant_1, variant_3]
  checks:
  - metric: runtime
    test: "lambda old, new: new / old > 1.1"
  - metric: passed
    test: "lambda old, new: False if not old else not new"

This says to check whether any benchmark regressed when run under
variant_2 compared to variant_1. A benchmark is considered to have
regressed if the value of the 'runtime' metric under variant_2 is 10%
higher than the value under variant_1. Furthermore, the benchmark is
also considered to have regressed if it was previously passing, but is
now failing. These same checks are performed on all benchmarks run under
variant_3 compared to variant_1. If any of those lambda functions
returns True, then benchcomp will terminate with a return code of 1.

This commit fixes #2338.

Testing:

How is this change tested? Two new regression tests
Is this a refactor change? No

Checklist

Each commit message has a non-empty body, explaining why the change was made
Methods or procedures are documented
Regression or unit tests are included, or existing tests cover the modified code
My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

qinheping

Is it possible to report the name of the regressed benchmark in the error message?
Also, it would be great to have a test that contains metric of more than one benchmark.

tools/benchcomp/benchcomp/parsers/test_file_to_metric.py

karkhaz · 2023-04-06T23:27:03Z

Done both, thank you! I'm printing it out as a warning to stderr.

Note that this visualization is supposed to be fairly minimal, it's not supposed to be doing fancy output... the idea is that we'd implement more visualizations for Markdown output (for GitHub Actions), HTML reports, etc when we need them. But I do think a warning message to terminal is a good idea here

This commit adds an implementation for the `benchcomp visualize` command. Currently, there is one visualization, "error_on_regression", which causes `benchcomp` or `benchcomp visualize` to terminate with a return code of 1 if there was a regression in any of the metrics. Users can specify the following in their config file: visualize: - type: error_on_regression variant_pairs: - [variant_1, variant_2] - [variant_1, variant_3] checks: - metric: runtime test: "lambda old, new: new / old > 1.1" - metric: passed test: "lambda old, new: False if not old else not new" This says to check whether any benchmark regressed when run under variant_2 compared to variant_1. A benchmark is considered to have regressed if the value of the 'runtime' metric under variant_2 is 10% higher than the value under variant_1. Furthermore, the benchmark is also considered to have regressed if it was previously passing, but is now failing. These same checks are performed on all benchmarks run under variant_3 compared to variant_1. If any of those lambda functions returns True, then benchcomp will terminate with a return code of 1. This commit fixes model-checking#2338.

qinheping

Thank you @karkhaz!

karkhaz force-pushed the kk-visualize-error-on-regression branch 2 times, most recently from afe4f61 to 778cecc Compare April 4, 2023 18:11

karkhaz marked this pull request as ready for review April 4, 2023 18:14

karkhaz requested a review from a team as a code owner April 4, 2023 18:14

karkhaz marked this pull request as draft April 4, 2023 18:15

karkhaz force-pushed the kk-visualize-error-on-regression branch 4 times, most recently from 57ef7d6 to 3b93138 Compare April 6, 2023 17:00

karkhaz marked this pull request as ready for review April 6, 2023 17:01

qinheping reviewed Apr 6, 2023

View reviewed changes

tools/benchcomp/benchcomp/parsers/test_file_to_metric.py Outdated Show resolved Hide resolved

karkhaz force-pushed the kk-visualize-error-on-regression branch from 3b93138 to 91e468c Compare April 6, 2023 23:25

karkhaz force-pushed the kk-visualize-error-on-regression branch from 91e468c to 9f9e537 Compare April 6, 2023 23:28

karkhaz force-pushed the kk-visualize-error-on-regression branch from 9f9e537 to 4b5223e Compare April 6, 2023 23:31

karkhaz enabled auto-merge (squash) April 6, 2023 23:31

qinheping approved these changes Apr 6, 2023

View reviewed changes

karkhaz merged commit 94a26d1 into model-checking:main Apr 7, 2023

karkhaz deleted the kk-visualize-error-on-regression branch April 7, 2023 01:23

github-actions bot mentioned this pull request Apr 22, 2025

Automatic toolchain upgrade to nightly-2025-04-22 #4039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'benchcomp visualize' & error on regression#2348

Add 'benchcomp visualize' & error on regression#2348
karkhaz merged 1 commit intomodel-checking:mainfrom
karkhaz:kk-visualize-error-on-regression

karkhaz commented Apr 4, 2023 •

edited

Loading

Uh oh!

qinheping left a comment

Uh oh!

Uh oh!

karkhaz commented Apr 6, 2023

Uh oh!

qinheping left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karkhaz commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing:

Checklist

Uh oh!

qinheping left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

karkhaz commented Apr 6, 2023

Uh oh!

qinheping left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karkhaz commented Apr 4, 2023 •

edited

Loading