CI for performance regressions by DavidEiglspergerQC · Pull Request #973 · Quantco/glum

DavidEiglspergerQC · 2026-02-10T15:53:53Z

This a first implementation of an automated CI regression detection. If you think this should be done in a different way (e.g. using asv) I am more than happy to adapt. This is just the approach I found the most lightweight and easy to implement given the current state of the updated benchmarking. Also the specific problems benchmarked need to be discussed, so we have the most representative ones there. We might also think about using a dedicated machine instead of running this with github-hosted runners.

Runtime Regression CI

This PR adds a lightweight automated runtime regression check for glum.

Added a dedicated workflow: .github/workflows/runtime-regression.yml
- runs on pull_request updates and push to main (we might need to adapt this, this was just the easiest setting for me to test things)
- compares base vs head in one run
Reused existing benchmark tooling (no new harness):
- glum_benchmarks/run_benchmarks.py
- glum_benchmarks/compare_results.py
Added CI benchmark config: glum_benchmarks/config_ci.yaml
- focused representative glum problems
- iterations: 10, num_threads: 1
- only benchmark + analysis steps enabled
- thresholds:
  - max_rel_slowdown: 0.15
  - max_abs_slowdown_sec: 0.05
  - max_regressed_cases: 0
CI stability/noise controls:
- warmup run (discarded), then measured run
- separate cache per ref
- fixed thread/hash env vars
- CI uses trimmed_mean aggregation; default benchmark behavior remains min

If regression thresholds are exceeded, the workflow fails and publishes a summary/artifacts for inspection.

Copilot

Pull request overview

This PR introduces a lightweight automated runtime regression detection system for the glum benchmarking pipeline, aimed at catching performance degradations in pull requests before they're merged.

Changes:

Added a new GitHub Actions workflow that runs performance benchmarks comparing the PR base against the head commit
Implemented a new comparison script that analyzes benchmark results and fails CI when regression thresholds are exceeded
Extended the runtime measurement utility to support configurable aggregation methods (min vs trimmed mean) to reduce CI noise

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`.github/workflows/runtime-regression.yml`	New workflow that runs benchmarks for base and head commits in separate worktrees, then compares results
`glum_benchmarks/config_ci.yaml`	CI-specific benchmark configuration with reduced scope and regression thresholds
`glum_benchmarks/compare_results.py`	New script to compare base/head benchmark CSVs and detect regressions
`glum_benchmarks/util.py`	Enhanced runtime measurement to support trimmed mean aggregation via environment variables
`glum_benchmarks/run_benchmarks.py`	Added CLI arguments for config path and run name overrides, plus threshold metadata fields
`pixi.toml`	Added `rich` dependency for prettier console output
`pixi.lock`	Lock file updates for `rich` and its dependencies (`markdown-it-py`, `mdurl`)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Marc-Antoine Schmidt (MarcAntoineSchmidtQC)

Overall this is good. Let's do one more round of integration with the existing systems.

Marc-Antoine Schmidt (MarcAntoineSchmidtQC)

Your solution to the config circular import is great. I tested a CI run with a regression and the output was good. I think this is ready to merge.

DavidEiglspergerQC changed the title ~~very first CI integration implementation~~ CI integration implementation Feb 10, 2026

DavidEiglspergerQC changed the title ~~CI integration implementation~~ CI for performance regressions Feb 10, 2026

DavidEiglspergerQC requested review from Matthias Schmidtblaicher (MatthiasSchmidtblaicherQC) and Copilot February 10, 2026 17:36

Copilot started reviewing on behalf of DavidEiglspergerQC February 10, 2026 17:40 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

DavidEiglspergerQC marked this pull request as ready for review February 17, 2026 16:32

DavidEiglspergerQC requested review from Marc-Antoine Schmidt (MarcAntoineSchmidtQC), Jan Tilly (jtilly) and Martin Stancsics (stanmart) as code owners February 17, 2026 16:32

DavidEiglspergerQC self-assigned this Mar 5, 2026

DavidEiglspergerQC mentioned this pull request Mar 5, 2026

Fix/update benchmarks #968

Merged

DavidEiglspergerQC force-pushed the fix/update_benchmarks branch from 319e52a to 4dd016a Compare March 5, 2026 19:09

Base automatically changed from fix/update_benchmarks to main March 5, 2026 19:52

DavidEiglspergerQC added 8 commits March 5, 2026 16:10

Allow flexible K/N ratio for the simulated problems

2fce08c

adjust defaults and available distributions

e2e24c8

very first CI integration implementation

86cb434

adjustements

a0e4399

Put into separate file and change trigger on

d72e253

reduce number of iterations and test cases to prevent timeout

b472229

Reduce iterations and adapt triggers

2e01c4b

Copilot feedback

bc30b91

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) force-pushed the feat/ci_regression branch from 595bc95 to bc30b91 Compare March 5, 2026 21:18

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) and others added 4 commits March 5, 2026 16:24

pch

b70b45f

Fix merge conflicts

2606d65

minor cleanup

ab638ca

skip table prints in CI

0760e84

DavidEiglspergerQC linked an issue Mar 6, 2026 that may be closed by this pull request

Automatic performance benchmarks #421

Closed

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) requested changes Mar 6, 2026

View reviewed changes

Comment thread glum_benchmarks/compare_results.py Outdated

Comment thread glum_benchmarks/config_ci.yaml

Comment thread glum_benchmarks/compare_results.py

Comment thread glum_benchmarks/compare_results.py

Comment thread glum_benchmarks/compare_results.py Outdated

Incorporate feedback

122d09b

DavidEiglspergerQC and others added 7 commits March 8, 2026 18:51

Extract BenchmarkConfig into config.py

5698220

reduce number of iterations to speedup CI

492fe56

Further runtime optimizations

68ae6a5

Simplify compare_results table rendering

48009c7

artificial slowdown

9ae3691

revert

4a12f58

also remove comment

b2e28de

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) approved these changes Mar 12, 2026

View reviewed changes

DavidEiglspergerQC merged commit e92aa88 into main Mar 13, 2026
25 checks passed

DavidEiglspergerQC deleted the feat/ci_regression branch March 13, 2026 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI for performance regressions#973

CI for performance regressions#973
DavidEiglspergerQC merged 20 commits intomainfrom
feat/ci_regression

DavidEiglspergerQC commented Feb 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DavidEiglspergerQC commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Runtime Regression CI

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marc-Antoine Schmidt (MarcAntoineSchmidtQC) left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DavidEiglspergerQC commented Feb 10, 2026 •

edited

Loading